Authors: Juan Miguel Cejuela and Jorge Campos
tagtog is a web-based tool to fetch and edit manual and automatic annotations of biomedical literature. Multiple users can collaborate to create corpora of annotated text. Furthermore, automatic entity recognizers and automatic relationship extractors can be applied to accelerate the annotation process. The tool can be accessed interactively via the web interface or programmatically via a REST API. At the time of writing, the API is an alpha demo-like status. Current internal efforts at tagtog have the objective of documenting a formal API. We expect to finish this work before and is therefore beyond the scope of the BLAH2 hackaton. Rather, our objective for BLAH2 is be to provide an official interaction and sharing of annotations from tagtog to PubAnnotation. This includes 1) a conversion from tagtog's internal format to PubAnnotation format (callable from the API) and 2) a transparent way to store tagtog projects' annotations to PubAnnotation. As a result, the PubAnnotation archive of biomedical annotations will benefit from the increasing annotations efforts performed on the tagtog editor. We already implemented important parts of the needed code and we already made a live demonstration at the first edition of BLAH1 of tagtog interoperating with PubAnnotation. Thus, our goals are realizable within the scope of the BLAH hackaton week.
At the time of writing, tagtog can automatically extract the entities: GGP (Gene or Gene Product; normalized to UniProt ID), Organism (normalized to NCBI Taxonomy ID), SNOMED (normalized to SNOMED CT Ontology), and Sequencing Platforms (companies and methods). We are currently working on providing automatic entity & relationship annotations for the ontologies: GO (Gene Ontology), HPO (Human Disease Ontology), and Disease.
The existing demo API can automatically annotate documents provided their PMID, PMCID, or plain text. For example, the annotation of PMID:25821226 can be retrieved through the GET URLs:
- Visual results on tagtog: https://www.tagtog.net/-demo?docid=at7_FwmvMGJhC9FZyPtVio232Jlq-25821226&output=visualize
- Results in JSON (ann.json format): https://www.tagtog.net/-demo?docid=at7_FwmvMGJhC9FZyPtVio232Jlq-25821226&output=ann.json
Our internal efforts previous the hackaton will focus on simplifying and documenting a formal API. The API will be freely-accessible for researchers. For the hackaton, our first goal is to also allow returning annotation results in PubAnnotation format. For example, a tentative call would be: https://www.tagtog.net/-api?pmid=25821226&output=pubannotation -- The returned results could further be stored in PubAnnotation or visualized on TextAE.
tagtog annotations are centered around the structure of projects; these contain the corpora. An user can own or participate in multiple projects and a project can be accessed by multiple users. For the hackaton, our second goal is to allow projects' owners to opt if they want annotations to be stored in PubAnnotation. If this project setting is true, the format of the annotations will be first converted and then stored using PubAnnotations's API. The process will be transparent and will not require manual intervention from the user.
- Juan Miguel Cejuela, Peter McQuilton, Laura Ponting, Steven J. Marygold, Raymund Stefancsik, Gillian H. Millburn, Burkhard Rost, and the FlyBase Consortium -- tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles -- Database, 2014
- Tatyana Goldberg, Shrikant Vinchurkar, Juan Miguel Cejuela, Lars Juhl Jensen and Burkhard Rost -- Linked annotations: a middle ground for manual curation of biomedical databases and text corpora -- BMC Proceedings, 2015