TY - JOUR
T1 - Text mining in the biocuration workflow
T2 - applications for literature curation at WormBase, dictyBase and TAIR.
AU - Van Auken, Kimberly
AU - Fey, Petra
AU - Berardini, Tanya Z.
AU - Dodson, Robert
AU - Cooper, Laurel
AU - Li, Donghui
AU - Chan, Juancarlos
AU - Li, Yuling
AU - Basu, Siddhartha
AU - Muller, Hans Michael
AU - Chisholm, Rex
AU - Huala, Eva
AU - Sternberg, Paul W.
AU - WormBase Consortium, Consortium
PY - 2012
Y1 - 2012
N2 - WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.
AB - WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.
UR - http://www.scopus.com/inward/record.url?scp=84876468442&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876468442&partnerID=8YFLogxK
U2 - 10.1093/database/bas040
DO - 10.1093/database/bas040
M3 - Article
C2 - 23160413
AN - SCOPUS:84876468442
SN - 1758-0463
VL - 2012
SP - bas040
JO - Database : the journal of biological databases and curation
JF - Database : the journal of biological databases and curation
ER -