TY - JOUR
T1 - Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge
AU - Cormack, James
AU - Nath, Chinmoy
AU - Milward, David
AU - Raja, Kalpana
AU - Jonnalagadda, Siddhartha R.
N1 - Funding Information:
We would like to acknowledge the organizers of the i2b2/UTHealth 2014 challenge and the time and effort invested by the annotators of the data. We would also like to acknowledge the helpful comments about the paper from reviewers. This work was partly supported by the NLM R00 Grant LM011389 .
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/12/1
Y1 - 2015/12/1
N2 - This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.
AB - This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.
KW - Clinical natural language processing
KW - Information extraction
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=84937854995&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937854995&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2015.06.030
DO - 10.1016/j.jbi.2015.06.030
M3 - Article
C2 - 26209007
AN - SCOPUS:84937854995
SN - 1532-0464
VL - 58
SP - S120-S127
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -