TY - JOUR
T1 - Comprehensive temporal information detection from clinical text
T2 - Medical events, time, and TLINK identification
AU - Sohn Dr., Sunghwan
AU - Wagholikar, Kavishwar B.
AU - Li, Dingcheng
AU - Jonnalagadda, Siddhartha R.
AU - Tao, Cui
AU - Elayavilli, Ravikumar Komandur
AU - Liu, Hongfang
PY - 2013
Y1 - 2013
N2 - Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.
AB - Background: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. Objective: To construct automated systems for EVENT/ TIMEX3 extraction and temporal link (TLINK) identification from clinical text. Materials and methods: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. Results: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. Conclusions: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.
UR - http://www.scopus.com/inward/record.url?scp=84881183249&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881183249&partnerID=8YFLogxK
U2 - 10.1136/amiajnl-2013-001622
DO - 10.1136/amiajnl-2013-001622
M3 - Article
C2 - 23558168
AN - SCOPUS:84881183249
SN - 1067-5027
VL - 20
SP - 836
EP - 842
JO - Journal of the American Medical Informatics Association : JAMIA
JF - Journal of the American Medical Informatics Association : JAMIA
IS - 5
ER -