TY - JOUR
T1 - Classification of clinically useful sentences in clinical evidence resources
AU - Morid, Mohammad Amin
AU - Fiszman, Marcelo
AU - Raja, Kalpana
AU - Jonnalagadda, Siddhartha R.
AU - Del Fiol, Guilherme
N1 - Funding Information:
This project was supported by Grants 1R01LM011416-01 and 4R00LM011389-02 from the National Library of Medicine.
Funding Information:
This project was supported by Grants 1R01LM011416-01 and 4R00LM011389-02 from the National Library of Medicine .
Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2016/4/1
Y1 - 2016/4/1
N2 - Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. Objective: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. Materials and methods: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. Results: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p < 0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p < 0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p = 0.62). Conclusions: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.
AB - Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. Objective: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. Materials and methods: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. Results: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p < 0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p < 0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p = 0.62). Conclusions: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.
KW - Clinical decision support
KW - Machine learning
KW - Natural language processing
KW - Text summarization
UR - http://www.scopus.com/inward/record.url?scp=84963569397&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963569397&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2016.01.003
DO - 10.1016/j.jbi.2016.01.003
M3 - Article
C2 - 26774763
AN - SCOPUS:84963569397
SN - 1532-0464
VL - 60
SP - 14
EP - 22
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -