Classification of clinically useful sentences in clinical evidence resources

Mohammad Amin Morid, Marcelo Fiszman, Kalpana Raja, Siddhartha R. Jonnalagadda, Guilherme Del Fiol*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    23 Scopus citations


    Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. Objective: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. Materials and methods: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. Results: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p < 0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p < 0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p = 0.62). Conclusions: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.

    Original languageEnglish (US)
    Pages (from-to)14-22
    Number of pages9
    JournalJournal of Biomedical Informatics
    StatePublished - Apr 1 2016


    • Clinical decision support
    • Machine learning
    • Natural language processing
    • Text summarization

    ASJC Scopus subject areas

    • Health Informatics
    • Computer Science Applications


    Dive into the research topics of 'Classification of clinically useful sentences in clinical evidence resources'. Together they form a unique fingerprint.

    Cite this