Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge

James Cormack*, Chinmoy Nath, David Milward, Kalpana Raja, Siddhartha R. Jonnalagadda

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    33 Scopus citations


    This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.

    Original languageEnglish (US)
    Pages (from-to)S120-S127
    JournalJournal of Biomedical Informatics
    StatePublished - Dec 1 2015


    • Clinical natural language processing
    • Information extraction
    • Text mining

    ASJC Scopus subject areas

    • Health Informatics
    • Computer Science Applications


    Dive into the research topics of 'Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge'. Together they form a unique fingerprint.

    Cite this