An anatomic pathology natural language dictionary (LEXICON) has evolved over a nine-year period, a result of scanning over one million words of narrative text from tissue examination request forms and surgical pathology reports. The text is parsed into individual words which are looked up in LEXICON and flagged by action codes which determine usage in constructing a KWIC index file and an on-line database retrievable by keywords. The LEXICON now resides on an IBM 370/168 system and has survived several transfers between computer systems. An update program is used after each batch of narrative text is scanned to modify LEXICON. LEXICON now contains 24,228 medical and nonmedical terms, 24.8% are errors (misspellings), 45.9% are keywords retrievable on and off line, 52.2% of the words are cross-referenced to a supplementary word. A preliminary study shows that many of the “nonmedical” terms in LEXICON carry significant medical information, and that there is considerable overlap of medical words among LEXICON, SNOMED, and ICDA-8. Our LEXICON appears to be an intermediate step in the process of evolving an algorithm capable of “understanding” medical narrative text.
ASJC Scopus subject areas
- Medicine (miscellaneous)