TY - JOUR
T1 - Automatically finding relevant citations for clinical guideline development
AU - Bui, Duy Duc An
AU - Jonnalagadda, Siddhartha
AU - Del Fiol, Guilherme
N1 - Funding Information:
This project was supported in part by grants 1R01LM011416-02 and 4R00LM011389-02 from the National Library of Medicine.
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/10/1
Y1 - 2015/10/1
N2 - Objective: Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. Methods: We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Results: Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p < 0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall at k +21.1%, p < 0.001), PubMed's rank by "relevance" (average precision +6.1%, recall at k +14.8%, p < 0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall at k +4.2%, p < 0.001). Conclusions: Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search.
AB - Objective: Literature database search is a crucial step in the development of clinical practice guidelines and systematic reviews. In the age of information technology, the process of literature search is still conducted manually, therefore it is costly, slow and subject to human errors. In this research, we sought to improve the traditional search approach using innovative query expansion and citation ranking approaches. Methods: We developed a citation retrieval system composed of query expansion and citation ranking methods. The methods are unsupervised and easily integrated over the PubMed search engine. To validate the system, we developed a gold standard consisting of citations that were systematically searched and screened to support the development of cardiovascular clinical practice guidelines. The expansion and ranking methods were evaluated separately and compared with baseline approaches. Results: Compared with the baseline PubMed expansion, the query expansion algorithm improved recall (80.2% vs. 51.5%) with small loss on precision (0.4% vs. 0.6%). The algorithm could find all citations used to support a larger number of guideline recommendations than the baseline approach (64.5% vs. 37.2%, p < 0.001). In addition, the citation ranking approach performed better than PubMed's "most recent" ranking (average precision +6.5%, recall at k +21.1%, p < 0.001), PubMed's rank by "relevance" (average precision +6.1%, recall at k +14.8%, p < 0.001), and the machine learning classifier that identifies scientifically sound studies from MEDLINE citations (average precision +4.9%, recall at k +4.2%, p < 0.001). Conclusions: Our unsupervised query expansion and ranking techniques are more flexible and effective than PubMed's default search engine behavior and the machine learning classifier. Automated citation finding is promising to augment the traditional literature search.
KW - Information retrieval
KW - Medical subject headings
KW - Natural language processing
KW - Practice guideline
KW - PubMed
UR - http://www.scopus.com/inward/record.url?scp=84949503529&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949503529&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2015.09.003
DO - 10.1016/j.jbi.2015.09.003
M3 - Article
C2 - 26363352
AN - SCOPUS:84949503529
SN - 1532-0464
VL - 57
SP - 436
EP - 445
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -