Nowcasting sexually transmitted infections in Chicago: Predictive modeling and evaluation study using google trends

Amy Kristen Johnson*, Runa Bhaumik, Irina Tabidze, Supriya D. Mehta

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Background: Sexually transmitted infections (STIs) pose a significant public health challenge in the United States. Traditional surveillance systems are adversely affected by data quality issues, underreporting of cases, and reporting delays, resulting in missed prevention opportunities to respond to trends in disease prevalence. Search engine data can potentially facilitate an efficient and economical enhancement to surveillance reporting systems established for STIs. Objective: We aimed to develop and train a predictive model using reported STI case data from Chicago, Illinois, and to investigate the model’s predictive capacity, timeliness, and ability to target interventions to subpopulations using Google Trends data. Methods: Deidentified STI case data for chlamydia, gonorrhea, and primary and secondary syphilis from 2011-2017 were obtained from the Chicago Department of Public Health. The data set included race/ethnicity, age, and birth sex. Google Correlate was used to identify the top 100 correlated search terms with “STD symptoms,” and an autocrawler was established using Google Health Application Programming Interface to collect the search volume for each term. Elastic net regression was used to evaluate prediction accuracy, and cross-correlation analysis was used to identify timeliness of prediction. Subgroup elastic net regression analysis was performed for race, sex, and age. Results: For gonorrhea and chlamydia, actual and predicted STI values correlated moderately in 2011 (chlamydia: r=0.65; gonorrhea: r=0.72) but correlated highly (chlamydia: r=0.90; gonorrhea: r=0.94) from 2012 to 2017. However, for primary and secondary syphilis, the high correlation was observed only for 2012 (r=0.79), 2013 (r=0.77), 2016 (0.80), and 2017 (r=0.84), with 2011, 2014, and 2015 showing moderate correlations (r=0.55-0.70). Model performance was the most accurate (highest correlation and lowest mean absolute error) for gonorrhea. Subgroup analyses improved model fit across disease and year. Regression models using search terms selected from the cross-correlation analysis improved the prediction accuracy and timeliness across diseases and years. Conclusions: Integrating nowcasting with Google Trends in surveillance activities can potentially enhance the prediction and timeliness of outbreak detection and response as well as target interventions to subpopulations. Future studies should prospectively examine the utility of Google Trends applied to STI surveillance and response.

Original languageEnglish (US)
Article numbere20588
JournalJMIR Public Health and Surveillance
Issue number4
StatePublished - Oct 2020


  • Google Trends
  • Health information technology
  • Infodemiology
  • Infoveillance
  • Sexually transmitted infections
  • Surveillance

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health
  • Medicine(all)
  • Health Informatics


Dive into the research topics of 'Nowcasting sexually transmitted infections in Chicago: Predictive modeling and evaluation study using google trends'. Together they form a unique fingerprint.

Cite this