TY - JOUR
T1 - Using Machine Learning of Online Expression to Explain Recovery Trajectories
T2 - Content Analytic Approach to Studying a Substance Use Disorder Forum
AU - Yang, Ellie Fan
AU - Kornfield, Rachel
AU - Liu, Yan
AU - Chih, Ming Yuan
AU - Sarma, Prathusha
AU - Gustafson, David
AU - Curtin, John
AU - Shah, Dhavan
N1 - Funding Information:
This work was supported by the National Institute on Drug Abuse under grant R01-DA034279. All interpretations of these data should be attributed to the authors, who thank the Center for Health Enhancement Systems Studies at UW-Madison for making these data available. YL was supported by Humanities and Social Science Foundation of Ministry of Education of China (19YJC860029) and Shanghai Pujiang Program (2020PJC056). RK was supported by the National Institute of Mental Health (P50MH119029 and K01MH125172).
Publisher Copyright:
©Ellie Fan Yang, Rachel Kornfield, Yan Liu, Ming-Yuan Chih, Prathusha Sarma, David Gustafson, John Curtin, Dhavan Shah.
PY - 2023
Y1 - 2023
N2 - Background: Smartphone-based apps are increasingly used to prevent relapse among those with substance use disorders (SUDs). These systems collect a wealth of data from participants, including the content of messages exchanged in peer-to-peer support forums. How individuals self-disclose and exchange social support in these forums may provide insight into their recovery course, but a manual review of a large corpus of text by human coders is inefficient. Objective: The study sought to evaluate the feasibility of applying supervised machine learning (ML) to perform large-scale content analysis of an online peer-to-peer discussion forum. Machine-coded data were also used to understand how communication styles relate to writers’ substance use and well-being outcomes. Methods: Data were collected from a smartphone app that connects patients with SUDs to online peer support via a discussion forum. Overall, 268 adult patients with SUD diagnoses were recruited from 3 federally qualified health centers in the United States beginning in 2014. Two waves of survey data were collected to measure demographic characteristics and study outcomes: at baseline (before accessing the app) and after 6 months of using the app. Messages were downloaded from the peer-to-peer forum and subjected to manual content analysis. These data were used to train supervised ML algorithms using features extracted from the Linguistic Inquiry and Word Count (LIWC) system to automatically identify the types of expression relevant to peer-to-peer support. Regression analyses examined how each expression type was associated with recovery outcomes. Results: Our manual content analysis identified 7 expression types relevant to the recovery process (emotional support, informational support, negative affect, change talk, insightful disclosure, gratitude, and universality disclosure). Over 6 months of app use, 86.2% (231/268) of participants posted on the app’s support forum. Of these participants, 93.5% (216/231) posted at least 1 message in the content categories of interest, generating 10,503 messages. Supervised ML algorithms were trained on the hand-coded data, achieving F1-scores ranging from 0.57 to 0.85. Regression analyses revealed that a greater proportion of the messages giving emotional support to peers was related to reduced substance use. For self-disclosure, a greater proportion of the messages expressing universality was related to improved quality of life, whereas a greater proportion of the negative affect expressions was negatively related to quality of life and mood. Conclusions: This study highlights a method of natural language processing with potential to provide real-time insights into peer-to-peer communication dynamics. First, we found that our ML approach allowed for large-scale content coding while retaining moderate-to-high levels of accuracy. Second, individuals’ expression styles were associated with recovery outcomes. The expression types of emotional support, universality disclosure, and negative affect were significantly related to recovery outcomes, and attending to these dynamics may be important for appropriate intervention.
AB - Background: Smartphone-based apps are increasingly used to prevent relapse among those with substance use disorders (SUDs). These systems collect a wealth of data from participants, including the content of messages exchanged in peer-to-peer support forums. How individuals self-disclose and exchange social support in these forums may provide insight into their recovery course, but a manual review of a large corpus of text by human coders is inefficient. Objective: The study sought to evaluate the feasibility of applying supervised machine learning (ML) to perform large-scale content analysis of an online peer-to-peer discussion forum. Machine-coded data were also used to understand how communication styles relate to writers’ substance use and well-being outcomes. Methods: Data were collected from a smartphone app that connects patients with SUDs to online peer support via a discussion forum. Overall, 268 adult patients with SUD diagnoses were recruited from 3 federally qualified health centers in the United States beginning in 2014. Two waves of survey data were collected to measure demographic characteristics and study outcomes: at baseline (before accessing the app) and after 6 months of using the app. Messages were downloaded from the peer-to-peer forum and subjected to manual content analysis. These data were used to train supervised ML algorithms using features extracted from the Linguistic Inquiry and Word Count (LIWC) system to automatically identify the types of expression relevant to peer-to-peer support. Regression analyses examined how each expression type was associated with recovery outcomes. Results: Our manual content analysis identified 7 expression types relevant to the recovery process (emotional support, informational support, negative affect, change talk, insightful disclosure, gratitude, and universality disclosure). Over 6 months of app use, 86.2% (231/268) of participants posted on the app’s support forum. Of these participants, 93.5% (216/231) posted at least 1 message in the content categories of interest, generating 10,503 messages. Supervised ML algorithms were trained on the hand-coded data, achieving F1-scores ranging from 0.57 to 0.85. Regression analyses revealed that a greater proportion of the messages giving emotional support to peers was related to reduced substance use. For self-disclosure, a greater proportion of the messages expressing universality was related to improved quality of life, whereas a greater proportion of the negative affect expressions was negatively related to quality of life and mood. Conclusions: This study highlights a method of natural language processing with potential to provide real-time insights into peer-to-peer communication dynamics. First, we found that our ML approach allowed for large-scale content coding while retaining moderate-to-high levels of accuracy. Second, individuals’ expression styles were associated with recovery outcomes. The expression types of emotional support, universality disclosure, and negative affect were significantly related to recovery outcomes, and attending to these dynamics may be important for appropriate intervention.
KW - content analysis
KW - expression effects
KW - mobile phone
KW - online peer support forum
KW - substance use disorder
KW - supervised machine learning
UR - http://www.scopus.com/inward/record.url?scp=85168534947&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168534947&partnerID=8YFLogxK
U2 - 10.2196/45589
DO - 10.2196/45589
M3 - Article
C2 - 37606984
AN - SCOPUS:85168534947
SN - 1439-4456
VL - 25
JO - Journal of medical Internet research
JF - Journal of medical Internet research
M1 - e45589
ER -