TY - JOUR
T1 - Efficient estimation of children’s language exposure in two bilingual communities
AU - Cychosz, Margaret
AU - Villanueva, Anele
AU - Weisleder, Adriana
N1 - Funding Information:
The authors thank the families who participated in this research program and acknowledge funding from the following sources: National Institute on Deafness and Other Communication Disorders Grants T32DC000046 (M. C.) and R21DC018357 (A. W.), two Oswalt Documenting Endangered Language Grants (M. C.), Academic Pediatric Association Young Investigator Award (A. W.), and International Congress of Infant Studies Summer Fellowship (A. V.). Additional thanks to Jan Edwards for the use of her recording equipment to construct the Quechua–Spanish corpus in Bolivia and to Alan Mendelsohn and the BELLE team for support in creating the Spanish–English corpus in the United States.
Publisher Copyright:
© 2021 American Speech-Language-Hearing Association.
PY - 2021/10
Y1 - 2021/10
N2 - Purpose: The language that children hear early in life is associated with their speech-language outcomes. This line of research relies on naturalistic observations of children’s language input, often captured with daylong audio recordings. However, the large quantity of data that daylong recordings generate requires novel analytical tools to feasibly parse thousands of hours of naturalistic speech. This study outlines a new approach to efficiently process and sample from daylong audio recordings made in two bilingual communities, Spanish–English in the United States and Quechua–Spanish in Bolivia, to derive estimates of children’s language exposure. Method: We employed a general sampling with replacement technique to efficiently estimate two key elements of children’s early language environments: (a) proportion of child-directed speech (CDS) and (b) dual language exposure. Proportions estimated from random sampling of 30-s segments were compared to those from annotations over the entire daylong recording (every other segment), as well as parental report of dual language exposure. Results: Results showed that approximately 49 min from each recording or just 7% of the overall recording was required to reach a stable proportion of CDS and bilingual exposure. In both speech communities, strong correlations were found between bilingual language estimates made using random sampling and all-day annotation techniques. A strong association was additionally found for CDS estimates in the United States, but this was weaker at the Bolivian site, where CDS was less frequent. Dual language estimates from the audio recordings did not correspond well to estimates derived from parental report collected months apart. Conclusions: Daylong recordings offer tremendous insight into children’s daily language experiences, but they will not become widely used in developmental research until data processing and annotation time substantially decrease. We show that annotation based on random sampling is a promising approach to efficiently estimate ambient characteristics from daylong recordings that cannot currently be estimated via automated methods.
AB - Purpose: The language that children hear early in life is associated with their speech-language outcomes. This line of research relies on naturalistic observations of children’s language input, often captured with daylong audio recordings. However, the large quantity of data that daylong recordings generate requires novel analytical tools to feasibly parse thousands of hours of naturalistic speech. This study outlines a new approach to efficiently process and sample from daylong audio recordings made in two bilingual communities, Spanish–English in the United States and Quechua–Spanish in Bolivia, to derive estimates of children’s language exposure. Method: We employed a general sampling with replacement technique to efficiently estimate two key elements of children’s early language environments: (a) proportion of child-directed speech (CDS) and (b) dual language exposure. Proportions estimated from random sampling of 30-s segments were compared to those from annotations over the entire daylong recording (every other segment), as well as parental report of dual language exposure. Results: Results showed that approximately 49 min from each recording or just 7% of the overall recording was required to reach a stable proportion of CDS and bilingual exposure. In both speech communities, strong correlations were found between bilingual language estimates made using random sampling and all-day annotation techniques. A strong association was additionally found for CDS estimates in the United States, but this was weaker at the Bolivian site, where CDS was less frequent. Dual language estimates from the audio recordings did not correspond well to estimates derived from parental report collected months apart. Conclusions: Daylong recordings offer tremendous insight into children’s daily language experiences, but they will not become widely used in developmental research until data processing and annotation time substantially decrease. We show that annotation based on random sampling is a promising approach to efficiently estimate ambient characteristics from daylong recordings that cannot currently be estimated via automated methods.
UR - http://www.scopus.com/inward/record.url?scp=85116763426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116763426&partnerID=8YFLogxK
U2 - 10.1044/2021_JSLHR-20-00755
DO - 10.1044/2021_JSLHR-20-00755
M3 - Article
C2 - 34520232
AN - SCOPUS:85116763426
VL - 64
SP - 3843
EP - 3866
JO - Journal of Speech and Hearing Disorders
JF - Journal of Speech and Hearing Disorders
SN - 1092-4388
IS - 10
ER -