Automatic measurement of voice onset time and prevoicing using recurrent neural networks

Yossi Adi, Joseph Keshet, Olga Dmitrieva, Matthew A Goldrick

Research output: Contribution to journalConference article

5 Citations (Scopus)

Abstract

Voice onset time (VOT) is defined as the time difference between the onset of the burst and the onset of voicing. When voicing begins preceding the burst, the stop is called prevoiced, and the VOT is negative. When voicing begins following the burst the VOT is positive. While most of the work on automatic measurement of VOT has focused on positive VOT mostly evident in American English, in many languages the VOT can be negative. We propose an algorithm that estimates if the stop is prevoiced, and measures either positive or negative VOT, respectively. More specifically, the input to the algorithm is a speech segment of an arbitrary length containing a single stop consonant, and the output is the time of the burst onset, the duration of the burst, and the time of the prevoicing onset with a confidence. Manually labeled data is used to train a recurrent neural network that can model the dynamic temporal behavior of the input signal, and outputs the events' onset and duration. Results suggest that the proposed algorithm is superior to the current state-of-the-art both in terms of the VOT measurement and in terms of prevoicing detection.

Original languageEnglish (US)
Pages (from-to)3152-3155
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - Jan 1 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: Sep 8 2016Sep 16 2016

Fingerprint

Recurrent neural networks
Recurrent Neural Networks
Burst
Voice
Voice Onset Time
Time measurement
Output
Onset
Confidence

Keywords

  • Prevoicing
  • Recurrent neural networks
  • Voice onset time

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{6f06487af5dd4e6087c5d12b7ca72b04,
title = "Automatic measurement of voice onset time and prevoicing using recurrent neural networks",
abstract = "Voice onset time (VOT) is defined as the time difference between the onset of the burst and the onset of voicing. When voicing begins preceding the burst, the stop is called prevoiced, and the VOT is negative. When voicing begins following the burst the VOT is positive. While most of the work on automatic measurement of VOT has focused on positive VOT mostly evident in American English, in many languages the VOT can be negative. We propose an algorithm that estimates if the stop is prevoiced, and measures either positive or negative VOT, respectively. More specifically, the input to the algorithm is a speech segment of an arbitrary length containing a single stop consonant, and the output is the time of the burst onset, the duration of the burst, and the time of the prevoicing onset with a confidence. Manually labeled data is used to train a recurrent neural network that can model the dynamic temporal behavior of the input signal, and outputs the events' onset and duration. Results suggest that the proposed algorithm is superior to the current state-of-the-art both in terms of the VOT measurement and in terms of prevoicing detection.",
keywords = "Prevoicing, Recurrent neural networks, Voice onset time",
author = "Yossi Adi and Joseph Keshet and Olga Dmitrieva and Goldrick, {Matthew A}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-893",
language = "English (US)",
volume = "08-12-September-2016",
pages = "3152--3155",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

Automatic measurement of voice onset time and prevoicing using recurrent neural networks. / Adi, Yossi; Keshet, Joseph; Dmitrieva, Olga; Goldrick, Matthew A.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 08-12-September-2016, 01.01.2016, p. 3152-3155.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Automatic measurement of voice onset time and prevoicing using recurrent neural networks

AU - Adi, Yossi

AU - Keshet, Joseph

AU - Dmitrieva, Olga

AU - Goldrick, Matthew A

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Voice onset time (VOT) is defined as the time difference between the onset of the burst and the onset of voicing. When voicing begins preceding the burst, the stop is called prevoiced, and the VOT is negative. When voicing begins following the burst the VOT is positive. While most of the work on automatic measurement of VOT has focused on positive VOT mostly evident in American English, in many languages the VOT can be negative. We propose an algorithm that estimates if the stop is prevoiced, and measures either positive or negative VOT, respectively. More specifically, the input to the algorithm is a speech segment of an arbitrary length containing a single stop consonant, and the output is the time of the burst onset, the duration of the burst, and the time of the prevoicing onset with a confidence. Manually labeled data is used to train a recurrent neural network that can model the dynamic temporal behavior of the input signal, and outputs the events' onset and duration. Results suggest that the proposed algorithm is superior to the current state-of-the-art both in terms of the VOT measurement and in terms of prevoicing detection.

AB - Voice onset time (VOT) is defined as the time difference between the onset of the burst and the onset of voicing. When voicing begins preceding the burst, the stop is called prevoiced, and the VOT is negative. When voicing begins following the burst the VOT is positive. While most of the work on automatic measurement of VOT has focused on positive VOT mostly evident in American English, in many languages the VOT can be negative. We propose an algorithm that estimates if the stop is prevoiced, and measures either positive or negative VOT, respectively. More specifically, the input to the algorithm is a speech segment of an arbitrary length containing a single stop consonant, and the output is the time of the burst onset, the duration of the burst, and the time of the prevoicing onset with a confidence. Manually labeled data is used to train a recurrent neural network that can model the dynamic temporal behavior of the input signal, and outputs the events' onset and duration. Results suggest that the proposed algorithm is superior to the current state-of-the-art both in terms of the VOT measurement and in terms of prevoicing detection.

KW - Prevoicing

KW - Recurrent neural networks

KW - Voice onset time

UR - http://www.scopus.com/inward/record.url?scp=84994253210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994253210&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-893

DO - 10.21437/Interspeech.2016-893

M3 - Conference article

AN - SCOPUS:84994253210

VL - 08-12-September-2016

SP - 3152

EP - 3155

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -