Blind Estimation of the Speech Transmission Index for Speech Quality Prediction

Prem Seetharaman, Gautham J. Mysore, Paris Smaragdis, Bryan A Pardo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The speech transmission index (STI) of a listening position within a given room indicates the quality and intelligibility of speech uttered in that room. The measure is very reliable for predicting speech intelligibility in many room conditions but requires an STI measurement of the impulse response for the room. We present a method for blindly estimating the STI without measuring or modeling the impulse response of the room using deep convolutional neural networks. Our model is trained entirely using simulated room impulse responses combined with clean speech examples from the DAPS dataset [1] and works directly on PCM audio. Our experiments show that our method predicts true STI with a high degree of accuracy-an average error of under 4%. It can also distinguish between different STI conditions to a level of granularity that is comparable to humans.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages591-595
Number of pages5
ISBN (Print)9781538646588
DOIs
StatePublished - Sep 10 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: Apr 15 2018Apr 20 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period4/15/184/20/18

Fingerprint

Speech transmission
Impulse response
Speech intelligibility
Pulse code modulation
Neural networks
Experiments

Keywords

  • Speech enhancement
  • Speech quality
  • Speech transmission index

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Seetharaman, P., Mysore, G. J., Smaragdis, P., & Pardo, B. A. (2018). Blind Estimation of the Speech Transmission Index for Speech Quality Prediction. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 591-595). [8461827] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461827
Seetharaman, Prem ; Mysore, Gautham J. ; Smaragdis, Paris ; Pardo, Bryan A. / Blind Estimation of the Speech Transmission Index for Speech Quality Prediction. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 591-595 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{e5fe7377606d46b2b2effc59865caeba,
title = "Blind Estimation of the Speech Transmission Index for Speech Quality Prediction",
abstract = "The speech transmission index (STI) of a listening position within a given room indicates the quality and intelligibility of speech uttered in that room. The measure is very reliable for predicting speech intelligibility in many room conditions but requires an STI measurement of the impulse response for the room. We present a method for blindly estimating the STI without measuring or modeling the impulse response of the room using deep convolutional neural networks. Our model is trained entirely using simulated room impulse responses combined with clean speech examples from the DAPS dataset [1] and works directly on PCM audio. Our experiments show that our method predicts true STI with a high degree of accuracy-an average error of under 4{\%}. It can also distinguish between different STI conditions to a level of granularity that is comparable to humans.",
keywords = "Speech enhancement, Speech quality, Speech transmission index",
author = "Prem Seetharaman and Mysore, {Gautham J.} and Paris Smaragdis and Pardo, {Bryan A}",
year = "2018",
month = "9",
day = "10",
doi = "10.1109/ICASSP.2018.8461827",
language = "English (US)",
isbn = "9781538646588",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "591--595",
booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",
address = "United States",

}

Seetharaman, P, Mysore, GJ, Smaragdis, P & Pardo, BA 2018, Blind Estimation of the Speech Transmission Index for Speech Quality Prediction. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings., 8461827, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2018-April, Institute of Electrical and Electronics Engineers Inc., pp. 591-595, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 4/15/18. https://doi.org/10.1109/ICASSP.2018.8461827

Blind Estimation of the Speech Transmission Index for Speech Quality Prediction. / Seetharaman, Prem; Mysore, Gautham J.; Smaragdis, Paris; Pardo, Bryan A.

2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 591-595 8461827 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Blind Estimation of the Speech Transmission Index for Speech Quality Prediction

AU - Seetharaman, Prem

AU - Mysore, Gautham J.

AU - Smaragdis, Paris

AU - Pardo, Bryan A

PY - 2018/9/10

Y1 - 2018/9/10

N2 - The speech transmission index (STI) of a listening position within a given room indicates the quality and intelligibility of speech uttered in that room. The measure is very reliable for predicting speech intelligibility in many room conditions but requires an STI measurement of the impulse response for the room. We present a method for blindly estimating the STI without measuring or modeling the impulse response of the room using deep convolutional neural networks. Our model is trained entirely using simulated room impulse responses combined with clean speech examples from the DAPS dataset [1] and works directly on PCM audio. Our experiments show that our method predicts true STI with a high degree of accuracy-an average error of under 4%. It can also distinguish between different STI conditions to a level of granularity that is comparable to humans.

AB - The speech transmission index (STI) of a listening position within a given room indicates the quality and intelligibility of speech uttered in that room. The measure is very reliable for predicting speech intelligibility in many room conditions but requires an STI measurement of the impulse response for the room. We present a method for blindly estimating the STI without measuring or modeling the impulse response of the room using deep convolutional neural networks. Our model is trained entirely using simulated room impulse responses combined with clean speech examples from the DAPS dataset [1] and works directly on PCM audio. Our experiments show that our method predicts true STI with a high degree of accuracy-an average error of under 4%. It can also distinguish between different STI conditions to a level of granularity that is comparable to humans.

KW - Speech enhancement

KW - Speech quality

KW - Speech transmission index

UR - http://www.scopus.com/inward/record.url?scp=85054235329&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054235329&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2018.8461827

DO - 10.1109/ICASSP.2018.8461827

M3 - Conference contribution

SN - 9781538646588

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 591

EP - 595

BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Seetharaman P, Mysore GJ, Smaragdis P, Pardo BA. Blind Estimation of the Speech Transmission Index for Speech Quality Prediction. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2018. p. 591-595. 8461827. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2018.8461827