Sound event detection using point-labeled data

Bongjun Kim, Bryan Pardo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Sound Event Detection (SED) in audio scenes is the task that has been studied by an increasing number of researchers. Recent SED systems often use deep learning models. Building these systems typically require a large amount of carefully annotated, strongly labeled data, where the exact time-span of a sound event (e.g. the 'dog bark' starts at 1.2 seconds and ends at 2.0 seconds) in an audio scene (a recording of a city park) is indicated. However, manual labeling of sound events with their time boundaries within a recording is very time-consuming. One way to solve the issue is to collect data with weak labels that only contain the names of sound classes present in the audio file, without time boundary information for events in the file. Therefore, weakly-labeled sound event detection has become popular recently. However, there is still a large performance gap between models built on weakly labeled data and ones built on strongly labeled data, especially for predicting time boundaries of sound events. In this work, we introduce a new type of sound event label, which is easier for people to provide than strong labels. We call them 'point labels'. To create a point label, a user simply listens to the recording and hits the space bar if they hear a sound event ('dog bark'). This is much easier to do than specifying exact time boundaries. In this work, we illustrate methods to train a SED model on point-labeled data. Our results show that a model trained on point labeled audio data significantly outperforms weak models and is comparable to a model trained on strongly labeled data.

Original languageEnglish (US)
Title of host publication2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-5
Number of pages5
ISBN (Electronic)9781728111230
DOIs
StatePublished - Oct 2019
Event2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 - New Paltz, United States
Duration: Oct 20 2019Oct 23 2019

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2019-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
CountryUnited States
CityNew Paltz
Period10/20/1910/23/19

Keywords

  • Deep learning
  • Point labels
  • Sound event detection
  • Weak labels

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Sound event detection using point-labeled data'. Together they form a unique fingerprint.

Cite this