Abstract
Sound Event Detection (SED) in audio scenes is the task that has been studied by an increasing number of researchers. Recent SED systems often use deep learning models. Building these systems typically require a large amount of carefully annotated, strongly labeled data, where the exact time-span of a sound event (e.g. the 'dog bark' starts at 1.2 seconds and ends at 2.0 seconds) in an audio scene (a recording of a city park) is indicated. However, manual labeling of sound events with their time boundaries within a recording is very time-consuming. One way to solve the issue is to collect data with weak labels that only contain the names of sound classes present in the audio file, without time boundary information for events in the file. Therefore, weakly-labeled sound event detection has become popular recently. However, there is still a large performance gap between models built on weakly labeled data and ones built on strongly labeled data, especially for predicting time boundaries of sound events. In this work, we introduce a new type of sound event label, which is easier for people to provide than strong labels. We call them 'point labels'. To create a point label, a user simply listens to the recording and hits the space bar if they hear a sound event ('dog bark'). This is much easier to do than specifying exact time boundaries. In this work, we illustrate methods to train a SED model on point-labeled data. Our results show that a model trained on point labeled audio data significantly outperforms weak models and is comparable to a model trained on strongly labeled data.
Original language | English (US) |
---|---|
Title of host publication | 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1-5 |
Number of pages | 5 |
ISBN (Electronic) | 9781728111230 |
DOIs | |
State | Published - Oct 2019 |
Event | 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 - New Paltz, United States Duration: Oct 20 2019 → Oct 23 2019 |
Publication series
Name | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
---|---|
Volume | 2019-October |
ISSN (Print) | 1931-1168 |
ISSN (Electronic) | 1947-1629 |
Conference
Conference | 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 |
---|---|
Country/Territory | United States |
City | New Paltz |
Period | 10/20/19 → 10/23/19 |
Funding
This work was funded, in part, by NSF Award Number:1617497
Keywords
- Deep learning
- Point labels
- Sound event detection
- Weak labels
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Computer Science Applications