REU for CHS:Small: Robust Interactive Audio Source Separation

Project: Research project

Project Details


Computational auditory scene analysis (CASA) is the study of how a computational system can
organize and understand sound. A fundamental problem in Computational Audio Scene Analysis
is that of audio source separation. This is the process of extracting elements of interest
(like individual voices) from an audio scene (like a cocktail party).
While many algorithms have been proposed to separate audio scenes into individual sources,
these methods are brittle and difficult to use. Because of this, potential users have not
broadly adopted the technology. Audio source separation methods are brittle because each algorithm
relies on a single cue to separate sources. When the cue is not reliable, the method fails.
Methods are difficult to use because algorithms cannot predict which audio scenes they are
likely to work on. Therefore, the user does not know which method to apply in any given case.
They are also difficult to use because their control parameters are hard to understand for
those not expert in signal processing.
The PI proposes to address algorithm brittleness by researching how to integrate multiple
source separation algorithms into a single framework. They will also research how to build
interfaces that let the user easily and interactively define what they wish to separate from
the audio scene. The final topic of research focuses on methods for systems to automatically
learn evaluation measures for audio and the algorithm applied to audio, so that a system could
provide the user guidance on tool selection and parameter settings.
The expected outcomes of this research are: (1) Audio source separation algorithms that adaptively
combine multiple cues to robustly separate sounds in cases where single-cue approaches fail;
(2) Interfaces that let the user guide the separation process towards a goal, without having
to understand the complex internals of the algorithms; (3) Methods to automatically learn
evaluation measures from past user interactions so systems can suggest approaches and settings
likely to work on the current interaction; and (4) An open-source audio source separation
tool that embodies these outcomes
Effective start/end date10/1/149/30/18


  • National Science Foundation (IIS-1420971)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.