Participant Support Subproject for CHS:Small: Robust Interactive Audio Source Separation

Project: Research project

Project Details


Computational auditory scene analysis (CASA) is the study of how a computational system can organize and understand sound. A fundamental problem in Computational Audio Scene Analysis is that of audio source separation. This is the process of extracting elements of interest (like individual voices) from an audio scene (like a cocktail party). While many algorithms have been proposed to separate audio scenes into individual sources, these methods are brittle and difficult to use. Because of this, potential users have not broadly adopted the technology. Audio source separation methods are brittle because each algorithm relies on a single cue to separate sources. When the cue is not reliable, the method fails. Methods are difficult to use because algorithms cannot predict which audio scenes they are likely to work on. Therefore, the user does not know which method to apply in any given case. They are also difficult to use because their control parameters are hard to understand for those not expert in signal processing. The PI proposes to address algorithm brittleness by researching how to integrate multiple source separation algorithms into a single framework. They will also research how to build interfaces that let the user easily and interactively define what they wish to separate from the audio scene. The final topic of research focuses on methods for systems to automatically learn evaluation measures for audio and the algorithm applied to audio, so that a system could provide the user guidance on tool selection and parameter settings. The expected outcomes of this research are: (1) Audio source separation algorithms that adaptively combine multiple cues to robustly separate sounds in cases where single-cue approaches fail; (2) Interfaces that let the user guide the separation process towards a goal, without having to understand the complex internals of the algorithms; (3) Methods to automatically learn evaluation measures from past user interactions so systems can suggest approaches and settings likely to work on the current interaction; and (4) An open-source audio source separation tool that embodies these outcomes
Effective start/end date10/1/149/30/18


  • National Science Foundation (IIS-1420971)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.