Abstract
The multi-resolution common fate transform (MCFT) is an audio signal representation useful for representing mixtures of multiple audio signals that overlap in both time and frequency. The MCFT combines the invertibility of a state-of-theart representation, the common fate transform (CFT), and the multi-resolution property of the cortical stage output of an auditory model. Since the MCFT is computed based on a fully invertible complex time-frequency representation, separation of audio sources with high time-frequency overlap may be performed directly in the MCFT domain, where there is less overlap between sources than in the time-frequency domain. The MCFT circumvents the resolution issue of the CFT by using a multi-resolution two-dimensional (2D) filter bank instead of fixed-size 2D windows. This enables higher quality separation without the need to handtune the window size to the specific case. In this work, we describe theMCFT, discuss the properties of the MCFT with the aid of illustrative examples, and provide definitions and objective measures for two desirable representation properties: separability of source signals and clusterability of components of each signal. The utility of the MCFT for source separation is illustrated by performing ideal masking on a comprehensive dataset of audio mixtures of musical tones played in unison, including audio samples from a wide pitch range and a variety of instruments/playing techniques. Results show that the ideal masks made in the MCFT domain yield better separability than those made in commonly used time- frequency signal representations as well as the CFT. The use of the MCFT also results in more reliable clusterability than the CFT in most cases.
Original language | English (US) |
---|---|
Article number | 8516327 |
Pages (from-to) | 342-354 |
Number of pages | 13 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 27 |
Issue number | 2 |
DOIs | |
State | Published - Feb 2019 |
Funding
Manuscript received May 3, 2018; revised August 28, 2018, September 20, 2018, and October 22, 2018; accepted October 23, 2018. Date of publication October 31, 2018; date of current version November 29, 2018. This work was supported by United States National Science Foundation award number 1420971. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Roland Badeau. (Corresponding author: Fatemeh Pishdadian.) The authors are with the Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208-3118 USA (e-mail:, [email protected]; [email protected]). Digital Object Identifier 10.1109/TASLP.2018.2878616
Keywords
- Audio source separation
- Clusterability
- Multi-resolution common fate transform
- Separability
ASJC Scopus subject areas
- Computer Science (miscellaneous)
- Acoustics and Ultrasonics
- Computational Mathematics
- Electrical and Electronic Engineering