Music/voice separation using the similarity matrix

Zafar Rafii*, Bryan A Pardo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

62 Scopus citations

Abstract

Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they depend on an assumption of periodically repeating patterns. In this work, we generalize the repetition-based source separation approach to handle cases where repetitions also happen intermittently or without a fixed period, thus allowing the processing of music pieces with fast-varying repeating structures and isolated repeating elements. Instead of looking for periodicities, the proposed method uses a similarity matrix to identify the repeating elements. It then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation method, while still being computationally efficient.

Original languageEnglish (US)
Title of host publicationProceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
Pages583-588
Number of pages6
StatePublished - Dec 1 2012
Event13th International Society for Music Information Retrieval Conference, ISMIR 2012 - Porto, Portugal
Duration: Oct 8 2012Oct 12 2012

Other

Other13th International Society for Music Information Retrieval Conference, ISMIR 2012
Country/TerritoryPortugal
CityPorto
Period10/8/1210/12/12

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint

Dive into the research topics of 'Music/voice separation using the similarity matrix'. Together they form a unique fingerprint.

Cite this