TY - GEN
T1 - Music/voice separation using the similarity matrix
AU - Rafii, Zafar
AU - Pardo, Bryan A
PY - 2012
Y1 - 2012
N2 - Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they depend on an assumption of periodically repeating patterns. In this work, we generalize the repetition-based source separation approach to handle cases where repetitions also happen intermittently or without a fixed period, thus allowing the processing of music pieces with fast-varying repeating structures and isolated repeating elements. Instead of looking for periodicities, the proposed method uses a similarity matrix to identify the repeating elements. It then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation method, while still being computationally efficient.
AB - Repetition is a fundamental element in generating and perceiving structure in music. Recent work has applied this principle to separate the musical background from the vocal foreground in a mixture, by simply extracting the underlying repeating structure. While existing methods are effective, they depend on an assumption of periodically repeating patterns. In this work, we generalize the repetition-based source separation approach to handle cases where repetitions also happen intermittently or without a fixed period, thus allowing the processing of music pieces with fast-varying repeating structures and isolated repeating elements. Instead of looking for periodicities, the proposed method uses a similarity matrix to identify the repeating elements. It then calculates a repeating spectrogram model using the median and extracts the repeating patterns using a time-frequency masking. Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation method, while still being computationally efficient.
UR - http://www.scopus.com/inward/record.url?scp=84873416007&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84873416007&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84873416007
SN - 9789727521449
T3 - Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
SP - 583
EP - 588
BT - Proceedings of the 13th International Society for Music Information Retrieval Conference, ISMIR 2012
T2 - 13th International Society for Music Information Retrieval Conference, ISMIR 2012
Y2 - 8 October 2012 through 12 October 2012
ER -