TY - GEN
T1 - MUSIC SEPARATION ENHANCEMENT WITH GENERATIVE MODELING
AU - Schaffer, Noah
AU - Cogan, Boaz
AU - Manilow, Ethan
AU - Morrison, Max
AU - Seetharaman, Prem
AU - Pardo, Bryan
N1 - Publisher Copyright:
© N. Schaffer, B. Cogan, E. Manilow, M. Morrison, P. Seetharaman, and B. Pardo.
PY - 2022
Y1 - 2022
N2 - Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.
AB - Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.
UR - http://www.scopus.com/inward/record.url?scp=85163968789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85163968789&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85163968789
T3 - Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
SP - 772
EP - 780
BT - Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
A2 - Rao, Preeti
A2 - Murthy, Hema
A2 - Srinivasamurthy, Ajay
A2 - Bittner, Rachel
A2 - Repetto, Rafael Caro
A2 - Goto, Masataka
A2 - Serra, Xavier
A2 - Miron, Marius
PB - International Society for Music Information Retrieval
T2 - 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
Y2 - 4 December 2022 through 8 December 2022
ER -