TY - JOUR
T1 - Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions
AU - Duan, Zhiyao
AU - Pardo, Bryan A
AU - Zhang, Changshui
N1 - Funding Information:
Manuscript received May 26, 2009; revised January 07, 2010. Date of publication February 02, 2010; date of current version September 08, 2010. This work was supported in part by the U.S. National Science Foundation under Grant IIS-0643752, in part by a China 973 Program (2009CB320602), and in part by the China National Science Foundation under Grant 60721003. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Sylvain Marchand.
PY - 2010
Y1 - 2010
N2 - This paper presents a maximum-likelihood approach to multiple fundamental frequency (F0) estimation for a mixture of harmonic sound sources, where the power spectrum of a time frame is the observation and the F0s are the parameters to be estimated. When defining the likelihood model, the proposed method models both spectral peaks and non-peak regions (frequencies further than a musical quarter tone from all observed peaks). It is shown that the peak likelihood and the non-peak region likelihood act as a complementary pair. The former helps find F0s that have harmonics that explain peaks, while the latter helps avoid F0s that have harmonics in non-peak regions. Parameters of these models are learned from monophonic and polyphonic training data. This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation. It also proposes a polyphony estimation method to terminate the iterative process. Finally, this paper proposes a postprocessing method to refine polyphony and F0 estimates using neighboring frames. This paper also analyzes the relative contributions of different components of the proposed method. It is shown that the refinement component eliminates many inconsistent estimation errors. Evaluations are done on ten recorded four-part J. S. Bach chorales. Results show that the proposed method shows superior F0 estimation and polyphony estimation compared to two state-ofthe- art algorithms.
AB - This paper presents a maximum-likelihood approach to multiple fundamental frequency (F0) estimation for a mixture of harmonic sound sources, where the power spectrum of a time frame is the observation and the F0s are the parameters to be estimated. When defining the likelihood model, the proposed method models both spectral peaks and non-peak regions (frequencies further than a musical quarter tone from all observed peaks). It is shown that the peak likelihood and the non-peak region likelihood act as a complementary pair. The former helps find F0s that have harmonics that explain peaks, while the latter helps avoid F0s that have harmonics in non-peak regions. Parameters of these models are learned from monophonic and polyphonic training data. This paper proposes an iterative greedy search strategy to estimate F0s one by one, to avoid the combinatorial problem of concurrent F0 estimation. It also proposes a polyphony estimation method to terminate the iterative process. Finally, this paper proposes a postprocessing method to refine polyphony and F0 estimates using neighboring frames. This paper also analyzes the relative contributions of different components of the proposed method. It is shown that the refinement component eliminates many inconsistent estimation errors. Evaluations are done on ten recorded four-part J. S. Bach chorales. Results show that the proposed method shows superior F0 estimation and polyphony estimation compared to two state-ofthe- art algorithms.
KW - Fundamental frequency
KW - Maximum likelihood
KW - Pitch estimation
KW - Spectral peaks
UR - http://www.scopus.com/inward/record.url?scp=77956540787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956540787&partnerID=8YFLogxK
U2 - 10.1109/TASL.2010.2042119
DO - 10.1109/TASL.2010.2042119
M3 - Article
AN - SCOPUS:77956540787
SN - 1558-7916
VL - 18
SP - 2121
EP - 2133
JO - IEEE Transactions on Audio, Speech and Language Processing
JF - IEEE Transactions on Audio, Speech and Language Processing
IS - 8
ER -