A state space model for online polyphonic audio-score alignment

Zhiyao Duan*, Bryan A Pardo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Scopus citations

Abstract

We present a novel online audio-score alignment approach for multi-instrument polyphonic music. This approach uses a 2-dimensional state vector to model the underlying score position and tempo of each time frame of the audio performance. The process model is defined by dynamic equations to transition between states. Two representations of the observed audio frame are proposed, resulting in two observation models: a multi-pitch-based and a chroma-based. Particle filtering is used to infer the hidden states from observations. Experiments on 150 music pieces with polyphony from one to four show the proposed approach outperforms an existing offline global string alignment-based score alignment approach. Results also show that the multi-pitch-based observation model works better than the chroma-based one.

Original languageEnglish (US)
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages197-200
Number of pages4
DOIs
StatePublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: May 22 2011May 27 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Other

Other36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/TerritoryCzech Republic
CityPrague
Period5/22/115/27/11

Keywords

  • Score following
  • audio-score alignment
  • hidden Markov model
  • online algorithm
  • realtime

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A state space model for online polyphonic audio-score alignment'. Together they form a unique fingerprint.

Cite this