Abstract
In order to determine the points at which meeting discourse changes from one topic to another, probabilistic models were used to approximate the process through which meeting transcripts were produced. Gibbs sampling was used to estimate the values of random variables in the models, including the locations of topic boundaries. This paper shows how discourse features were integrated into the Bayesian model and reports empirical evaluations of the benefit obtained through the inclusion of each feature and of the suitability of alternative models of the placement of topic boundaries. It demonstrates howmultiple cues to segmentation can be combined in a principled way, and empirical tests show a clear improvement over previous work.
Original language | English (US) |
---|---|
Pages (from-to) | 1238-1248 |
Number of pages | 11 |
Journal | IEEE Transactions on Audio, Speech and Language Processing |
Volume | 16 |
Issue number | 7 |
DOIs | |
State | Published - Sep 2008 |
Funding
Manuscript received June 18, 2007; revised April 30, 2008. This work was supported by the CALO project (DARPA Grant NBCH-D-03-0010) and the work of M. Dowman was supported by a Japan Society for the Promotion of Science postdoctoral fellowship. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mark Johnson. M. Dowman is with the Department of General System Studies, University of Tokyo, Tokyo 153-8902, Japan (e-mail: [email protected]). V. Savova and J. B. Tenenbaum are with the Department of Brain and Cognitive, Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: [email protected]; [email protected]). T. L. Griffiths is with the Department of Psychology, University of California at Berkeley, Berkeley, CA 94720 USA (e-mail: [email protected]). K. P. Körding is with Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL 60611 USA (e-mail: [email protected]). M. Purver is with Center for the Study of Language and Information, Stanford University, Stanford CA 94305 USA ( e-mail: [email protected]). Digital Object Identifier 10.1109/TASL.2008.925867
Keywords
- Gibbs sampling
- Hierarchical bayesian models
- Latent dirichlet allocation
- Markov chain monte carlo
- Topical segmentation
ASJC Scopus subject areas
- Acoustics and Ultrasonics
- Electrical and Electronic Engineering