TY - GEN
T1 - Speed regularization and optimality in word classing
AU - Zweig, Geoffrey
AU - Makarychev, Konstantin
PY - 2013/10/18
Y1 - 2013/10/18
N2 - Word-classing has been used in language modeling for two distinct purposes: to improve the likelihood of the language model, and to improve the runtime speed. In particular, frequency-based heuristics have been proposed to improve the speed of recurrent neural network language models (RNN-LMs). In this paper, we present a dynamic programming algorithm for determining classes in a way that provably minimizes the runtime of the resulting class-based language models. However, we also find that the speed-based methods degrade the perplexity of the language models by 5-10% over traditional likelihood-based classing. We remedy this via the introduction of a speed-based regularization term in the likelihood objective function. This achieves a runtime close to that of the speed based methods without loss in perplexity performance. We demonstrate these improvements with both an RNN-LM and the Model M exponential language model, for three different tasks involving two different languages.
AB - Word-classing has been used in language modeling for two distinct purposes: to improve the likelihood of the language model, and to improve the runtime speed. In particular, frequency-based heuristics have been proposed to improve the speed of recurrent neural network language models (RNN-LMs). In this paper, we present a dynamic programming algorithm for determining classes in a way that provably minimizes the runtime of the resulting class-based language models. However, we also find that the speed-based methods degrade the perplexity of the language models by 5-10% over traditional likelihood-based classing. We remedy this via the introduction of a speed-based regularization term in the likelihood objective function. This achieves a runtime close to that of the speed based methods without loss in perplexity performance. We demonstrate these improvements with both an RNN-LM and the Model M exponential language model, for three different tasks involving two different languages.
KW - Language Modeling
KW - Model M
KW - Recurrent Neural Network
KW - Word Classes
UR - http://www.scopus.com/inward/record.url?scp=84890477112&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84890477112&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2013.6639271
DO - 10.1109/ICASSP.2013.6639271
M3 - Conference contribution
AN - SCOPUS:84890477112
SN - 9781479903566
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8237
EP - 8241
BT - 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Proceedings
T2 - 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013
Y2 - 26 May 2013 through 31 May 2013
ER -