TY - GEN
T1 - Adapting n-gram maximum entropy language models with conditional entropy regularization
AU - Rastrow, Ariya
AU - Dredze, Mark
AU - Khudanpur, Sanjeev
PY - 2011
Y1 - 2011
N2 - Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.
AB - Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.
UR - http://www.scopus.com/inward/record.url?scp=84858988783&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84858988783&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2011.6163934
DO - 10.1109/ASRU.2011.6163934
M3 - Conference contribution
AN - SCOPUS:84858988783
SN - 9781467303675
T3 - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
SP - 220
EP - 225
BT - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
T2 - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011
Y2 - 11 December 2011 through 15 December 2011
ER -