TY - GEN
T1 - Learning sub-word units for open vocabulary speech recognition
AU - Parada, Carolina
AU - Dredze, Mark
AU - Sethy, Abhinav
AU - Rastrow, Ariya
PY - 2011
Y1 - 2011
N2 - Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.
AB - Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.
UR - http://www.scopus.com/inward/record.url?scp=84859091119&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859091119&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84859091119
SN - 9781932432879
T3 - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
SP - 712
EP - 721
BT - ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics
T2 - 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Y2 - 19 June 2011 through 24 June 2011
ER -