TY - GEN
T1 - Everything Is All It Takes
T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
AU - Yarmohammadi, Mahsa
AU - Wu, Shijie
AU - Marone, Marc
AU - Xu, Haoran
AU - Ebner, Seth
AU - Qin, Guanghui
AU - Chen, Yunmo
AU - Guo, Jialiang
AU - Harman, Craig
AU - Murray, Kenton
AU - White, Aaron Steven
AU - Dredze, Mark
AU - Van Durme, Benjamin
N1 - Funding Information:
We thank the anonymous reviewers for their valuable comments. We thank João Sedoc for helpful discussions and Shabnam Behzad for post-submission experiments. This work was supported in part by IARPA BETTER (#2019-19051600005). The views and conclusions contained in this work are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, or endorsements of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. This research made use of the following open-source software: AllenNLP (Gardner et al., 2018), FairSeq (Ott et al., 2019), NumPy (Harris et al., 2020), PyTorch (Paszke et al., 2017), PyTorch lightning (Falcon, 2019), scikit-learn (Pedregosa et al., 2011), and Transformers (Wolf et al., 2020).
Funding Information:
Fei et al. (2020) and Daza and Frank (2020) We thank the anonymous reviewers for their valu-also find improvements when training on a mixture able comments. We thank João Sedoc for help-of gold source language data and projected silver ful discussions and Shabnam Behzad for post-target language data. Ideas from domain adaptation submission experiments. This work was supported can be used to make more effective use of gold and in part by IARPA BETTER (#2019-19051600005). silver data to mitigate the effects of language shift The views and conclusions contained in this work (Xu et al., 2021). are those of the authors and should not be in-Improvements to task-specific models for zero-terpreted as necessarily representing the official shot transfer are orthogonal to our work. For ex-policies, either expressed or implied, or endorse-ample, language-specific information can be incor-ments of ODNI, IARPA, or the U.S. Government. porated using language indicators or embeddings The U.S. Government is authorized to reproduce (Johnson et al., 2017), contextual parameter genera-and distribute reprints for governmental purposes tors (Platanios et al., 2018), or language-specific semantic spaces (Luo et al., 2021). Conversely, adversarial training (Ganin et al., 2016) has been used to discourage models from learning language-specific information (Chen et al., 2018; Keung et al., 2019; Ahmad et al., 2019).
Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of "train on English, run on any language", we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.
AB - Zero-shot cross-lingual information extraction (IE) describes the construction of an IE model for some target language, given existing annotations exclusively in some other language, typically English. While the advance of pretrained multilingual encoders suggests an easy optimism of "train on English, run on any language", we find through a thorough exploration and extension of techniques that a combination of approaches, both new and old, leads to better performance than any one cross-lingual strategy in particular. We explore techniques including data projection and self-training, and how different pretrained encoders impact them. We use English-to-Arabic IE as our initial example, demonstrating strong performance in this setting for event extraction, named entity recognition, part-of-speech tagging, and dependency parsing. We then apply data projection and self-training to three tasks across eight target languages. Because no single set of techniques performs the best across all tasks, we encourage practitioners to explore various configurations of the techniques described in this work when seeking to improve on zero-shot training.
UR - http://www.scopus.com/inward/record.url?scp=85115689468&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115689468&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85115689468
T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 1950
EP - 1967
BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PB - Association for Computational Linguistics (ACL)
Y2 - 7 November 2021 through 11 November 2021
ER -