Zero-shot Cross-lingual Transfer is Under-specified Optimization

Shijie Wu, Benjamin Van Durme, Mark Dredze

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.

Original languageEnglish (US)
Title of host publicationACL 2022 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages236-248
Number of pages13
ISBN (Electronic)9781955917483
StatePublished - 2022
Event7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022 - Dublin, Ireland
Duration: May 26 2022 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022
Country/TerritoryIreland
CityDublin
Period5/26/22 → …

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Zero-shot Cross-lingual Transfer is Under-specified Optimization'. Together they form a unique fingerprint.

Cite this