Multi-Task Transfer Matters During Instruction-Tuning

David Mueller, Mark Dredze, Nicholas Andrews

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Instruction-tuning trains a language model on hundreds of tasks jointly to improve a model's ability to learn in-context, either from task descriptions, task samples, or both; however, the mechanisms that drive in-context learning are poorly understood and, as a result, the role of instruction-tuning on in-context generalization is poorly understood as well. In this work, we study the impact of instruction-tuning on multitask transfer: how well a model's parameters adapt to an unseen task via fine-tuning. We find that instruction-tuning negatively impacts a model's transfer to unseen tasks, and that model transfer and in-context generalization are highly correlated, suggesting that this catastrophic forgetting may impact in-context learning. We study methods to improve model transfer, finding that multi-task training-how well the training tasks are optimized-can significantly impact ICL generalization; additionally, we find that continual training on unsupervised pre-training data can mitigate forgetting and improve ICL generalization as well. Finally, we demonstrate that, early into training, the impact of instruction-tuning on model transfer to tasks impacts in-context generalization on that task. Overall, we provide significant evidence that multi-task transfer is deeply connected to a model's ability to learn a task in-context.

Original languageEnglish (US)
Title of host publication62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages14880-14891
Number of pages12
ISBN (Electronic)9798891760998
StatePublished - 2024
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: Aug 11 2024Aug 16 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period8/11/248/16/24

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Multi-Task Transfer Matters During Instruction-Tuning'. Together they form a unique fingerprint.

Cite this