TY - GEN
T1 - Multi-Task Transfer Matters During Instruction-Tuning
AU - Mueller, David
AU - Dredze, Mark
AU - Andrews, Nicholas
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Instruction-tuning trains a language model on hundreds of tasks jointly to improve a model's ability to learn in-context, either from task descriptions, task samples, or both; however, the mechanisms that drive in-context learning are poorly understood and, as a result, the role of instruction-tuning on in-context generalization is poorly understood as well. In this work, we study the impact of instruction-tuning on multitask transfer: how well a model's parameters adapt to an unseen task via fine-tuning. We find that instruction-tuning negatively impacts a model's transfer to unseen tasks, and that model transfer and in-context generalization are highly correlated, suggesting that this catastrophic forgetting may impact in-context learning. We study methods to improve model transfer, finding that multi-task training-how well the training tasks are optimized-can significantly impact ICL generalization; additionally, we find that continual training on unsupervised pre-training data can mitigate forgetting and improve ICL generalization as well. Finally, we demonstrate that, early into training, the impact of instruction-tuning on model transfer to tasks impacts in-context generalization on that task. Overall, we provide significant evidence that multi-task transfer is deeply connected to a model's ability to learn a task in-context.
AB - Instruction-tuning trains a language model on hundreds of tasks jointly to improve a model's ability to learn in-context, either from task descriptions, task samples, or both; however, the mechanisms that drive in-context learning are poorly understood and, as a result, the role of instruction-tuning on in-context generalization is poorly understood as well. In this work, we study the impact of instruction-tuning on multitask transfer: how well a model's parameters adapt to an unseen task via fine-tuning. We find that instruction-tuning negatively impacts a model's transfer to unseen tasks, and that model transfer and in-context generalization are highly correlated, suggesting that this catastrophic forgetting may impact in-context learning. We study methods to improve model transfer, finding that multi-task training-how well the training tasks are optimized-can significantly impact ICL generalization; additionally, we find that continual training on unsupervised pre-training data can mitigate forgetting and improve ICL generalization as well. Finally, we demonstrate that, early into training, the impact of instruction-tuning on model transfer to tasks impacts in-context generalization on that task. Overall, we provide significant evidence that multi-task transfer is deeply connected to a model's ability to learn a task in-context.
UR - http://www.scopus.com/inward/record.url?scp=85205323259&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205323259&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85205323259
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 14880
EP - 14891
BT - 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -