TY - GEN
T1 - MIXCE
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
AU - Zhang, Shiyue
AU - Wu, Shijie
AU - Irsoy, Ozan
AU - Lu, Steven
AU - Bansal, Mohit
AU - Dredze, Mark
AU - Rosenberg, David
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Qθ relative to the data distribution P -that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may “over-generalize”, in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Qθ, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MIXCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies.
AB - Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Qθ relative to the data distribution P -that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may “over-generalize”, in the sense that they produce non-human-like text. Moreover, we believe that reverse cross-entropy, i.e., the cross-entropy of P relative to Qθ, is a better reflection of how a human would evaluate text generated by a model. Hence, we propose learning with MIXCE, an objective that mixes the forward and reverse cross-entropies. We evaluate models trained with this objective on synthetic data settings (where P is known) and real data, and show that the resulting models yield better generated text without complex decoding strategies.
UR - http://www.scopus.com/inward/record.url?scp=85174389304&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174389304&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85174389304
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 9027
EP - 9050
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -