Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.
Original languageEnglish
Title of host publicationProceedings of the Second Workshop on Insights from Negative Results in NLP
Number of pages11
Place of PublicationOnline and Punta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics (ACL)
Publication date1 Nov 2021
Pages125-135
Publication statusPublished - 1 Nov 2021

    Research areas

  • t/generalization, task/NLI

ID: 285387385