BERT Busters: Outlier Dimensions That Disrupt Transformers

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Multiple studies have shown that Transformers are remarkably robust to pruning. Contrary to this received wisdom, we demonstrate that pre-trained Transformer encoders are surprisingly fragile to the removal of a very small number of features in the layer outputs ($
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
Number of pages14
Place of PublicationOnline
PublisherAssociation for Computational Linguistics (ACL)
Publication date1 Aug 2021
Pages3392-3405
DOIs
Publication statusPublished - 1 Aug 2021

ID: 285387504