Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Jiahuan Pei, Wang Cheng, Gÿorgy Szarvas

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the selfattention approximation by sampling from a Gumbel distribution is upper bounded. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and outof- domain (OOD) datasets. The experimental results demonstrate that our approach: (1) achieves the best predictive performance and uncertainty trade-off among compared methods; (2) exhibits very competitive (in most cases, improved) predictive performance on ID datasets; (3) is on par with Monte Carlo dropout and ensemble methods in uncertainty estimation on OOD datasets.
Original languageEnglish
Title of host publicationProceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
PublisherAssociation for the Advancement of Artificial Intelligence
Pages11147-11155
ISBN (Electronic)1577358767, 9781577358763
Publication statusPublished - 30 Jun 2022
Externally publishedYes
Event36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online
Duration: 22 Feb 20221 Mar 2022

Conference

Conference36th AAAI Conference on Artificial Intelligence, AAAI 2022
CityVirtual, Online
Period22/02/221/03/22

Fingerprint

Dive into the research topics of 'Transformer Uncertainty Estimation with Hierarchical Stochastic Attention'. Together they form a unique fingerprint.

Cite this