LAUGHING HYENA DISTILLERY Extracting Compact Recurrences From Convolutions

Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

2 Downloads (Pure)

Abstract

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads - naively requiring a full pass (or caching of activations) over the input sequence for each generated token - similarly to attention-based models. In this paper, we seek to enable O(1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pretraining quality and reduce the number of filters to be distilled. The resulting model achieves 10× higher throughput than Transformers and 1.5× higher than Hyena at 1.3B parameters, without any loss in quality after distillation.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 36 (NeurIPS 2023)
Subtitle of host publication[Proceedings]
EditorsA. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeurIPS
Pages1-45
Number of pages45
ISBN (Electronic)9781713899921
DOIs
Publication statusPublished - 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, United States
Duration: 10 Dec 202316 Dec 2023

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23

Bibliographical note

Publisher Copyright:
© 2023 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'LAUGHING HYENA DISTILLERY Extracting Compact Recurrences From Convolutions'. Together they form a unique fingerprint.

Cite this