The Impact of Knowledge Distillation on the Energy Consumption and Runtime Efficiency of NLP Models

Ye Yuan, Jiacheng Shi, Zongyao Zhang, Kaiwei Chen, Jingzhi Zhang, Vincenzo Stoico, Ivano Malavolta*

*Corresponding author for this work

Research output: Chapter in Book / Report / Conference proceedingConference contributionAcademicpeer-review

Abstract

Context. While models like BERT and GPT are powerful, they require substantial resources. Knowledge distillation can be employed as a technique to enhance their efficiency. Yet, we lack a clear understanding on their performance and energy consumption. This uncertainty is a major concern, especially in practical applications, where these models could strain resources and limit accessibility for developers with limited means. Our drive also comes from the pressing need for environmentally-friendly and sustainable applications in light of growing environmental worries. To address this, it is crucial to accurately measure their energy consumption. Goal. This study aims to determine how Knowledge Distillation affects the energy consumption and performance of NLP models. Method. We benchmark BERT, Distilled-BERT, GPT-2, and Distilled-GPT-2 using three different tasks from 3 different categories selected from a third-party dataset. The energy consumption, CPU utilization, memory utilization, and inference time of the considered NLP models are measured and statistically analyzed.Results. We observed notable differences between the original and the distilled version of the measured NLP models. Distilled versions tend to consume less energy, while distilled GPT-2 uses less CPU. Conclusion. The results of this study highlight the critical impact of model choice on performance and energy consumption metrics. Future research should consider a wider range of distilled models, diverse benchmarks, and deployment environments, as well as explore the ecological footprint of these models, particularly in the context of environmental sustainability.

Original languageEnglish
Title of host publicationCAIN '24
Subtitle of host publicationProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI
PublisherAssociation for Computing Machinery, Inc
Pages129-133
Number of pages5
ISBN (Electronic)9798400705915
DOIs
Publication statusPublished - 2024
Event3rd International Conference on AI Engineering, CAIN 2024, co-located with the 46th International Conference on Software Engineering, ICSE 2024 - Lisbon, Portugal
Duration: 14 Apr 202415 Apr 2024

Conference

Conference3rd International Conference on AI Engineering, CAIN 2024, co-located with the 46th International Conference on Software Engineering, ICSE 2024
Country/TerritoryPortugal
CityLisbon
Period14/04/2415/04/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

Funding

FundersFunder number
Rijksdienst voor Ondernemend Nederland
Horizon 2020 Framework Programme
Horizon 2020
H2020 Marie Skłodowska-Curie Actions871342
European Cooperation in Science and TechnologyCA19135

    Fingerprint

    Dive into the research topics of 'The Impact of Knowledge Distillation on the Energy Consumption and Runtime Efficiency of NLP Models'. Together they form a unique fingerprint.

    Cite this