Abstract
Memory usage is becoming an increasingly pressing bottleneck in the training process of Deep Neural Networks (DNNs), especially when training on Graphics Processing Units (GPUs). Existing solutions for multi-GPU training setups partition the neural network over the GPUs in a way that favors training throughput over memory usage, and thus maximum trainable network size. We propose mCAP, a partitioning solution for pipeline-parallel DNN training that focuses specifically on memory usage. It evenly distributes Deep Learning models over the available resources with respect to per-device peak memory usage. Our partitioning approach uses a novel incremental profiling strategy to extract per-layer memory usage statistics. A model-based predictor uses the profiling data to recommend a partitioning that balances peak memory usage. Our approach is DL-framework agnostic and orthogonal to existing memory optimizations found in large-scale DNN training systems. Our results show that our approach enables training of neural networks that are 1.55 times larger than existing partitioning solutions in terms of the number of parameters.
Original language | English |
---|---|
Title of host publication | Euro-Par 2022: Parallel Processing |
Subtitle of host publication | 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22–26, 2022, Proceedings |
Editors | José Cano, Phil Trinder |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 155-170 |
Number of pages | 16 |
ISBN (Electronic) | 9783031125973 |
ISBN (Print) | 9783031125966 |
DOIs | |
Publication status | Published - 2022 |
Event | 28th International European Conference on Parallel and Distributed Computing, Euro-Par 2022 - Glasgow, United Kingdom Duration: 22 Aug 2022 → 26 Aug 2022 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13440 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 28th International European Conference on Parallel and Distributed Computing, Euro-Par 2022 |
---|---|
Country/Territory | United Kingdom |
City | Glasgow |
Period | 22/08/22 → 26/08/22 |
Bibliographical note
Funding Information:the anonymous reviewers for their valuable feedback. This work is part of the Efficient Deep Learning (EDL) programme (grant number P16-25), financed by the Dutch Research Council (NWO). This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperative. The datasets generated during and/or analysed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.20000960 [4].
Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
Keywords
- Deep Learning
- HPC
- Pipeline Parallelism