Wang, F, Zhang, W, Lai, S et al. (2 more authors) (2021) Dynamic GPU Energy Optimization for Machine Learning Training Workloads. IEEE Transactions on Parallel and Distributed Systems. p. 1. ISSN 1045-9219
Abstract
GPUs are widely used to accelerate the training of machine learning workloads. As the machine learning models become increasingly larger, they require a longer time to train, which in turn leads to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing a set of novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the change of training iteration and only reprofile the performance counter when an iteration shift is detected. Then we use multi-objective models, based on the gradient boosting method, and a local search algorithm, to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with an average execution time increase of 5.1%.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Dynamic energy optimization , online application iteration detection , multi-objective machine learning , GPU |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 20 Jan 2022 15:18 |
Last Modified: | 20 Jan 2022 15:18 |
Status: | Published online |
Publisher: | IEEE |
Identification Number: | 10.1109/tpds.2021.3137867 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:182579 |