Lu, G, Zhang, W and Wang, Z orcid.org/0000-0001-6157-0662 (2020) Optimizing GPU Memory Transactions for Convolution Operations. In: Proceedings of 2020 IEEE International Conference on Cluster Computing. 2020 IEEE International Conference on Cluster Computing (Cluster 2020), 14-17 Sep 2020, Kobe, Japan (Online). IEEE ISBN 978-1-7281-6678-0
Abstract
Convolution computation is a common operation in deep neural networks (DNNs) and is often responsible for performance bottlenecks during training and inferencing. Existing approaches for accelerating convolution operations aim to reduce computational complexity. However, these strategies often increase the memory footprint with extra memory accesses, thereby leaving much room for performance improvement. This paper presents a novel approach to optimize memory access for convolution operations, specifically targeting GPU execution. Our approach leverages two optimization techniques to reduce the number of memory operations for convolution operations performed on the width and height dimensions. For convolution computations on the width dimension, we exploit shuffle instructions to exchange the overlapped columns of the input for reducing the number of memory transactions. For convolution operations on the height dimension, we multiply each overlapped row of the input with multiple rows of a filter to compute multiple output elements to improve the data locality of row elements. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to speedups over the quickest algorithm of cuDNN. We apply our approach to 2D and multi-channel 2D convolutions on an NVIDIA 2080Ti GPU. For 2D convolution, our approach delivers over 2× faster performance than the state-of-the-art image processing libraries. For multi-channel 2D convolutions, we obtain up to 1.3× speedups over the quickest algorithm of cuDNN.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Performance Optimization , Convolution , Memory Optimization , GPUs |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 18 Aug 2020 12:04 |
Last Modified: | 10 Dec 2020 00:53 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/CLUSTER49012.2020.00050 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:164433 |