Lu, G, Zhang, W and Wang, Z orcid.org/0000-0001-6157-0662 (2021) Optimizing Depthwise Separable Convolution Operations on GPUs. IEEE Transactions on Parallel and Distributed Systems. p. 1. ISSN 1045-9219
Abstract
The depthwise separable convolution is widely used to reduce the computation overhead of multi-channel 2D convolutions. Existing implementations of depthwise separable convolutions target accelerating model training with large batch size with a large number of samples to be processed at once. Such approaches are inadequate for small-batch-sized model training and the typical scenario of model inference where the model takes in a few samples at once. This paper aims to bridge the gap of optimizing depthwise separable convolutions by targeting the GPU architecture. We achieve this by designing two novel algorithms to improve the column and row reuse of convolution operations to reduce the number of memory operations. Our approach employs a dynamic tile size scheme to adaptively distribute the computational data across GPU threads to improve the GPU utilization and to hide the memory access latency. We apply our approach on two GPU platforms: NVIDIA RTX 2080Ti and NVIDIA Jetson AGX Xavier GPUs, and two data types: 32-bit floating point (FP32) and 8-bit integer (INT8). We compared our approach against cuDNN that is heavily tuned for the NVIDIA GPU architecture. Experimental results show that our approach delivers over 2 (up to 3) performance improvement over cuDNN.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Performance Optimization, Convolution, Depthwise, Pointwise, Memory Optimization, GPU Utilization |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 07 Jun 2021 09:43 |
Last Modified: | 07 Jun 2021 09:45 |
Status: | Published online |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Identification Number: | 10.1109/tpds.2021.3084813 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:174797 |