Wang, P., Yang, W., Fang, J. et al. (5 more authors) (2023) Optimizing Direct Convolutions on ARM Multi-Cores. In: Proceedings of SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis. SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, 12-17 Nov 2023, Denver, USA. ACM ISBN 9798400701092
Abstract
Convolution kernels are widely seen in deep learning workloads and are often responsible for performance bottlenecks. Recent research has demonstrated that a direct convolution approach can outperform the traditional convolution implementation based on tensor-to-matrix conversions. However, existing approaches for direct convolution still have room for performance improvement. We present nDirect, a new direct convolution approach that targets ARM-based multi-core CPUs commonly found in smartphones and HPC systems. nDirect is designed to be compatible with the data layout formats used by mainstream deep learning frameworks but offers new optimizations for the computational kernel, data packing, and parallelization. We evaluate nDirect by applying it to representative convolution kernels and demonstrating its performance on four distinct ARM multi-core CPU platforms. We compare nDirect against state-of-the-art convolution optimization techniques. Experimental results show that nDirect gives the best overall performance across evaluation scenarios and platforms.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This item is protected by copyright. This is an author produced version of a conference paper accepted for publication in Proceedings of SC23 The International Conference for High Performance Computing, Networking, Storage, and Analysis, made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Convolution, Direct Algorithm, Neural networks, ARMv8 MultiCore, Performance Optimization |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number Royal Society IEC\NSFC\191465 |
Depositing User: | Symplectic Publications |
Date Deposited: | 30 Aug 2023 10:08 |
Last Modified: | 24 Nov 2023 15:12 |
Published Version: | https://dl.acm.org/doi/10.1145/3581784.3607107 |
Status: | Published |
Publisher: | ACM |
Identification Number: | 10.1145/3581784.3607107 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:202768 |