Yang, W, Fang, J, Dong, D et al. (2 more authors) (2022) LibShalom: Optimizing Small and Irregular-shaped Matrix Multiplications on ARMv8 Multi-Cores. In: Proceeding of SC21: International Conference for High Performance Computing, Networking, Storage and Analysis. SC21: International Conference for High Performance Computing, Networking, Storage and Analysis, 14-19 Nov 2021, St. Louis, Missouri, USA. IEEE ISBN 978-1-4503-8442-1
Abstract
General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing. While the mainstream linear algebra libraries can deliver high performance on large and regular-shaped GEMM, they are inadequate for optimizing small and irregular-shaped GEMMs, which are commonly seen in new HPC applications. Some of the recent works in this direction have made promising progress on x86 architectures and GPUs but still leave much room for improvement on emerging HPC hardware built upon the ARMv8 architecture. We present Libshalom, an open-source library for optimizing small and irregular-shaped GEMMs, explicitly targeting the ARMv8 architecture. Libshalom builds upon the classical Goto algorithm but tailors it to minimize the expensive memory accessing overhead for data packing and processing small matrices. It uses analytic methods to determine GEMM kernel optimization parameters, enhancing the computation and parallelization efficiency of the GEMM kernels. We evaluate Libshalom by applying it to three ARMv8 multi-core architectures and comparing it against five mainstream linear algebra libraries. Experimental results show that Libshalom can consistently outperform existing solutions across GEMM workloads and hardware architectures.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Matrix Multiplication; Small and Irregular-Shaped; ARMv8 Multi-Core; Performance Optimization |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number Royal Society IEC\NSFC\191465 |
Depositing User: | Symplectic Publications |
Date Deposited: | 03 Sep 2021 09:23 |
Last Modified: | 04 Aug 2023 13:48 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1145/3458817.3476217 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:177559 |