Gao, W, Fang, J, Huang, C et al. (2 more authors) (2021) Optimizing Barrier Synchronization on ARMv8 Many-Core Architectures. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER). 2021 IEEE International Conference on Cluster Computing (CLUSTER), 07-10 Sep 2021, Online. IEEE , pp. 542-552. ISBN 978-1-7281-9666-4
Abstract
Synchronization operations are commonly seen in OpenMP programs where a parallel construct often works with an explicit or implicit barrier operation. While OpenMP synchronization has been extensively studied on the traditional x86 CPU architectures, there is little work on understanding OpenMP barrier synchronization operations on ARMv8 high-performance many-cores. This paper presents the first comprehensive performance study on OpenMP barrier implementations on emerging ARMvS-based many-cores. We evaluate seven representative barrier algorithms on three distinct ARMv8 architectures: Phytium 2000+, ThunderX2, and Kunpeng920. We empirically show that the existing synchronization implementations exhibit poor scalability on ARMv8 architectures compared to the x86 counterpart. We then propose various optimization strategies for improving these widely used synchronization algorithms on each platform. We showcase that our optimizations yield 12.6x performance improvement over the GCC implementation and 4.7x improvement over the LLVM implementation, translating to 1.6x improvement over the state-of-the-art best-performing algorithm. We share our experience and practical insights on optimizing OpenMP synchronization operations on emerging ARMv8 multi-core CPU architectures.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Barrier synchronization; ARMv8 many-cores |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 Aug 2021 14:06 |
Last Modified: | 07 Aug 2023 03:22 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/Cluster48925.2021.00044 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:176699 |