Peng, J., Fang, J., Liu, J. et al. (5 more authors) (2023) Optimizing MPI Collectives on Shared Memory Multi-cores. In: Proceedings of SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis. SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, 12-17 Nov 2023, Denver, USA. ACM ISBN 9798400701092
Abstract
Message Passing Interface (MPI) programs often experience performance slowdowns due to collective communication operations, like broadcasting and reductions. As modern CPUs integrate more processor cores, running multiple MPI processes on shared-memory machines to take advantage of hardware parallelism is becoming increasingly common. In this context, it is crucial to optimize MPI collective communications for shared-memory execution. However, existing MPI collective implementations on shared-memory systems have two primary drawbacks. The first is extensive redundant data movements when performing reduction collectives, and the second is the ineffective use of non-temporal instructions to optimize streamed data processing. To address these limitations, this paper proposes two optimization techniques that minimize data movements and enhance the use of non-temporal instructions. We evaluated our techniques by integrating them into the OpenMPI library and tested their performance using micro-benchmarks and real-world applications running on two multi-core clusters. Experimental results show that our approach significantly outperforms existing techniques, yielding a 1.2--6.4x performance improvement.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This item is protected by copyright. This is an author produced version of a conference paper accepted for publication in Proceedings of SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis , made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | MPI, Collective Communication, Memory Access, Optimization |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 30 Aug 2023 10:16 |
Last Modified: | 24 Nov 2023 14:52 |
Published Version: | https://dl.acm.org/doi/10.1145/3581784.3607074 |
Status: | Published |
Publisher: | ACM |
Identification Number: | 10.1145/3581784.3607074 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:202767 |