Zheng, W, Fang, J, Juan, C et al. (9 more authors) (2019) Auto-Tuning MPI Collective Operations on Large-Scale Parallel Systems. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 10-12 Aug 2019, Zhangjiajie, China. IEEE , pp. 670-677. ISBN 978-1-7281-2058-4
Abstract
MPI libraries are widely used in applications of high performance computing. Yet, effective tuning of MPI collectives on large parallel systems is an outstanding challenge. This process often follows a trial-and-error approach and requires expert insights into the subtle interactions between software and the underlying hardware. This paper presents an empirical approach to choose and switch MPI communication algorithms at runtime to optimize the application performance. We achieve this by first modeling offline, through microbenchmarks, to find how the runtime parameters with different message sizes affect the choice of MPI communication algorithms. We then apply the knowledge to automatically optimize new unseen MPI programs. We evaluate our approach by applying it to NPB and HPCC benchmarks on a 384-node computer cluster of the Tianhe-2 supercomputer. Experimental results show that our approach achieves, on average, 22.7% (up to 40.7%) improvement over the default setting.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 30 Jan 2020 14:03 |
Last Modified: | 30 Jan 2020 14:03 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/hpcc/smartcity/dss.2019.00101 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:156217 |