Speck, R and Ruprecht, D orcid.org/0000-0003-1904-2473 (2017) Toward fault-tolerant parallel-in-time integration with PFASST. Parallel Computing, 62. pp. 20-37. ISSN 0167-8191
Abstract
We introduce and analyze different strategies for the parallel-in-time integration method PFASST to recover from hard faults and subsequent data loss. Since PFASST stores solutions at multiple time steps on different processors, information from adjacent steps can be used to recover after a processor has failed. PFASST's multi-level hierarchy allows to use the coarse level for correcting the reconstructed solution, which can help to minimize overhead. A theoretical model is devised linking overhead to the number of additional PFASST iterations required for convergence after a fault. The potential efficiency of different strategies is assessed in terms of required additional iterations for examples of diffusive and advective type.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 Elsevier B.V. This is an author produced version of a paper published in Parallel Computing. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | algorithm-based fault tolerance, resilience, parallel-in-time integration, Gray-Scott model, Boussinesq equations |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Mechanical Engineering (Leeds) > Institute of Engineering Thermofluids, Surfaces & Interfaces (iETSI) (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 20 Dec 2016 11:57 |
Last Modified: | 11 Jan 2023 11:47 |
Published Version: | https://doi.org/10.1016/j.parco.2016.12.001 |
Status: | Published |
Publisher: | Elsevier |
Identification Number: | 10.1016/j.parco.2016.12.001 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:109710 |