Garraghan, P, Perks, S, Ouyang, X et al. (2 more authors) (2016) Tolerating Transient Late-Timing Faults in Cloud-Based Real-Time Stream Processing. In: Proceedings: 2016 IEEE 19th International Symposium on Real-Time Distributed Computing. ISORC 2016, 17-20 May 2016, York, UK. IEEE , pp. 108-115. ISBN 978-1-4673-9032-3
Abstract
Real-time stream processing is a frequently deployed application within Cloud datacenters that is required to provision high levels of performance and reliability. Numerous fault-tolerant approaches have been proposed to effectively achieve this objective in the presence of crash failures. However, such systems struggle with transient late-timing faults - a fault classification challenging to effectively tolerate - that manifests increasingly within large-scale distributed systems. Such faults represent a significant threat towards minimizing soft real-time execution of streaming applications in the presence of failures. This work proposes a fault-tolerant approach for QoS-aware data prediction to tolerate transient late-timing faults. The approach is capable of determining the most effective data prediction algorithm for imposed QoS constraints on a failed stream processor at run-time. We integrated our approach into Apache Storm with experiment results showing its ability to minimize stream processor end-to-end execution time by 61% compared to other fault-tolerant approaches. The approach incurs 12% additional CPU utilization while reducing network usage by 44%.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. |
Keywords: | Fault-tolerance; Stram Processing; Data Prediction; Cloud computing; Prediction algorithms, Real-time systems, Fault tolerant systems, Transient analysis, Quality of service, Predictive models |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Institute for Computational and Systems Science (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 23 Jun 2016 09:40 |
Last Modified: | 16 Nov 2016 08:17 |
Published Version: | http://dx.doi.org/10.1109/ISORC.2016.24 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/ISORC.2016.24 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:98900 |