Ouyang, X, Zhou, H, Clement, S orcid.org/0000-0003-2918-5881 et al. (2 more authors) (2018) Mitigate data skew caused stragglers through ImKP partition in MapReduce. In: IEEE International Performance, Computing and Communications Conference, Proceedings. 36th IEEE International Performance Computing and Communications Conference (IPCCC), 10-12 Dec 2017, San Diego, California, USA. Institute of Electrical and Electronics Engineers ISBN 978-1-5090-6468-7
Abstract
Speculative execution is the mechanism adopted by current MapReduce framework when dealing with the straggler problem, and it functions through creating redundant copies for identified stragglers. The result of the quicker task will be adopted to improve the overall job execution performance. Although proved to be effective for contention caused stragglers, speculative execution can easily meet its bottleneck when mitigating data skew caused stragglers due to its replication nature: the identical unbalanced input data will lead to a slow speculative task. The Map inputs are typically even in size according to the HDFS block configuration, therefore the skew caused stragglers happen mainly in the Reduce phase because of the unknown intermediate key distribution. In this paper, we focus on mitigating data skew caused Reduce stragglers, propose ImKP, an Intermediate Key Pre-processing framework that enables the even distributed partition for Reduce inputs. A group based ranking technique has been developed that dramatically decreases the pre-processing time, and ImKP manages to eliminate this timing overhead through parallelizing the pre-processing with the file uploading procedure (from local file system to HDFS). For jobs that take input directly from HDFS, ImKP minimizes the overhead by storing the <; GroupedKey, Reducer > mapping result on every node within the cluster for reuse. Experiments are conducted on different datasets with various workloads. Results show that, compared to the popular hash partition, ImKP can dramatically decrease Reduce skew, achieving a 99.8% reduction in the coefficient of variation of the input sizes in average, and improve up to 29.37% job response performance.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | Stragglers; MapReduce; Skew-handling; Partition |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 01 Nov 2017 16:35 |
Last Modified: | 27 Mar 2018 11:01 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Identification Number: | 10.1109/PCCC.2017.8280475 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:123203 |