Zhu, J, Yang, R orcid.org/0000-0001-6334-4925, Hu, C et al. (4 more authors) (2021) Perph: A Workload Co-location Agent with Online Performance Prediction and Resource Inference. In: 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid). 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid ), 10-13 May 2021, Melbourne, Australia. ISBN 978-1-7281-9586-5
Abstract
Striking a balance between improved cluster utilization and guaranteed application QoS is a long-standing research problem in cluster resource management. The majority of current solutions require a large number of sandboxed experimentation for different workload combinations and leverage them to predict possible interference for incoming workloads. This results in non-negligible time complexity that severely restricts its applicability to complex workload co-locations. The nature of pure offline profiling may also lead to model aging problem that drastically degrades the model precision. In this paper, we present Perph, a runtime agent on a per node basis, which decouples ML-based performance prediction and resource inference from centralized scheduler. We exploit the sensitivity of long-running applications to multi-resources for establishing a relationship between resource allocation and consequential performance. We use Online Gradient Boost Regression Tree (OGBRT) to enable the continuous model evolution. Once performance degradation is detected, resource inference is conducted to work out a proper slice of resources that will be reallocated to recover the target performance. The integration with Node Manager (NM) of Apache YARN shows that the throughput of Kafka data-streaming application is 2.0x and 1.82x times that of isolation execution schemes in native YARN and pure cgroup cpu subsystem. In TPC-C benchmarking, the throughput can also be improved by 35% and 23% respectively against YARN native and cgroup cpu subsystem.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Keywords: | performance isolation , co-location , multi-dimensional resource |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number EPSRC (Engineering and Physical Sciences Research Council) EP/T01461X/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Aug 2021 10:11 |
Last Modified: | 11 Aug 2021 10:11 |
Status: | Published |
Identification Number: | 10.1109/CCGrid51090.2021.00027 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:176926 |