Xia, C., Zhao, J., Sun, Q. et al. (4 more authors) (2024) Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions. In: The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 27 Apr - 01 May 2024, San Diego, USA. ACM , pp. 286-301. ISBN 979-8-4007-0372-0
Abstract
Optimizing deep neural network (DNN) execution is important but becomes increasingly difficult as DNN complexity grows. Existing DNN compilers cannot effectively exploit optimization opportunities across operator boundaries, leaving room for improvement. To address this challenge, we present Souffle, an open-source compiler that optimizes DNN inference across operator boundaries. Souffle creates a global tensor dependency graph using tensor expressions, traces data flow and tensor information, and partitions the computation graph into subprograms based on dataflow analysis and resource constraints. Within a subprogram, Souffle performs local optimization via semantic-preserving transformations, finds an optimized program schedule, and improves instruction-level parallelism and data reuse. We evaluated Souffle using six representative DNN models on an NVIDIA A100 GPU. Experimental results show that Souffle consistently outperforms six state-of-the-art DNN optimizers by delivering a geometric mean speedup of up to 3.7× over TensorRT and 7.8× over Tensorflow XLA.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. |
Keywords: | Deep Neural Network, Compiler Optimization, Tensor Expression, GPU |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number EPSRC (Engineering and Physical Sciences Research Council) EP/X018202/1 |
Depositing User: | Symplectic Publications |
Date Deposited: | 26 Sep 2023 12:45 |
Last Modified: | 16 May 2024 12:39 |
Published Version: | https://dl.acm.org/doi/10.1145/3617232.3624858 |
Status: | Published |
Publisher: | ACM |
Identification Number: | 10.1145/3617232.3624858 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:203681 |
Download
Filename: Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions.pdf
Licence: CC-BY 4.0