Zhang, S., Zhao, J., Xia, C. et al. (3 more authors) (2024) Introducing Compiler Semantics into Large Language Models as Programming Language Translators: A Case Study of C to x86 Assembly. In: Findings of the Association for Computational Linguistics: EMNLP 2024. 2024 Conference on Empirical Methods in Natural Language Processing, 12-16 Nov 2024, Miami, Florida, USA. Association for Computational Linguistics , pp. 996-1011.
Abstract
Compilers are complex software containing millions of lines of code, taking years to develop. This paper investigates to what extent Large Language Models (LLMs) can replace hand-crafted compilers in translating high-level programming languages to machine instructions, using C to x86 assembly as a case study. We identify two challenges of using LLMs for code translation and introduce two novel data pre-processing techniques to address the challenges: numerical value conversion and training data resampling. While only using a 13B model, our approach achieves a behavioral accuracy of over 91%, outperforming the much larger GPT-4 Turbo model by over 50%. Our results are encouraging, showing that LLMs have the potential to transform how compilation tools are constructed.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | ACL materials are Copyright © 1963–2025 ACL; other materials are copyrighted by their respective copyright holders. Materials prior to 2016 here are licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 International License. Permission is granted to make copies for the purposes of teaching and research. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Distributed Systems & Services |
Depositing User: | Symplectic Publications |
Date Deposited: | 21 Feb 2025 10:42 |
Last Modified: | 21 Feb 2025 10:43 |
Published Version: | https://aclanthology.org/2024.findings-emnlp.55/ |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Identification Number: | 10.18653/v1/2024.findings-emnlp.55 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:223607 |
Download
Filename: Introducing Compiler Semantics into Large Language Models as.pdf
Licence: CC-BY 4.0