Yang, K., Liu, L., Liu, H. et al. (1 more author) (2025) A novel parallel processing element architecture for accelerating ODE and AI. Tsinghua Science and Technology, 30 (5). pp. 1954-1964. ISSN: 1007-0214
Abstract
Transforming complex problems, such as transforming ordinary differential equations (ODEs) into matrix formats, into simpler computational tasks is key for AI advancements and paves the way for more efficient computing architectures. Systolic Arrays, known for their computational efficiency, low power use and ease of implementation, address AI's computational challenges. They are central to mainstream industry AI accelerators, with improvements to the Processing Element (PE) significantly boosting systolic array performance, and also streamlines computing architectures, paving the way for more efficient solutions in technology fields. This research presents a novel PE design and its integration of systolic array based on a novel computing theory - bit-level mathematics for Multiply-Accumulate (MAC) operation. We present 3 different architectures for the PE and provide a comprehensive comparison between them and the state-of-the-art technologies, focusing on power, area, and throughput. This research also demonstrates the integration of the proposed MAC unit design with systolic arrays, highlighting significant improvements in computational efficiency. Our implementations show a 2380952.38 times lower latency, yet 64.19 times less DSP48E1, 1.26 times less Look-Up Tables (LUTs), 10.76 times less Flip-Flops (FFs), with 99.63 times less power consumption and 15.19 times higher performance per PE compared to the state-of-the-art design.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Power demand; AI accelerators; Computer architecture; Throughput; Systolic arrays; computational efficiency; Table lookup; Sparse matrices; Resource management; Low latency communication |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Electronic and Electrical Engineering (Sheffield) |
| Date Deposited: | 13 Nov 2025 14:35 |
| Last Modified: | 13 Nov 2025 14:35 |
| Published Version: | https://doi.org/10.26599/tst.2024.9010090 |
| Status: | Published |
| Publisher: | Tsinghua University Press |
| Refereed: | Yes |
| Identification Number: | 10.26599/tst.2024.9010090 |
| Related URLs: | |
| Sustainable Development Goals: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:234460 |
Download
Filename: A_Novel_Parallel_Processing_Element_Architecture_for_Accelerating_ODE_and_AI.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)