Accelerate On-Chip Artificial Neural Network Training: a Customised fNIRS Signals Processing System-on-Chip

Zhao, Y., Zhao, H. and Yang, S. orcid.org/0000-0003-0531-2903 (2025) Accelerate On-Chip Artificial Neural Network Training: a Customised fNIRS Signals Processing System-on-Chip. In: Proceedings of 38th IEEE International System-on-Chip Conference. 38th IEEE International System-on-Chip Conference (SOCC), 29 Sep - 01 Oct 2025, Dubai, United Arab Emirates. . IEEE. ISBN: 979-8-3315-9478-7. ISSN: 2164-1706. EISSN: 2164-1706.

Abstract

During artificial neural network (ANN) training, batch normalisation (BN) techniques improve training performance algorithmically but create hardware implementation challenges. This paper aims to address this specific bottleneck within the broader challenge of on-chip training. We propose an all-onchip artificial neural network (ANN) training framework that utilises AMD’s Versal XCVC1902 heterogeneous SoC with a purpose-built batch-normalisation (BN) accelerator to overcome the two main barriers to edge computing: memory bandwidth and batch normalisation layer latency. By hardwiring batch normalisation calculation to form a hardware primitive and using fixed-point arithmetic, the design improves batch normalisation latency from tens of cycles to nine cycles, improving the efficiency of the ANN training process. The hardware-software co-design strategy demonstrates how deep processing unit cores, BN engine based on FPGA fabric, and CPUs can be orchestrated through a unifying abstraction, pointing toward application-specific SoCs that hide low-level scheduling from machine learning developers. The framework is evaluated as an application of real-time functional near-infrared spectroscopy (fNIRS) signal reconstruction, where cloud off-loading is undesirable for privacy and latency reasons. Although evaluated on fNIRS, the tiled batch normalisation engine and zero-copy dataflow can be reused in any ANN training that spends a significant fraction of time in batch normalisation layers.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Zhao, Y. Zhao, H. Yang, S. https://orcid.org/0000-0003-0531-2903
Copyright, Publisher and Additional Information:	This is an author produced version of an conference paper published in Proceedings of 38th IEEE International System-on-Chip Conference, made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	fNIRS, Edge Acceleration, Hardware/Software Co-design, Optimisation, System-on-chip
Dates:	Accepted: 16 July 2025 Published: 17 November 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Mechanical Engineering (Leeds) > Institute of Medical and Biological Engineering (iMBE) (Leeds)
Funding Information:	Funder Grant number Napier University Edinburgh R2183
Date Deposited:	24 Jul 2025 10:59
Last Modified:	20 Apr 2026 12:04
Status:	Published
Publisher:	IEEE
Identification Number:	10.1109/SOCC66126.2025.11235415
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:229435

Download

Accepted Version

Filename: IEEESOCC2025.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Accelerate On-Chip Artificial Neural Network Training: a Customised fNIRS Signals Processing System-on-Chip

Abstract

Metadata

Download

Accepted Version

Export

Statistics