Mai, Y. and Goetze, S. orcid.org/0000-0003-1044-7343 (2025) MetricGAN+KAN: Kolmogorov-Arnold networks in metric-driven speech enhancement systems. In: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 06-11 Apr 2025, Hyderabad, India. Institute of Electrical and Electronics Engineers (IEEE) ISBN 9798350368758
Abstract
Neural-network-based speech enhancement (SE) approaches have shown to be particularly powerful in combination with perceptually motivated metrics to produce high-quality enhanced speech signals. Among these deep learning (DL)-based SE models, MetricGAN and its extension can generate output signals directly optimising quality metrics. The recently proposed Kolmogorov-Arnold networks (KANs) with learnable activation functions have shown great success in replacing multi-layer perceptrons (MLPs). This work proposes the use of KANs in a MetricGAN framework and analyses their performance in replacing different types of network layers. The best-performing proposed MetricGAN+KAN model uses approximately 80% fewer parameters and achieves 13.2% higher SE performance (measured by PESQ) on the Voicebank-DEMAND dataset, compared to the MetricGAN+ baseline.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). Except as otherwise noted, this author-accepted version of a paper published in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Speech enhancement; quality metrics; Kolmogorov-Arnold network (KAN); Generative adversarial network (GAN); MetricGAN |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 23 Jan 2025 17:06 |
Last Modified: | 14 Mar 2025 16:25 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP49660.2025.10890542 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:221982 |