Ollerenshaw, A., Jalal, M.A. and Hain, T. orcid.org/0000-0003-0939-3464 (2022) Insights of neural representations in multi-banded and multi-channel convolutional transformers for end-to-end ASR. In: Proceedings of 2022 30th European Signal Processing Conference (EUSIPCO). 2022 30th European Signal Processing Conference (EUSIPCO), 29 Aug - 02 Sep 2022, Belgrade, Serbia. Institute of Electrical and Electronics Engineers (IEEE) , pp. 434-438. ISBN 9781665467995
Abstract
End-to-End automatic speech recognition (ASR) models aim to learn generalised representations of speech. Popular approaches for End-to-End solutions have involved utilising extremely large amounts of data and large models to im-prove recognition performance. However, it is not clear if these models are generalising the training data or memorising the data. This paper combines the power of a mixture of experts (MoE) models, which is referred to as multi-band, multi-channel, with a popular model for ASR, the CNN-transformer, to capture longer-term dependencies without increasing the computational complexity of training. The goal is to investigate how the transformer models adapt to these different input representations of the same data. No external language models were used to remove the impact of external language models during inference. Although the proposed multi-band transformer shows performance gain, the main finding of this paper is to show the adaptive memo-risation nature of transformers and the neural representations of transformer embedding. Using the statistical correlation index SVCCA, comparative discussion of the neural repre-sentations of the proposed model and transformer approach is provided, with key insights into the distinct learned structures.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2022 by European Association for Signal Processing (EURASIP). Published by IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | end-to-end; automatic speech recognition; transformer; interpretability; convolutional neural networks |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 28 Jul 2022 12:58 |
Last Modified: | 18 Oct 2023 00:13 |
Published Version: | https://ieeexplore.ieee.org/document/9909875 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:189115 |