Pahar, M. orcid.org/0000-0002-5926-0144, Mirheidari, B. orcid.org/0009-0009-8679-203X, Illingworth, C. orcid.org/0009-0002-3800-7999 et al. (5 more authors) (2025) Automatic detection of early cognitive decline using multimodal feature fusion and transfer learning on real-world conversational speech. IEEE Journal of Biomedical and Health Informatics, 29 (12). pp. 8727-8734. ISSN: 2168-2194
Abstract
Early signs of cognitive decline, such as dementia and mild cognitive impairment (MCI), often manifest in conversational speech. Early and accurate identification is essential for potential interventions prior to the onset of more severe stages of neurodegenerative diseases. We present CognoMemory, a system for detecting cognitive decline based on a person's speech, to collect 307 hrs of real-world conversational speech, corresponding to 1.92 million Whisper-transcribed words, from 1,639 participants. Speech recordings were collected as participants answered 14 memory-probing, clinically effective questions asked by a virtual agent, starting with a motivation prompt, followed by memory, cognitive functioning, fluency, picture description and reading task. Both acoustic and linguistic features, along with large language model (LLM) embeddings, were extracted from all 1,639 participants. A subset of 614 participants, either with an unconfirmed diagnosis or younger than 50 years, was used for pre-training. The remaining three groups (64 dementia, 169 MCI and 792 healthy participants) were used to fine-tune our proposed model. Our multimodal feature fusion and CNN/Bi-LSTM-based transfer learning approach outperforms LLM-based (BART, DistilBERT, RoBERTa and HuBERT) approaches while achieving the highest F1-scores of 0.83 & 0.54 using just the initial ‘motivation’ question for 2-way & 3-way classification; exhibiting a 3% performance increase due to the application of transfer learning, while being also 38% faster. Finally, the classifiers trained on the CognoMemory data, the largest of its kind, were tested on the second-largest available DementiaBank dataset (Pitt corpus), and a CNN-based transfer learning architecture achieved an F1-score of 0.89, demonstrating better stability and generalisation across datasets and of our novel feature fusion and architecture.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Author(s). Except as otherwise noted, this author-accepted version of a journal article published in IEEE Journal of Biomedical and Health Informatics is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | Dementia; MCI; transfer learning; feature fusion; cognitive decline; pathological speech; multimodal |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Medicine and Population Health The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > Department of Neuroscience (Sheffield) |
| Date Deposited: | 09 Dec 2025 17:04 |
| Last Modified: | 10 Dec 2025 09:07 |
| Status: | Published |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Refereed: | Yes |
| Identification Number: | 10.1109/jbhi.2025.3624043 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:235344 |
Download
Filename: CNN_LSTM_Audio_Text_feats_BHI_2025__JBHI_special_8_page_FINAL.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)