Preiss, J. (2023) Automatic named entity obfuscation in speech. In: Rogers, A., Boyd-Graber, J. and Okazaki, N., (eds.) Findings of the Association for Computational Linguistics: ACL 2023. 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), 09-14 Jul 2023, Toronto, Canada. Association for Computational Linguistics , pp. 615-622. ISBN 9781959429623
Abstract
Sharing data containing personal information often requires its anonymization, even when consent for sharing was obtained from the data originator. While approaches exist for automated anonymization of text, the area is not as thoroughly explored in speech. This work focuses on identifying, replacing and inserting replacement named entities synthesized using voice cloning into original audio thereby retaining prosodic information while reducing the likelihood of deanonymization. The approach employs a novel named entity recognition (NER) system built directly on speech by training HuBERT (Hsu et al, 2021) using the English speech NER dataset (Yadav et al, 2020). Name substitutes are found using a masked language model and are synthesized using text to speech voice cloning (Eren and team, 2021), upon which the substitute named entities are re-inserted into the original text. The approach is prototyped on a sample of the LibriSpeech corpus (Panyatov et al, 2015) with each step evaluated individually.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2023 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 15 Apr 2024 15:40 |
Last Modified: | 15 Apr 2024 15:40 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/2023.findings-acl.39 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:211372 |