Newman-Griffis, D. orcid.org/0000-0002-0473-4226, Sivaraman, V., Perer, A. et al. (2 more authors) (2021) TextEssence: a tool for interactive analysis of semantic shifts between corpora. In: Sil, A. and Lin, X.V., (eds.) Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations. The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 06-11 Jun 2021, Online. Association for Computational Linguistics , pp. 106-115. ISBN 9781954085480
Abstract
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 Feb 2023 10:00 |
Last Modified: | 18 Feb 2023 01:17 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/2021.naacl-demos.13 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:196485 |