Preiss, J. orcid.org/0000-0002-2158-5832 and Paramita, M.L. orcid.org/0000-0002-9414-1853 (2023) Do origin and facts identify automatically generated text? In: Montes-y-Gómez, M., Rangel, F., Jiménez-Zafra, S.M., Casavantes, M., Altuna, B., Alvarez-Carmona, M.Á., Bel-Enguix, G., Chiruzzo, L., de la Iglesia, I., Escalante, H.J., Garcia-Cumbreras, M.Á., García-Díaz, J.A., González Barba, J.Á., Tamayo, R.L., Lima, S., Moral, P., Plaza del Arco, F.M. and Valencia-García, R., (eds.) Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023). Iberian Languages Evaluation Forum 2023, 26 Sep 2023, Andalusia, Spain. CEUR Workshop Proceedings, 3496 . CEUR-WS.org
Abstract
We present a proof of concept investigating whether native language identification and fact checking information improves a language model (GPT-2) classifier which determines whether a piece of text was written by a human or a machine. Since automatical text generation is trained on writings of many individuals, we hypothesize that there will not be a clear native language for 'the writer' and therefore that a native language identification module can be used in reverse - i.e. when a native language cannot be identified, the probability of automatic generation is higher. Automatic generation is also known to hallucinate, making up content. To this end, we integrate a Wikipedia fact checking module. Both pieces of information are simply added to the input to the GPT-2 classifier, and result in an improvement over its baseline performance in the English language human or generated subtask of the Automated Text Identification (AuTexTification) shared task [1].
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/). |
Keywords: | GPT-2classifier; native language identification; Wikipedia fact checking |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 10 Apr 2024 10:20 |
Last Modified: | 10 Apr 2024 10:20 |
Published Version: | https://ceur-ws.org/Vol-3496/ |
Status: | Published |
Publisher: | CEUR-WS.org |
Series Name: | CEUR Workshop Proceedings |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:211371 |