How can we effectively expand the vocabulary of LLMs with 0.01GB of target language text?

This is the latest version of this eprint.

Yamaguchi, A. orcid.org/0000-0001-8327-7598, Villavicencio, A. and Aletras, N. (Accepted: 2025) How can we effectively expand the vocabulary of LLMs with 0.01GB of target language text? Computational Linguistics. ISSN: 0891-2017 (In Press)

Abstract

Metadata

Item Type: Article
Authors/Creators:
Copyright, Publisher and Additional Information:

© 2025 Association for Computational Linguistics.

Dates:
  • Accepted: 24 October 2025
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:
Funder
Grant number
Engineering and Physical Sciences Research Council
2894795
Date Deposited: 07 Nov 2025 10:05
Last Modified: 07 Nov 2025 10:05
Status: In Press
Publisher: The MIT Press
Refereed: Yes
Related URLs:
Open Archives Initiative ID (OAI ID):

Available Versions of this Item

Download

Accepted Version


Under temporary embargo

Filename: 2406.11477v3.pdf

Request a copy

file not available

Export

Statistics