This is the latest version of this eprint.
Yamaguchi, A. orcid.org/0000-0001-8327-7598, Mi, M. and Aletras, N. (Accepted: 2026) Enhancing linguistic competence of language models through pre-training with language learning tasks. In: Proceedings of 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), 02-07 Jul 2026, San Diego, California. . Association for Computational Linguistics (ACL). (In Press)
Abstract
Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence. To bridge this gap, we propose L2T, a pre-training framework integrating Language Learning Tasks alongside standard next-token prediction. Inspired by human language acquisition, L2T transforms raw text into structured input-output pairs to provide explicit linguistic stimulation. Pre-training LMs on a mixture of raw text and L2T data not only improves overall performance on linguistic competence benchmarks but accelerates its acquisition, while maintaining competitive performance on general reasoning tasks.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 Association for Computational Linguistics. |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council 2894795 |
| Date Deposited: | 06 May 2026 14:36 |
| Last Modified: | 06 May 2026 14:40 |
| Status: | In Press |
| Publisher: | Association for Computational Linguistics (ACL) |
| Refereed: | Yes |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:240804 |
Available Versions of this Item
-
Enhancing linguistic competence of language models through pre-training with Language Learning Tasks. (deposited 06 May 2026 14:28)
- Enhancing linguistic competence of language models through pre-training with language learning tasks. (deposited 06 May 2026 14:36) [Currently Displayed]
Download
Filename: l2t.pdf

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)