Enhancing linguistic competence of language models through pre-training with language learning tasks

This is the latest version of this eprint.

Yamaguchi, A. orcid.org/0000-0001-8327-7598, Mi, M. and Aletras, N. (2026) Enhancing linguistic competence of language models through pre-training with language learning tasks. In: Liakata, M., Moreira, V.P., Zhang, J. and Jurgens, D., (eds.) Proceedings of 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026). 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), 02-07 Jul 2026, San Diego, California. . Association for Computational Linguistics (ACL), 2, pp. 316-336. ISBN: 9798891763913.

Abstract

Language models (LMs) are pre-trained on raw text datasets to generate text sequences token-by-token. While this approach facilitates the learning of world knowledge and reasoning, it does not explicitly optimize for linguistic competence. To bridge this gap, we propose L2T, a pre-training framework integrating Language Learning Tasks alongside standard next-token prediction. Inspired by human language acquisition, L2T transforms raw text into structured input-output pairs to provide explicit linguistic stimulation. Pre-training LMs on a mixture of raw text and L2T data not only improves overall performance on linguistic competence benchmarks but accelerates its acquisition, while maintaining competitive performance on general reasoning tasks.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Yamaguchi, A. https://orcid.org/0000-0001-8327-7598 Mi, M. Aletras, N.
Editors:	Liakata, M. Moreira, V.P. Zhang, J. Jurgens, D.
Copyright, Publisher and Additional Information:	© 2026 Association for Computational Linguistics. Licensed under a Creative Commons Attribution 4.0 International License - https://creativecommons.org/licenses/by/4.0/
Dates:	Accepted: 15 April 2026 Published (online): July 2026 Published: July 2026
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:	Funder Grant number Engineering and Physical Sciences Research Council 2894795
Date Deposited:	06 May 2026 14:36
Last Modified:	26 Jun 2026 08:37
Published Version:	https://aclanthology.org/2026.acl-short.27/
Status:	Published
Publisher:	Association for Computational Linguistics (ACL)
Refereed:	Yes
Related URLs:	Conference arXiv URL
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:240804

Available Versions of this Item

Enhancing linguistic competence of language models through pre-training with Language Learning Tasks. (deposited 06 May 2026 14:28)
- Enhancing linguistic competence of language models through pre-training with language learning tasks. (deposited 06 May 2026 14:36) [Currently Displayed]

Download

Published Version

Filename: 2026.acl-short.27.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)