Zhang, Y., Valentino, M. orcid.org/0000-0002-9959-8385, Carvalho, D. et al. (1 more author) (2026) Learning to disentangle latent reasoning rules with language VAEs: a systematic study. In: Proceedings of the AAAI Conference on Artificial Intelligence. 40th AAAI Conference on Artificial Intelligence, 20-27 Jan 2026, Singapore. Vol. 40 (23). Association for the Advancement of Artificial Intelligence (AAAI), pp. 19458-19466. ISSN: 2159-5399. EISSN: 2374-3468.
Abstract
Incorporating explicit reasoning rules within the latent space of language models (LMs) offers a promising pathway to enhance generalisation, interpretability, and controllability. While current Transformer-based language models have shown strong performance on Natural Language Inference (NLI) tasks, they often rely on memorisation rather than explicit rule-based generalisation. This work investigates how human-interpretable reasoning rules can be explicitly encoded within LMs with the support of Language Variational Autoencoders (VAEs), as a mechanism for generative control. We propose a complete pipeline for learning reasoning rules within Transformer-based language VAEs. This pipeline encompasses three rule-based reasoning tasks, a supporting theoretical framework, and a practical end-to-end architecture. The experiment illustrates the following findings: Disentangled reasoning: Under explicit signal supervision, reasoning rules (viewed as functional mappings) can be disentangled within the encoder’s parametric space. This separation results in distinct clustering of rules in the output feature space. Prior knowledge injection: injecting rule-based constraints into the Query enables the model to more effectively retrieve the stored Value from memory based on Key. This approach offers a simple method for integrating prior knowledge into decoder-only language models. Moreover, we found that FFN layers are better than attention layers at preserving the separation of reasoning rules in the model's parameters.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Authors. Except as otherwise noted, this author-accepted version of a conference paper published in Proceedings of the AAAI Conference on Artificial Intelligence is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | Information and Computing Sciences; Artificial Intelligence; Mental health |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
| Date Deposited: | 17 Apr 2026 09:01 |
| Last Modified: | 17 Apr 2026 13:21 |
| Status: | Published |
| Publisher: | Association for the Advancement of Artificial Intelligence (AAAI) |
| Refereed: | Yes |
| Identification Number: | 10.1609/aaai.v40i23.39024 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:240162 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)