Kazakov, Dimitar Lubomirov orcid.org/0000-0002-0637-8106, Minkov, Stefan, Margova, Ruslana et al. (2 more authors) (Accepted: 2025) Towards Creating a Bulgarian Readability Index. In: Proceedings of the Workshop on on Advancing NLP for Low-Resource Languages (LowResNLP) at RANLP 2025. Workshop on Advancing NLP for Low-Resource Languages at RANLP 2025, 11 Sep 2025 Association for Computational Linguistics (ACL), BGR. (In Press)
Abstract
Readability assessment plays a crucial role in education and text accessibility. While numerous indices exist for English and have been extended to Romance and Slavic languages, Bulgarian remains under-served in this regard. This paper reviews established readability metrics across these language families, examining their underlying features and modelling methods. We then report the first attempt to develop a readability index for Bulgarian, using end-of-school-year assessment questions and literary works targeted at children of various ages. Key linguistic attributes, namely, word length, sentence length, syllable count, and information content (based on word frequency), were extracted, and their first two statistical moments, mean and variance, were modelled against grade levels using linear and polynomial regression. Results suggest that polynomial models outperform linear ones by capturing non-linear relationships between textual features and perceived difficulty, but may be harder to interpret. This work provides an initial framework for building a reliable readability measure for Bulgarian, with applications in educational text design, adaptive learning, and corpus annotation.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy. |
Keywords: | READABILITY,index,Bulgarian |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 09 Sep 2025 11:00 |
Last Modified: | 09 Sep 2025 12:51 |
Status: | In Press |
Publisher: | Association for Computational Linguistics (ACL) |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:231312 |
Download
Filename: RANLP_2025_WS_BG_READABILITY_CRC_.pdf
Description: RANLP_2025_WS_BG_READABILITY_CRC_
Licence: CC-BY 2.5