Nikolaev, A. and Bermel, N. orcid.org/0000-0002-1663-9322 (2023) Studying negative evidence in Finnish language corpora. Word Structure, 16 (2-3). pp. 206-232. ISSN 1750-1245
Abstract
This study explores the relationship between lower-than-expected frequencies of word forms and inherent gaps in Finnish inflectional paradigms. The research aims to determine whether it is possible to predict paradigmatic gaps from lower-than-expected frequencies of word forms. We examined Finnish nouns inflected in a marginal case (the instructive) and hypothesized that some of these nouns may potentially have gaps in their inflectional paradigms. However, we found that such gaps are contingent and do not cause uncertainty when filled. We find that the correlation between inherent gaps and lower frequencies is one-directional: predicting inherent gaps from lower-than-expected frequencies is problematic. The results suggest that any paradigmatic gap suggested by corpus frequency is more likely to be contingent than inherent, and that the less semantic need there is for a particular word form, the more likely it will be unattested even in a large corpus. The research highlights the importance of considering semantic profiles when analyzing the grammaticality of word forms and suggests that statistical tests like Fisher’s exact are not necessarily the right approach to tackle the problem of negative evidence in corpus studies.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 Edinburgh University Press. This is an author-produced version of a paper accepted for publication in Word Structure. Uploaded in accordance with the publisher's self-archiving policy. This version is made available under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Finnish; inflectional morphology; defectivity; corpus linguistics; Bayesian statistics; word semantics |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Arts and Humanities (Sheffield) > School of Languages and Cultures (Sheffield) |
Funding Information: | Funder Grant number ARTS AND HUMANITIES RESEARCH COUNCIL AH/T002859/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 27 Nov 2023 12:55 |
Last Modified: | 27 Nov 2023 14:07 |
Status: | Published |
Publisher: | Edinburgh University Press |
Refereed: | Yes |
Identification Number: | 10.3366/word.2023.0229 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:205951 |