West, R. orcid.org/0000-0001-6398-0921, Brown, J. orcid.org/0000-0002-2797-5428, Shahab, L. orcid.org/0000-0003-4033-442X et al. (10 more authors) (2025) Annotating datasets in behavioural and social sciences to promote interoperability: development of the schema for ontology-based dataset annotation (SODA) version 1.0. Wellcome Open Research, 10. p. 455. ISSN: 2398-502X
Abstract
Background and aims
Ontologies are increasingly employed to help find, use and synthesise information, but methods for using them to annotate documents and datasets remain in their infancy in the behavioural and social sciences. The Behavioural Research UK DEMO-DATA project aimed to develop a prototype schema for annotating datasets in behavioural and social sciences.
Methods
A case-study dataset (the ‘Smoking Toolkit Study’), used to inform an Agent-Based Model of trajectories in cigarette smoking and cessation in England, was chosen for annotation using two ontologies - The Behaviour Change Intervention Ontology (BCIO) and the Addiction Ontology (AddictO). The data set included 21 variables representing information about sociodemographic and tobacco and nicotine use attributes of the study population. A preliminary version of the schema for linking variables to ontology classes was developed as a basis for annotating each variable in the dataset. This was applied and revised iteratively until it was judged by an expert panel of domain experts and modellers to represent the variables sufficiently accurately to enable searching for and integration of data.
Results
The prototype Schema for Ontology-based Dataset Annotation (SODA) version 1.0 was developed over seven iterations. Variables were represented by an ‘object property’|‘ontology class’ expression (e.g., ‘has characteristic’|‘extent of social smoking’) together with information about the data types (e.g., numbers, ontology subclasses, or Boolean values), measurement source, unit of measurement, any coding or data transformations and whether or not the variable was fully characterised by the annotation. The prototype schema was applied successfully to the smoking dataset with 15 new ontology classes being created as required.
Conclusions
A prototype schema for annotating behavioural and social science datasets was developed and successfully applied to a dataset on smoking in England using ontology relations and classes. The next step is to further develop and evaluate the schema by application to case studies with a range of users and other datasets
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 West R et al. This is an open access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0/ |
Keywords: | Biomedical and Clinical Sciences; Health Sciences; Bioengineering; Drug Abuse (NIDA only); Behavioral and Social Science; Substance Misuse; Tobacco; Networking and Information Technology R&D (NITRD); Tobacco Smoke and Health; Good Health and Well Being |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Medicine and Population Health The University of Sheffield > Faculty of Science (Sheffield) > Department of Psychology (Sheffield) The University of Sheffield > Faculty of Engineering (Sheffield) > School of Electrical and Electronic Engineering The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Funding Information: | Funder Grant number ECONOMIC & SOCIAL RESEARCH COUNCIL ES/Y001044/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 28 Aug 2025 13:57 |
Last Modified: | 28 Aug 2025 13:57 |
Status: | Published |
Publisher: | F1000 Research Ltd |
Refereed: | Yes |
Identification Number: | 10.12688/wellcomeopenres.24234.1 |
Sustainable Development Goals: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230887 |
Download
Filename: a9b1968f-9239-428e-b248-ab43b21e4ef4_24234_-_robert_west.pdf
Licence: CC-BY 4.0