Sickotra, S. orcid.org/0000-0002-7351-9255 (2025) Data resource profile: a guide for constructing school-to-work sequence analysis trajectories using the Longitudinal Education Outcomes (LEO) data. International Journal of Population Data Science, 8 (6). 11. ISSN 2399-4908
Abstract
Introduction
Sequence analysis is a powerful methodology for examining longitudinal school-to-work trajectories. Despite its growing use, there is limited guidance on preparing suitable datasets. This resource details the creation of a dataset specifically designed for sequence analysis, capturing yearly education and employment activity states for 556,182 individuals from England's 2010/11 school-leaver cohort.
Methods
The dataset was constructed using the Department for Education's Longitudinal Education Outcomes (LEO) data. SQL was used to extract relevant variables, and data linkage and preprocessing was performed using R. Data processing was tailored to sequence analysis, including reducing the number of activity states and applying a hierarchy to integrate education and employment data.
Results
The resulting dataset spans activities from the first non-compulsory state in 2011/12 until 2018/19, tracking trajectories from ages 16/17 to 23/24. The dataset was designed with the ability to subset school-leavers by their initial Combined Authority residence to aid in regional analysis of school-to-work trajectories. Individual-level socio-demographic characteristics that can be linked to the longitudinal activity histories were also built, alongside longitudinal geographic locations and employment earnings data. Additionally, the limitations of the developed data are discussed.
Conclusion
This resource provides crucial guidance for researchers and practitioners who may require experience preparing input datasets for sequence analysis, addressing the current gap in available resources. By offering step-by-step instructions and shared code, it empowers users to recreate or adapt the dataset for their specific research needs. Its ability to subset by region further supports localised and comparative studies of school-to-work trajectories, making it a valuable tool for advancing existing research. The LEO data can be accessed by application through the Office for National Statistics Secure Research Service.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Open Access under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/deed.en) |
Keywords: | Education data linkage; Administrative Data Linkage; Sequence Analysis; Longitudinal Education Outcomes; School-to-work; Data Development; Data Pre-processing; Big Data |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Sheffield Methods Institute |
Funding Information: | Funder Grant number Economic and Social Research Council 2433665 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Feb 2025 11:41 |
Last Modified: | 26 Mar 2025 13:11 |
Status: | Published |
Publisher: | Swansea University |
Refereed: | Yes |
Identification Number: | 10.23889/ijpds.v8i6.2953 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:223128 |