Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs

Abstract

Quantitative measures of systematic differences in OSCE scoring across examiners (often termed examiner stringency) can threaten the validity of examination outcomes. Such effects are usually conceptualised and operationalised based solely on checklist/domain scores in a station, and global grades are not often used in this type of analysis. In this work, a large candidate-level exam dataset is analysed to develop a more sophisticated understanding of examiner stringency. Station scores are modelled based on global grades—with each candidate, station and examiner allowed to vary in their ability/stringency/difficulty in the modelling. In addition, examiners are also allowed to vary in how they discriminate across grades—to our knowledge, this is the first time this has been investigated. Results show that examiners contribute strongly to variance in scoring in two distinct ways—via the traditional conception of score stringency (34% of score variance), but also in how they discriminate in scoring across grades (7%). As one might expect, candidate and station account only for a small amount of score variance at the station-level once candidate grades are accounted for (3% and 2% respectively) with the remainder being residual (54%). Investigation of impacts on station-level candidate pass/fail decisions suggest that examiner differential stringency effects combine to give false positive (candidates passing in error) and false negative (failing in error) rates in stations of around 5% each but at the exam-level this reduces to 0.4% and 3.3% respectively. This work adds to our understanding of examiner behaviour by demonstrating that examiners can vary in qualitatively different ways in their judgments. For institutions, it emphasises the key message that it is important to sample widely from the examiner pool via sufficient stations to ensure OSCE-level decisions are sufficiently defensible. It also suggests that examiner training should include discussion of global grading, and the combined effect of scoring and grading on candidate outcomes.

Metadata

Item Type:	Article
Authors/Creators:	Homer, M. https://orcid.org/0000-0002-1161-5938
Copyright, Publisher and Additional Information:	© The Author(s) 2023. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	OSCE; Examiner stringency; Standard setting; Borderline regression
Dates:	Published: July 2024 Published (online): 16 October 2023 Accepted: 24 September 2023
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Education, Social Sciences and Law (Leeds) > School of Education (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	04 Oct 2023 12:06
Last Modified:	10 Dec 2024 09:59
Status:	Published
Publisher:	Springer
Identification Number:	10.1007/s10459-023-10289-w
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:203908

Download

Published Version

Filename: s10459-023-10289-w.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Towards a more nuanced conceptualisation of differential examiner stringency in OSCEs

Abstract

Metadata

Download

Published Version

Export

Statistics