Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives

Abstract

Objectives Police routinely collect unstructured narrative reports of their interactions with civilians. These accounts have the potential to reveal the extent of police engagement with vulnerable populations. We test whether large language models (LLMs) can effectively replicate human qualitative coding of these narratives—a task that would otherwise be highly resource intensive.

Methods Using publicly available narrative reports from Boston Police Department, we compare human-generated and LLM-generated labels for four vulnerabilities: mental ill health, substance misuse, alcohol dependence, and homelessness. We assess multiple LLM sizes and prompting strategies, measure label variability through repeated prompts, and conduct counterfactual experiments to examine potential classification biases related to sex and race.

Results LLMs demonstrate high agreement with human coders in identifying narratives without vulnerabilities, particularly when repeated classifications are unanimous or near-unanimous. Human-LLM agreement improves with larger models and tailored prompting strategies, though effectiveness varies by vulnerability type. These findings suggest a human-LLM collaborative approach, where LLMs screen the majority of cases whilst humans review ambiguous instances, would significantly reduce manual coding requirements. Counterfactual analyses indicate minimal influence of subject sex and race on LLM classifications beyond those expected by chance.

Conclusions LLMs can substantially reduce resource requirements for analyzing large narrative datasets, whilst enhancing coding specificity and transparency, and enabling new approaches to replication and comparative analysis. These advances present promising opportunities for criminology and related fields.

Metadata

Item Type:	Article
Authors/Creators:	Relins, S. Birks, D. https://orcid.org/0000-0003-3055-7398 Lloyd, C.
Copyright, Publisher and Additional Information:	© The Author(s) 2025. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	Large Language Models, Unstructured Data, Policing, Vulnerability, Deductive Coding
Dates:	Accepted: 22 April 2025 Published (online): 17 June 2025
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Education, Social Sciences and Law (Leeds) > School of Law (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	08 Jul 2025 10:33
Last Modified:	08 Jul 2025 10:33
Status:	Published online
Publisher:	Springer Nature
Identification Number:	10.1007/s10940-025-09611-z
Related URLs:	Author Dataset
Sustainable Development Goals:
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:228890