Wen, C., Clough, P. orcid.org/0000-0003-1739-175X, Paton, R. et al. (1 more author) (2025) Leveraging large language models for thematic analysis: a case study in the charity sector. AI & Society.
Abstract
This study explores how large language models (LLMs) can support deductive and inductive thematic coding in real-life contexts, balancing AI-driven efficiency with essential human oversight. Using three datasets from Tearfund, a UK-based Christian charity, we propose a dual-role human–LLM collaborative framework where the LLM functions as an initial annotator and a validator. In the deductive phase, GPT-4o and GPT-4o-mini were compared against human coders. GPT-4o achieved a substantial agreement in multi-label thematic categorization (κ = 0.61–0.65), while GPT-4o-mini showed a moderate agreement (κ = 0.41–0.58). Both models excelled in sentiment analysis (κ = 0.91–0.95), but struggled with evaluating evidence of impact due to contextual complexity (κ ≤ 0.01). GPT-4o-mini exhibited greater output variability and instability than GPT-4o, but benefited more from few-shot learning to mitigate hallucinations. In the inductive phase, GPT-4o demonstrated a strong semantic alignment with human-generated themes (cosine similarity = 0.76–0.79) though its tendency toward broad themes required human refinement. Despite their potential to streamline thematic analysis, LLMs also pose limitations and implementation challenges, including inconsistencies in excerpt extraction (precision = 0.41, recall = 0.53) and the trade-off between the time saved in coding and the time required for human validation. To facilitate practical implementation, we provide reusable prompt templates for four stages: context, instructions, data processing, and verification. Our findings underline the indispensable role of human expertise—from prompt engineering and managing hallucinations to final verification—to ensure accurate and trustworthy AI-assisted analyses. While LLMs can enhance qualitative analysis, their full potential is only realized under skilled human guidance.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Large language models (LLMs); Generative AI (GenAI); GPT-4o; Prompt engineering; Thematic analysis |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 02 Sep 2025 08:31 |
Last Modified: | 02 Sep 2025 08:31 |
Published Version: | https://doi.org/10.1007/s00146-025-02487-4 |
Status: | Published online |
Publisher: | Springer Verlag |
Refereed: | Yes |
Identification Number: | 10.1007/s00146-025-02487-4 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230962 |