Wen, C., Clough, P. orcid.org/0000-0003-1739-175X, Paton, R. et al. (1 more author) (2025) Leveraging large language models for thematic analysis: a case study in the charity sector. AI & Society.
Abstract
This study explores how large language models (LLMs) can support deductive and inductive thematic coding in real-life contexts, balancing AI-driven efficiency with essential human oversight. Using three datasets from Tearfund, a UK-based Christian charity, we propose a dual-role human–LLM collaborative framework where the LLM functions as an initial annotator and a validator. In the deductive phase, GPT-4o and GPT-4o-mini were compared against human coders. GPT-4o achieved a substantial agreement in multi-label thematic categorization (κ = 0.61–0.65), while GPT-4o-mini showed a moderate agreement (κ = 0.41–0.58). Both models excelled in sentiment analysis (κ = 0.91–0.95), but struggled with evaluating evidence of impact due to contextual complexity (κ ≤ 0.01). GPT-4o-mini exhibited greater output variability and instability than GPT-4o, but benefited more from few-shot learning to mitigate hallucinations. In the inductive phase, GPT-4o demonstrated a strong semantic alignment with human-generated themes (cosine similarity = 0.76–0.79) though its tendency toward broad themes required human refinement. Despite their potential to streamline thematic analysis, LLMs also pose limitations and implementation challenges, including inconsistencies in excerpt extraction (precision = 0.41, recall = 0.53) and the trade-off between the time saved in coding and the time required for human validation. To facilitate practical implementation, we provide reusable prompt templates for four stages: context, instructions, data processing, and verification. Our findings underline the indispensable role of human expertise—from prompt engineering and managing hallucinations to final verification—to ensure accurate and trustworthy AI-assisted analyses. While LLMs can enhance qualitative analysis, their full potential is only realized under skilled human guidance.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2025 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Large language models (LLMs); Generative AI (GenAI); GPT-4o; Prompt engineering; Thematic analysis |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
| Depositing User: | Symplectic Sheffield |
| Date Deposited: | 02 Sep 2025 08:31 |
| Last Modified: | 02 Sep 2025 08:31 |
| Published Version: | https://doi.org/10.1007/s00146-025-02487-4 |
| Status: | Published online |
| Publisher: | Springer Verlag |
| Refereed: | Yes |
| Identification Number: | 10.1007/s00146-025-02487-4 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230962 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)