Rinke, EM orcid.org/0000-0002-5330-7634, Dobbrick, T, Löb, C et al. (2 more authors) (2022) Expert-Informed Topic Models for Document Set Discovery. Communication Methods and Measures, 16 (1). pp. 39-58. ISSN 1931-2458
Abstract
The first step in many text-as-data studies is to find documents that address a specific topic within a larger document set. Researchers often rely on simple keyword searches to do this, even though this may introduce considerable selection bias. Such bias may be even greater when researchers lack the domain knowledge required to make informed search decisions, for example, in cross-national research or research on unfamiliar social contexts. We propose expert-informed topic modeling (EITM) as a hybrid approach to tackle this problem. EITM combines the validity of external domain knowledge captured through expert surveys with probabilistic topic models to help researchers identify subsets of documents that cover initially unknown domain-specific topics, such as specific events and debates, that belong to a researcher-defined master topic. EITM is a flexible and efficient approach to the thematic selection of documents from large text corpora for further study. We benchmark and validate the method by discovering blog posts that address the public role of religion within large corpora of Australian, Swiss, and Turkish blog posts and provide researchers with a complete workflow to guide the application of EITM in their own work.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 The Author(s). This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Education, Social Sciences and Law (Leeds) > School of Politics & International Studies (POLIS) (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 02 Jul 2021 15:10 |
Last Modified: | 23 Apr 2022 06:12 |
Status: | Published |
Publisher: | Routledge |
Identification Number: | 10.1080/19312458.2021.1920008 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:175813 |