Srijith, P.K., Hepple, M., Bontcheva, K. et al. (1 more author) (2017) Sub-story detection in Twitter with hierarchical Dirichlet processes. Information Processing & Management, 53 (4). pp. 989-1003. ISSN 0306-4573
Abstract
Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time – a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection – as sub-story detection. This paper proposes hierarchical Dirichlet processes (HDP), a probabilistic topic model, as an effective method for automatic sub-story detection. HDP can learn sub-topics associated with sub-stories which enables it to handle subtle variations in sub-stories. It is compared with state-of-the-art story detection approaches based on locality sensitive hashing and spectral clustering. We demonstrate the superior performance of HDP for sub-story detection on real world Twitter data sets using various evaluation measures. The ability of HDP to learn sub-topics helps it to recall the sub-stories with high precision. This has resulted in an improvement of up to 60% in the F-score performance of HDP based sub-story detection approach compared to standard story detection approaches. A similar performance improvement is also seen using an information theoretic evaluation measure proposed for the sub-story detection task. Another contribution of this paper is in demonstrating that considering the conversational structures within the Twitter stream can bring up to 200% improvement in sub-story detection performance.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) |
Keywords: | Sub-story detection; Hierarchical dirichlet process; Spectral clustering; Locality sensitive hashing |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number EUROPEAN COMMISSION - FP6/FP7 PHEME - 611233 EUROPEAN COMMISSION - FP6/FP7 TRENDMINER - 287863 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 Aug 2016 12:04 |
Last Modified: | 07 Nov 2018 10:21 |
Published Version: | https://doi.org/10.1016/j.ipm.2016.10.004 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.ipm.2016.10.004 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:101394 |