Alokaili, A., Aletras, N. and Stevenson, M. orcid.org/0000-0002-9483-6006 (2020) Automatic generation of topic labels. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20), 25-30 Jul 2020, Online conference. SIGIR Proceedings . Association for Computing Machinery (ACM) , pp. 1965-1968. ISBN 9781450380164
Abstract
Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic assignment of labels to topics has relied on a two-stage approach: (1) candidate labels are retrieved from a large pool (e.g. Wikipedia article titles); and then (2) re-ranked based on their semantic similarity to the topic terms.
However, these extractive approaches can only assign candidate labels from a restricted set that may not include any suitable ones. This paper proposes using a sequence-to-sequence neural-based approach to generate labels that does not suffer from this limitation. The model is trained over a new large synthetic dataset created using distant supervision. The method is evaluated by comparing the labels it generates to ones rated by humans.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 The Authors. This is an author-produced version of a paper subsequently published in SIGIR '20: Proceedings. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 28 May 2020 08:44 |
Last Modified: | 27 Jul 2020 09:41 |
Status: | Published |
Publisher: | Association for Computing Machinery (ACM) |
Series Name: | SIGIR Proceedings |
Refereed: | Yes |
Identification Number: | 10.1145/3397271.3401185 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:161293 |