Zhang, Z. orcid.org/0000-0002-8587-8618, Gao, J. and Ciravegna, F. orcid.org/0000-0001-5817-4810 (2018) SemRe-Rank: Improving Automatic Term Extraction by Incorporating Semantic Relatedness with Personalised PageRank. ACM Transactions on Knowledge Discovery from Data, 12 (5). 57. ISSN 1556-4681
Abstract
Automatic Term Extraction (ATE) deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed perspective to this problem: instead of searching for such a ‘one-size-fit-all’ solution that may never exist, we propose to develop generic methods to ‘enhance’ existing ATE methods. We introduce SemRe-Rank, the first method based on this principle, to incorporate semantic relatedness—an often overlooked venue—into an existing ATE method to further improve its performance. SemRe-Rank incorporates word embeddings into a personalised PageRank process to compute ‘semantic importance’ scores for candidate terms from a graph of semantically related words (nodes), which are then used to revise the scores of candidate terms computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all base methods and across all datasets, with up to 15 percentage points when measured by the Precision in the top ranked K candidate terms (the average for a set of K’s), or up to 28 percentage points in F1 measured at a K that equals to the expected real terms in the candidates (F1 in short). Compared to an alternative approach built on the well-known TextRank algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision at top K, or up to 17 points in F1.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2018 ACM. This is an author produced version of a paper subsequently published in ACM Transactions on Knowledge Discovery from Data. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Funding Information: | Funder Grant number EUROPEAN COMMISSION - HORIZON 2020 688082 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 07 Sep 2018 09:14 |
Last Modified: | 27 Jan 2020 09:49 |
Published Version: | https://doi.org/10.1145/3201408 |
Status: | Published |
Publisher: | Association for Computing Machinery |
Identification Number: | 10.1145/3201408 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:135437 |