Yuan, Y., Gao, J. orcid.org/0000-0002-3610-8748 and Zhang, Y. (2018) Supervised Learning for Robust Term Extraction. In: Asian Language Processing (IALP), 2017 International Conference on. International Conference on Asian Language Processing (IALP 2017), 05-07 Dec 2017, Singapore. IEEE
Abstract
We propose a machine learning method to automatically classify the extracted ngrams from a corpus into terms and non-terms. We use 10 common statistics in previous term extraction literature as features for training. The proposed method, applicable to term recognition in multiple domains and languages, can help 1) avoid the laborious work in the post-processing (e.g. subjective threshold setting); 2) handle the skewness and demonstrate noticeable resilience to domain-shift issue of training data. Experiments are carried out on 6 corpora of multiple domains and languages, including GENIA and ACLRD-TEC(1.0) corpus as training set and four TTC subcorpora of wind energy and mobile technology in both Chinese and English as test set. Promising results are found, which indicate that this approach is capable of identifying both single word terms and multiword terms with reasonably good precision and recall.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | term extraction; supervised learning; classification; n-gram |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 30 Oct 2017 09:48 |
Last Modified: | 19 Dec 2022 13:37 |
Published Version: | https://doi.org/10.1109/IALP.2017.8300603 |
Status: | Published |
Publisher: | IEEE |
Refereed: | Yes |
Identification Number: | 10.1109/IALP.2017.8300603 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:123210 |