Green, T.A.F. orcid.org/0000-0002-5643-2473, Maynard, D. orcid.org/0000-0002-1773-7020 and Lin, C. orcid.org/0000-0003-3454-2468 (2022) Development of a benchmark corpus to support entity recognition in job descriptions. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference. Thirteenth Language Resources and Evaluation Conference, 20-25 Jun 2022, Marseille, France. European Language Resources Association , pp. 1201-1208. ISBN 9791095546726
Abstract
We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons license. This was created to address the distinct lack of resources available to the community for the extraction of salient entities, such as skills, from job descriptions. The dataset contains 18.6k entities comprising five types (Skill, Qualification, Experience, Occupation, and Domain). We include a benchmark CRF-based ER model which achieves an F1 score of 0.59. Through the establishment of a standard definition of entities and training/testing corpus, the suite is designed as a foundation for future work on tasks such as the development of job recommender systems.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0. (http://creativecommons.org/licenses/by-nc/4.0/) |
Keywords: | entity recognition; corpus development; job descriptions; natural language processing |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 25 Jan 2024 17:24 |
Last Modified: | 25 Jan 2024 17:33 |
Published Version: | https://aclanthology.org/2022.lrec-1.128 |
Status: | Published |
Publisher: | European Language Resources Association |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:208054 |