Zhang, Z., Gao, J. and Ciravegna, F. (2016) JATE2.0 : Java Automatic Term Extraction with Apache Solr. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J. and Piperidis, S., (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). LREC 2016, Tenth International Conference on Language Resources and Evaluation, 23-28 May 2016, Portorož, Slovenia. European Language Resources Association (ELRA) ISBN 9782951740891
Abstract
Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customization, development, and comparison of algorithms under a uniform environment. This paper introduces JATE 2.0, a complete remake of the free Java Automatic Term Extraction Toolkit (Zhang et al., 2008) delivering new features including: (1) highly modular, adaptable and scalable ATE thanks to integration with Apache Solr, the open source free-text indexing and search platform; (2) an extended collection of state-of-the-art algorithms. We carry out experiments on two well-known benchmarking datasets and compare the algorithms along the dimensions of effectiveness (precision) and efficiency (speed and memory consumption). To the best of our knowledge, this is by far the only free ATE library offering a flexible architecture and the most comprehensive collection of algorithms.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2016 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial Licence (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You may not use the material for commercial purposes. |
Keywords: | term extraction; term recognition; NLP; text mining; Solr; search; indexing |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Innovate UK (TSB) 101947 / 41205-293373 European Commission - FP6/FP7 WESENSEIT - 308429 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 02 Jun 2016 09:57 |
Last Modified: | 17 Jun 2020 11:25 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2016/summ... |
Status: | Published |
Publisher: | European Language Resources Association (ELRA) |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:96573 |