JATE2.0 : Java Automatic Term Extraction with Apache Solr

Zhang, Z., Gao, J. and Ciravegna, F. (2016) JATE2.0 : Java Automatic Term Extraction with Apache Solr. In: Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J. and Piperidis, S., (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). LREC 2016, Tenth International Conference on Language Resources and Evaluation, 23-28 May 2016, Portorož, Slovenia. European Language Resources Association (ELRA) ISBN 9782951740891

Abstract

Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customization, development, and comparison of algorithms under a uniform environment. This paper introduces JATE 2.0, a complete remake of the free Java Automatic Term Extraction Toolkit (Zhang et al., 2008) delivering new features including: (1) highly modular, adaptable and scalable ATE thanks to integration with Apache Solr, the open source free-text indexing and search platform; (2) an extended collection of state-of-the-art algorithms. We carry out experiments on two well-known benchmarking datasets and compare the algorithms along the dimensions of effectiveness (precision) and efﬁciency (speed and memory consumption). To the best of our knowledge, this is by far the only free ATE library offering a ﬂexible architecture and the most comprehensive collection of algorithms.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Zhang, Z. Gao, J. Ciravegna, F.
Editors:	Calzolari, N. Choukri, K. Declerck, T. Goggi, S. Grobelnik, M. Maegaard, B. Mariani, J. Mazo, H. Moreno, A. Odijk, J. Piperidis, S.
Copyright, Publisher and Additional Information:	© 2016 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial Licence (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You may not use the material for commercial purposes.
Keywords:	term extraction; term recognition; NLP; text mining; Solr; search; indexing
Dates:	Accepted: 1 February 2016 Published: 23 May 2016
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Funding Information:	Funder Grant number Innovate UK (TSB) 101947 / 41205-293373 European Commission - FP6/FP7 WESENSEIT - 308429
Depositing User:	Symplectic Sheffield
Date Deposited:	02 Jun 2016 09:57
Last Modified:	17 Jun 2020 11:25
Published Version:	http://www.lrec-conf.org/proceedings/lrec2016/summ...
Status:	Published
Publisher:	European Language Resources Association (ELRA)
Refereed:	Yes
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:96573

Download

Published Version

Filename: 211_Paper.pdf

Licence: CC-BY-NC 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

JATE2.0 : Java Automatic Term Extraction with Apache Solr

Abstract

Metadata

Download

Published Version

Export

Statistics