Tablan, V, Roberts, I, Cunningham, H et al. (1 more author) (2012) GATECloud.net: a Platform for Large-Scale, Open-Source Text Processing on the Cloud. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 371. ISSN 1471-2962
Abstract
Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost–benefit analysis and usage evaluation.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2012 Tablan et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | text mining, cloud computing, big data |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number EPSRC EP/I034092/1 EPSRC EP/I004327/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 22 Aug 2013 10:23 |
Last Modified: | 22 Aug 2013 10:23 |
Status: | Published |
Publisher: | Royal Society, The |
Identification Number: | 10.1098/rsta.2012.0071 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:76298 |