Jaiswal, AK orcid.org/0000-0001-8848-7041, Tiwari, P, Garg, S et al. (1 more author) (2021) Entity-aware capsule network for multi-class classification of big data: A deep learning approach. Future Generation Computer Systems, 117. pp. 1-11. ISSN 0167-739X
Abstract
Named entity recognition (NER) is one of the most challenging natural language processing (NLP) tasks, as its performance is related to constantly evolving languages and dependency on expert (human) annotation. The diverse and dynamic content on the web significantly raises the need for a more generalized approach—one that is capable of correctly classifying terms in a corpus and feeding subsequent NLP tasks, such as machine translation, query expansion, and many other applications. Although extensively researched in recent times, the variety of public corpora available nowadays provides room for new and more accurate methods to tackle the NER problem. This paper presents a novel method that uses deep learning techniques based on the capsule network architecture for predicting entities in a corpus. This type of network groups neurons into so-called capsules to detect specific features of an object without reducing the original input unlike convolutional neural networks and their ‘max-pooling’ strategy. Our extensive evaluation on several benchmarked datasets demonstrates how competitive our method is in comparison with state-of-the-art techniques and how the usage of the proposed architecture may represent a significant benefit to further NLP tasks, especially in cases where experts are needed. Also, we explore NER using a theoretical framework that leverages big data for security. For the sake of reproducibility, we make the codebase open-source2 .
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Keywords: | Natural language processing; Named entity recognition; Capsule network |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 26 Apr 2021 13:15 |
Last Modified: | 26 Apr 2021 13:15 |
Status: | Published |
Publisher: | Elsevier |
Identification Number: | 10.1016/j.future.2020.11.012 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:173382 |