Jiang, B, Li, Z, Chen, H et al. (1 more author) (2018) Latent Topic Text Representation Learning on Statistical Manifolds. IEEE Transactions on Neural Networks and Learning Systems, 29 (11). pp. 5643-5654. ISSN 2162-237X
Abstract
The explosive growth of text data requires effective methods to represent and classify these texts. Many text learning methods have been proposed, like statistics-based methods, semantic similarity methods, and deep learning methods. The statistics-based methods focus on comparing the substructure of text, which ignores the semantic similarity between different words. Semantic similarity methods learn a text representation by training word embedding and representing text as the average vector of all words. However, these methods cannot capture the topic diversity of words and texts clearly. Recently, deep learning methods such as CNNs and RNNs have been studied. However, the vanishing gradient problem and time complexity for parameter selection limit their applications. In this paper, we propose a novel and efficient text learning framework, named Latent Topic Text Representation Learning. Our method aims to provide an effective text representation and text measurement with latent topics. With the assumption that words on the same topic follow a Gaussian distribution, texts are represented as a mixture of topics, i.e., a Gaussian mixture model. Our framework is able to effectively measure text distance to perform text categorization tasks by leveraging statistical manifolds. Experimental results on text representation and classification, and topic coherence demonstrate the effectiveness of the proposed method.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number Royal Society IE121685 |
Depositing User: | Symplectic Publications |
Date Deposited: | 05 Apr 2018 10:08 |
Last Modified: | 30 Jan 2019 16:01 |
Status: | Published |
Publisher: | IEEE |
Identification Number: | 10.1109/TNNLS.2018.2808332 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:129178 |