Hu, X and Atwell, ES (2003) A survey of machine learning approaches to analysis of large corpora. In: Simov, K and Osenova, P, (eds.) Proceedings of SProLaC: Workshop on Shallow Processing of Large Corpora. SProLaC: Workshop on Shallow Processing of Large Corpora, 28-31 Mar 2003, Lancaster University, UK. UCREL, Lancaster University , 45 - 52.
Abstract
Corpus-based Machine Learning of linguistic annotations has been a key topic for all areas of Natural Language Processing. This paper presents a survey, along three dimensions of classification. First we outline different linguistic level of analysis: Tokenisation, Part-of-Speech tagging, Parsing, Semantic analysis and Discourse annotation. Secondly, we introduce alternative approaches to Machine Learning applicable to linguistic annotation of corpora: N-gram and Markov models, Neural Networks, Transformation-Based Learning, Decision Tree learning, and Vector-based classification. Thirdly, weexamine a range of Machine Learning systems for the most challenging level of linguistic annotation, discourse analysis; these illustrate the various Machine Learning approaches. Our overall aim is to provide an ontology or framework for further development of our research.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | (c) 2003, UCREL. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Machine Learning; corpus; annotation; tagging; linguistic analysis; dialogue |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 19 Jan 2015 11:14 |
Last Modified: | 19 Dec 2022 13:30 |
Published Version: | http://ucrel.lancs.ac.uk/cl2003/ |
Status: | Published |
Publisher: | UCREL, Lancaster University |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82305 |