Souter, DC and Atwell, ES (1992) A richly annotated corpus for probabilistic parsing. In: Weir, C and Grishman, R, (eds.) AAAI Technical Report W S-92-01. The Tenth National Conference on Artificial Intelligence: Workshop on Statistically-Based NLP Techniques, 12-16 Jul 1992, San Jose, California, USA. AAAI , 22 - 32.
Abstract
This paper describes the use of a small but syntactically rich parsed corpus of English in probabilistic parsing. Software has been developed to extract probabilistic systemic-functional grammars (SFGs) from the Polytechnic of Wales Corpus in several formalisms, which could equally well be applied to other parsed corpora. To complement the large probabilistic grammar, we discuss progress in the provision of lexical resources, which range from corpus wordlists to a large lexical database supplemented with word frequencies and SFG categories. The lexicon and grammar resources may be used in a variety of probabilistic parsing programs, one of which is presented in some detail: The Realistic Annealing Parser. Compared to traditional rule-based methods, such parsers usually implement complex algorithms, and are relatively slow, but are more robust in providing analyses to unrestricted and even semi-grammatical English.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Keywords: | Natural language processing (Computer science) |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 18 Dec 2014 12:27 |
Last Modified: | 19 Dec 2022 13:29 |
Published Version: | http://www.aaai.org/Papers/Workshops/1992/WS-92-01... |
Status: | Published |
Publisher: | AAAI |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:82291 |