Krotov, A., Hepple, M., Gaizauskas, R. et al. (1 more author) (1999) Evaluating two methods for Treebank grammar compaction. Natural Language Engineering, 5 (4). pp. 377-394. ISSN 1351-3249
Abstract
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 1999 Cambridge University Press. Reproduced in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Repository Assistant |
Date Deposited: | 03 Oct 2006 |
Last Modified: | 05 Jun 2014 12:37 |
Published Version: | http://dx.doi.org/10.1017/S1351324900002308 |
Status: | Published |
Publisher: | Cambridge University Press |
Refereed: | Yes |
Identification Number: | 10.1017/S1351324900002308 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:1631 |