White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Evaluating two methods for Treebank grammar compaction

Krotov, A., Hepple, M., Gaizauskas, R. and Wilks, Y. (1999) Evaluating two methods for Treebank grammar compaction. Natural Language Engineering, 5 (4). pp. 377-394. ISSN 1351-3249


Download (182Kb)


Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.

In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.

Item Type: Article
Copyright, Publisher and Additional Information: © 1999 Cambridge University Press. Reproduced in accordance with the publisher's self-archiving policy.
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User: Repository Assistant
Date Deposited: 03 Oct 2006
Last Modified: 05 Jun 2014 12:37
Published Version: http://dx.doi.org/10.1017/S1351324900002308
Status: Published
Publisher: Cambridge University Press
Refereed: Yes
Identification Number: 10.1017/S1351324900002308
URI: http://eprints.whiterose.ac.uk/id/eprint/1631

Actions (repository staff only: login required)