Gaizauskas, R., Demetriou, G., Artymiuk, P.J. et al. (1 more author) (2003) Protein structures and information extraction from biological texts: The PASTA system. Bioinformatics, 19 (1). 135 - 143. ISSN 1367-4803
Abstract
Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. Results: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > School of Biosciences (Sheffield) > Department of Molecular Biology and Biotechnology (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 10 Feb 2014 15:17 |
Last Modified: | 15 Sep 2014 02:05 |
Identification Number: | 10.1093/bioinformatics/19.1.135 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:77600 |
Download
Filename: WRRO_77600.pdf
