Protein structures and information extraction from biological texts: The PASTA system

Abstract

Motivation: The rapid increase in volume of protein structure literature means useful information may be hidden or lost in the published literature and the process of finding relevant material, sometimes the rate-determining factor in new research, may be arduous and slow. Results: We describe the Protein Active Site Template Acquisition (PASTA) system, which addresses these problems by performing automatic extraction of information relating to the roles of specific amino acid residues in protein molecules from online scientific articles and abstracts. Both the terminology recognition and extraction capabilities of the system have been extensively evaluated against manually annotated data and the results compare favourably with state-of-the-art results obtained in less challenging domains. PASTA is the first information extraction (IE) system developed for the protein structure domain and one of the most thoroughly evaluated IE system operating on biological scientific text to date.

Metadata

Item Type:	Article
Authors/Creators:	Gaizauskas, R. Demetriou, G. Artymiuk, P.J. Willett, P.
Dates:	Published: 1 January 2003
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Science (Sheffield) > School of Biosciences (Sheffield) > Department of Molecular Biology and Biotechnology (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	10 Feb 2014 15:17
Last Modified:	15 Sep 2014 02:05
Identification Number:	10.1093/bioinformatics/19.1.135
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:77600

Download

Published Version

Embargoed until: 2 February 2114

Filename: WRRO_77600.pdf

CORE (COnnecting REpositories)

Protein structures and information extraction from biological texts: The PASTA system

Abstract

Metadata

Download

Published Version

Export

Statistics