Oxnard, L. and Evans, A. (2003) Methodologies for the Automatic Location of Academic and Educational Texts on the Internet. Working Paper. School of Geography , University of Leeds.
Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined.
|Copyright, Publisher and Additional Information:||Copyright of the School of Geography, University of Leeds.|
|Institution:||The University of Leeds|
|Academic Units:||The University of Leeds > Faculty of Environment (Leeds) > School of Geography (Leeds) > Geography Working Papers (Leeds)|
|Depositing User:||Mr CIC Carson|
|Date Deposited:||22 Dec 2008 13:01|
|Last Modified:||12 Jun 2014 01:23|
|Publisher:||School of Geography|
|Identification Number:||School of Geography Working Paper 03/01|