White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Use of graph theory for data mining in public health

Bath, P.A., Craigs, C., Maheswaran, R., Raymond, J. and Willett, P. (2002) Use of graph theory for data mining in public health. In: Zanasi, A., Brebbia, C.A., Ebecken, N.F. and Melli, P., (eds.) Data Mining III Proceedings of the Third International Conference on Data Mining. Third International Conference on Data Mining, 2002, Bologna, Italy. Southampton: WIT Press , pp. 819-828. ISBN 1-85312-925-9

Full text not available from this repository.

Abstract

Data mining problems are common in public health, for example for identifying disease clusters and multidimensional patterns within large databases, e.g. socioeconomic differentials in health. Although numerous data mining methods have been developed, currently available methods are not designed to handle complex pattern searching queries and no satisfactory methods are available for this purpose.

The aim of the study reported here was to test graph-theoretical methods for data mining in public health databases to identify areas of high deprivation that are surrounded by affluent areas and deprived areas surrounded by deprived areas. Graph-theory (using the maximum common subgraph isomorphism (mcs) method) was used to search a database containing information on the 10920 enumeration districts (EDs) for the Trent Region of England. Each ED was allocated to a deprivation quintile based on the Townsend Deprivation Score. These mcs program was used to identify deprived EDs that are adjacent to deprived EDs and deprived EDs that are adjacent to affluent EDs. The mcs program identified 1528 deprived EDs adjacent to at least two deprived EDs, 1181 deprived EDs adjacent to at least three deprived EDs, 802 deprived EDs adjacent to at least four deprived EDs, and 505 deprived EDs adjacent to at least five deprived EDs. The program successfully identified 147 deprived EDs adjacent to at least two affluent EDs, 54 deprived EDs adjacent to at least three affluent EDs, 14 deprived EDs adjacent to at least four affluent EDs, and six deprived EDs adjacent to at least five affluent EDs. The retrieved EDs were then used for hypothesis testing using statistical methods. The study demonstrates the potential of graph theoretical techniques for data mining in public health databases.

Item Type: Proceedings Paper
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Health and Related Research (Sheffield) > Section of Public Health (Sheffield)
The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User: Information Studies
Date Deposited: 25 Mar 2009 12:28
Last Modified: 13 May 2009 14:55
Published Version: http://dx.doi.org/10.2495/DATA020791
Status: Published
Publisher: Southampton: WIT Press
Refereed: Yes
Identification Number: 10.2495/DATA020791
URI: http://eprints.whiterose.ac.uk/id/eprint/8376

Actions (repository staff only: login required)