Abushilah, SF, Taylor, CC orcid.org/0000-0003-0181-1094 and Gusnanto, A (2020) Geometry-based Distance for Clustering Amino Acids. Journal of Applied Statistics, 47 (7). pp. 1235-1250. ISSN 0266-4763
Abstract
Clustering amino acids is one of the most challenging problems in functional and structural prediction of protein. Previous studies have proposed clusters based on measurements of physical and biochemical characteristics of the amino acids such as volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. These characteristics, although important, are less directly related to the protein structure compared to geometrical characteristics such as dihedral angles between amino acids. We propose using the p-value from a test of equality of dihedral-angle distributions as the basis of a distance measure for the clustering. In this novel approach, an energy test is modified to deal with bivariate angular data and the p-value is obtained via a permutation method. The results indicate that the clusters of amino acids have sensible interpretation where Glycine, Proline, and Asparagine each forms a distinct cluster. A simulation study suggests that this approach has good working characteristics to cluster amino acids.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019, Informa UK Limited, trading as Taylor & Francis Group. This is an author produced version of a paper published in the Journal of Applied Statistics. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Circular distance, squared Euclidean distance, permutation two-sample test, energy statistic, hierarchical clustering, similarity indices |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Mathematics (Leeds) > Statistics (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Oct 2019 15:54 |
Last Modified: | 27 Jan 2022 10:24 |
Status: | Published |
Publisher: | Taylor and Francis |
Identification Number: | 10.1080/02664763.2019.1673324 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:151789 |