Flamm, C, Hellmuth, M orcid.org/0000-0002-1620-5508, Merkle, D et al. (2 more authors) (2020) Generic Context-Aware Group Contributions. IEEE/ACM Transactions on Computational Biology and Bioinformatics. ISSN 1545-5963
Abstract
Many properties of molecules vary systematically with changes in the structural formula and can thus be estimated from regression models defined on small structural building blocks, usually functional groups. Typically, such approaches are limited to a particular class of compounds and requires hand-curated lists of chemically plausible groups. This limits their use in particular in the context of generative approaches to explore large chemical spaces. Here we overcome this limitation by proposing a generic group contribution method that iteratively identifies significant regressors of increasing size. To this end, LASSO regression is used and the context-dependent contributions are “anchored” around a reference edge to reduce ambiguities and prevent overcounting due to multiple embeddings. We benchmark our approach, which is available as “Context AwaRe Group cOntribution” (CARGO), on artificial data, typical applications from chemical thermodynamics. As we shall see, this method yields stable results with accuracies comparable to other regression techniques. As a by-product, we obtain interpretable additive contributions for individual chemical bonds and correction terms depending on local contexts.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 06 Jul 2020 10:19 |
Last Modified: | 09 Aug 2020 23:54 |
Status: | Published online |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Identification Number: | 10.1109/tcbb.2020.2998948 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:162674 |