Ghiandoni, G.M., Bodkin, M.J., Chen, B. et al. (4 more authors) (2019) Development and application of a data-driven reaction classification model : comparison of an electronic lab notebook and the medicinal chemistry literature. Journal of Chemical Information and Modeling, 59 (10). pp. 4167-4187. ISSN 1549-9596
Abstract
Reaction classification has often been considered an important task for many different applications, and has traditionally been accomplished using hand-coded rule-based approaches. However, the availability of large collections of reactions enables data-driven approaches to be developed. We present the development and validation of a 336-class machine learning-based classification model integrated within a Conformal Prediction (CP) framework in order to associate reaction class predictions with confidence estimations. We also propose a data-driven approach for 'dynamic' reaction fingerprinting to maximise the effectiveness of reaction encoding, as well as developing a novel reaction classification system that organises labels in four hierarchical levels (SHREC: Sheffield Hierarchical REaction Classification). We show that the performance of the CP augmented model can be improved by defining confidence thresholds to detect predictions that are less likely to be false. For example, the external validation of the model reports 95% of predictions as correct by filtering out less than 15% of the uncertain classifications. The application of the model is demonstrated by classifying two reaction datasets: one extracted from an industrial ELN and the other from the medicinal chemistry literature. We show how confidence estimations and class compositions across different levels of information can be used to gain immediate insights on the nature of reaction collections and hidden relationship between reaction classes.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019 American Chemical Society. This is an open access article published under a Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 26 Sep 2019 09:17 |
Last Modified: | 10 Jul 2020 09:19 |
Status: | Published |
Publisher: | American Chemical Society (ACS) |
Refereed: | Yes |
Identification Number: | 10.1021/acs.jcim.9b00537 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:151385 |