Barzdajn, B. orcid.org/0000-0002-3081-4131 and P Race, C. orcid.org/0000-0002-9775-687X (2025) Optimal design of experiments in the context of machine-learning inter-atomic potentials: improving the efficiency and transferability of kernel based methods. Modelling and Simulation in Materials Science and Engineering, 33 (2). 025011. ISSN 0965-0393
Abstract
Data-driven machine learning (ML) models of atomistic interactions are often based on flexible and non-physical functions that can relate nuanced aspects of atomic arrangements to predictions of energies and forces. As a result, these potentials are only as good as the training data (usually the results of so-called ab initio simulations), and we need to ensure that we have enough information to make a model sufficiently accurate, reliable and transferable. The main challenge stems from the fact that descriptors of chemical environments are often sparse, high-dimensional objects without a well-defined continuous metric. Therefore, it is rather unlikely that any ad hoc method for selecting training examples will be indiscriminate, and it is easy to fall into the trap of confirmation bias, where the same narrow and biased sampling is used to generate training and test sets. We will show that an approach derived from classical concepts of statistical planning of experiments and optimal design can help to mitigate such problems at a relatively low computational cost. The key feature of the method we will investigate is that it allows us to assess the quality of the data without obtaining reference energies and forces—a so-called offline approach. In other words, we are focusing on an approach that is easy to implement and does not require sophisticated frameworks that involve automated access to high performance computing.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > School of Chemical, Materials and Biological Engineering |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 04 Feb 2025 14:32 |
Last Modified: | 04 Feb 2025 14:32 |
Published Version: | https://doi.org/10.1088/1361-651x/ada050 |
Status: | Published |
Publisher: | IOP Publishing |
Refereed: | Yes |
Identification Number: | 10.1088/1361-651x/ada050 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:222803 |
Download
Filename: Barzdajn_2025_Modelling_Simul._Mater._Sci._Eng._33_025011.pdf
Licence: CC-BY 4.0