Bai, P. orcid.org/0000-0003-3027-5518, Miljković, F. orcid.org/0000-0001-5365-505X, Liu, X. orcid.org/0000-0002-3084-519X et al. (4 more authors) (2025) Mask-prior-guided denoising diffusion improves inverse protein folding. Nature Machine Intelligence, 7 (6). pp. 876-888. ISSN 2522-5839
Abstract
Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep learning advances showing strong potential and competitive performance. However, challenges remain, such as predicting elements with high structural uncertainty, including disordered regions. To tackle such low-confidence residue prediction, we propose a mask-prior-guided denoising diffusion (MapDiff) framework that accurately captures both structural information and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural information and residue interactions, we have developed a graph-based denoising network with a mask-prior pretraining strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to reduce uncertainty. Evaluation on four challenging sequence design benchmarks shows that MapDiff substantially outperforms state-of-the-art methods. Furthermore, the in silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s) 2025. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | Computer science; Machine learning; Protein design |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 30 Jun 2025 13:36 |
Last Modified: | 30 Jun 2025 13:36 |
Status: | Published |
Publisher: | Springer Science and Business Media LLC |
Refereed: | Yes |
Identification Number: | 10.1038/s42256-025-01042-6 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:228531 |