Turgut-Ogme, S.S., Aydin, N. and Kurt, Z. orcid.org/0000-0003-3186-8091 (Submitted: 2025) Evaluation of generative AI models for processing single-cell RNA-sequencing data in human pancreatic tissue. [Preprint - bioRxiv] (Submitted)
Abstract
Single-cell RNA-seq (scRNAseq) analyses performed at the cellular level aim to understand the cellular landscape of tissue sections, offer insights into rare cell-types, and identify marker genes for annotating distinct cell types. Additionally, scRNAseq analyses are widely applied to cancer research to understand tumor heterogeneity, disease progression, and resistance to therapy. Single-cell data processing is a challenging task due to its high-dimensionality, sparsity, and having imbalanced class distributions. An accurate cell-type identification is highly dependent on preprocessing and quality control steps. To address these issues, generative models have been widely used in recent years. Techniques frequently used include Variational Autoencoders (VAE), Generative Adversarial Networks (GANs), Gaussian-based methods, and, more recently, Flow-based (FB) generative models. We conducted a comparative analysis of fundamental generative models, aiming to serve as a preliminary guidance for developing novel automated scRNAseq data analysis systems. We performed a meta-analysis by integrating four datasets derived from pancreatic tissue sections. To balance class distributions, synthetic cells were generated for underrepresented cell types using VAE, GAN, Gaussian Copula, and FB models. To evaluate the performances of generative models, we performed automated cell-type classification tasks in original and dimensionality-reduced spaces in a comparative manner. We also identified differentially expressed genes for each cell type, and inferred cell-cell interactions based on ligand-receptor pairs across distinct cell-types. Among the generative models, FB consistently outperformed others across all experimental setups in cell-type classification (with an F1-score of 0.8811 precision of 0.8531 and recall of 0.8643). FB produced biologically more relevant synthetic data according to correlation structures (with a correlation discrepancy score of 0.0511) and cell-cell interactions found from synthetic cells were closely resembling those of the original data. These findings highlight the potential and promising use of FB in scRNAseq analyses.
Metadata
Item Type: | Preprint |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). This preprint is made available under a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/) |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 22 Jan 2025 14:12 |
Last Modified: | 22 Jan 2025 14:12 |
Status: | Submitted |
Publisher: | Cold Spring Harbor Laboratory |
Identification Number: | 10.1101/2025.01.15.633192 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:222102 |