Identification and characterization of in vitro expanded hematopoietic stem cells

Abstract Hematopoietic stem cells (HSCs) cultured outside the body are the fundamental component of a wide range of cellular and gene therapies. Recent efforts have achieved > 200‐fold expansion of functional HSCs, but their molecular characterization has not been possible since the majority of cells are non‐HSCs and single cell‐initiated cultures have substantial clone‐to‐clone variability. Using the Fgd5 reporter mouse in combination with the EPCR surface marker, we report exclusive identification of HSCs from non‐HSCs in expansion cultures. By directly linking single‐clone functional transplantation data with single‐clone gene expression profiling, we show that the molecular profile of expanded HSCs is similar to proliferating fetal HSCs and reveals a gene expression signature, including Esam, Prdm16, Fstl1, and Palld, that can identify functional HSCs from multiple cellular states. This “repopulation signature” (RepopSig) also enriches for HSCs in human datasets. Together, these findings demonstrate the power of integrating functional and molecular datasets to better derive meaningful gene signatures and opens the opportunity for a wide range of functional screening and molecular experiments previously not possible due to limited HSC numbers.


Introduction
Achieving efficient and controlled in vitro HSC expansion and defined mature cell production would have substantial therapeutic implications. HSC transplantation has been the bedrock of therapy in hematological malignancies for over 60 years and its success strongly correlates with the number of HSCs transplanted (Eaves, 2015). Increasing the purity of transplanted HSCs relative to mature cells would help reduce the likelihood of graft-versus-host disease (Lang & Handgretinger, 2008). Similarly, the expansion of functional HSCs outside the body would benefit gene therapy efforts for congenital hematopoietic diseases by preserving HSC function during genetic manipulation (Naldini, 2015) while also seeding in vitro production of virtually limitless numbers of mature blood cells, alleviating the need for blood cell donations (Batta et al, 2016).
Decades of research have identified a wide range of intrinsic genetic regulators that substantially increase HSC expansion in vitro, including Hoxb4, Fbxw7, Dppa5, Prdm16, among others (Antonchuk et al, 2002;Deneault et al, 2009;Iriuchishima et al, 2011;Miharada et al, 2014). Despite significant increases in functional HSCs, these strategies uniformly required genetic integration, resulting in a risk of leukemic initiation via over-activation of self-renewal programs or blockage of differentiation. To overcome this, numerous groups have explored transient use of extrinsic regulators such as hematopoietic cytokines, growth factors, and small molecules to increase HSC self-renewal (Zhang & Lodish, 2008;Fares et al, 2014;Wohrer et al, 2014;Wen et al, 2020). These efforts have culminated in a fully defined culture system that expands mouse HSCs > 200-fold over a 28-day period (Wilkinson et al, 2019).
Despite this significant breakthrough, several outstanding issues remain. First, single HSCs cultured under these conditions display considerable heterogeneity in terms of their expansion and functional transplantation outcomes, and there is currently no way to prospectively identify clones containing functional HSCs. Second, since the large majority of cells are not HSCs (even in a successfully expanded HSC culture), it is challenging to undertake any molecular experiments on purified populations of expanded HSCs. As a result, despite the urgent need to understand and manipulate the molecular program of an expanded HSC, there are no methods currently available to undertake these studies amidst a lack of robust markers to isolate functional HSCs in vitro (Zhang & Lodish, 2005).
Here, we describe novel methods for prospectively isolating and characterizing in vitro expanded HSCs. Using the Fgd5 reporter mouse (Gazit et al, 2014) in combination with the HSC cell surface marker EPCR (Kent et al, 2009), we present a specific purification strategy for expanded HSCs and validate its functional utility in transplantation experiments. Combining the functional outcomes of these experiments with transcriptional profiling of the same clones split into expanded HSCs and their non-HSC counterparts, we report the first molecular profile of expanded HSCs. Finally, by integrating single-cell gene expression profiles of cycling and in vitro hibernating HSCs with a freshly isolated hematopoietic cell landscape, we identify a robust transcriptional signature that can identify functional HSCs irrespective of cellular source or activation status, including highly efficient enrichment of functional human HSCs within human RNA-seq datasets.

Fgd5 and EPCR mark stem cells in vitro
EPCR has previously been identified as a highly selective marker for HSCs in vivo (Kent et al, 2009) and has also been demonstrated to track expanded human HSCs in culture (Fares et al, 2017), but on its own, it is insufficient to obtain highly purified HSCs. The advent of numerous mouse HSC reporter strains (Gazit et al, 2014;Busch et al, 2015;Chen et al, 2016;Cabezas-Wallscheid et al, 2017;Tajima et al, 2017;Pinho et al, 2018) represents a potentially novel set of tools for improving the identification of expanded HSCs in vitro. Because Fgd5 was previously described as a highly specific reporter for HSCs (Gazit et al, 2014) and to enrich for a subset of primitive HSCs in EPCR + cell fractions as well as immune-activated cells (Bujanover et al, 2018;Rabe et al, 2019), we investigated whether Fgd5 and EPCR could, in combination, mark functional mouse HSCs in vitro after periods of culture.
Since expanded HSCs are actively cycling-unlike adult HSCswe first assessed whether Fgd5 was expressed on actively cycling HSCs in vivo by studying fetal liver (FL) HSCs. All phenotypic HSCs (as defined by the E-SLAM gating strategy; Benz et al, 2012;Kent et al, 2009) in both bone marrow (BM) and FL were Fgd5-ZsGreen + (Fig 1A and B), suggesting that Fgd5 and EPCR expression could mark actively cycling HSCs. Next, we utilized 10-day single-cell HSC expansion cultures to compare Fgd5 and EPCR expression with traditional in vitro markers of expanded HSCs. We observed a strong correlation between the percentage of Fgd5-ZsGreen + EPCR + (FE + ) cells and the Lin À Sca1 + Kit + (LSK) phenotype in 10-day clones ( Fig 1C). However, some clones with high LSK percentages had lower FE + percentages, suggesting that FE + could be more selective for phenotypic HSCs than LSK alone (Fig 1C). Similarly, 10-day single-cell cultures  with a wide-range of clone sizes showed that the FE + fraction of cells contained significantly higher proportions of LSK cells and that all of the larger, more differentiated clones lost both Fgd5 and EPCR expression ( Fig EV1A).
To test the functional HSC content of FE + cells within the culture, we cultured E-SLAM HSCs for 3 days in conditions that maintain HSC function (Kent et al, 2008) and re-sorted Fgd5-ZsGreen high EPCR high (F hi E hi ) and Fgd5-ZsGreen low EPCR low (F lo E lo ) cells for transplantation into irradiated recipients ( Fig EV1B). Fgd5 and EPCR expression were correlated (r 2 = 0.5; Fig EV1C) and even though fewer F hi E hi cells were transplanted (583 vs. 1,509 cells), only mice receiving F hi E hi cells displayed long-term multilineage reconstitution, indicating that Fgd5 and EPCR expression are retained on functional HSCs in vitro (Figs 1D and EV1D). In addition, FE + cells were also more numerous in single HSC cultures with higher levels of expansion (e.g., cultures containing PVA compared to those containing HSA; Wilkinson et al, 2019;Figs 1E, and EV1E and F), thereby supporting the use of FE + as a simple two-color screening tool for functional HSC content in vitro.
Recently, a fully defined cell culture protocol was reported to expand HSCs between 236-and 899-fold over a 28-day period (Wilkinson et al, 2019). However, since individual clones showed substantial heterogeneity in clone size and functional output, we set out to determine whether the FE + strategy could help prospectively identify clones containing functional HSCs. After 28 days of culture, single HSC clones were harvested for transplantation  and 10% of each clone was collected for flow cytometric analysis. At day 28, the percentage of FE + cells strongly correlated (r 2 = 0.6658) with functional HSC content as measured by transplantation ( Fig 1F). The addition of LSK markers further enhanced the correlation (r 2 = 0.7425; Fig 1G and H). Taken together, these results indicate that Fgd5 and EPCR can reliably identify clones containing functional HSCs in both short-and long-term cultures.

Linked functional and gene expression assays enable analyses of heterogeneous HSC clones
To understand the molecular drivers of heterogeneity in functional HSC expansion, we performed simultaneous functional and molecular assessment of 20 single HSC-derived 28-day clones. Individual clones were expanded for 28 days and cells were fractionated into phenotypic HSCs and non-HSCs, with 50% of the sorted cells used for transplantation and 50% for RNA-sequencing (RNA-seq; Fig 2A). For these experiments, although Fgd5 ZsGreen•ZsGreen/+ mice were used for immunophenotypic profiling of the clones, Fgd5 was not used in the gating strategy of the re-sort for two reasons: (i) EPCR and Fgd5 levels correlated strongly within the LSK fraction and (ii) reliance on a reporter mouse would decrease the broader applicability of the results and method. Therefore, for the reisolation post culture, phenotypic HSCs were defined as EPCR + LSK (ELSK) cells and non-HSCs represented the remaining cell fraction (nonELSK; Fig EV2A).
Recipients of phenotypic HSCs from 8 of 20 (40%) clones displayed high levels of multilineage engraftment, which accords with the previously reported single clone engraftment frequency of 28.5% (Wilkinson et al, 2019;Fig 2B and C). Donor cell contribution in recipient mice did not correlate with absolute live cell numbers or absolute numbers of phenotypic HSCs within each clone (r 2 = 0.0096 and r 2 = 0.1894), suggesting that HSC self-renewal is not linked with overall clonal proliferation (Fig 2D and E). In contrast, donor cell contribution was highly correlated with the percentage of ELSK cells present in the clone (r 2 = 0.8279) and there was an even stronger correlation of %ELSK with donor cell contribution to the granulocyte-monocyte (GM) lineage, which is a reliable indicator of serial repopulating ability (Dykstra et al, 2007;Fig 2F and G). The addition of Fgd5 to the gating strategy resulted in a slight improvement to the correlation (r 2 = 0.8642; Fig EV2B), but for the reasons outlined above it was not used for cell isolation. Using these data, we identified a conservative > 20% cutoff for the %ELSK proportion that could reliably identify clones with functional HSCs. Transplantation of non-HSCs (nonELSK cells) from multiple clones was also performed and despite transplanting an average of 26-fold more cells per mouse, nonELSK cells largely lacked multilineage reconstitution capacity (Fig EV2C and D) with a calculated estimate of HSC frequency within the nonELSK fraction being less than 1 in 750,494 cells (Fig EV2E), strongly indicating that the vast majority of functional HSCs are contained within the ELSK fraction. The frequency of functional HSCs in the ELSK fraction, on the other hand, was much higher with single cell transplantation data from 28-day bulk cultures showing a very high rate of success. At 12 weeks posttransplantation, 33 of 50 (66%) single cell transplantations gave donor chimerism values of > 1% ( Fig EV2F). Seventeen of 50 single-cell transplantations (34%) contributed > 0.5% to each of the GM, B, and T cell lineages (Fig EV2G-I). These data indicate that the ELSK fraction is highly enriched for HSCs (> 100,000 fold) compared to non-ELSK cells although a small minority of functional HSCs likely still exist outside of the ELSK gate.
The above experiments were performed by transplanting 50% of the HSCs from each clone. In order to further test the HSC expansion capacity of single cell-derived cultures, we selected seven clones for transplantation using just 5% of sorted ELSK cells (ranging from 3 to 48% of total cells in the clone). Out of the seven transplanted clones, only the two with the highest ELSK content showed successful multilineage reconstitution at 5% doses (Fig EV2J-L). Overall, our data strongly indicate that the ELSK phenotype can be used to reliably track functional HSC content in heterogeneous expansion cultures, and therefore acts as a robust tool for functional validation of the molecular characterization of expanded HSCs described below.

Expanded HSC clones share molecular features with freshly isolated HSCs
To characterize the molecular state of in vitro expanded HSCs and identify potential drivers of repopulating versus nonrepopulating clones, RNA-seq was performed on 12 clones which were split into phenotypic HSCs (ELSK) and non-HSCs (nonELSK; Fig EV3A) and coupled with transplantation assays for functional validation. Clones were selected to represent a range of phenotypic HSC content and donor chimerism in transplantation assays, directly linking the functional HSC content to the transcriptional profile. To simplify the analysis, the samples were categorized into four groups: (i) ELSK cells from clones that repopulated mice (PosELSK); (ii) ELSK cells from clones that did not repopulate mice (NegELSK); (iii) nonELSK cells from clones that repopulated mice (PosNo-nELSK); and (iv) nonELSK cells from clones that did not repopulate mice (NegNonELSK; Fig 3A).
Of the 24 cell fractions, two samples failed quality control due to low read counts (Table 1). After removal of lowly expressed genes, 16,648 genes were detected across 22 unique samples. We first performed multiple dimensional scaling (MDS), which showed clear separation between samples originating from clones which repopulated mice and clones that did not ( Fig 3B). Samples could be further resolved by whether they were phenotypic HSCs (ELSK cells) or not (nonELSK cells). Notably, the nonELSK fraction of repopulating clones overlapped with the profiles of ELSK cells from nonrepopulating clones (negELSK), suggesting that molecular profiles are more closely linked to cellular function than cell surface immunophenotype ( Fig 3B). In line with this observation, posELSK samples with an increasing proportion of donor chimerism and GM contribution clustered separately from negELSK cells ( Fig EV3B).
In order to assess the similarity of repopulating ELSK cells to freshly isolated HSCs, we first computed the geometric mean for a previously described gene signature for freshly isolated HSCs (termed molecular overlap, or "MolO"; Wilson et al, 2015). Here, increasing repopulation potency was closely correlated with the MolO geometric mean score, and nonELSK cell fractions had significantly lower MolO scores than ELSK cells (Figs 3B and EV3C). NonELSK cells from repopulating clones (PosNo-nELSK) showed higher MolO scores than ELSK cells obtained from clones without functional HSCs (NegELSK), suggesting that the MolO signature correlated with functional HSC content ( Fig EV3C). Of note, several MolO signature genes were below the minimum threshold of expressed genes across all samples (Cldn10, Ramp2, Smtnl1, Sox18, and Sqrdl), indicating that although these genes are expressed in freshly isolated HSCs (and may play a biological role in those cells), they are not highly expressed in ex vivo cultured HSCs. Overall, these data highlight the prospect of identifying functional HSCs with durable selfrenewal and repopulation potency based on their transcriptional profiles.
◀ Figure 1. Fgd5 and EPCR mark HSCs in short and long-term cultures.

A
Representative flow analysis for adult BM and FL ESLAM HSCs. B Percentage of ESLAM HSCs that are Fgd5 + in adult BM and FL (n = 7 and 4 biological replicates, respectively). Error bars represent SD. C The correlation between %LSK and %Fgd5 + EPCR + (FE + ) cells in single-cell clones cultured for 10 days (n = 2). Pearson correlation, ****P < 0.001. D The percentage of donor chimerism in primary recipients over 16 weeks is displayed for F hi E hi cells (n = 3), F lo E lo cells (n = 3), and bulk live cells (n = 2). t-Test, *P < 0.05, **P < 0.01, ***P < 0.001. E The percentage of cells that are FELSK within each clone is displayed in the graph for 10-day F12 cultures containing HSA (n = 81) or PVA (n = 89). t-Test, **P < 0.01. F-H The correlation between donor chimerism and different phenotypic gating strategies for clones transplanted after 28-days of culture in F12 PVA medium supplemented with 10 ng/ml SCF, 100 ng/ml TPO, and 20 ng/ml IL-11. Pearson correlation, r 2 values displayed, *P < 0.05, **P < 0.01.
84% 82% 74% 70% 45% 42% 32% 31% 12% 7% 6% 5% 4% Nonrepopulating clones express mature cell gene signatures and lose HSC gene expression signatures To first identify the dominant cell types in isolated cell fractions, we computed the correlation of each fraction with previously defined gene expression profiles for a broad spectrum of hematopoietic stem and progenitor cell types (curated within the ImmGen database; . While all ELSK cell fractions were correlated with short-term HSCs (ST34F), only repopulating ELSKs were specifically correlated with long-term HSCs (LT-HSC; Fig 3C). Despite the stark differences in HSC functional content and associated gene signatures, a direct comparison of posELSK to negELSK cells revealed a limited set of 44 differentially expressed genes (Figs EV3D and E), with negELSK enriched for differentiation-associated gene ontology (GO) terms and posELSKs enriched for cell surface GO terms ( Fig EV3F). The non-HSC fraction (nonELSK) on the other hand, were subject to a wider range of transcriptional differences between repopulating and nonrepopulating clones (Appendix Fig S1A), suggesting that the cellular composition of each clone might affect HSC expansion. In order to gain a better understanding of the cellular compositions of the in vitro expanded clones, we performed 10× RNA-seq on a culture initiated with 50 LT-HSCs. In accordance with the earlier Immgen analysis, the majority of cells shared a transcriptomic signature with primitive hematopoietic cells, with a small minority of single cells expressing mature myeloid genes and an even smaller proportion expressing lymphoid genes ( Fig 3D). Cell state transitions of this in vitro progenitor cell production were also predicted to share a high degree of similarity with previously mapped in vivo differentiation trajectories, including the HSC fraction bearing the molecular signature of the G 1 /G 0 phases of the cell cycle (Appendix Fig S1B and C). A customized online resource allows full exploration of these data (http://128.232.224.252/gp_apps/CheBode2021/). In order to infer cell identities from the bulk transcriptomes, we computed direction of state transition (DoT) scores (Kucinski et al, 2020) for PosNo-nELSK and NegNonELSK fractions and projected these onto a previously defined single-cell hematopoietic landscape, using differentially expressed genes (DEGs) of matched ELSK and nonELSK cells within each clone ( Fig 3E and Appendix Fig S1D). While both ELSK fractions were enriched for genes expressed in HSCs, both nonELSK fractions showed enrichment for genes associated with myeloid cell types such as monocytes, neutrophils, and basophils ( Fig 3E,  Appendix Fig S1E). Interestingly, only the non-HSC fractions from nonrepopulating clones (NegNonELSK) showed enrichment of megakaryocyte and erythrocyte genes ( Fig 3E). Overall, these results indicate that nonrepopulating clones undergo increased myeloid and megakaryocyte-erythroid differentiation, and that these cell types may negatively regulate HSC self-renewal.

A molecular signature for expanded HSCs
In order to identify a gene signature most strongly associated with ex vivo expanded functional HSCs, we derived the PCA-based dimensionality reduction for all bulk transcriptome samples and computed Pearson correlations of transplantation metadata with each principal component (PC) and the associated loading plots ( Fig 4A). In line with the previous MDS plots, repopulating and nonrepopulating samples showed distinct clustering ( Fig 4B). Intriguingly, a single PC (PC1, 35.62% of variation) was significantly correlated with donor chimerism, GM contribution, and a binary repopulation outcome score (Figs 4C and EV4A). We identified the top 50 genes driving PC1 ( Fig 4D) and further curated the gene signature using fitted logistic and linear regression models for each transplantation parameter (Figs 4E, and EV4B and C) to identify the most significant drivers of repopulation potential. The resulting "repopulation signature" gene list (RepopSig) contains 23 genes ( Fig 4F), including previously described HSC markers and selfrenewal regulators, such as Esam, Slamf1 (CD150), and Prdm16 (Deneault et al, 2009;Yokota et al, 2009;Oguro et al, 2013;Gudmundsson et al, 2020), as well as novel genes that were not previously associated with HSC self-renewal, such as Klhl4, Mpdz and Insyn1. Next, we computed the geometric mean for the RepopSig across all samples, confirming a robust identification of repopulating clones (Figs 4G and EV4D). Intriguingly, the RepopSig gene signature score improved the distinction of repopulating samples when compared to the MolO signature (Figs 4H, and EV4D and E). Of note, a subset of MolO signature genes was not correlated with the RepopSig, possibly indicating their limited role for the function of ex vivo expanded HSCs ( Fig EV4F).
The top signaling pathway enriched among genes closely correlated with the RepopSig (r > 0.75) was sphingolipid signaling, which was recently implicated in human HSC self-renewal (Xie et al, 2019; Fig 4I). A number of other enriched pathways, namely Hippo, FoxO, Ras, and VEGF, involve RhoGTPase signaling and several previous studies in mouse HSCs have specifically implicated CDC42 (Florian et al, 2018;Liu et al, 2019) and ARHGAP5 (Hinge et al, 2017) in mouse HSC function ( Fig 4I). To test the role of RhoGTPase signaling in regulating HSC expansion, we undertook expansion cultures with or without various RhoGTPase inhibitors (CASIN, NSC23766, Rhosin) or an activator (ML099; Gao et al, 2004;Surviladze et al, 2010;Shang et al, 2013;Liu et al, 2019). Inhibitors uniformly decreased the percentage of phenotypic HSCs (ELSK cells) in a dose-dependent manner and in some cases resulted in substantially reduced survival (Figs EV4G and H). Activating RhoGTPase signaling with ML099, on the other hand, did not alter HSC expansion, suggesting that increased RhoGTPase ◀ Figure 2. Reporter strategy deciphers clonal heterogeneity in expansion cultures.

A
Schematic of experimental design in which single ESLAM HSCs were cultured for 28 days in F12 media containing PVA, 10 ng/ml SCF, and 100 ng/ml TPO. At day 28, 20 clones were harvested and re-sorted for phenotypic HSCs, defined as EPCR + Lin À Sca-1 + C-kit À (ELSK) cells and remaining nonELSK cells. The two fractions were each split in half, 50% for transplantation and 50% for bulk RNA-sequencing. On average for each clone, 22,382 ELSK cells were sorted compared to 90,700 nonELSK cells. B The donor chimerism in animals receiving ELSK cells re-sorted from the 20 clones (45-50% dose). One mouse was culled for health reasons before experimental endpoint. C The proportion of donor GM, B, and T cells in each clone above 1% donor chimerism at week 16 with overall donor chimerism listed underneath each bar. D-G The correlation between donor chimerism and absolute numbers and proportions of live cells or FELSK cells. Pearson correlation, ****P < 0.001.  signaling is not sufficient to actively drive HSC self-renewal beyond the current limitations of the expansion system (Fig EV4G and H). These experiments further underscore the power of the ELSK reporter system to replace lengthy and expensive functional transplantation assays for validating such molecules for their effect on HSC expansion.

Repopulation signature identifies HSCs from multiple cellular states
Compared to the MolO signature, the RepopSig score was better able to separate repopulating clones from nonrepopulating cells in our initial experiments (Figs 4H and EV4D). However, since the RepopSig was initially derived from this training dataset, we next generated a validation dataset by an additional series of 28-day single-cell cultures with concomitant qPCR, flow cytometry, and transplantation assays. We selected 9 clones with > 20% ELSK and 10 with < 1% ELSK for parallel qPCR and transplantation assays. Functional HSC activity was exclusively restricted to clones with > 20% ELSK, where all nine had robust multilineage contribution in recipient animals (Fig 5A and B). Signature gene expression strongly correlated with clones that had a high percentage of phenotypic HSCs (> 20% ELSK) and genes that associated with negative transplantation outcomes were more highly expressed in clones ◀ Figure 3. Expanded HSC clones are transcriptionally similar to freshly isolated HSCs.
A Schematic of color-coded cell population categories. Repopulation was defined as having > 1% donor chimerism and > 1% contribution to GM at 16 weeks. B MDS plot of bulk RNA sequencing samples colored by their population categories and their corresponding MolO score. C Correlation of each sample to gene expression profiles of various hematopoietic stem and progenitor cell populations, as defined in the Immgen database. D UMAP depiction of single cell profiling of a 28-day culture initiated by 50 HSCs. Cell type annotations were derived using marker gene signatures. E UMAP representation of mouse LK/LSK (Dahlin et al, 2018)

Repopulation
Chimerism GM  with fewer phenotypic HSCs (< 1% ELSK; Fig 5C and D). To further demonstrate the applicability of RepopSig genes in the identification of repopulating clones, we probed 28-day single cell-derived clones for cell surface expression of ESAM (one of the RepopSig genes) by flow cytometric profiling. Interestingly, we observed a strong correlation (r 2 = 0.951) between the ESAM-ELSK (EELSK) and Fgd5-ELSK phenotypes, indicating that Fgd5 can be replaced by ESAM, thus removing the need for using the reporter mouse in screening experiments ( Fig 5E). The utility of RepopSig is further indicated by the 10× single cell RNA-sequencing data where the HSC compartment is more specifically identified by the RepopSig compared to the MolO gene signature, with a multitude of RepopSig genes exclusively expressed in the HSC cluster ( Fig 5F, Appendix Fig S2A and  B). Overall, these data underscore the robustness of both the ELSK phenotype and the RepopSig score for identifying cultures with high numbers of functional HSCs. Following its validation in transplantation assays, we next assessed the general applicability of the RepopSig for identifying HSCs from a variety of cellular states using published single-cell RNA-sequencing (scRNA-seq) datasets (Nestorowa et al, 2016;Oedekoven et al, 2021). We first assessed data from~1,600 freshly isolated stem and progenitor cells from Nestorowa et al, where the RepopSig was able to distinguish LT-HSCs from HSPCs and progenitors (Figs 6A and B,and EV5A and B). Although the MolO score outperforms the RepopSig score in this dataset (largely due to cell cycle genes in the MolO signature), we hypothesized that the RepopSig might perform better for uniformly marking HSCs in culture and in cell cycle. To test this, we generated new scRNA-seq libraries for cycling FL HSCs and 7-day in vitro hibernating HSCs (Oedekoven et al, 2021). Whereas the MolO score consistently ranked hibernating HSCs higher than FL HSCs (Fig 6C and D), the RepopSig scored both FL HSCs and hibernating HSCs similarly despite their distinct cell cycle status (Fig 6C and D). In addition, higher RepopSig scores were also observed in freshly isolated and hibernating HSCs compared to cytokine-stimulated cells with reduced HSC frequency (Figs 6E and F, and EV5C). In addition, we observed RepopSig enrichment in freshly isolated HSCs against a broad range of hematopoietic cell types (Appendix Fig S3A and B). Overall, this suggests that the RepopSig can mark HSCs from multiple distinct cellular states ranging from active versus quiescent, freshly isolated versus cultured, and adult versus fetal origin.

Repopulation Signature
Finally, to test the applicability of the HSC RepopSig across species and to explore its potential translatability to humans, human hematopoietic cell datasets were obtained and assessed for the expression of RepopSig homologs. Out of 23 repopulation signature genes, only Skint3 and Gm38066 lacked human homologs and the expression of the remaining 21 genes was highly enriched within human HSCs compared to all other examined hematopoietic cell subsets ( Fig 6G). This further affirms the strength of RepopSig for identifying HSCs from a wide range of cellular states and sets the stage for translating these findings to the human system, including an exploration of individual genes for their impact on HSC function.

Discussion
Ex vivo HSC expansion has been a long-standing goal in the field, with substantial clinical implications for improving stem cell transplantation, production of limitless populations of mature blood cells, and the base cellular product for gene therapy. While the recent report of 200-899 fold mouse HSC expansion ex vivo represents a major breakthrough (Wilkinson et al, 2019), the substantial heterogeneity in single cell-derived clones has thus far precluded the molecular characterization of expanded HSCs or high throughput screening for cultures containing large numbers of functional HSCs. Here, we report an in vitro reporter strategy that overcomes these issues by using Fgd5 and EPCR as markers of functional HSCs in culture and by prospectively separating HSCs from non-HSCs, thus allowing molecular profiling. By integrating singleclone functional transplantation data with gene expression profiling from the same clones, we report that (i) EPCR and Fgd5 are reliable in vitro markers for enriching functional HSCs; (ii) ESAM can substitute for the Fgd5 reporter as a reliable in vitro marker; (iii) expanded HSCs share a core molecular program with freshly isolated HSCs; (iv) megakaryocytic and erythroid genes are overrepresented in non-HSCs in nonrepopulating clones, which may provide a source of negative feedback signals; (v) the molecular profile of expanded HSCs can be defined with a new repopulation signature, which can also identify HSCs in multiple cellular states. This reporter system represents a highly efficient way of identifying functional HSCs in vitro and avoids the costly and timeconsuming in vivo transplantation step, thereby setting the stage for large-scale screening that has been previously impossible to undertake.
One of the main barriers that has hindered the study of HSC expansion has been the lack of robust markers to isolate stem cells in vitro (Zhang & Lodish, 2005). Fares et al, 2017 first indicated that EPCR expression tracked with human HSC content in vitro, and ◀ Figure 4. A molecular signature (RepopSig) of expanded HSCs.
A Schematic of how the repopulation signature (RepopSig) was derived using PCA analysis and transplantation-associated metadata. B PCA plot of samples colored by their categories and by functional outcome. C Correlation between the first 7 principal components and the metadata, showing r 2 values and significance. Pearson correlation, *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001. D PCA loading plot for PC1 and PC2. E Regression coefficients of top 50 repopulation-associated genes. Logistic regression coefficients (Firth penalized) depicted for repopulation outcome and linear regression of chimerism and GM contribution. Cut-off for signature inclusion is indicated by the red dotted line. N = 22 individual clones, error bars represent SD. F List of RepopSig genes. G MDS plots depicting sample categories and RepopSig signature scores. H MolO and RepopSig scores of each sample colored by their categories. I KEGG pathway analysis of genes correlated with the RepopSig (r > 0.7, Signature) and genes least correlated with the RepopSig (r < À0.7).  . The ability to expand, and subsequently highly enrich, HSCs also enables a wide range of previously impossible experiments requiring large numbers of input cells, including global proteomics, metabolomics, and ChIP-sequencing. Moreover, the faithful tracking of the FELSK and EELSK phenotypes with functional HSC content now permits large-scale functional screens (small molecules, CRISPR, etc.) and directed differentiation experiments on a previously unimaginable scale. Importantly, while a > 20% ELSK cutoff provided a reliable metric for identifying single cell-derived 28-day expanded clones with robust expansion of functional HSCs, it would require validation and potential adjustment if culture length and/or input cells were altered. Whereas the molecular profile of freshly isolated HSCs from different isolation strategies and developmental timepoints have been firmly established (Kent et al, 2009;Wilson et al, 2015;Nestorowa et al, 2016;Dong et al, 2020), our data represent the first description of the molecular profile of in vitro expanded HSCs. A complete understanding of the molecular machinery governing HSC selfrenewal ex vivo will be instrumental in the improvement of gene therapy and directed differentiation protocols. Our data implicate a wide range of signaling pathways and cellular feedback mechanisms that could be key to unlocking this clinical potential.

A B
At present, extensive clonal heterogeneity remains in 28-day cultures and our dataset accords with previous data from Wilkinson et al, demonstrating that single HSCs can expand in F12 PVA-based conditions. Surprisingly, considering the length of culture, the molecular profile of expanded HSCs resembles freshly isolated HSCs to a high degree. Interestingly, the total cell number of a clone did not correlate significantly with its repopulation potential, further affirming the established negative relationship between HSC selfrenewal and proliferation (Wilson et al, 2008). Our data suggest that irrespective of clone size, the percentage of phenotypic HSCs in a 28-day clone is the best predictor of its repopulation potential. The identification of previously established self-renewal regulators such as Prdm16 (Deneault et al, 2009;Chuikov et al, 2010;Gudmundsson et al, 2020), Esam (Ooi et al, 2009;Yokota et al, 2009) and Fstl1 (Holmfeldt et al, 2016) in the RepopSig is perhaps not surprising; however, the RepopSig additionally includes a number of genes that have not been implicated in HSC biology previously, such as Klhl4 and Mpdz, offering exciting new targets for potentially regulating HSC expansion. Although filtered out from the final RepopSig by linear regression analysis, Tcf15, which was recently shown to be enriched in mouse HSCs, was also identified within the loading plot of PC1 to be associated with repopulation (Rodriguez-Fraticelli et al, 2020). Pathway analysis of the RepopSig also accords with recent reports on the role of sphingolipid signaling in the self-renewal of human HSCs ex-vivo (Xie et al, 2019).
Our data also suggest that clones which no longer contain functional HSCs at 28 days predominantly generate cells with molecular characteristics of the megakaryocyte and erythrocyte lineages. Future expansion strategies might take advantage of this by targeted removal of such cells, and some efforts (such as fed-batch cultures; Csaszar et al, 2012) have already demonstrated that dilution of exogenous factors can increase expansion efficiency. Additionally, the RepopSig can be combined with FELSK markers as a quality control tool for rapid monitoring of long-term HSC content via qPCR and flow cytometry, respectively. While previous HSC gene signatures such as MolO are biased toward freshly isolated, quiescent HSCs, the RepopSig appears to represent a more general functional HSC signature, capable of identifying cycling as well as cultured HSCs. Our strategy also extends on the growing number of studies that demonstrate the power of linking functional and molecular data to better impart biological meaning behind transcriptomic information (Wilson et al, 2015;Psaila et al, 2016;Shepherd et al, 2018;Shepherd & Kent, 2019).
Ultimately, such approaches will be applied to human HSC expansion by adding combinations of extrinsic self-renewal regulators, utilizing fed-batch negative feedback regulation (Csaszar et al, 2012), or even engineering artificial 3D niches with ECM proteins and functionalized hydrogels (Bai et al, 2019). To exemplify the direct applicability and translatability of mouse HSC expansion profiling to the human system, we also demonstrate that homologs to genes with high RepopSig scores are highly enriched in human HSCs, including a number of genes having previously described associations with human HSC biology, such as Hlf and EPCR, which have recently been reported to also mark expanded human HSCs in culture (Fares et al, 2017;Garg et al, 2019;preprint: Lehnertz et al, 2020). This, in combination with the early indication that F12 PVA-based cultures can modestly expand human HSCs, bodes well for moving these findings rapidly into the human system. Similarly, fed-batch systems are already being applied with novel small ◀ Figure 5. Reporter strategy and RepopSig gene signature reliably mark clones with functional HSCs.
A A selection of clones was transplanted into irradiated recipients using 50% of the cells harvested at day 28 and donor chimerism of mice from clones with > 20% ELSK (red, n = 9) and < 1% ELSK (blue, pooled from 10 clones) are displayed. B Corresponding lineage output of clones as a percentage of donor cells at week 12 post-transplantation with donor chimerism displayed under each bar. C Correlation of the relative gene expression against the ELSK percentage of the clones. Red and blue dots indicate clones that were transplanted in Fig 5C. Pearson correlation, ****P < 0.0001,***P < 0.001, **P < 0.01, *P < 0.05. D Single ESLAM HSCs were cultured for 28 days and 10% of the cultures were analyzed by flow cytometry on day 27. Clones with above 20% (n = 13) and below 1% ELSK cells (n = 15) were analyzed for their relative gene expression of RepopSig genes (both positive and negative markers) ***P < 0.001, **P < 0.01, *P < 0.05. Error bars represent SD. E The correlation between the proportion of Esam + ELSK (%EELSK) and FELSK cells in single-cell clones cultured for 28 days, Pearson correlation, ****P < 0.0001. F Projections of geometric mean scores for MolO (middle) and RepopSig (right) gene signatures onto the previously derived ( Fig 3D)   molecules such as UM171 and SR1 (Fares et al, 2014) to achieve modest levels of expansion. Combining such promising avenues will undoubtedly lead to success in future clinical scale human HSC expansion.

Mice
Fgd5 ZsGreenÁZsGeenr/+ knock in/knock out mice were purchased from Jackson Laboratories and wild-type (WT) mice were either Fgd5 +/+ litter mates or CD45.2 C57BL/6. All transplantation recipients were C57BL/6 W41/W41 -Ly5.1 (W41). All mice were kept in microisolator cages in Central Biomedical Service animal facility of University of Cambridge and University of York, and provided continuously with sterile food, water, and bedding. All mice were kept in specified pathogen-free conditions, and all procedures performed according to the United Kingdom Home Office regulations, in accordance with the Animal Scientific Procedure Act.

F12-based short-term (< 10 days) cultures
For short-term cultures up to 10-days, cells were cultured as above, except 96 well U-bottom plates (Corning) were used and no media changes were performed.
A UMAP representation of mouse HSC transcriptomes (Nestorowa et al, 2016), colored by their cell type. B Boxplot of MolO and RepopSig scores for each cell group with both signatures being able to identify LT-HSCs. C Projections of hibernating (hibHSC) and fetal liver (FL) HSC scRNA-seq profiles onto the single-cell landscape showing the majority of cells in both cases localizing to the LT-HSC region. D MolO and RepopSig scores for FL HSCs and hibHSCs where MolO preferentially associates with quiescent hibHSCs and RepopSig associates with both equally. This is true both when (I) all single cells and (II) excluding cells falling outside the LT-HSC compartment are assessed. E UMAP landscape for unstimulated and SCF-stimulated freshly isolated HSCs and hibHSCs (Oedekoven et al, 2021). 14 of 19 EMBO reports 23: e55502 | 2022 Ó 2022 The Authors number of fluorescent beads (Trucount Control Beads, BD) were added to each well and each sample was back calculated to the proportion of the total that were run through the cytometer. Flow cytometry was performed on an LSRFortessa (BD) with a High Throughput Sampler (BD; for single clone analysis).

Bone marrow transplantation assay
Recipient mice were W41 mice as described previously. Recipient mice were sub-lethally irradiated with a single dose (400 cGy) of Cesium irradiation. All transplantations were performed by intravenous tail vein injection of cell fractions suspended in 200-300 ll PBS using a 29.5G insulin syringe. Repopulation was defined as having > 1% donor chimerism and > 1% contribution to GM at 16 weeks.

Bulk RNA sequencing
RNA was extracted using the Picopure RNA Isolation Kit (Thermo) according to the manufacturer's protocol. Libraries were prepared using the SMARTer Stranded Total RNA-seq Kit v2-Pico Input mammalian (Takara Bio, CA, USA) according to manufacturer's protocol. Quality control (QC) steps were performed using Qubit RNA HS Assay Kit and bioanalyzer. Sequencing was run at the Cancer Research UK Cambridge Institute Genomics core on a Novaseq 6000 (Illumina), using 50 bp paired-end reads. Reads were trimmed using trim_galore (parameters: --paired --quality 30 --clip_R2 3) and aligned to the Mus musculus genome build (mm10) using STAR (default parameters). Gene counts were acquired using HTSeq (parameters: --format = bam --stranded = reverse --type = exon --mode = intersection-nonempty -additional-attr = gene_name). Raw data and processed gene count tables are available via GEO accession number: GSE175400. Raw counts were processed using edgeR (version 3.28.1; McCarthy et al, 2012). First, lowly expressed genes were excluded from downstream analysis. Here, genes with fewer than two libraries expressing a minimum of 1 CPM (counts per million) were considered lowly expressed. Subsequently, read counts were normalized using the trimmed mean of M values (TMM) method (Robinson & Oshlack, 2010). Where there are multiple sequencing runs across an experiment, technical replicates were used to inform batch correction, performed with Limma (version 3.42.2; Ritchie et al, 2015). With little variation between Batch1 and Batch2, batch correction was performed on Batch1 and Batch3, where a significant variation of technical replicates was identified. Log-transformed and batch corrected values were subsequently used for downstream analysis.
Single-Cell RNA sequencing -Smart-Seq2 Data scRNA-seq analysis was performed according to the previously described Smart-seq2 protocol (Picelli et al, 2013). Freshly isolated fetal liver (FL) HSCs and 7-day cultured hibernating HSCs (hibHSC) were deposited into 96-well plates, containing lysis buffer [0.2% Triton X-100 (Sigma), RNAse inhibitor (SUPERase, Thermo Fisher), nuclease-free water (ThermoFisher)]. The Illumina Nextera XT DNA preparation kit was used for library construction. The pooled library (single end, 50 bp reads) was sequenced on the Illumina HiSeq 4000 at the Cancer Research UK Cambridge Institute Genomics core. Raw data and processed gene count tables are available via GEO accession number: GSE175400. Raw reads were aligned to the Mus musculus genome build (mm10) using STAR and read counts were computed using featureCounts. Cells not passing quality control thresholds below were excluded. First, a threshold of mapped reads was set to > 1e5 and < 3e7, with mapped reads comprising nuclear genes, mitochondrial genes and ERCCs. A minimum threshold of 20% for reads mapping to known genes was set, in order to exclude empty wells and dead cells. In addition, the threshold for reads mapping to mitochondrial genes was > 0.075, to ensure a minimum of 7.5% of reads to map to nonmitochondrial genes. Finally, an ERCC cutoff of 5% and a high gene cutoff of 1,800 were selected. Besides newly-generated scRNA-seq data for fetal liver and hibernating HSCs, the following previously published datasets were used: (i) Hematopoietic stem and progenitor compartment and (ii) freshly isolated, hibernating, and stimulated HSCs. All datasets were processed using the Seurat R package (version 4.0.0). Data were normalized using a scaling factor of 10,000 and 7,500 variable features were computed. Data were scaled using default parameters. Gene signature scoring and visualizations were performed using Seurat (version 4.0.0), ggplot2 (version 3.3.3) and native R functions). FL HSC and hibHSC single cells were projected onto the single-cell hematopoietic stem and progenitor landscape using default settings for finding anchors between the reference landscape and query datasets (version 4.0.0). RNA-seq profiles of human stem, progenitor, and mature cell types were retrieved from Xie et al (2019Xie et al ( , 2021. Gene expression profiles were normalized using variance-stabilizing transformation (performed using DESeq2; Love et al, 2014), prior to Repopulation Signature geometric mean scoring. Human homologs for 19 out of 23 Repopulation Signature genes were identified and utilized for gene scoring. The four missing genes included Gm38066, INSYN1, SKINT3, and ZFP532.
Single-cell RNA sequencing-10× genomics data A 30-day expansion culture (as previously described Wilkinson et al, 2020) was collected for single-cell RNA-seq analysis from 12week-old male C57BL/6 mice. HSCs were isolated by FACS as previously described using the following strategy: CD150 + CD34 À/lo c-Kit + Sca1 + Lineage À (Wilkinson et al, 2020). Droplet scRNA-Seq experiment was performed using the 10× Genomics Single Cell 3 0 v3 kit according to manufacturer's protocol. The UMI counts per cell were calculated using the cellranger package (v3.1.0) and downstream analysis was performed using the scanpy package (Wolf et al, 2018). Low quality cells were excluded with following criteria per cell: < 1,500 genes and < 5,000 UMI counts. Putative doublet cells were removed using the scrublet tool (Wolock et al, 2019) (threshold of 0.3). The scRNA-Seq landscape was generated by: cell count log-normalization, selection of 5,000 top variable genes, computation of 40 principal components, nearest neighbor search (k = 10), leiden clustering (resolution = 1), and finding UMAP embedding. To minimize cell cycle effects a set of genes correlated with cell cycle signature were removed from the highly variable gene set (as used previously (Dahlin et al, 2018)) and residual cell-cycle effects were regressed out by using the cell cycle phase as covariate (re-gress_out function). Cell cycle phases were computed using the score_genes_cell_cycle function. Xist and Y-chromosome genes were also excluded. Clusters were manually annotated on markers identified in the literature. Cluster connectivity and putative trajectories were computed using the PAGA method (Wolf et al, 2019).

Statistical analysis
Differential expression was performed using a likelihood ratio test approach. For this purpose, a negative binomial generalized linear model (GLM) was fitted. Multidimensional scaling (MDS) plots were computed using Limma (version 3.42.2). Genes were considered differentially expressed when a LogFC ≥ 2 and FDR < 0.05.
To compute gene ontology (GO) enrichment and KEGG pathway enrichment, gene symbols were converted to Entrez gene identifiers, using the mouse genome annotation database (org.Mm.eg.db, version 3.10.0). GO terms were extracted from the GO annotation database (GO.db, version 3.10.0) and GO term enrichment was computed using the Limma package (version 3.42.2). Biological process GO terms with a P-value < 0.05 were considered enriched. KEGG pathways were extracted from the KEGG annotation database (version 3.2.3) and were also computed using the Limma package (version 3.42.2).
Principal component analysis (PCA) was performed using the PCAtools R package (version 1.2.0). To ensure a Gaussian distribution of gene expression values for PCA computations, lowly expressed genes were removed based on a cumulative cut-off > 40CPM across all samples per gene. During PCA computation, 10% of the most nonvariable genes were excluded from analysis. To identify key genes driving separation of principal components, loadings plots were computed using the top 15% variable genes. Subsequently, a 0.05 cut-off irrespective of directionality was applied to select genes. Pearson correlation coefficients and the respective r 2 values were computed to determine the correlation of transplantation metadata with principal components.
A molecular overlap (MolO) gene signature associated with freshly isolated LT-HSCs was previously described. MolO signature genes which passed the threshold for expressed genes (minimum 1 CPM in at least 2 libraries) were extracted from the dataset. The geometric mean was computed on log-transformed expression values for all MolO genes to derive the MolO score for each sample. A geometric mean was also computed for a novel repopulation gene signature, derived from the loading plots of the PCA.
To identify dominant cell types of each sample library, the scRNA-seq-based cell type recognition tool SingleR (version 1.0.6) was repurposed and applied to the bulk RNA-seq dataset at hand (Aran et al, 2019). Default parameters were used to compute the correlation of each sample against the curated ImmGen reference dataset (Jojic et al, 2013;Aguilar et al, 2020). In particular, subtypes within the broad hematopoietic stem cell compartment were used as reference.
Pathway analysis was performed based on the curated Reactome pathway database, using the ReactomePA tool (version 1.30.0; Yu & He, 2016). Entrez gene identifiers for genes of interest were used as input. A P-value cut-off < 0.05 was applied. Gene Set Enrichment Analysis was performed using the GSEA software (US San Diego and Broad Institute; Subramanian et al, 2005;Mootha et al, 2003). Gene sets for hematopoietic cell types were retried from Chambers et al (2007).
To determine the cell type composition of single HSC-derived clones and deconvolute bulk transcriptomes, the direction of state transition (DoT) score was computed (Kucinski et al, 2020). Differentially expressed genes between (i) PosELSK vs NegELSK, (ii) PosNonELSK vs PosELSK, and (iii) NegNonELSK vs PosELSK were used for computing DoT scores. The previously described scRNAseq data of mouse LK and LSK cells (Dahlin et al, 2018) was used as a reference landscape. The DoT score was computed as described previously. The point of origin was set to a naive stem and progenitor compartment (Figs 3G and Appendix Fig S3I).
Logistic and linear regression models were fitted to curate the repopulation gene signature for a binary repopulation outcome, donor chimerism, and GM contribution. Logistic regression models were fitted using logistf (version 1.24) using Firth's penalized maximum likelihood and alpha = 0.05. Linear models were fitted using native R functions. Coefficients and standard errors for each model were extracted. A signature inclusion cutoff was set to the lower bound of the standard error of the gene with the highest coefficient for each transplantation parameter.

Data availability
The datasets produced in this study are available in the following databases: RNA-Seq data: Gene Expression Omnibus GSE175400 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE175400). Flow cytometry data and code used for statistical analyses are available upon request.

Disclosure and competing interests statement
The authors declare that they have no conflict of interest.