on bulk and pseudo-bulk

practical considerations from personal experiences

Posted by brainfo on July 7, 2025

This topic was also discussed in a previous post.

A preprint questioning pseudo-bulk fidelity

Main results from this preprint:

  • Most variances are pseudo-bulk vs bulk; biological interpretations of the variances:
    • Pseudo-bulk enriched with intracellular transcripts
    • bulk bear with tissue relevant transcripts

Following their results, major question would be if the “clinical” conditions variance is replicated within pseudo-bulk data and within bulk data. From the data they analyzed, could be similar (PC2).

Personal experiences

For the question, which to trust. In reality, there are other factors (than (ps)bulk and clinic) to be concerned.

  • The standard of the sampling, sample preparation and library construction experiments
  • The sequencing platforms

Apart from the question on similarity, there’s another question being how dissimilar.

Case 1, snRNA-seq pseudo-bulk to standardized public bulk RNA-seq

Not matched samples

pearson correlation $\approx 0.80$. Our clinical groups have minor effects, so the correlations to the public control are similar.

Case 2, matched snRNA-seq and bulk RNA-seq (Prime-seq)

In this case, the bulk RNA-seq failed to capture clinically explainble DEGs. And the DEGs from comparison clinical groups within any cell type from snRNA-seq (nuclei level or pseudobulk level) and within bulk RNA-seq only had one overlap in one cell type. We deemly that snRNA-seq was relatively better here in conveying clinical variations, as

  • The sampling issue: the sampling sites of each human donor varied a lot. Though both sn- and bulk had the same issue, the variation in sn- data might distribute to nuclei level, and not fully linear/in parallel to other factors.
  • The experimental standards: here this samples subjective to bulk- were not washed (blood) as required but the sn- has washing steo for itself.
  • The library construction procedure and sequencing platform: my lab has used standard library construction, smartseq3 mini bulk, or Prime-seq using Novoseq 6000, and also Prime-seq on MGI G400. The resulted read QC and biological discoveries (PCA, DEGs) varied massively. This year the lab relocated, and many results could not be replicated, might trace back to the $H_2O$ being used. So to speak could the difference in pseudobulk form sc- or bulk observed in the preprint attributed to other tech difference than single cell or not?

Case 3, how dissimilar meaning infidelity

This shall be relative to the intra- (ps)bulk similarity?

In a case where we compared the bulk of a cell line in two culture conditions; and the corresponding cell type from a tissue pseudo-bulk. The pearson correlation within bulk between two conditions was $\approx 0.88$ (spearman $\approx 0.92$) while pseudo-bulk and bulk $\approx 0.30$. We concluded that the cell line could not represent the cell type in that specific tissue.