Chapter 1 Preface

This is a minimal benchmark for analyzing snRNA-seq data, especially with scverse and bioconductor communities. It will be continuously updated throughout my PhD years.

From my current experience:

  • bioconductor has more solid basement of biological (annotation) data
  • questions and methods to tackle them (except for dl) are usually first developed in R, then the python community would propose competing methods. Since seruat comes earlier than scanpy
  • the scanpy community evolves fast but somehow the developers are not considering better integrating each other’s new features
  • packages in R is in general easier to use, though I personally is more used to python and could adapt more myself
  • R utils are too easily to crash with the same hardware facility, same data size and same kind of job

1.1 Comments on snRNA-seq vs scRNA-seq

advantages of snRNA-seq: does not require the preservation of cellular integrity during sample preparation, especially dissociation.

disadvantages of snRNA-seq: loss of biological signal for genes with little nuclear localization.

Interpretation of DE results:

  1. assumption: nuclear abundances are a good proxy for the overall expression profile.
  2. considerations:
    • transcripts for strongly expressed genes might localize to the cytoplasm for efficient translation and subsequently be lost upon stripping.
    • genes with the same overall expression but differences in the rate of nuclear export may appear to be differentially expressed between clusters.
    • In the most pathological case, higher snRNA-seq abundances may indicate nuclear sequestration of transcripts for protein-coding genes and reduced activity of the relevant biological process, contrary to the usual interpretation of the effect of upregulation.