Single Cell Genomics Day

This is a recap from SCGD 2023, which includes:

a very concise list of concepts that catch my eye
my comments in quotes.

Rahul Satija

Single-cell genomics: Recent advances and future directions

in-silico “experiments”
sample size > cell numbers

think about what matters to you, detecting rare cell type or information from major cell types (perturbation onto major cell type )? Most of my cases would be the latter.

Prior investigation on “how” to lower cell number per sample, project specific (sample quality and property play a part) remains to be done.

Cell states

Samantha Morris

New genomic technologies to deconstruct and control cell identity

“Dead end” and “reprogramming permissive state”
computation infill the gap between sampling and experimental measurement to the “truth”.

our field exists only because of the limitations of experimental technology.

Sydney Shaffer

Tracing lineages and cell states from metaplasia to cancer

lentivirus doesn’t work in human tissues
barcode in vivo: mitochondrial mutations
- survival advantages? Assumption: not the driver for cancer cell growth
metaplasia, dysplasia and source clone of it and molecular driver (scRNA-seq & WES confirm)
Atlas

Sten Linnarsson

Cellular diversity of the developing and adult human nervous system
1. dissection-dissection correlation
  
  Very good insights one can get from an “atlas” study

Gaussian mixture model clusters and Topics
Representative sampling, based on factors contributing to phenotypes most
- e.g. sex, if really want to dissect, need large sample sizes. If you have not enough samples, no point in sex-specific analysis
  
  prior knowledge on factors contributing to phenotypes to guide study design before initiating projects

Jay Shendure

Global views of mammalian development, from zygote to pup

Mice: bounded, reproducible, accessible
Whole embryos flash frozen
dynamic integration
nearest neighbor mostly from same cell type (in differnet time points)

is this a question regarding the fluidity/cutoff in terms of time in ontology?
drastic changes at birth, C-section quite differently

birth itself is a dramastic process

Spatial

Sanja Vickovic

Spatial host-microbiome sequencing

compromise between resolution and throughput

integrate/cross-compare the results from techs with different resolution-throughput

DataInno: in-silico tissue generators
SHM-seq: host-microbiome

how to dissect mutual influences?
communicating or just proximity: computationally more on live tissues.
subcellular spatial information: to higher resolution

Jackson Weir

Slide-tags: A new methodology for generating spatially localized single cell genomics data

single cell, quality ≈ snRNA-seq
tonsil: immune, germinal center reaction, spatial confined cell-cell interactions
- receptor-ligand in two (not so) proximal clusters
dedifferentiation in melanoma tumors and TF motis associated in it

they just did cluster level analysis. but this concept is what could be one insight we want from “trajectory” analysis

DNA perturbation

Brian Cleary

Compressed Perturb-seq: highly efficient screens from regulatory circuits using random composite perturbations

assumptions: affect low numbers of co-regulated modules
from prior knowledge of scarcity of (nonzero effects) from perturbation, compress samples to O(k log n), then sparce inference by methods such as LASSO
composite sample: $cs_1 = \sum_{i=1}^s w_{2,i} x_i$ (weighted sum of outcome of a subset of conventional samples)
- Non-linearity, complexity. compressed- theoretially help with this. But in their 600 genes study, still too many as C_600^2 to test any significant pair
Methods:
1. cell pooling (multiple cells per droplet), resulting in droplet level observations in matrices
2. Guide pooling (multiple gRNAs per cell), resulting in (target) gene level …
3. Decompress: sparse factorization (sparce PCA) on expression count matrix (droplets by genes), sparce recovery (LASSO) on perturbation design matrix (droplets by sgRNAs)
Evaluation:
- 4-20x reduced costs
- Guide pooling more effective in degree of overloading
biological analysis: cis-, trans-eQTL

Helen Kang

Variant-to-Gene-to-Program: an approach to trace the path between GWAS variants and common disease

Variant-Gene & Gene-Program: intersection

why simple intersect? is there more soft association analysis?

post-transcriptional process

Madeline Kowalski

PerturbSci-Kinetics: Characterizing heterogeneity in post-transcriptional regulation with single-cell sequencing

with info about RNA/degradation/production

experimental data which I should use to evaluate my metrics build from RNA-seq data
3’UTR enlongation or shortening
APARENT-PERTURB to predict sequence (motif and regions) ~ CPA regulators
VASA-seq, EasySci, Parse Biosciences, Smart-seq3, long-read scRNA-seq: alternative splicing

John Blair

Phospho-seq: Reconstructing regulatory networks from multimodal single cell data

consider as experimental data to evaluate signaling regulatory network inference, but better to just build upon data sampling the matched subjects (proteins) per se. i.e. These experimental advances void the computational methods designed before.

TF-focused association network: Pando, in silico perturb by CellOracle
TF binding: in silico ChIP-seq by Argelaguet, et al., 2022
protein level: TF-protein by NEAT-seq
Bridge integration: high quality RNA from unfixed scRNA-seq, fixed RNA+ATAC as bridge, final trimodel dataset with ATAC+Protein

Money!

Scalable strategies

Gesmira Molla

Strategies for massively scalable single-cell analysis

Aggreagte:
- MetaCell-2: Parallel and dynamic
- SEACells: similarity from adaptive Gaussian kernel, then archetype analysis decomposes kernels. Meta cells good in correlation between ATAC-seq peaks and Genes since ATAC-seq in single cell is quite sparse
  
  this kernel and Archetypal analysis might have other usage
Sketch (sampling)
- geometricly even
Stream
Then why bother massive?
1. computational sampling is not the same as experimental sampling
2. the techs advance speed is just not matched in this era
Confirms that samples > cells

How true is the assumption in homogeneous? Would this method change the intra-“cell type” heterogeneity?

Why we must put together cell types