API
Import as:
Submodules
You can import everything via the top-level too, e.g., sps.load_anndata
, sps.subset_by_values
, sps.aggregate_st_by_cell_class
, etc.
IO
load_anndata
Load an AnnData object from an .h5ad file.
- path: path to .h5ad
- backed: pass 'r' for backed mode, or None for in-memory
Source code in spstencil/io.py
save_anndata
Save an AnnData object to .h5ad.
- compression: None|'gzip'|'lzf' etc.
Source code in spstencil/io.py
- What: Read/write AnnData
.h5ad
. - Key functions:
load_anndata(path, backed=None)
save_anndata(adata, path, compression="gzip")
- Example:
Generic ops
aggregate_cell_counts
aggregate_cell_counts(adata: AnnData, cell_type_key: str, group_key: str | None = None, normalize: bool = True) -> pd.DataFrame
Aggregate counts of cell types, optionally per group.
Returns a DataFrame of percentages if normalize else counts. - If group_key is None, returns one-row DataFrame across all cells. - Expects categorical/string columns in obs.
Source code in spstencil/ops.py
save_split_by_key
save_split_by_key(adata: AnnData, key: str, out_dir: str | Path, compression: str | None = 'gzip') -> Dict[str, Path]
Split by obs[key] and save each part to out_dir/
Source code in spstencil/ops.py
split_by_key
Split AnnData into a dict keyed by unique values in obs[key].
Source code in spstencil/ops.py
subset_by_values
subset_by_values(adata: AnnData, key: str, include_values: Sequence[str] | None = None, exclude_values: Sequence[str] | None = None) -> ad.AnnData
Return a new AnnData subset by obs[key] membership.
At least one of include_values or exclude_values must be provided.
Source code in spstencil/ops.py
- What: Generic AnnData operations on
obs
. - Key functions:
subset_by_values(adata, key, include_values=None, exclude_values=None)
split_by_key(adata, key) -> dict[str, AnnData]
aggregate_cell_counts(adata, cell_type_key, group_key=None, normalize=True)
save_split_by_key(adata, key, out_dir, compression='gzip') -> dict[str, Path]
- Example:
Spatial tools
aggregate_hdf5_by_cell_class
aggregate_hdf5_by_cell_class(hdf5_path: str | Path, *, tissue_tsv: Optional[str | Path] = None, group_by_tissue: bool = False, normalize: bool = True, swap_axes: bool = False) -> pd.DataFrame
Aggregate counts/percentages of HDF5 cell classes overall or by tissue type.
If group_by_tissue is True, requires tissue_tsv to map cells to tissue classes.
Source code in spstencil/st.py
aggregate_st_by_cell_class
aggregate_st_by_cell_class(adata: AnnData, cell_hdf5_path: str | Path, *, normalize: bool = True, group_key: Optional[str] = None, tissue_df: Optional[DataFrame] = None, swap_axes: bool = False) -> ad.AnnData
Aggregate ST expression matrix by cell-class predictions from cell classifier HDF5.
Strategy: - Map ST spots to tile indices (t_x, t_y) using their pixel coords. - Map cell predictions to tile indices using their pixel coords from HDF5. - Join on tile indices to assign cell classes to spots. - Aggregate gene expression by summing across spots with same cell class. - Optionally normalize by total expression per cell class.
Returns AnnData with cell classes as obs and aggregated expression matrix.
Source code in spstencil/st.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 |
|
annotate_tissue_to_spots
annotate_tissue_to_spots(adata: AnnData, tissue_df: DataFrame, *, swap_axes: bool = False, column_name: str = 'tissue_type', fill_value: str = 'Unlabelled') -> ad.AnnData
Add a column in obs with tissue class by mapping each spot/cell to tile (x,y).
The mapping uses obsm['spatial'] pixel coords when available.
Source code in spstencil/st.py
assign_tiles_from_coords
assign_tiles_from_coords(coords: ndarray, tissue_df: DataFrame, swap_axes: bool = False) -> np.ndarray
Map pixel coords to tile (x, y) indices defined by tissue_preds.tsv grid.
coords should be in [x, y] format to match tissue_df columns. Returns an array of shape (N, 2) of int tile indices [tile_x, tile_y].
Source code in spstencil/st.py
compute_keep_mask
compute_keep_mask(cell_coords: ndarray, tissue_df: DataFrame, exclude_classes: Sequence[str], swap_axes: bool = False) -> np.ndarray
Compute a boolean mask of cells to keep by excluding tiles of given classes.
- cell_coords: array shape (N, 2) with [x, y] pixel coordinates
- tissue_df: DataFrame with columns [x, y, class]
- exclude_classes: classes to exclude (e.g., ['Chorion'])
- swap_axes: if True, swap (max_x, max_y) estimation to handle axis-swapped cases
Source code in spstencil/st.py
crop_cell_data
crop_cell_data(cell_hdf5_path: str | Path, bounds: Dict[str, Tuple[int, int]], output_path: str | Path) -> int
Crop cell HDF5 data based on coordinate bounds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cell_hdf5_path
|
str | Path
|
Path to cell HDF5 file |
required |
bounds
|
Dict[str, Tuple[int, int]]
|
Dict with 'x' and 'y' bounds as (min, max) tuples |
required |
output_path
|
str | Path
|
Path for output cropped HDF5 file |
required |
Returns:
Type | Description |
---|---|
int
|
Number of cells kept after cropping |
Source code in spstencil/st.py
crop_spatial_data
Crop spatial data based on coordinate bounds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData
|
AnnData object with spatial coordinates |
required |
bounds
|
Dict[str, Tuple[int, int]]
|
Dict with 'x' and 'y' bounds as (min, max) tuples |
required |
Returns:
Type | Description |
---|---|
AnnData
|
Cropped AnnData object |
Source code in spstencil/st.py
crop_tissue_data
crop_tissue_data(tissue_tsv_path: str | Path, bounds: Dict[str, Tuple[int, int]], output_path: str | Path, bin_size: int = 100) -> int
Crop tissue TSV data based on coordinate bounds.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tissue_tsv_path
|
str | Path
|
Path to tissue TSV file |
required |
bounds
|
Dict[str, Tuple[int, int]]
|
Dict with 'x' and 'y' bounds as (min, max) tuples |
required |
output_path
|
str | Path
|
Path for output cropped TSV file |
required |
bin_size
|
int
|
Bin size used for coordinate conversion |
100
|
Returns:
Type | Description |
---|---|
int
|
Number of tissue tiles kept after cropping |
Source code in spstencil/st.py
determine_crop_bounds
determine_crop_bounds(spatial: Dict[str, ndarray], bin_size: int = 100, default_bounds: Optional[Dict[str, Tuple[int, int]]] = None) -> Dict[str, Tuple[int, int]]
Determine cropping bounds based on gaps in spatial data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spatial
|
Dict[str, ndarray]
|
Dict with 'x' and 'y' coordinate arrays |
required |
bin_size
|
int
|
Binning size for gap detection |
100
|
default_bounds
|
Optional[Dict[str, Tuple[int, int]]]
|
Default bounds to use if no gaps near edges |
None
|
Returns:
Type | Description |
---|---|
Dict[str, Tuple[int, int]]
|
Dict with 'x' and 'y' bounds as (min, max) tuples |
Source code in spstencil/st.py
filter_cell_predictions_by_tissue
filter_cell_predictions_by_tissue(hdf5_path: str | Path, tissue_tsv: str | Path, exclude_classes: Sequence[str], swap_axes: bool = False) -> Tuple[np.ndarray, np.ndarray, np.ndarray]
Load HDF5 and tissue TSV, compute keep mask, and return (preds_kept, coords_kept, keep_mask).
Source code in spstencil/st.py
get_spatial_coords
Return Nx2 pixel coordinates for ST spots/cells in [x, y] format.
Prefers obsm['spatial'] (Scanpy/Visium convention). If absent, tries obs columns 'x', 'y' or 'px', 'py' or 'array_row', 'array_col' (converted to pixels via max range).
Source code in spstencil/st.py
is_continuous
load_cell_predictions
Read predictions and coords arrays from an HDF5 produced by cell classifier.
Expects datasets named 'predictions' and 'coords'. Returns (predictions, coords).
Source code in spstencil/st.py
load_tissue_predictions
Load tissue tile predictions with columns [x, y, class].
Source code in spstencil/st.py
map_cells_to_tissue_classes
map_cells_to_tissue_classes(cell_coords: ndarray, tissue_df: DataFrame, *, swap_axes: bool = False, fill_value: str = 'Unlabelled') -> np.ndarray
Return an array of tissue class labels per cell by tiling coords with tissue_preds grid.
Source code in spstencil/st.py
missing_ranges
Find gaps in sequence as (start, end) tuples (exclusive end).
Source code in spstencil/st.py
split_hdf5_by_tissue
split_hdf5_by_tissue(hdf5_path: str | Path, tissue_tsv: str | Path, *, swap_axes: bool = False) -> Dict[str, np.ndarray]
Return dict mapping tissue class -> indices of cells in that tissue tile.
Source code in spstencil/st.py
split_st_by_tissue
split_st_by_tissue(adata: AnnData, tissue_df: DataFrame, swap_axes: bool = False, column_name: str = 'tissue_type') -> Dict[str, ad.AnnData]
Split AnnData into dict keyed by tissue class.
Source code in spstencil/st.py
subset_st_by_tissue
subset_st_by_tissue(adata: AnnData, tissue_df: DataFrame, include: Optional[Sequence[str]] = None, exclude: Optional[Sequence[str]] = None, swap_axes: bool = False, column_name: str = 'tissue_type') -> ad.AnnData
Return AnnData subset where mapped tissue_type is included/excluded.
If include is provided, only those classes are kept. If exclude is provided, those are dropped.
Source code in spstencil/st.py
validate_coordinate_compatibility
validate_coordinate_compatibility(h5ad_coords: ndarray, hdf5_coords: Optional[ndarray] = None, tissue_coords: Optional[ndarray] = None, adata: Optional[AnnData] = None, tolerance_factor: float = 10.0) -> Dict[str, any]
Validate coordinate system compatibility between files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
h5ad_coords
|
ndarray
|
Coordinates from H5AD file [N, 2] in [x, y] format |
required |
hdf5_coords
|
Optional[ndarray]
|
Optional coordinates from HDF5 file [M, 2] in [x, y] format |
None
|
tissue_coords
|
Optional[ndarray]
|
Optional coordinates from tissue TSV [P, 2] in [x, y] format |
None
|
adata
|
Optional[AnnData]
|
Optional AnnData object for metadata extraction |
None
|
tolerance_factor
|
float
|
Maximum acceptable ratio difference between coordinate ranges |
10.0
|
Returns:
Type | Description |
---|---|
Dict[str, any]
|
Dict with validation results and warnings |
Source code in spstencil/st.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
|
- What: Spatial tools: coordinate handling, tissue grid mapping, aggregation, cropping, HDF5 helpers.
- Common workflow:
- Load inputs:
load_tissue_predictions(tissue_tsv)
# expects columnsx,y,class
load_cell_predictions(cell_hdf5) -> (preds, coords)
get_spatial_coords(adata) -> Nx2 [x,y]
- Map and subset/split:
annota
te_tissue_to_spots(adata, tissue_df, swap_axes=False, column_name="tissue_type")`subset_st_by_tissue(adata, tissue_df, include=..., exclude=...)
split_st_by_tissue(adata, tissue_df)
- Aggregate by cell class:
aggregate_st_by_cell_class(adata, cell_hdf5, normalize=True, group_key=None, tissue_df=None, swap_axes=False) -> AnnData
- Validate coordinates:
validate_coordinate_compatibility(h5ad_coords, hdf5_coords=None, tissue_coords=None, adata=None, tolerance_factor=10.0)
- Cropping:
determine_crop_bounds({"x": xs, "y": ys}, bin_size=100) -> bounds
crop_spatial_data(adata, bounds) -> AnnData
crop_cell_data(cell_hdf5, bounds, out_hdf5) -> int
crop_tissue_data(tissue_tsv, bounds, out_tsv, bin_size=100) -> int
- Utilities:
map_cells_to_tissue_classes(cell_coords, tissue_df)
split_hdf5_by_tissue(cell_hdf5, tissue_tsv)
aggregate_hdf5_by_cell_class(cell_hdf5, tissue_tsv=None, group_by_tissue=False, normalize=True)
is_continuous(seq)
,missing_ranges(seq)
- Example:
tissue = sps.st.load_tissue_predictions("tissue_preds.tsv") adata_t = sps.st.subset_st_by_tissue(adata, tissue, include=["Epithelium"]) preds, coords = sps.st.load_cell_predictions("cells.hdf5") adata_agg = sps.st.aggregate_st_by_cell_class(adata, "cells.hdf5", normalize=True) xy = sps.st.get_spatial_coords(adata) spatial = {"x": xy[:,0], "y": xy[:,1]} bounds = sps.st.determine_crop_bounds(spatial, bin_size=100) adata_crop = sps.st.crop_spatial_data(adata, bounds)
Projection and Unroll
- What: Project 2D points onto an arbitrary polyline/curve and aggregate expression along a normalized 0–1 axis.
- Key functions:
project_points_onto_polyline(points, polyline, *, use_kdtree=True, kdtree_k=64, use_grid=False, grid_cell_size=None)
extract_polyline_from_image(path, threshold=64, invert=False, step=1)
unroll_adata_along_polyline(adata, polyline, *, coord_keys=("imagecol","imagerow"), features=None, layer=None, n_bins=50, agg="mean", distance_max=None)
-
unroll_sdata(sdata, points_key, line_key, *, table_key=None, features=None, layer=None, n_bins=50, agg="mean", distance_max=None)
(optional integration with SpatialData; see SpatialData) -
Examples:
import numpy as np import spstencil as sps # 1) Project arbitrary points onto a sine-wave polyline x = np.linspace(-500, 500, 1000) polyline = np.stack([x, np.sin(x/20.0)], axis=1) pts = np.random.uniform([-500, polyline[:,1].min()], [500, polyline[:,1].max()], size=(2000, 2)) res = sps.project_points_onto_polyline(pts, polyline, use_kdtree=True, kdtree_k=64) # res.normalized_positions in [0,1], res.projected_points are closest points on the polyline # 2) Extract a polyline from a thin-curve ROI image poly_from_img = sps.extract_polyline_from_image("testdata/roi.jpg", threshold=64, step=5) # 3) Unroll AnnData expression along a curve agg_df, mapping_df = sps.unroll_adata_along_polyline( adata, poly_from_img, coord_keys=("imagecol", "imagerow"), features=["Car1", "Retnlb", "Prdx6"], n_bins=50, agg="mean", distance_max=250.0, ) # 4) SpatialData adapter (if using SpatialData objects) # agg_df, mapping_df = sps.unroll_sdata( # sdata, points_key="spots", line_key="roi_line", table_key="table", # features=["Car1", "Retnlb", "Prdx6"], n_bins=50 # )
Utilities
convert_cyx_to_yxc_array
Convert a 3D CYX numpy array to YXC by transposing (1, 2, 0).
Raises ValueError if the input is not 3D.
Source code in spstencil/_utils/axes.py
get_tiff_axes
Return axes string from first series of a TIFF, if available.
Examples: "YXC", "CYX", "TCZYX". Returns None if not present.
Source code in spstencil/_utils/axes.py
get_yxc_dims
get_yxc_dims(path: str | Path, *, on_missing: str = 'infer', channel_upper: int = 256, min_spatial: int = 64) -> tuple[int, int, int]
Return (Y, X, C) dimension sizes for a 3D TIFF.
- Uses TIFF axes metadata when available (supports 'YXC' and 'CYX').
- If metadata is missing or unsupported:
- on_missing='error': raise ValueError
- on_missing='force_cyx': assume CYX ordering
- on_missing='infer': 1) if TIFF SamplesPerPixel matches a dimension, use that as C 2) else prefer a dimension <= channel_upper as C; others become Y/X (ideally both >= min_spatial)
Source code in spstencil/_utils/axes.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
imread_yxc
Read a TIFF from disk and return a 3D YXC numpy array.
Uses TIFF metadata to determine axes: - If axes == 'YXC': returns as-is - If axes == 'CYX': converts to YXC - If axes missing or different: raises unless force=True (assumes CYX)
Source code in spstencil/_utils/axes.py
- What: TIFF axis handling and conversions for arrays/images.
- Key functions:
get_tiff_axes(path) -> Optional[str]
imread_yxc(path, force=False) -> np.ndarray
# returns YXCconvert_cyx_to_yxc_array(img) -> np.ndarray
get_yxc_dims(path, on_missing='infer', channel_upper=256, min_spatial=64) -> (Y, X, C)
- Example:
convert_cyx_to_yxc.py ───────────────────────────────────────────────
Walk through a directory, find every *.tif(f) image assumed to be in CYX (channels–first) order, transpose it to YXC (channels–last) order, and save the result as a new TIFF.
The output keeps only the first three dimensions and adds an axes
entry to the TIFF metadata so downstream tools (e.g. happy
) recognise
the layout.
Usage
$ python -m spstencil._utils.cyx2yxc /path/to/input_dir [/path/to/output_dir]
• If output_dir is omitted, files are written next to the originals
with the suffix _yxc.tif
.
• Recurses into sub‑directories.
convert_cyx_dir_to_yxc
convert_cyx_dir_to_yxc(input_dir: Path, output_dir: Path | None = None, *, workers: int | None = None, force: bool = False) -> int
Convert all TIFFs under input_dir from CYX to YXC, respecting metadata.
Returns the number of successfully converted files. Skips files that are already YXC or have incompatible/missing axes unless force=True.
Source code in spstencil/_utils/cyx2yxc.py
- What: Batch convert directories of TIFFs from CYX to YXC.
- Key function:
convert_cyx_dir_to_yxc(input_dir, output_dir=None, workers=None, force=False) -> bool|int
- Example: