sparrow.tb.preprocess_transcriptomics#
- sparrow.tb.preprocess_transcriptomics(sdata, labels_layer, table_layer, output_layer, min_counts=10, min_cells=5, size_norm=True, highly_variable_genes=False, highly_variable_genes_kwargs=mappingproxy({}), max_value_scale=10, n_comps=50, update_shapes_layers=True, overwrite=False)#
Preprocess a table (AnnData) attribute of a SpatialData object for transcriptomics data.
Performs filtering (via
scanpy.pp.filter_cells
andscanpy.pp.filter_genes
) and optional normalization (on size or viascanpy.sc.pp.normalize_total
), log transformation (scanpy.pp.log1p
), highly variable genes selection (scanpy.pp.highly_variable_genes
), scaling (scanpy.pp.scale
), and PCA calculation (scanpy.tl.pca
) for transcriptomics data contained in thesdata
. QC metrics are added tosdata.tables[output_layer].obs
usingscanpy.pp.calculate_qc_metrics
.- Parameters:
sdata (
SpatialData
) – The input SpatialData object.labels_layer (
Union
[str
,Iterable
[str
]]) – The labels layer(s) ofsdata
used to select the cells via the _REGION_KEY insdata.tables[table_layer].obs
. Note that ifoutput_layer
is equal totable_layer
and overwrite is True, cells insdata.tables[table_layer]
linked to otherlabels_layer
(via the _REGION_KEY), will be removed fromsdata.tables[table_layer]
(also from the backing zarr store if it is backed).table_layer (
str
) – The table layer insdata
on which to perform preprocessing on.output_layer (
str
) – The output table layer insdata
to which preprocessed table layer will be written.min_counts (
int
(default:10
)) – Minimum number of genes a cell should contain to be kept (passed toscanpy.pp.filter_cells
).min_cells (
int
(default:5
)) – Minimum number of cells a gene should be in to be kept (passed toscanpy.pp.filter_genes
).size_norm (
bool
(default:True
)) – IfTrue
, normalization is based on the size of the nucleus/cell. IfFalse
,scanpy.sc.pp.normalize_total
is used for normalization.highly_variable_genes (
bool
(default:False
)) – IfTrue
, will only retain highly variable genes, as calculated byscanpy.pp.highly_variable_genes
.highly_variable_genes_kwargs (
Mapping
[str
,Any
] (default:mappingproxy({})
)) – Keyword arguments passed toscanpy.pp.highly_variable_genes
. Ignored ifhighly_variable_genes
isFalse
.max_value_scale (
int
(default:10
)) – The maximum value to which data will be scaled, usingscanpy.pp.scale
.n_comps (
int
(default:50
)) – Number of principal components to calculate.update_shapes_layers (
bool
(default:True
)) – Whether to filter the shapes layers associated withlabels_layer
. If set toTrue
, cells that do not appear in resultingoutput_layer
(with_REGION_KEY
equal tolabels_layer
) will be removed from the shapes layers (via_INSTANCE_KEY
) in thesdata
object. Filtered shapes will be added tosdata
with prefix ‘filtered_low_counts’.overwrite (
bool
(default:False
)) – If True, overwrites theoutput_layer
if it already exists insdata
.
- Return type:
SpatialData
- Returns:
: The
sdata
containing the preprocessed AnnData object as an attribute (sdata.tables[output_layer]
).- Raises:
ValueError – If
sdata
does not have labels attribute.ValueError – If
sdata
does not have tables attribute.ValueError – If
labels_layer
, or one of the element oflabels_layer
is not a labels layer insdata
.ValueError – If
table_layer
is not a table layer insdata
.
Warning
If
max_value_scale
is set too low, it may overly constrain the variability of the data, potentially impacting downstream analyses.If the dimensionality of
sdata.tables[table_layer]
is smaller than the desired number of principal components,n_comps
is set to the minimum dimensionality, and a message is printed.
See also
sparrow.tb.allocate
create an AnnData table in
sdata
using apoints_layer
and alabels_layer
.