sparrow.tb.preprocess_proteomics

sparrow.tb.preprocess_proteomics#

sparrow.tb.preprocess_proteomics(sdata, labels_layer, table_layer, output_layer, size_norm=True, log1p=True, scale=False, max_value_scale=10, q=None, calculate_pca=False, n_comps=50, overwrite=False)#

Preprocess a table (AnnData) attribute of a SpatialData object for proteomics data.

Performs optional normalization (on size or via scanpy.sc.pp.normalize_total), log transformation (scanpy.pp.log1p), scaling (scanpy.pp.scale)/ quantile normalization and PCA calculation (scanpy.tl.pca) for proteomics data contained in sdata.

Parameters:

sdata (SpatialData) – The input SpatialData object.
labels_layer (Union[str, Iterable[str]]) – The labels layer(s) of sdata used to select the cells via the _REGION_KEY in sdata.tables[table_layer].obs. Note that if output_layer is equal to table_layer and overwrite is True, cells in sdata.tables[table_layer] linked to other labels_layer (via the _REGION_KEY), will be removed from sdata.tables[table_layer]. If a list of labels layers is provided, they will therefore be preprocessed together (e.g. multiple samples).
table_layer (str) – The table layer in sdata to apply preprocessing to. It is an AnnData object containing total intensities per cell in .obs (rows) and per channel in .var (columns).
output_layer (str) – The output table layer in sdata to which preprocessed table layer will be written.
size_norm (bool (default: True)) – If True, normalization is based on the size of the nucleus/cell. If False, scanpy.sc.pp.normalize_total is used for normalization.
log1p (bool (default: True)) – If True, applies log1p transformation to the data.
scale (bool (default: False)) – If True, scales the data to have zero mean and a variance of one. The scaling is capped at max_value_scale.
max_value_scale (float | None (default: 10)) – The maximum value to which data will be scaled. Ignored if scale is False.
q (float | None (default: None)) – Quantile used for normalization. If specified, values are normalized by this quantile calculated for each adata.var. Values are multiplied by 100 after normalization. Typical value used is 0.999,
calculate_pca (bool (default: False)) – If True, calculates principal component analysis (PCA) on the data.
n_comps (int (default: 50)) – Number of principal components to calculate. Ignored if calculate_pca is False.
overwrite (bool (default: False)) – If True, overwrites the output_layer if it already exists in sdata.

Return type:

SpatialData

Returns:

: The sdata containing the preprocessed AnnData object as an attribute (sdata.tables[output_layer]).

Raises:

ValueError –

If sdata does not contains any labels layers. - If sdata does not contain any table layers. - If labels_layer, or one of the element of labels_layer is not a labels layer in sdata. - If table_layer is not a table layer in sdata. - If both scale is set to True and q is not None.

Warning

If scale is True and max_value_scale is set too low, it may overly constrain the variability of the data, potentially impacting downstream analyses.
If the dimensionality of sdata.tables[table_layer] is smaller than the desired number of principal components when calculate_pca is True, n_comps is set to the minimum dimensionality, and a message is printed.

sparrow.tb.preprocess_proteomics

Contents

sparrow.tb.preprocess_proteomics#