sparrow.tb.preprocess_proteomics#
- sparrow.tb.preprocess_proteomics(sdata, labels_layer, table_layer, output_layer, size_norm=True, log1p=True, scale=False, max_value_scale=10, q=None, calculate_pca=False, n_comps=50, overwrite=False)#
Preprocess a table (AnnData) attribute of a SpatialData object for proteomics data.
Performs optional normalization (on size or via
scanpy.sc.pp.normalize_total
), log transformation (scanpy.pp.log1p
), scaling (scanpy.pp.scale
)/ quantile normalization and PCA calculation (scanpy.tl.pca
) for proteomics data contained insdata
.- Parameters:
sdata (
SpatialData
) – The input SpatialData object.labels_layer (
Union
[str
,Iterable
[str
]]) – The labels layer(s) ofsdata
used to select the cells via the _REGION_KEY insdata.tables[table_layer].obs
. Note that ifoutput_layer
is equal totable_layer
and overwrite is True, cells insdata.tables[table_layer]
linked to otherlabels_layer
(via the _REGION_KEY), will be removed fromsdata.tables[table_layer]
. If a list of labels layers is provided, they will therefore be preprocessed together (e.g. multiple samples).table_layer (
str
) – The table layer insdata
on which to perform preprocessing on.output_layer (
str
) – The output table layer insdata
to which preprocessed table layer will be written.size_norm (
bool
(default:True
)) – IfTrue
, normalization is based on the size of the nucleus/cell. If False,scanpy.sc.pp.normalize_total
is used for normalization.log1p (
bool
(default:True
)) – IfTrue
, applies log1p transformation to the data.scale (
bool
(default:False
)) – IfTrue
, scales the data to have zero mean and a variance of one. The scaling is capped atmax_value_scale
.max_value_scale (
float
(default:10
)) – The maximum value to which data will be scaled. Ignored ifscale
isFalse
.q (
Optional
[float
] (default:None
)) – Quantile used for normalization. If specified, values are normalized by this quantile calculated for eachadata.var
. Values are multiplied by 100 after normalization. Typical value used is 0.999,calculate_pca (
bool
(default:False
)) – IfTrue
, calculates principal component analysis (PCA) on the data.n_comps (
int
(default:50
)) – Number of principal components to calculate. Ignored ifcalculate_pca
is False.overwrite (
bool
(default:False
)) – IfTrue
, overwrites theoutput_layer
if it already exists insdata
.
- Return type:
SpatialData
- Returns:
: The
sdata
containing the preprocessed AnnData object as an attribute (sdata.tables[output_layer]
).- Raises:
ValueError –
If
sdata
does not contains any labels layers. - Ifsdata
does not contain any table layers. - Iflabels_layer
, or one of the element oflabels_layer
is not a labels layer insdata
. - Iftable_layer
is not a table layer insdata
. - If bothscale
is set to True andq
is not None.
Warning
If
scale
is True andmax_value_scale
is set too low, it may overly constrain the variability of the data, potentially impacting downstream analyses.If the dimensionality of
sdata.tables[table_layer]
is smaller than the desired number of principal components whencalculate_pca
is True,n_comps
is set to the minimum dimensionality, and a message is printed.
See also
sparrow.tb.allocate_intensity
create an AnnData table in
sdata
using animage_layer
and alabels_layer
.