sparrow.tb.score_genes_iter#
- sparrow.tb.score_genes_iter(sdata, labels_layer, table_layer, output_layer, path_marker_genes, delimiter=',', min_score='Zero', min_score_p=25, scaling='Nmarkers', scale_score_p=1, n_iter=5, calculate_umap=False, calculate_neighbors=False, neigbors_kwargs=mappingproxy({}), umap_kwargs=mappingproxy({}), output_dir=None, celltype_column='annotation', overwrite=False)#
Iterative annotation algorithm.
For each cell, a score is calculated for each cell type.
In the 0-th iteration this is:
First mean expression is substracted from expression levels. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.
And in following iterations:
Expression levels are normalized by substracting the mean over all celltypes assigned in iteration i-1. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.
Function expects scaled data (obtained through e.g.
scanpy.pp.scale
).- Parameters:
sdata (
SpatialData
) – The SpatialData object.labels_layer (
list
[str
]) – The labels layer(s) ofsdata
used to select the cells via the _REGION_KEY insdata.tables[table_layer].obs
. Note that ifoutput_layer
is equal totable_layer
and overwrite is True, cells insdata.tables[table_layer]
linked to otherlabels_layer
(via the _REGION_KEY), will be removed fromsdata.tables[table_layer]
. If a list of labels layers is provided, they will therefore be scored together (e.g. multiple samples).table_layer (
str
) – The table layer insdata
on which to perform annotation on. We assume the data is already preprocessed by e.g.sp.tb.preprocess_transcriptomics
. Features should all have approximately same variance.output_layer (
str
) – The output table layer insdata
to which table layer with results of annotation will be written.path_marker_genes (
str
|Path
|DataFrame
) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.delimiter (
str
(default:','
)) – Delimiter used in the CSV file.min_score (
Literal
['Zero'
,'Quantile'
,None
] (default:'Zero'
)) – Min score method. Choose from one of these options: “Zero”, “Quantile”, None.min_score_p (
float
(default:25
)) – Min score percentile. Ignored ifmin_score
is not set to “Quantile”.scaling (
Literal
['MinMax'
,'ZeroMax'
,'Nmarkers'
,'Robust'
,'Rank'
] (default:'Nmarkers'
)) – Scaling method. Choose from one of these options: “MinMax”, “ZeroMax”, “Nmarkers”, “Robust”, “Rank”.scale_score_p (
float
(default:1
)) – Scale score percentile.n_iter (
int
(default:5
)) – Number of iterations.calculate_umap (default:
False
) – IfTrue
, calculates a UMAP viascanpy.tl.umap
for visualization of obtained annotations per iteration. IfFalse
and ‘umap’ or ‘X_umap’ is not in .obsm, then no umap will be plotted.calculate_neighbors (default:
False
) – IfTrue
, calculates neighbors viascanpy.pp.neighbors
. Ignored ifcalculate_umap
is set toFalse
.umap_kwargs (
Mapping
[str
,Any
] (default:mappingproxy({})
)) – Keyword arguments passed toscanpy.tl.umap
. Ignored ifcalculate_umap
isFalse
.neigbors_kwargs (
Mapping
[str
,Any
] (default:mappingproxy({})
)) – Keyword arguments passed toscanpy.pp.neighbors
. Ignored ifcalculate_umap
isFalse
or ifcalculate_neighbors
is set toFalse
and “neighbors” already in.uns.keys()
.output_dir (default:
None
) – If specified, figures with umaps will be saved in this directory after each iteration. If None, the plots will be displayed directly without saving.celltype_column (
str
(default:'annotation'
)) – The column name in the SpatialData object’s table that specifies the cell type annotations. The default value is_ANNOTATION_KEY
.overwrite (
bool
(default:False
)) – If True, overwrites theoutput_layer
if it already exists insdata
.
- Return type:
tuple
[SpatialData
,list
[str
],list
[str
]]- Returns:
: tuple:
Updated
sdata
.list of strings, with all celltypes that are scored (but are not in the del_celltypes list).
list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.