sparrow.tb.score_genes_iter

sparrow.tb.score_genes_iter#

sparrow.tb.score_genes_iter(sdata, labels_layer, table_layer, output_layer, path_marker_genes, delimiter=',', min_score='Zero', min_score_p=25, scaling='Nmarkers', scale_score_p=1, n_iter=5, calculate_umap=False, calculate_neighbors=False, neigbors_kwargs=mappingproxy({}), umap_kwargs=mappingproxy({}), output_dir=None, celltype_column='annotation', overwrite=False)#

Iterative annotation algorithm.

For each cell, a score is calculated for each cell type.

In the 0-th iteration this is:

First mean expression is substracted from expression levels. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.

And in following iterations:

Expression levels are normalized by substracting the mean over all celltypes assigned in iteration i-1. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.

Function expects scaled data (obtained through e.g. scanpy.pp.scale).

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • labels_layer (list[str]) – The labels layer(s) of sdata used to select the cells via the _REGION_KEY in sdata.tables[table_layer].obs. Note that if output_layer is equal to table_layer and overwrite is True, cells in sdata.tables[table_layer] linked to other labels_layer (via the _REGION_KEY), will be removed from sdata.tables[table_layer]. If a list of labels layers is provided, they will therefore be scored together (e.g. multiple samples).

  • table_layer (str) – The table layer in sdata on which to perform annotation on. We assume the data is already preprocessed by e.g. sp.tb.preprocess_transcriptomics. Features should all have approximately same variance.

  • output_layer (str) – The output table layer in sdata to which table layer with results of annotation will be written.

  • path_marker_genes (str | Path | DataFrame) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.

  • delimiter (str (default: ',')) – Delimiter used in the CSV file.

  • min_score (Literal['Zero', 'Quantile', None] (default: 'Zero')) – Min score method. Choose from one of these options: “Zero”, “Quantile”, None.

  • min_score_p (float (default: 25)) – Min score percentile. Ignored if min_score is not set to “Quantile”.

  • scaling (Literal['MinMax', 'ZeroMax', 'Nmarkers', 'Robust', 'Rank'] (default: 'Nmarkers')) – Scaling method. Choose from one of these options: “MinMax”, “ZeroMax”, “Nmarkers”, “Robust”, “Rank”.

  • scale_score_p (float (default: 1)) – Scale score percentile.

  • n_iter (int (default: 5)) – Number of iterations.

  • calculate_umap (default: False) – If True, calculates a UMAP via scanpy.tl.umap for visualization of obtained annotations per iteration. If False and ‘umap’ or ‘X_umap’ is not in .obsm, then no umap will be plotted.

  • calculate_neighbors (default: False) – If True, calculates neighbors via scanpy.pp.neighbors. Ignored if calculate_umap is set to False.

  • umap_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments passed to scanpy.tl.umap. Ignored if calculate_umap is False.

  • neigbors_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments passed to scanpy.pp.neighbors. Ignored if calculate_umap is False or if calculate_neighbors is set to False and “neighbors” already in .uns.keys().

  • output_dir (default: None) – If specified, figures with umaps will be saved in this directory after each iteration. If None, the plots will be displayed directly without saving.

  • celltype_column (str (default: 'annotation')) – The column name in the SpatialData object’s table that specifies the cell type annotations. The default value is _ANNOTATION_KEY.

  • overwrite (bool (default: False)) – If True, overwrites the output_layer if it already exists in sdata.

Return type:

tuple[SpatialData, list[str], list[str]]

Returns:

: tuple:

  • Updated sdata.

  • list of strings, with all celltypes that are scored (but are not in the del_celltypes list).

  • list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.