sparrow.tb.score_genes

Contents

sparrow.tb.score_genes#

sparrow.tb.score_genes(sdata, labels_layer, table_layer, output_layer, path_marker_genes, delimiter=',', row_norm=False, repl_columns=None, del_celltypes=None, input_dict=False, celltype_column='annotation', overwrite=False, **kwargs)#

The function loads marker genes from a CSV file and scores cells for each cell type using those markers using scanpy’s sc.tl.score_genes function.

Function annotates cells to the celltype with the maximum score obtained through sc.tl.score_genes. Marker genes can be provided as a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column; or in dictionary format. The function further allows replacements of column names and deletions of specific marker genes.

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • labels_layer (list[str]) – The labels layer(s) of sdata used to select the cells via the _REGION_KEY in sdata.tables[table_layer].obs. Note that if output_layer is equal to table_layer and overwrite is True, cells in sdata.tables[table_layer] linked to other labels_layer (via the _REGION_KEY), will be removed from sdata.tables[table_layer]. If a list of labels layers is provided, they will therefore be scored together (e.g. multiple samples).

  • table_layer (str) – The table layer in sdata on which to perform annotation on.

  • output_layer (str) – The output table layer in sdata to which table layer with results of annotation will be written.

  • path_marker_genes (str | Path | DataFrame) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.

  • delimiter (default: ',') – Delimiter used in the CSV file, default is ‘,’.

  • row_norm (bool (default: False)) – Flag to determine if row normalization is applied, default is False.

  • repl_columns (Optional[dict[str, str]] (default: None)) – Dictionary containing cell types to be replaced. The keys are the original cell type names and the values are their replacements.

  • del_celltypes (Optional[dict[str]] (default: None)) – List of cell types to be deleted from the list of possible cell type candidates. Cells are scored for these cell types, but will not be assigned a cell type from this list.

  • input_dict (bool (default: False)) – If True, the marker gene list from the CSV file is treated as a dictionary with the first column being the cell type names and the subsequent columns being the marker genes for those cell types. Default is False.

  • celltype_column (str (default: 'annotation')) – The column name in the SpatialData object’s table that specifies the cell type annotations. The default value is _ANNOTATION_KEY.

  • overwrite (bool (default: False)) – If True, overwrites the output_layer if it already exists in sdata.

  • **kwargs (Any) – Additional keyword arguments passed to scanpy.tl.score_genes.

Return type:

tuple[SpatialData, list[str], list[str]]

Returns:

: tuple:

  • Updated sdata.

  • list of strings, with all celltypes that are scored (but are not in the del_celltypes list).

  • list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.

Notes

The cell type _UNKNOWN_CELLTYPE_KEY is reserved for cells that could not be assigned a specific cell type.