sparrow.tb.score_genes#
- sparrow.tb.score_genes(sdata, labels_layer, table_layer, output_layer, path_marker_genes, delimiter=',', row_norm=False, repl_columns=None, del_celltypes=None, input_dict=False, celltype_column='annotation', overwrite=False, **kwargs)#
The function loads marker genes from a CSV file and scores cells for each cell type using those markers using scanpy’s
sc.tl.score_genes
function.Function annotates cells to the celltype with the maximum score obtained through
sc.tl.score_genes
. Marker genes can be provided as a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column; or in dictionary format. The function further allows replacements of column names and deletions of specific marker genes.- Parameters:
sdata (
SpatialData
) – The SpatialData object.labels_layer (
list
[str
]) – The labels layer(s) ofsdata
used to select the cells via the _REGION_KEY insdata.tables[table_layer].obs
. Note that ifoutput_layer
is equal totable_layer
and overwrite is True, cells insdata.tables[table_layer]
linked to otherlabels_layer
(via the _REGION_KEY), will be removed fromsdata.tables[table_layer]
. If a list of labels layers is provided, they will therefore be scored together (e.g. multiple samples).table_layer (
str
) – The table layer insdata
on which to perform annotation on.output_layer (
str
) – The output table layer insdata
to which table layer with results of annotation will be written.path_marker_genes (
str
|Path
|DataFrame
) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.delimiter (default:
','
) – Delimiter used in the CSV file, default is ‘,’.row_norm (
bool
(default:False
)) – Flag to determine if row normalization is applied, default is False.repl_columns (
Optional
[dict
[str
,str
]] (default:None
)) – Dictionary containing cell types to be replaced. The keys are the original cell type names and the values are their replacements.del_celltypes (
Optional
[dict
[str
]] (default:None
)) – List of cell types to be deleted from the list of possible cell type candidates. Cells are scored for these cell types, but will not be assigned a cell type from this list.input_dict (
bool
(default:False
)) – If True, the marker gene list from the CSV file is treated as a dictionary with the first column being the cell type names and the subsequent columns being the marker genes for those cell types. Default is False.celltype_column (
str
(default:'annotation'
)) – The column name in the SpatialData object’s table that specifies the cell type annotations. The default value is_ANNOTATION_KEY
.overwrite (
bool
(default:False
)) – If True, overwrites theoutput_layer
if it already exists insdata
.**kwargs (
Any
) – Additional keyword arguments passed toscanpy.tl.score_genes
.
- Return type:
tuple
[SpatialData
,list
[str
],list
[str
]]- Returns:
: tuple:
Updated
sdata
.list of strings, with all celltypes that are scored (but are not in the del_celltypes list).
list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.
Notes
The cell type
_UNKNOWN_CELLTYPE_KEY
is reserved for cells that could not be assigned a specific cell type.