sparrow.pl.analyse_genes_left_out

sparrow.pl.analyse_genes_left_out#

sparrow.pl.analyse_genes_left_out(sdata, labels_layer, table_layer, points_layer='transcripts', to_coordinate_system='global', name_x='x', name_y='y', name_gene_column='gene', output=None)#

Analyse and visualize the proportion of genes that could not be assigned to a cell during allocation step.

Parameters:
  • sdata (SpatialData) – Data containing spatial information for plotting.

  • labels_layer (str) – The layer in sdata that contains the segmentation masks. This layer is used to calculate the crd (region of interest) that was used in the segmentation step, otherwise transcript counts in points_layer of sdata (containing all transcripts) and the counts obtained via sdata.tables[ table_layer ] are not comparable. It is also used to select the cells in sdata.tables[table_layer] that are linked to this labels_layer via the _REGION_KEY.

  • table_layer (str) – The table layer in sdata on which to perform analysis.

  • points_layer (str (default: 'transcripts')) – The layer in sdata containing transcript information.

  • to_coordinate_system (str (default: 'global')) – The coordinate system that holds labels_layer and points_layer.

  • name_x (str (default: 'x')) – The column name representing the x-coordinate in points_layer.

  • name_y (str (default: 'y')) – The column name representing the y-coordinate in points_layer.

  • name_gene_column (str (default: 'gene')) – The column name representing the gene name in points_layer.

  • output (Union[str, Path, None] (default: None)) – The path to save the generated plots. If None, plots will be shown directly using plt.show().

Return type:

DataFrame

Returns:

: A DataFrame containing information about the proportion of transcripts kept for each gene, raw counts (i.e. obtained from points_layer of sdata), and the log of raw counts.

Raises:

AttributeError – If the provided sdata does not contain the necessary attributes (i.e., ‘labels’ or ‘points’).

Notes

This function produces two plots:
  • A scatter plot of the log of raw gene counts vs. the proportion of transcripts kept.

  • A regression plot for the same data with Pearson correlation coefficients.

The function also prints the ten genes with the highest proportion of transcripts filtered out.