sparrow.io.read_transcripts

sparrow.io.read_transcripts#

sparrow.io.read_transcripts(sdata, path_count_matrix, transform_matrix=None, pixelSize=None, output_layer='transcripts', overwrite=False, debug=False, column_x=0, column_y=1, column_z=None, column_gene=3, column_midcount=None, delimiter=',', header=None, comment=None, crd=None, to_coordinate_system='global', filter_gene_names=None, blocksize='64MB')#

Reads transcript information from a file with each row listing the x and y coordinates, along with the gene name.

If a transform matrix is provided an affine transformation is applied to the coordinates of the transcripts. The transformation is applied to the dask dataframe before adding it to sdata. The SpatialData object is augmented with a points layer named output_layer that contains the transcripts.

Parameters:

sdata (SpatialData) – The SpatialData object to which the transcripts will be added.
path_count_matrix (str | Path) – Path to a .parquet file or .csv file containing the transcripts information. Each row should contain an x (column_x), y (column_y) coordinate and a gene name (column_gene). Optional a count column (see column_midcount) is provided.
transform_matrix (str | Path | ndarray[Any, dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]] | None (default: None)) – This numpy array should contain a 3x3 transformation matrix for the affine transformation. The matrix defines the linear transformation to be applied to the coordinates of the transcripts before adding it as a points layer to sdata. E.g.: | Sx 0 Tx | | 0 Sy Ty | | 0 0 1 | If no transform matrix is specified, the identity matrix will be used. If transform_matrix is specified as a path to a file, it will be read via numpy.loadtext.
output_layer (str (default: 'transcripts')) – Name of the points layer of the SpatialData object to which the transcripts will be added.
overwrite (bool (default: False)) – If True overwrites the output_layer (a points layer) if it already exists.
debug (bool (default: False)) – If True, a sample of the data is processed for debugging purposes.
pixelSize (float | None) – Pixel size in microns. If provided, a scaling transformation matrix is created based on this value. Ignored if path_transform_matrix is provided.
column_x (int (default: 0)) – Column index of the X coordinate in the count matrix.
column_y (int (default: 1)) – Column index of the Y coordinate in the count matrix.
column_z (int | None (default: None)) – Column index of the Z coordinate in the count matrix.
column_gene (int (default: 3)) – Column index of the gene information in the count matrix.
column_midcount (int | None (default: None)) – Specifies the column index that contains the count of how many times the gene is detected at that particular location. Ignored when set to None.
delimiter (str (default: ',')) – Delimiter used to separate values in the .csv file. Ignored if path_count_matrix is a .parquet file.
header (int | None (default: None)) – Row number to use as the header in the .csv file. If None, no header is used. Ignored if path_count_matrix is a .parquet file.
comment (str | None (default: None)) – Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Ignored if path_count_matrix is a .parquet file.
crd (tuple[int, int, int, int] | None (default: None)) – The coordinates (in pixels) for the region of interest in the format (xmin, xmax, ymin, ymax). If None, all transcripts are considered.
to_coordinate_system (str (default: 'global')) – Coordinate system to which output_layer will be added.
filter_gene_names (str | list[str] | None (default: None)) – Gene names that need to be filtered out (via str.contains), mostly control genes that were added, and which you don’t want to use. Filtering is case insensitive.
blocksize (str (default: '64MB')) – Block size of the partions of the dask dataframe stored as points_layer in sdata.

Return type:

SpatialData

Returns:

: The updated SpatialData object containing the transcripts.

sparrow.io.read_transcripts

Contents

sparrow.io.read_transcripts#