sparrow.io.read_transcripts#
- sparrow.io.read_transcripts(sdata, path_count_matrix, path_transform_matrix=None, output_layer='transcripts', overwrite=False, debug=False, column_x=0, column_y=1, column_z=None, column_gene=3, column_midcount=None, delimiter=',', header=None, comment=None, crd=None, to_coordinate_system='global', filter_gene_names=None, blocksize='64MB')#
Reads transcript information from a file with each row listing the x and y coordinates, along with the gene name.
If a transform matrix is provided a linear transformation is applied to the coordinates of the transcripts. The transformation is applied to the dask dataframe before adding it to
sdata
. The SpatialData object is augmented with a points layer namedoutput_layer
that contains the transcripts.- Parameters:
sdata (
SpatialData
) – The SpatialData object to which the transcripts will be added.path_count_matrix (
str
|Path
) – Path to a.parquet
file or.csv
file containing the transcripts information. Each row should contain an x, y coordinate and a gene name. Optional a midcount column is provided. If a midcount column is provided, rows are repeated.path_transform_matrix (
Union
[str
,Path
,None
] (default:None
)) – This file should contain a 3x3 transformation matrix for the affine transformation. The matrix defines the linear transformation to be applied to the coordinates of the transcripts. If no transform matrix is specified, the identity matrix will be used.output_layer (str, default='transcripts'.) – Name of the points layer of the SpatialData object to which the transcripts will be added.
overwrite (bool, default=False) – If True overwrites the
output_layer
(a points layer) if it already exists.debug (bool, default=False) – If True, a sample of the data is processed for debugging purposes.
column_x (
int
(default:0
)) – Column index of the X coordinate in the count matrix.column_y (
int
(default:1
)) – Column index of the Y coordinate in the count matrix.column_z (
Optional
[int
] (default:None
)) – Column index of the Z coordinate in the count matrix.column_gene (
int
(default:3
)) – Column index of the gene information in the count matrix.column_midcount (
Optional
[int
] (default:None
)) – Column index for the count value to repeat rows in the count matrix. Ignored when set to None.delimiter (
str
(default:','
)) – Delimiter used to separate values in the.csv
file. Ignored ifpath_count_matrix
is a.parquet
file.header (
Optional
[int
] (default:None
)) – Row number to use as the header in the CSV file. If None, no header is used. Ignored ifpath_count_matrix
is a.parquet
file.comment (
Optional
[str
] (default:None
)) – Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Ignored ifpath_count_matrix
is a.parquet
file.crd (
Optional
[tuple
[int
,int
,int
,int
]] (default:None
)) – The coordinates (in pixels) for the region of interest in the format (xmin, xmax, ymin, ymax). If None, all transcripts are considered.to_coordinate_system (
str
(default:'global'
)) – Coordinate system to whichoutput_layer
will be added.filter_gene_names (
Union
[str
,list
,None
] (default:None
)) – Regular expression(s) of gene names that need to be filtered out (via str.contains), mostly control genes that were added, and which you don’t want to use. If list of strings, all items in the list are seen as regular expressions. Filtering is case insensitive.blocksize (
str
(default:'64MB'
)) – Block size of the partions of the dask dataframe stored aspoints_layer
insdata
.
- Return type:
SpatialData
- Returns:
: The updated SpatialData object containing the transcripts.
Notes
This function reads a .csv file using Dask and applies a transformation matrix to the coordinates. It can also repeat rows based on the
MIDCount
value and can work in a debug mode that samples the data.