Assignment
Now that we have the PCR sequences aligned to their respective guides we can perform the assignment step.
Summary
The assignment step will be done with geomux
.
This is a tool I created based on a hypergeometric test which can assign cells to guides with high confidence.
The intuition behind the mathematics is that every cell with an sgRNA will have a background distribution (or sequencing noise) and a signal distribution (i.e. its true sgRNAs). The hypergeometric test is used to identify what sgRNA abundance occurs above random chance, and then a ratio test is used to select for cells with only a single signal sgRNA (i.e. filter against high multiplicity of infection).
Installation
We can install geomux
directly from its github repo.
pip install git+https://github.com/noamteyssier/geomux
Usage
We can use geomux
directly from the command line, or we can use it from
a jupyter notebook.
I prefer to run it from the commandline and will include those instructions here.
If you prefer to use it from a jupyter notebook, check out the tutorial on the repo.
Creating an assignment directory
First we create a directory to hold our assignment results.
mkdir -p meta/assignment/
Running geomux
We can now run geomux
with the following command:
geomux \
-i alignment/pcr/sampleX/counts_filtered/adata.h5ad \
-o meta/assignment/sampleX.tab
This will create a tab-separated file (*.tab
) which will have the cell-barcode,
associated guide, and statistics about the assignment provided.
Intersecting the 10X sequencing data and the Assignments
We have now done all the steps necessary to intersect the 10X single-cell sequencing data with the enrichment PCR.
We can now just load in the 10X sequencing data and merge the dataframes together to have our cells assigned to their respective sgRNAs.
The next step will assume you have scanpy
installed already.
import pandas as pd
import scanpy as sc
# load assignments
assignments = pd.read_csv(
"meta/assignment/sampleX.tab",
sep="\t",
index_col="barcode"
)
# load anndata (10X)
adata = sc.read(
"alignment/10x/sampleX/counts_filtered/adata.h5ad"
)
# merge the two
adata.obs = adata.obs.merge(
assignment,
left_index=True,
right_index=True,
how="left"
)
# drop cells that do not have an assignment
adata = adata.obs[~adata.obs.guide.isna()].copy()
# view the merged anndata
adata