Requirements

Before starting you’ll need to make sure you have the following files prepared:

  • Reference library ( *.fasta / *.fasta.gz )
  • Sample Libraries ( *.fastq / *.fastq.gz )
  • sgRNA to Gene Mapping ( *.txt / *.tsv / *.tab ) [ optional ]

Reference Library

This should be a FASTA formatted file, and can have any file extension (*.fasta, *.fa) and their gzip counterparts.

This is expected to the the variable regions of your sgRNAs - and should not have any adapter regions on either side.

Here is tiny example library:

>lib.0
ATAGCCCGGCGGTCTGCTGG
>lib.1
TAAGGCACTATAGCAATGAG
>lib.2
GTAGATAAAACGTGTGGCCC
>lib.3
AAGGCGACCATCTACCCTTG

Sample Library

These are the larger FASTQ formatted files which represent the results of your screen. They can have any file extension (*.fastq / *.fq ) and their gzip counterparts.

This are not expected to be the variable regions, so don’t worry about pretrimming these sequences before running them with sgcount.

Here is a tiny example sample:

@seq.AACGTTCTCCAGTATGAAAG.0
ATNGCAACGTTCTCCAGTATGAAAGTAGCGACAAGACGGGCCAAGAGGGACTGCGCACCACGTAGTTACCCCGATCCTAT
+
43212322322242515413324331541432414553224213511111344532442224113253532413451225
@seq.CGGTTCCCTGCCGCTACGAG.1
ATNGCCGGTTCCCTGCCGCTACGAGTAGCGACAAGACGGGCCAAGAGGGACTGCGCACCACGTAGTTACCCCGATCCTAT
+
23233555242215242532355415123114534342422111212445152424453152255425331534444213
@seq.CTCGCCGCGCGGCACTATTG.2
ATNGCCTCGCCGCGCGGCACTATTGTAGCGACAAGACGGGCCAAGAGGGACTGCGCACCACGTAGTTACCCCGATCCTAT
+
54532443112431133412311213532322244241224451345215242125451241523232121145343513
@seq.TATAGACATATTATACGTCC.3
ATNGCTATAGACATATTATACGTCCTAGCGACAAGACGGGCCAAGAGGGACTGCGCACCACGTAGTTACCCCGATCCTAT
+
33231435244335232142144245314521453354531535215154523311555133141253412544112225
@seq.GGTTTGTTACGCGAGCAGTT.4
ATNGCGGTTTGTTACGCGAGCAGTTTAGCGACAAGACGGGCCAAGAGGGACTGCGCACCACGTAGTTACCCCGATCCTAT
+
52245315235112214142511531543122452153335313154325215245554114252235434421423233

sgRNA to Gene Mapping

Generally, and especially for v2 libraries, you have multiple sgRNAs for each target gene.

If you’d like to include the sgRNA → Gene information in the final output table (like if you were planning on doing differential expression next!) you can include that information with a two-column tab-delim file.

This <s2g> is a two-column tab-delim file where the first column is the sgRNA name as it appears in your <input_library> and the second column is the gene that it targets (or any alias you would like to assign it).

sgrna_1	AP2S1
sgrna_2	AP2S1
sgrna_3	RFX3
sgrna_4	RFX3
sgrna_5	LDB1
sgrna_6	LDB1
sgrna_7	LDB1

If you don’t have this table, and you don’t know how to make it easily with a bash script, I invite you to checkout my tool fxtools where I’ve written in a simple tool to extract the gene names from sgRNA libraries.