[ fxtools extract-variable ]

Summary

This command will extract the variable regions from an input fastx and write those variable regions to the output fastx.

Expected Input Sequences

It was designed assuming that the sequences are all equal size and that they are prefixed and suffixed by a fairly static nucleotide region (consider CRISPRi/a libraries with a constant adapter sequence on either side of a highly variable region).

[prefix][variable][suffix]
[prefix][variable][suffix]
           ...
[prefix][variable][suffix]

Expected Output Sequences

The output sequences will extract just the positions of the input sequence that have a higher entropy than random chance.

[variable]
[variable]
   ...
[variable]

How it Works

This works by calculating the positional entropy across the nucleotides at each position, then applies a z-score threshold on those entropies to determine a contiguous variable region which is then used as the bounds to write the output sequences.

Parameters

Default will write to stdout, but you can provide an output file with the -o flag. You can decide how many sequences to calculate the entropy on with the -n flag. You can decide what z-score threshold to use for your data with the -z flag.

Note:

The z-score threshold default is arbitrarily set. If you have a smaller number of sequences try to reduce the threshold to 0.5, and see if that helps.

Usage

fxtools extract-variable \
  -i <input_fastx> \
  -o <output_fastx> \
  -n <number of sequences to use in fitting entropy [default: 5000]> \
  -z <zscore threshold to use [default: 1.]>