Background

Set Theory on the Genome

Genomic Intervals

The core piece of genome interval arithmetic is the interval object:


#![allow(unused)]
fn main() {
GenomicInterval {
    Chromosome,
    Start,
    End,
}
}

This is an abstract object with at minimum 3 attributes defining its chromosome, start, and end positions on the genome.

Genomic Interval Sets

A collection of genomic intervals can be considered a set which in brief is a collection of objects that match particular properties.

There is a branch of mathematics known as set theory which describe a range of operations, such as the union, intersection, difference, complement, etc of those sets.

Some examples of these are shown below:

Intersection

This generates all the intervals that are at the intersection of two interval sets.

(a)   x---------y    x-----------y
(b)     i--j  i--------j    i--------j
========================================
        i--j  i-y    x-j    i----y

Difference

This generates all the intervals in a which are not covered by b. This calculates the difference / relative complement of a set.

(a)  x----------y   x------------y
(b)     i--j  i--------j    i--------j
========================================
     x--i  j--i        j----i    

Complement

This generates all the intervals in a which are not covered by its span (s). This is the absolute complement of the set.

(s)  s----------------------------s
========================================
(a)  x-----y   x------y    x------y
========================================
           y---x      y----x    

Genomic Interval Arithmetic

Genomic interval arithmetic revolves around performing set theoretical operations in the context of genomic regions, and is useful for a wide range of purposes in bioinformatics analyses.