Purpose - i.e. why gia?
gia
was developed to split the difference between bedtools
and bedops
and be a tool that can
match both philosophies without sacrificing either efficiency or convenience.
That being said, the author of gia
was greatly inspired by both tools and has
used them extensively for years.
gia
would not exist if not for the work of their authors and maintainers as well
as their meticulous documentation.
Philosophies
bedtools
- utility over efficiency
bedtools
is the original
genome interval toolkit and is the go-to for many people.
It prioritizes convenience and utility over computational efficiency, and does that very well.
One of the major design choices for most of the tools in the toolkit is that the genome interval sets are loaded into memory completely before processing occurs.
This incurs a huge memory and computational overhead once genome interval sets get larger and larger - which is increasingly the case for large high throughput genomic datasets.
bedops
- efficiency over utility
bedops
came later from bedtools
and was built for computational efficiency.
Most of the methods within focus around pre-sorted data, and the computational and memory efficiency comes from the fact that everything is built around streams (i.e. intervals are assumed sorted and only kept in memory for the abosolute minimum amount of time required for the operation.)
This leads to highly efficient streaming operations with a constant memory overhead, but provides some inconveniences, as all inputs must be presorted, and some functional limitations, as most of the set operations implicitly merge intervals on input.
gia
- both in a single tool
gia
was built with the idea that both philosophies are useful for different
purposes and that the same operations and underlying implementations can be
shared.
By default, all tools are built with an inplace memory load, which allows
for the complete set of functionality available in bedtools
with no expectation
that the dataset is a priori sorted or merged, but where relevant an argument
may be passed to allow for streaming operations, which perform highly performant
memory constant operations on pre-sorted inputs such as in bedops
.