| R Allan Barker science | technology | history | philosophy + curiosity |
|
Last update April 28, 2025 |
Incredibly Fast Genome Assembly and Variant CallingA human genome is very long and every human has two genomes, one paternal and one maternal. Each genome has roughly 3 billion DNA bases for a total of 6 billion. If each base was roughly the size of an ant, your genomes would circle the Earth. However, many DNA scanning machines only produce small pieces about 100-300 bases long. Analyzing just one human's DNA requires billions of pieces and nearly a trillion bases. These pieces make a puzzle that can take days to assemble and subsequently search for the millions of variants that make you you. These millions of variations make assembly more difficult, like assembling a jigsaw puzzle with many wrong pieces. Moreover, paternal and maternal genomes may have different wrong pieces. Illumina, the world's leading DNA scanning equipment maker recently announced a series of new machines that will lead to $100 genomes and process up to one genome per hour. However, Illumina's CEO acknowledged there is currently no way to process genomes in such a short time. As the old saw goes, the difficult we do immediately, the impossible takes a bit longer. The video below shows the full assembly and variant calling of a full human genome in just minutes. Video contents in order:
The assembly window in part two shows:
|
Video: Fullscreen Recommended |
DiscussionThe software runs on ordinary PC/GPU computer hardware and can be easily implemented anywhere at low cost. The underlying algorithm can be described as an indexed neural network. The same method is applicable to a number of computational problems where the data can be described in terms of features. The novel aspect of applying this method to genome assembly is the ability to fuse the alignment and variant calling steps into a single application thus eliminating a multitude of intermediate data processing work. |