iGenomics: Free Genome Analysis For The iPhone
By Allison Proffitt
December 15, 2020 | Sequencing in the palm of your hand is now a reality. Oxford Nanopore’s MinION is a handheld sequencer that has been used “in the field” to sequence genomic materials in Africa in the middle of the Ebola outbreak, in South America during the Zika outbreak, and even on the International Space Station. But even as the sequencers are spitting out reads, that’s not solving the analysis problem.
iGenomics can do that for you.
In a paper published in GigaScience (DOI: 10.1093/gigascience/giaa138), a team of researchers presents what they call the first comprehensive mobile genome analysis application, with capabilities to align reads, call variants, and visualize the results entirely on an iOS device. What started as a high school student’s internship project is now freely available in the App Store.
The concept is straightforward, Michael Schatz, senior author on the paper told Bio-IT World. “DNA sequencers are getting really small. In fact, Oxford Nanopore manufactures a sequencer that fits in the palm of your hand.” But sequencers are “kind of dumb,” Schatz observed, shooting out raw data without any context. “It doesn’t tell you what species it is, what the variants are, how it relates to what’s known at a genetic level. So we wanted to develop an application that was super simple to use that could answer those questions.”
The project began about ten years ago when first author Aspyn Palatnick began interning with Schatz. Palatnick was a 14 year-old Cold Springs Harbor High School student, a self-taught coder who had already been creating (and selling) iOS apps for a few years; Schatz then co-led the Cancer Genetics & Genomics Program in the Cold Springs Harbor Laboratory Cancer Center.
“95% of the stuff going on in his lab was beyond me,” Palatnick remembers of the Schatz lab. “But through Mike’s mentorship, I learned a lot of algorithms and hardcore computer science concepts… Mike’s mentorship—in conjunction with my previous experience building… these apps with fun interfaces—sort of led to this realization that it was actually really possible for us to work together to build something impactful in the genomics space.”
The result—now available—will be very impactful, they believe.
iGenomics for both iPhone and iPad takes input reads in either fasta or fastq format. Reads can be loaded through Dropbox, Google Drive, Airdrop, or directly onto the phone. From a single interface, users choose the reads file, the reference file, and optionally, a tab-delimited file annotating known mutations. After choosing the files to align, users either select the “analyze” button to align reads to the reference genome using the default parameters, or configure certain parameters before aligning.
After alignment is complete, users can explore the aligned reads by scrolling along the reference genome, using standard iOS gestures to move through the genome. Variations from the reference genome are highlighted in a different color; a long touch pops up the read name, edit distance of the alignment, gapped read and gapped substring of the reference, and more details. Additional views include coverage profile and mutation lists.
The team validated iGenomics’ functionality and accuracy against BWA-MEM and SAMtools using Ebola and Zika data. In all cases, iGenomics had a faster runtime than the desktop pipeline, likely due to a difference in how iGenomics and the desktop software store the alignments in memory, the authors write in the paper.
“We compare head to head with scientific software. The variants you call are just as accurate; the way you identify species is just as accurate; and it’s just as fast to use as it would be to use these scientific software packages on your server,” Schatz says.
Of course there are limits to the size of genomes one can easily align and analyze on an iPhone. The team focused on viral and microbial genomes, validating the app with real and simulated datasets including public Ebola, Zika, and influenza genomes from Illumina and Oxford Nanopore. They observed a near linear relationship between genome length and alignment runtime with viral genomes up to 18,000 bp being analyzed in less than 5 seconds.
“In theory you could work with much, much larger genomes: microbes, different pathogens, maybe even a whole human genome. But at some point, it becomes a little bit silly to tie up your phone for hours and hours,” Schatz said.
Code Evolution
iGenomics was built over the years that Palatnick was in high school and then college at the University of Pennsylvania. “The bulk of the hard algorithmic work was done in high school and a large amount of the visualization work was done then and a bit during my freshman year in college,” Palatnick says. “In the few years after that, it was sort of just optimizing things.”
At Penn, Palatnick majored in Networks and Social Systems Engineering as an undergraduate and added a Master’s in Computer Science in his fifth year. Those years were optimization years for iGenomics. “As I became more experienced in computer science and started getting more formal training—even through Mike—a lot of the code became cleaner,” he said. “I wouldn’t write anything about the code being beautiful when I was 14-16,” Palatnick laughs. “It was not my prime code beauty time.”
Palatnick wasn’t the only one learning and improving. The past ten years has seen great power gains for iPhones and iPads. “As those got more powerful, our results just kept getting better and better,” he says. “Our results are probably 5-10 times better without even changing the core algorithms because iPhones have improved so much in their hardware and speed.”
And the computation is still improving. “Today’s really state-of-the-art methods that run on supercomputers can do [whole human genome] analysis in a few hours,” Schatz says. “A few years ago that took days and days and days; there’s been steady progress on both the hardware and the software.”
“The good news is the human genome is not getting any bigger. But our smart phones keep getting faster; new algorithms keep getting invented. There’s a lot of applications in deep learning using GPUs and other accelerated hardware… It is very much in our future to be able to do that analysis in a matter of a few seconds.”
Democratization of Genomics
iGenomics as published and available today is hardly a high school science project. “We think you can do real science with this even though it’s kind of fun that it’s an iPhone app, an iPad app that you can take with you,” Schatz says. Though he does concede that the functionalities may be most attractive to those who aren’t already doing genomics work. “I think the most immediate application—realistically—will be for students that maybe have no training in bioinformatics. Because users avoid the command line and can interact with results in a familiar iOS environment, it’s easy to learn and quick to get started.”
To kick off new users, the application comes pre-loaded with a list of critical mutations for influenza A that indicate which antiviral agents are most likely to be ineffective and a tutorial for looking at flu virus genomes and SARS-CoV-2 genomes.
“One of the notions with iGenomics is to make this idea of analyzing and studying DNA not some foreign concept that is only available to researchers at major institutions,” Palatnick says. “It’s meant to make it so that anybody that has an iPhone or iPad can just go to the App Store and look at some of the samples we provide and just study DNA on their own.”
The pair made iGenomics fully open source; full details are listed on GitHub. Palatnick, who just started a job with Facebook, plans to maintain the code indefinitely. Schatz has moved to Johns Hopkins University where he is the Bloomberg Distinguished Associate Professor in the Computational Biology & Medicine Group. He remains a CSHL adjunct associate professor of Quantitative Biology.
“Our goal with making it open source is really that we wanted to make it easy for this to become more and more advanced and to make it clear that mobile genomics and mobile DNA analysis is totally feasible,” Palatnick says. “Right now devices are fast enough. If you are looking at the flu on your phone and it takes one second to perform the analysis and let you browse around mutations and alignments, you can’t really complain about that.”
“Now that it’s very clear that DNA analysis can be done on an iPhone,” Palatnick says. “Hopefully it’ll spark some motivation to say that once the sequencer is there, this will all be portable.”