The Prosecution RESTs
By Aaron Kitzmiller
October 8, 2012 | Inside the Box Guest Column | Life Technologies has recently delivered version 3.0 of the Torrent Suite software. If you’re not familiar with it, the Torrent Suite bestrides the Ion Torrent PGM and the outside world by managing data and processing on the Torrent Server, a hardware partner that comes with the PGM. The software release includes updates to the RESTful API, a series of URL commands and JSON data objects that allows interrogation of the analysis results along with setup of experimental runs. A RESTFul API is not novel; particularly since the rise of AJAX applications, it has become fairly common. What is novel is that, finally, in an industry chronically plagued with data integrity problems, a simple, modern API is available.
Ever since the banks of capillary sequencers started cranking out the last gigabases of the human genome, genomics labs have been in sore need of APIs that could setup experiment runs, collect the results and even drive execution. A typical modern genomics lab may be running highly multiplexed sequencing experiments on instruments generating a gigabase per run; confirming with qPCR; integrating BioAnalyzer and Nanodrop results; and tying results together with downstream analysis. If you layer on chronic staffing pressure, it becomes clear that a powerful, mission control software hub could be generated if only instruments would expose experiment setup and data collection APIs.
An argument can be made that genomics instruments have been using file-based APIs for many years. The venerable capillary sequencer used sample sheets to drive runs and provided tracefiles laden with metadata. A decade ago, files were really the only language-independent API available, despite the pitfalls of representing types in text. While, in the right hands, files are a perfectly reasonable data communication mechanism, genomics instrument companies have not consistently treated them as the API that they are. Directories have been shuffled and reshuffled; objects and elements have been renamed and refactored; formats have been tweaked and tossed.
In the intervening years, first SOAP, then the simpler REST, have provided global standards for language- and location-independent APIs. A desktop or server-based system written in Ruby, C#, Perl, or Python could easily setup instrument runs and collect results via POST/GET/PUT operations and JSON objects. Sample name, customer, project, organism, plate and all the other crucial metadata needed to tie together the results of a modern genomics lab could be passed seamlessly from instrument input to output. Reruns of failed experiments could be initiated from home. Results could be QC’d between talks at a meeting.
While I was first at Helicos, then at Life Technologies (prior to the Ion Torrent acquisition), I had some informal discussions with the instrument control software teams about exposing APIs. In general, they were reluctant to do so for a variety of reasons; concerns about opening the instrument to HTTP requests; added software layers in the tight confines of the on-board processing environment; introducing new field service issues on already complex machines. Though these concerns are important, providing sample sheet-level input, remote access to progress information, and structured output objects is a straightforward extension of the functionality already built into the software. Arguments of insufficient processing power on instrument are difficult to reconcile with modern cell phones and automobile dashboards.
The Ion Torrent PGM is not directly controlled by the RESTful API. The equivalent of sample sheets, Planned Experiments objects, can be posted to the Torrent Server. A scientist must then access the instrument console, select the Planned Experiment, and run through a few more screens to initiate the run. Links to sequence files and base- and read-level statistics can be retrieved. While not fully automated, this system is a dramatic improvement over past integration schemes, supporting the type of round-trip data integrity that eliminates a lot of copy-and-paste-from-spreadsheet errors.
PacBio has provided similar access to the RS sequencer, pushing sample sheets to the instrument and supporting output integration via RESTful APIs. If this simple, modern idea can catch hold and propagate, the lab of the future will be within reach.
Aaron Kitzmiller is a consultant at The BioTeam. He can be reached at aaron@bioteam.net.