Clouds, Drugs and Big Data at Bio-IT Europe 2012

October 23, 2012

By Kevin Davies  

October 23, 2012 | VIENNA—Scientists shared some important advances in fields from big data and cloud computing to bioinformatics and clinical genomics in the beautiful surroundings of Vienna at the fourth annual Bio-IT World Europe conference earlier this month.* 

 The conference was the first standalone event for Bio-IT World Europe, which for the three years previously was held in conjunction with the BioTechnica trade fair in Germany. In addition to industry announcements (see, News from Vienna), were many standout speakers presenting over three days and four sessions including Yike Guo (Imperial College London), Paul Flicek (European Bioinformatics Institute), Andrew Hopkins (University of Dundee), Corrado Priami (CoSBI, Italy), Hermann Hauser (Amadeus Capital Partners), Eric Perakslis (FDA) and Dirk Evers (New York Genome Center).  

Imperial College’s Yike Guo presented an important update on eTRIKS, a knowledge management platform based on the tranSMART foundation originally developed by Perakslis and colleagues at Johnson & Johnson. The project has received 24 million Euros in funding from the Innovative Medicines Initiative over the next five years. 

eTRIKS (European Translational Information & Knowledge Management Services) will develop resources for knowledge management, cloud hosting, standards, analytics, and training, in collaboration with some 18 big pharma and academic partners (including CNRS, the University of Luxembourg, IDBS, CDISC, as well as AstraZeneca, Roche, GSK, Sanofi-Aventics, Bayer, Pfizer and Lilly.)  

 Hauser 
 Hermann Hauser, Amadeus Capital Partners 

Guo says the idea is to put these platforms into the cloud to enable projects to store and share study data on a common platform. He also says that eTRIKS has the potential to become the “Google for translational research,” building an Information Commons that pharma can use for biomarker research, clinical scientists for personalized medicine. Stay tuned. 

Paul Flicek (European Bioinformatics Institute) described resources such as the popular genome visualization tool Ensembl that are designed on the premise that “researchers want to get down to biology—more than aligned reads or a genome browser.” Current Ensembl traffic stats track 45% hits from Europe, 30% Americas, and 25% Asia (Japan). Three servers at Amazon Web Services currently take half of EBI’s traffic, while costing less than the single colo server in California EBI set up five years ago.  

Taking advantage of the Helix Nebula, a new “science cloud” co-developed by EMBL, CERN and the European Space Agency, Ensembl aims to provide worldwide content delivery, genome annotation, and variant analysis, Flicek said. To help users who are seeking programmatic access to Ensembl data, Flicek and colleagues have elected to use a REST web services API (rather than PERL). 

Vienna-born venture capitalist Hermann Hauser (Amadeus Capital Partners, UK) delivered an illuminating talk on his experience building a series of successful companies. Citing the computing industry, Hauser noted that most companies do not evolve from one technology wave to the next. One of Hauser’s biggest success stories is ARM (which he helped spin out from Acorn in the 1990s), which shipped 8 billion chips in 2012, more than Intel has in its entire history. Hauser also reflected on his early backing of British next-gen sequencing company Solexa, which was ultimately acquired by Illumina. Hauser said he has personally invested in former Solexa CEO John West’s new genome interpretation company, Personalis. That company meets his general criteria for investment, which include a clean interface, large markets, star executive teams, and defensible technology. 

Designing Drugs  

Andrew Hopkins (University of Dundee, UK) said his colleagues both in academia as well as at a new company, Ex Scientia Ltd, are making progress in automating drug design with the purpose of reducing the cost of lead discovery. Hopkins quoted Sir James Black, who said: “The most fruitful basis for the discovery of a new drug is to start with an old drug.” Indeed, this was the strategy Paul Janssen used, whom Hopkins called “the single most successful drug discoverer of all time,” to bring 80 drugs to market using a chemocentric rather than a gene-based strategy.  

Hopkins’ automation strategy begins with building an appreciation of the favorite rules and strategies medicinal chemists use in refining the structure of drug compounds. This yielded a few hundred “plays” to generate ideas, which were then married to 1 million or more bioactive compounds found in public chemical databases. Using a variety of Bayesian models and other approaches, Hopkins and colleagues can computationally evolve drugs to find optimal binders against a particular receptor, while selecting for other desirable features including oral availability, molecular weight, and patentability. The first results of this approach were published recently in Nature Chemistry.  

In a plenary roundtable session, Eric Perakslis (CIO, Food and Drug Administration) sat down with his European counterparts Alison Davis (the UK’s Medicines and Healthcare products Regulatory Agency) and Luc Verhelst (European Medicines Agency) for a wide-ranging discussion of IT issues shaping the drug agencies. 

As Perakslis noted in his keynote six months earlier at the Bio-IT World Expo, his range of responsibilities extends far beyond drugs to areas such as monitoring the global food supply chain. On the drug front, FDA is striving to slash the time for a generic drug approval from 36 months to just 6 months.   

On drug development, Perakslis said that electronic CTD submissions were trending in the right direction, but “we’ve still got a lot paper.” According to Perakslis, an average new drug approval submission consists of some 18 boxes of paper—and there are roughly 20,000 submissions/month. Perakslis is clearly not satisfied with the status quo. His definition of data mining? “You walk closer to it,” he said. Three weeks of data manipulation used to be “36 pages taped together” and stuck on a wall. 

Perakslis admitted he was “not a big standards guy” and didn’t believe in waiting for them. A rich data source of toxicology data on more than 2,000 drugs was being released to the public. “Every now and again, you’ve got to throw a pass,” he said. He was eager to transition as much as possible to the cloud, but obviously excluding anything that represented the sole single copy of a piece of data. “If I could get rid of a datacenter, I’d do it in a heartbeat,” he said. One idea gaining traction is the Virtual Scientific Innovation Hotel for rapid evaluation of scientific computing community proposals. 

Davis and Verhelst offered somewhat more modest agendas for their respective agencies. Both stressed the importance of people as much as data. “Our size and networking can be a strength but the chain is only as good as the weakest link,” said Davis. She said the iPad had revolutionized data delivery, and a system called Sentinel would provide some cloud capabilities. Commercial collaborations with major IT firms such as Oracle were ongoing.  

Patrick Baker (Sheffield University) discussed the 3D visualization of protein structures with the aid of a stunning projection technology from a UK company, Virtalis. “It allows collaborators to look at the same molecule and understand what we’re talking about,” he said. By overlaying the 3D structures of an unknown bacterial virulence factor against a known cytotoxic agent, Baker says researchers could glean valuable insights into the mystery protein’s function. 

Managing Data 

Several talks focused on the challenges of managing clinical genomic data. From Iceland, deCODE Genetics’ Hakon Gubdjartsson previewed the company’s Clinical Sequence Miner, a powerful new database to leverage the clinical insights that the company has compiled on most of the nation’s 320,000 residents. “We’re quite familiar with massive data,” he said modestly, noting the company currently manages some 2 Petabytes in storage. 

In designing a clinical database, Gubdjartsson said that a relational database structure was impractical because of the myriad data formats. deCODE has implemented something called GOR (Genomics Ordered Relations) architecture, which not only stores data according to access patterns but also enables “smarter declarative querying”. The architecture queries directly on the data, scales very well, but still provides ad hoc capabilities. 

Having developed such systems for research purposes, deCODE would be marketing this system as a solution for hospitals, he said. “Eventually we see the market being any physician doing genetic testing,” he said. 

Several other speakers including Gholson Lyon (Cold Spring Harbor) and Robert Lyle (Norwegian Sequencing Centre) presented results of exome sequencing in real-life medical cases. Lyle said his colleagues were confronting multiple ethical issues, including clinical consent, de-identification of material, and the return of incidental findings. “Norwegian law has very strict requirements for human data,” Lyle said. “The cloud? Forget it.” 

The first successful exome sequencing diagnosis at NSC concerned a pregnant mother who had previously lost a daughter to a genetically heterogeneous disorder called CDG1 (Congenital Disorder of Glycosylation). “Wow, this really works,” Lyle recalled thinking, “we really helped people.” 

Another recent success story involved an extremely rare disorder called Stormorken syndrome, which was first described in 1985. The NSC team identified a mutated gene in individuals from Norway and France. Lyle said it was difficult to assess the overall success rate because of variability in selecting cases, but he put the current rate at about 30 percent. 

Michal Blazejczyk (Montreal Heart Inst) described his team’s experience analyzing gene panels in a cardiovascular genetics clinic, while moving from Sanger sequencing to next-gen sequencing. Challenges included more data (and less clarity), the cost of optimization, non-trivial bioinformatics support, and policy changes. As buying a commercial LIMS (Lab Information Management System) was not an option, a 5-person team has built a clinical LIMS codenamed KIWI. The system uses a Java EE-backend and a MySQL database. The team also elected to curate the genetics literature themselves rather than subscribe to the Human Gene Mutation Database.  

Dirk Evers discussed the early progress in building the scientific and data management infrastructure at the New York Genome Center. (A full interview with Evers conducted at the conference can be found at the NYGC blog.) 

Nour Shublaq (University College London) concluded the conference with an overview of some ambitious European programs and initiatives in the area of personal health records and translational research. These include the Virtual Physiological Human (VPH), a simulator for personalized drug rankings (the BAC Simulator), and programs under the IT Future of Medicine – a short-listed project under the EU Framework.  

The 2012 conference proved a stimulating and convivial meeting place in a charming venue. In 2013, the Bio-IT Europe conference will be held in Lisbon, Portugal (October 30-November 1, 2013). 

Editor’s note: A complete chronological compilation of the tweets from Bio-IT World Europe 2012 can be found here: http://chirpstory.com/li/27632 

*Bio-IT World Europe Conference & Expo 2012, Vienna, Austria: October 9-11, 2012.