Moleculo Man: Mickey Kertesz on Illumina’s Sub-Assembly Acquisition

By Kevin Davies

January 18, 2013 | You could have gotten pretty long odds on a major genomics company snapping up a stealth start-up less than one year old, named after a Saturday Night Live character and without a scientific publication to its name. But last month, Moleculo co-founders Mickey Kertesz and Dmitry Pushkarev sold their San Francisco start-up to Illumina. The prize was a proprietary technology—part wet-lab, part computation—for greatly increasing the assembled virtual read-length of short-read next-gen sequencing data, addressing a short-coming in Illumina’s second-generation HiSeq and MiSeq instruments (a technology that traces back to the acquisition of British sequencing company Solexa in 2007).

Illumina has squeezed out impressive improvements in accuracy, throughput and read-length in recent years, pushing read lengths to 150 or 250 basepairs (bp). But this lags behind the median read-length of Pacific Biosciences, the long-fragment read technology of Complete Genomics, and the potential of emerging single-molecule nanopore firms such as Oxford Nanopore and Genia Technologies. One route around the problem—which is critical when attempting de novo genome assemblies—is to marry the throughput and accuracy of a short-read platform with the longer, but more error-prone, reads of PacBio.

Late last year, Illumina executives started to hear about promising results using Moleculo’s technology from some early customers. In January 2013, Illumina CEO Jay Flatley announced the acquisition of the company. On Moleculo’s first anniversary, Mickey Kertesz gave Bio-IT World editor Kevin Davies an exclusive interview on the company’s origins, technology, and potential advantages.

Bio-IT World: Mickey, let’s start by asking about your research background.

The original Moleculo Man
(Conan O’Brien on Saturday
Night Live circa 2001).
Source: Tumblr

Michael Kertesz: I grew up in Israel, did my master’s in computer science. I spent about seven years in industry in image processing at a couple of start-ups… As the years went by, I became more and more excited by life sciences. I decided that rather than doing this as a hobby, I would turn this into my career. After I did some consulting for a couple of companies in New York, I hooked up with my advisor, who was then at Rockefeller University, moved with him back to the Weizmann Institute, and did my PhD in computational biology there.

I did my postdoc with Steve Quake at Stanford. I had some ideas on probing the complex genomic landscapes that viruses occupy using high-throughput sequencing, and to do that had to develop some methods to develop accuracy and ‘virtual’ read lengths, so that the very small differences between different viral species could be probed accurately. At the same time, my co-founder [and chief technology officer], Dmitry Pushkarev, was a PhD student in the lab working on orthogonal problems – I was working on very small genomes, he was working on huge genomes, but he had the same problem with short reads not allowing the accurate assembly and understanding of the genome.

The genesis of the company started with Dmitry’s work on complex genomes, about two years ago. During the year that we overlapped on campus, I added some components of the technology on viruses, and then about 18 months ago, we looked at this and decided it made more sense to try to develop this in industry rather than academia. Steve Quake joined us as scientific co-founder, and we incorporated the company. I’m really happy we did that, because it seems to be doing magic to our customers and we wanted to bring this to the scientific community. The paper that describes this technology is still not published—it is still in review.

By stepping off campus, there’s a higher activation energy and it’s a little more risky getting funding, but by the end of last year, there’s already two dozen customers using this for some really fascinating projects. And now that we’ve teamed up with Illumina, it means we can go even quicker to market and have the impact that this technology should make.

I bet I know the answer, but for the record: Where did the name ‘Moleculo’ come from?If you search online, there’s an old Saturday Night Live sketch with Conan O’Brien called ‘Moleculo Man.’ A few days before we had to choose a name, Dmitry showed me the clip as a joke. The next day I emailed him to say that the Moleculo.com domain name was free. An hour later, I get back the registration form and he tells me it is now ours! So that’s how we got our name.

Does Moleculo’s technology have both a wet lab and a bioinformatics aspect?

Yes, it’s about 50:50. One doesn’t make sense without the other. There are two components: first, there is a molecular biology kit and protocol that takes in genomic DNA and turns it into a sequencer-compatible library. After modifying and tagging the DNA, this allows the second component, the algorithmic part, to take the short reads and reconstructs long reads using those tags. Those are two separate parts. We developed both on campus, and improved upon them after we started the company last year.

Pretty impressive given the company is barely one year old!

Yes, today is the company’s first birthday!

Did you have Illumina in mind when you were devising this or was the idea to build a platform-agnostic technology?

Yes, it was a platform-agnostic technology. We had a version for Ion Torrent, and early on started to develop a version for SOLiD, but figured out there’s no customers for that. It’s easy to make the technology platform agnostic because the same tagging strategy that works for Illumina could work for other platforms. But it turned out that all our customers were Illumina customers, because those are the only machines that can currently generate the throughput that makes sense for complex genome assembly.

Moleculo Lift Off: (L-to-R) Pushkarev, Blauwkamp and
Kertesz start experiments.

How does the process work? Do customers use a kit for library construction and then run an algorithm for assembly?

The way we currently provide access to the technology is in a service mode. We get a tube of genomic DNA, we do the library prep in house, and then send back to the customers the sequencing-ready library—because we never had an Illumina sequencer! The customers—including big genome centers—had the capacity to sequence on their machines.

Then there’s a small software tool we run that hooks up to the sequencer and pushes the sequencing data up to the [Amazon] cloud in real time as it is being generated. That allows us to start the analysis the moment the run is done, and not have to wait a few days for the shipping of hard drives. The analysis is done on our cloud infrastructure within a couple of hours, and the long reads are then returned to the customer.

Did you take any outside funding when you started the company?

We took a seed round from angel investors, which allowed us to buy some second-hand PCR machines and centrifuges on eBay! We also recruited the initial team. When we left campus, there were three of us: myself, Dmitry and Tim Blauwkamp, who was a postdoc at Stanford and led the molecular biology. We set up the lab and the R&D plan, and Tim carried out all the experiments for the first nine months, until we went wild and recruited another scientist for the lab. Those funds were only supposed to last until March or April 2012, but we lowered the burn rate by keeping the head count down.

We had a few consultants on the computational side. Our base caller—this algorithm that runs in the cloud—was written remotely by probably the best person in the world to do that, a PhD student named Jared Simpson at the Sanger Institute in Cambridge, UK. He’s written some of the most commonly used de novo short-read assembles—ABySS and SGA. He wrote the algorithm for us from scratch. We used some consultants on the business side as well.

The other thing that helped us survive for so long was the fact that the customers were all paying customers. That was a bit strange because after six months we were already profitable.

Did you buy a sequencer on eBay too?

No, we never had a sequencer. What worked well was that we started Moleculo at the QB3 incubator space at UCSF. In our building there were 25 micro-companies, which was really helpful. We could borrow reagents and share a MiSeq and other expensive equipment that we couldn’t afford ourselves.

How did a relatively unknown company with a quirky name attract paying customers so quickly?

This is thanks to our connection to Stanford and Steve Quake and some of his contacts, and also some of our advisory board contacts. We chose those customers because they are big groups and needed long reads. I was the marketing, sales, support, etc. It was my job to talk to them and introduce the technology. Initially it took a bit longer with the first customers because we didn’t even have any data to share, just some preliminary results from our Stanford lab. But it worked out, because the need for long reads is so strong that when anyone comes in with long reads, customers will at least try it…. It was also important that the early customers know what to do with the data. The worst thing would be to give researchers 10-kb reads but they wouldn’t know what to do with that information.

Have you applied for patents on the technology?

Sure, there are patents from the Stanford days. The core technology is patent [pending] by Stanford and we in-licensed it to the company. [Steve Quake is also a co-inventor.] We filed additional patents during this year.

What sort of read lengths does Moleculo’s technology offer?

First, why are long reads so important? In complex genomes, the main problem in assembly is repeat regions. A 200-basepair (bp) read is not unique enough to map to a specific area and cannot assemble into a unique fragment… the assembly is then shattered into very short pieces. The only way to overcome that is to have a very long read that spans the repeat regions, typically 2-5 kb (kilobases). Once you have a read that spans that repeat range, then you can reconstruct the genome.

As reported at PAG [Plant and Animal Genomes conference, January 2013], from a single HiSeq lane, you can get about 700 megabases in long reads. The read length, as presented [by customers] at PAG, is a distribution with a nice peak around 8-10 kb. We get a lot of reads that are very long…

I don’t want to over-promise. The read-length distribution is not a normal distribution, it is not a bell-shaped curve. There are a lot of reads that are shorter than 8-10 kb. The average varies a lot. As more data come out on different genomes and applications, it will be easier to give the average metrics and throughput.

Why would your technology be so useful for small viral genomes?

Our customers right now are using it for human and complex genomes. It requires some modification to use on viral genomes, and we’ve not done that yet. I’m very excited by this, and this is what brought me to Steve Quake’s lab. In ten years, our kids won’t believe we lost so many days—and lives also—fighting viruses. Having said that, our current technology is not best suited for that type of very heterogeneous short genome.

How did the discussions with Illumina come about?

It just happened because all our customers are Illumina customers. I guess some went back to Illumina and told them about us. We tried to stay under the radar, nothing on the web, nothing in public, just working hard in the lab developing the technology…

We were reaching a point where we had to develop manufacturing and distribution, sales and support capabilities as an independent company. We were close to raising a very nice Series A round—we had the term sheets and were ready to go—when Illumina came in a couple of days before the closing with an offer we thought was fair to everyone. Also for me and the rest of the team, it was a great way to ensure we can keep focusing on the R&D, because at the end of the day, we are scientists and that’s what excites us, and let others deal with the manufacturing, distribution etc. I’m sure we could have done this, but it would have been a redundant effort. Illumina already has working relations with all our customers, and tapping into those resources is resulting in an incredible acceleration.

Did you pocket what Jonathan Rothberg got for Ion Torrent?

{Laughter} I won’t comment on that!

Will you keep operating your shop in Mission Bay?

Jay Flatley [Illumina CEO] addressed this in his keynote at J.P. Morgan. Our team will join Illumina’s facility in Hayward, CA. [the former Solexa headquarters]. We will move there pretty soon. Teaming up with such great people, it’s been a great interaction. It is a big publicly traded company, over 2,500 employees, but so far, very manageable.

What are your priorities for 2013?

We will provide access to this technology as a service initially. In the second half of the year, we will make this into a kit. That is the exciting part. It will scale up nicely.

One other point I’d like to make is related to the broader story of Moleculo. I hope this inspires others to do the same. From my academic years—six years in the life sciences and before that in computer science—I see that there is a huge amount of talent, innovation and creativity on campus, but in too many cases they don’t leave the lab or find themselves in the hands of the broader scientific community and have the deserved impact. That’s what we did—it’s a bit scary, we may find ourselves running out of money and out of a job, but we decided to try to accelerate the development going the entrepreneurial route. I think things are even worse in biotech because it takes years of development and FDA approval.

I think the underlying sequencing technology is becoming so powerful, now is the time to come up with applications like ours that sit on top of an existing technology and do something very useful. I hope people will look at this story, one year of very hard work, ups and downs, but in the end, the technology finds itself in the best possible hands, marketed widely to make a great impact.

1/22: The title and introduction have been revised to more accurately convey the features of Moleculo’s technology.