The Broad’s Approach to Genome Sequencing (part I)
September 16, 2010 | The Broad Institute is the largest genome center in the United States, building on the pivotal role of the Whitehead Institute Center for Genome Research during the Human Genome Project. Chad Nusbaum and Rob Nicol are two key figures behind the success and smooth running of the Broad’s genome sequencing facility. Nusbaum, the co-director of the genome sequencing and analysis program, joined the Whitehead, under the direction of Eric Lander, in 1996, while Nicol, director of sequencing operations and technology development, was recruited in 2001.
In Part I of a two-part exclusive interview, Nicol and Nusbaum share some of the keys to the Broad’s sequencing success with Bio-IT World chief editor Kevin Davies. In Part II (tomorrow), Davies talks to Toby Bloom, the Broad’s director of informatics for the production sequencing group.
Bio-IT World: What’s the current state of the Broad Institute’s sequencing fleet?
NICOL: We’re transitioning our fleet. We used to have 94 Illumina GA IIs, we’re transitioning to 51 HiSeqs . . . There’s greater productivity on the new instrument when run at full capacity. It’s far more efficient to phase out the older generation. That’s a theme here – everything we do is, ‘how can we streamline the process?’ It was carefully modeled, not just installing the latest and greatest. If the GA IIs had made sense for our purposes, we would have kept them.
NUSBAUM: The one thing that probably wasn’t going to make sense is a split shop. Historically we’ve always done it this way. [Fifteen years ago] We had slab gels and tested out the [Applied Biosystems] 3700. As soon as we decided we liked the 3700, the slab gels were gone in a month. When we passed to the 3730, the 3700s were gone. We really moved the 3730s out pretty fast as we moved to Illumina. We hung onto a few of them for certain legacy applications.
NICOL: We’ve tailored our Illumina infrastructure to match the various project deliverables we are responsible for, mainly high-throughput whole genome and exome sequencing, but also maintain 454 processes for viral sequencing and certain niche applications. That said, our organization’s processes have been designed to be as adaptable as possible. This means we try to keep our eye on emerging technologies through testing and collaborations but also design existing processes for flexibility where possible. We take very seriously the ability to be nimble, and I think it’s true of all the NHGRI sequencing network centers as well. We all have to continuously adapt.
Rob, what is your team’s role in managing operations?
NICOL: My group covers all the implementation of any new technology into production, development of process technology as needed, and the actual running of not just the sequencers themselves, but also the upstream processes. There’s so much emphasis on the boxes that the upstream [sample preparation] work gets kind of lost. Increasingly that’s where the action is.
NUSBAUM: Oftentimes, we’ll design new processes based on new projects. Someone will say, we’re going to write a grant to get $10 million to do this, but it requires this process which we don’t have. So we sit down and say, what’s it going to take us to build it? That usually goes right into the timelines of the proposal. A good example is cDNA sequencing/RNA-Seq by Illumina. We’ve had a modest capacity, we make 10-20 libraries every two weeks for the last year, and we’re continuously polling people to say when are you going to want to do a lot? Early this year, there were 2-3 groups saying they needed to do many hundreds of them. Samples were coming at the end of the year, so we’ve been porting this over to an automated process – which means completely redesigning the process.
NICOL: Every technology early on has some issues. It’s going to be a learning curve. They’re “opportunities,” not headaches. We’ll systematically go through instruments and do what is called a “failure mode and effects analysis.” Whenever we observe a “failure mode,” a headache, we’ll catalogue it, describe what we think caused it, try to reproduce it and understand the variables that caused it. We’ll either report it back to the vendor or build it into our knowledgebase so it gets eliminated. That’s key — that process is ongoing, not a one-time thing. People will continuously be looking at these things. Over time, things improve – instead of 90% up time, it will be 99%, then 99.5%. It does get harder and harder [to keep improving], but by then, some new generation of instrument has probably come in.
NUSBAUM: Rob’s technology development group and production group is one and the same. The technology development is done wholly by the people who do the production work. They’re constantly charged with making their own jobs easier and more effective. They cycle through what Rob calls a sabbatical system, where you do production line work for a period, then you have a short time where you do sabbatical work, which can be technology improvement for your job or something else in the process. It’s a philosophy that runs through the whole organization of process improvement.
Rob, your background was not in life sciences. Has that been to your advantage?
NICOL: I had spent a number of years building refineries and power plants, petrochemical and industrial processes. I’d just come out of a program at MIT focused on operations. The program was about leadership in manufacturing. It was designed as a reaction to when the Japanese seemed to be taking over everything, to develop US talent to seed Boeing and Intel etc. with a lot of these process and operations and manufacturing tools.
NUSBAUM: In 2000, we were engaged in a job search to find someone to run our sequencing factory. The factory was run at that time by a bunch of biologists like me, amateurs really. We were looking for someone who knew how to run a production operation. We knew everyone in the sequencing field and we realized we knew as much about sequencing as anyone else. We weren’t going to improve the organization by hiring someone who knew stuff we already knew. Just then, I meet this guy [Nicol], he doesn’t know a damn thing about sequencing, but he knows about stuff. As soon as he tells me about this, I’m thinking, “this is where we need to go.” There was some internal discussion -- “This guy doesn’t know anything about sequencing.” But I was thinking, “Sequencing just ain’t that hard. I can do it! What’s hard is doing process really well.” There was no-one here that really knew anything about that.
The knowledge that Rob brought in here changed how we did genome sequencing because he professionalized the factories. Beyond learning from others in our own industry, we started learning from manufacturing experts like Toyota and Boeing, people who know how to run a factory.
NICOL: What’s most gratifying to me is that our folks have taken these process ideas and come up with improvements, their own biotech version of it . . . So much of what we do has followed the evolution of other industries in manufacturing. Initially, you could almost compare large-scale sequencing to a Henry Ford assembly line. The reason it’s not done that way anymore is you give up a lot of flexibility. It wasn’t easy to make changes or be nimble with those manufacturing models. Sequencing has changed rapidly since I’ve been here and we’ve had to adapt.
NUSBAUM: It was an assembly line [early on] but it was also very wild west. We were innovating from the gut. I think we innovate even more now, but in a much more mature, disciplined way. Rob has grown up several generations of leaders, and there is a very strong emphasis on people development at every level of the organization, which is key.
NICOL: Sequencing these days is a unique industry. There’s nothing like it anywhere. If you plot the cycle time for innovation, it’s far faster than for semiconductors or almost anything. The scale at which it’s changing is mind boggling. Charles Fine, in his book Clock Speed, talks about various industries and their rate of innovation. A Boeing 747 cycle time is almost measured in decades for a major technology shift. Then you get faster in various industries, until you get to desktop computers, maybe 18 months. Then the components that feed into these are a little bit shorter. That’s about the limit, unless you get into media and content. Our cycle time is even faster. No industrial process is as fast. We should come up with a term for it!
We’ve had to come up with a lot of different and unique methods to deal with this rate of change. We talked about the sabbatical system, which was invented largely as a response to this. Traditional organizational structures, of which we had some, would require a separate R&D organization to develop protocols, test new instruments, make them robust then train and hand off to production folks who will likely encounter problems and have to loop back to R&D. You’ve already lost six months just coordinating all of that.
NUSBAUM: We also have more sophisticated “production” workers, because they’re actually also the technology development people, they understand how their process works and want to make it better. Stuff comes from the ground up as much as it comes from formal R&D.
How closely do you interact with your counterpart on the informatics side, Toby Bloom?
NICOL: Both teams interact on a daily basis. She and I have to coordinate very closely to make this work. She faces the exact same pressures – the rate of change. She needs to synchronize with the downstream algorithms, feeding the massive amounts of data we’re producing through software pipelines that are changing just as rapidly and have tremendous complexity. Plus she also has the additional dimension of coordinating the necessary IT. One concern people have is we’re improving at a faster rate than storage.
NUSBAUM: The cost of storage is coming down very slowly [compared to sequencing costs]. It’s not very hard to foresee a time when storage is half the [total] cost [of sequencing].
NICOL: Or we store it as DNA – and resequence it!
NUSBAUM: It’s been a couple of years since we saved the primary [raw image] data. It is cheaper to redo the sequence and pull it out of the freezer. There are 5,000 tubes in a freezer. Storing a tube isn’t very expensive. Storing 1 Terabyte of data that comes out of that tube costs half as much as the freezer! People [like Ewan Birney at EBI] are working on very elaborate algorithms for storing data, because you can’t compress bases any more than nature already has. The new paradigm is, the bases are here, only indicate the places where the bases are different . . . In 2-3 years, you’ll wonder about even storing the bases. And forget about quality scores.
The cost of DNA sequencing might not matter in a few years. People are saying they’ll be able to sequence the human genome for $100 or less. That’s lovely, but it still could cost you $2,500 to store the data, so the cost of storage ultimately becomes the limiting factor, not the cost of sequencing. We can quibble about the dollars and cents, but you can’t argue about the trends at all.
Historically the Broad has favored Illumina, but if you were starting out today, is there anything to choose between the leading next-gen platforms?
NUSBAUM: There’s always a cost point that will tell it’s time to change horses. We’re always looking for that cost point. We hope not to find it too often because it does create upheaval. But if a technology showed us now that, in six months, it has a good chance of being way ahead of Illumina, we’ll be watching it. Of course, the cost of switching the actual hardware out and amortization has to be factored in, but if it still beats it, then the dollars say you got to go. I’m not going to handicap the likelihood of that, but we’re always looking for that.
That’s looking ahead, but what if you were starting fresh today?
NUSBAUM: If I was starting out right now, I would test them head-to-head and run the numbers. The answer is complicated – a proper model should take into account the cost of the machines, whether you buy or lease them, what your reagent contract looks like, maintenance, how much labor is involved and how much support you have to build, and what your scientific goals are . . . If changing from one to the other requires a new sophisticated informatics process, that costs money too.
NICOL: You have to build your cost model to take into account the technology, but also your local conditions. If you have a particular interest or types of projects, what are your applications, you have to build all of them into the cost model and your technology selection. The sample prep infrastructure is also non-trivial.
Is the HiSeq 2000 meeting its targeted output of 200 Gigabases/run?
NUSBAUM: Yes. That was their launch spec and they hit it at launch. It’s been slightly exceeded here.
Is there a steady two-way exchange of information between the Broad and Illumina?
NUSBAUM: That’s been the nature of our relationship since before they were on the market. In the very early days, before Solexa had anything, we were just talking to them and said, ‘Show us the data.’ They said, ‘No, it’s not ready.’ We said, ‘Show it to us anyway!’ There was a fellow over there named Clive Brown, a brilliant guy. Clive said, ‘OK, I’m going to show them the data and see what they say.’ He got into an instant dialogue with David Jaffe (Broad Institute, Computational R&D), and said, ‘Wow, these guys see stuff that we don’t see!’ That was the starting point of our relationship. It was open, as was practical for them to be, sharing a lot of information back and forth, and really accelerating things here and there . . .
When Illumina acquired Solexa, we hoped that the Solexa corporate culture was maintained in the sequencing – and it largely has been. We now have a very close collaborative relationship with Illumina.
NICOL: It’s also the nature of many of our vendor collaborations, not just with Illumina. We try to be as good a development partner as possible. They want to put their alpha instruments here, because they get a lot of development and the ultimate instrument that goes out into the community is that much better for it. We obviously have to be very selective with these because they are significant commitments, but they have almost always delivered lots of value to the vendor and us.
What are the major sequencing projects going on at the moment?
NUSBAUM: A lot of 1000 Genomes, a lot of tumor sequencing for various projects including The Cancer Genome Atlas. There are tons of cancer projects scattered all over. The basic type of project is either exome sequencing or whole genome shotgun sequencing, sometimes a combination. It’s basically find the mutations in a bunch of tumors and look for patterns and pathways you hit. It’s been pretty fertile ground. As costs go down and throughput goes up, there’s a lot more [discovery] going on. There’s also stuff in other human diseases. We do follow ups for GWAS [genome-wide association studies]. You get your gene down to 1 Megabase, but then what do you do? We also have major projects for example in the microbiome and in viruses such as HIV, Dengue and West Nile. These are relatively small in terms of total amount of sequence but represent large numbers of individual samples and so pose their own challenges.
Is there enough demand to keep the sequencer fleet fully occupied?
NUSBAUM: Oh, heavens yes! It’s very rare that a sequencer goes hungry!
How do you set the sequencing project priorities?
NUSBAUM: That’s a key function of our sequencing leadership group. Six of us sit down every Monday morning. There haven’t been too many [project] collisions lately, but there often are things where we have to say, what’s this deadline? Sometimes an emergency comes along, and we have to make tough decisions.
How painful is it to get a new platform or model up and running, and what is the reliability like once you’ve done that?
NICOL: We purposely want to get very early-stage machines. I wouldn’t say our experience is what an eventual downstream user will see. Most of the machines we get are the first ones built, ever. That’s by design. Our organization and collaborations are designed that way.
NUSBAUM: The HiSeqs didn’t work perfectly on Day 1. Every machine that comes in the building doesn’t work on Day 1. You plug it in, test it out, mess around with it, call the service engineer, etc. Rob sets a very high standard for what you have to be to be a machine in production. Once these HiSeqs are in production, there will be greater than 90% up time.
NICOL: The difference between 95% and 97% [up time] is someone loading the machine noticing and saying, ‘There’s this little thing that sometimes happens at loading’. We need to incorporate that observation into our standard procedure, for example we need to re-focus the camera if the image has a certain pattern.’ Capturing and acting on that bit of knowledge is key -- there’s a vital connection between the person paying attention but also thinking with an R&D mode. The organization then also has to be ready to take that knowledge and disseminate it to all the other lab personnel that may encounter that failure mode either through re-training or knowledge reviews.
Do you feel an obligation to disseminate the best practices you’ve established to the, er, broader scientific community?
NUSBAUM: Communication is one of our key functions. We want to write peer reviewed papers about what we do. That’s a very formal way of disseminating information. We give talks and posters. We’ve been running a training course with Illumina where we produced most of the content. That’s in response to needs from the community, internally and externally. There was tremendous interest in setting that thing up and it’s been very well received. We could give it every week and be sold out.
NICOL: Our plan is to do it 1-2 times a year on site, and to eventually make the content available on the web including video. Right now it is a webinar. Putting these courses together is a lot of work, but its actually very helpful in forcing to make sure we really understand what the best way to do something is. We also do a trial run of the course where we invite key people from other centers large and small to critique and add their own experience. This is tremendously valuable as we often learn as much from them as they do from us. There are lots of very sharp and innovative people in all the centers and it would be great to find better ways for them to share best practices.
What do you anticipate from the 3rd-generation sequencing technologies on the horizon?
NUSBAUM: I’ve been around long enough that I don’t know what generation this is. It’s way more than the 3rd generation! I first sequenced by Maxam and Gilbert, you could call that the first generation. So I’d guess it’s now seventh or eighth! The next new technologies offer a bunch of opportunities. They’re different approaches, different ways of thinking about the problem. It doesn’t mean they’re going to come in and take over the world in a week. But it does mean they create a platform on which there may be real game-breaking opportunities.
As soon as a new platform is available, we need to understand it. We want to be ready when it’s there but also to help guide where it’s going. Often, the users can see things that the vendors can’t about their technology. We think it’s important to do as much of a mind meld with the vendors, so they understand what we’re thinking. There’s a very different mentality that goes into using these things than building them and the two are highly complementary.
The other thing is, even if they’re not immediately going to take over the world, they offer the opportunity to do things you can’t do. Let’s take PacBio, for example: the machine’s run time is claimed to be something like 20 minutes, the read lengths are claimed to be up to 1,000 bases. Those two things right there, there’s nothing else that can do that in a remotely affordable way. If you do nothing else with this machine, you can get your answer over lunch! That’s already enough of a thing, so we couldn’t possibly not be looking at it.
What’s your reaction to what is going on in Hong Kong and China at BGI?
NUSBAUM: We’re actually in pretty close contact with them at a lot of levels. They’ve been here several times. We’re building ongoing collaborations with them. Ideally we want them to be a sister center with us. There’s so much sequencing in the world that needs to be done, right now, I don’t see any need to compete with them. They’re trying to prove themselves in the world. It’s a friendly competition. There is an exchange of knowledge. We’re learning from them.
Yes, it upsets the balance of power, but I think it upsets the balance of power in a good way, because there’s more sequencing in the world. It’s not that different to the opposite phenomenon – there’s a large number of labs with one machine. They also change the balance of power, perhaps even more disruptively.
Chad, what are your personal research interests? I take it you have some?!
NUSBAUM: The funny thing is, I’m actually trained as a biologist, a developmental geneticist/gene expression guy. But in fact, my job now is to be an amateur engineer. Sometimes I get to go back and do my other job. I do a lot of advising on different projects. I could tell you I work on malaria, TB, cancer, human variation, human evolution, fungal evolution. All of that is true, in the sense that I’m behind the scenes providing access to technology. I do have my own research projects, but they’re not the major part of my job. I’ve been doing some work in plant pathogens, a study on yeast comparative genomics, trying to understand the origins of multicellularity in the Metazoan lineage. I’ve published on six Kingdoms, and several different kinds of viruses. I’m not biased. I have a short attention span!
In closing, what’s your near-term outlook for next-gen sequencing technology?
NUSBAUM: My hope is those technologies, aside from the “better--cheaper--faster,” will give us more power to answer the biological questions we want to ask. We’ve been able to sequence human genomes for a couple of years. Now it’s affordable. I’d like to be able to sequence human genomes really, really well with structural accuracy – so you can really understand polymorphism, not just SNPs but in long-range haplotypes. It’s exactly the same problem in our studies of HIV. The problem is, what’s the haplotype across the molecule? Nothing can tell us what the haplotype is across 10 kb right now, or even 1 kb, and yet those are key questions in HIV biology. So at the very top end and the very bottom end of the genome spectrum, what’s the impact of mutations that are in cis, in the same strand? How can we understand structural changes really accurately? I think that’s a general question we haven’t been able to address. Next year’s going to be a very good year for that from the smallest to the largest genomes.
NICOL: None of that is going to matter without the upstream sample prep!