Microsoft’s Azure Cloud BLASTs a Challenge to Amazon
By Kevin Davies
November 17, 2010 | Backing up the claim that cloud computing is its top strategic priority, Microsoft marked the beginning of this year’s Supercomuting conference (SC10) that by porting the sequence aligner BLAST to Windows Azure, it is releasing a major protein dataset that should not only be of value to researchers in its own right but also demonstrate the potential value of Microsoft’s Azure Cloud to the bioscience community.
Bill Hilf, general manager of Microsoft’s Technical Computing Group, tasked with driving strategy around technical computing, sees researchers increasingly working at their computer as if they had “large batteries behind the desktop,” providing seamless computing capabilities, mainstreaming and democratizing access for scientific research.
Hilf told Bio-IT World that Microsoft has ported the BLAST code to Windows Azure and built software around it to make it easy to use. By running BLAST against the entire NCBI protein database – an operation that used more than 4,000 cores over six days -- Microsoft generated some 10 million sequences, for a net 100 billion protein comparisons.
Microsoft is releasing the BLAST implementation and the resulting protein dataset, which is hosted on Azure, for free. “We’re showing how to use cloud computing to make it powerful and a lot more mainstream,” says Hilf. “It cost less than $20,000 using normal pricing for the Azure system. A comparable system would be millions of dollars to do it on [their] own.”
Hilf also mentioned a project with next-generation sequencing company Pacific Biosciences through the University of Washington, running 5,000 BLAST sequences on this system. The cost was $150 (list price) in 30 minutes. “It opens up the market,” says Hilf. “It’s the first example of scientific application at scale running on the cloud.”
In effect, Hilf says Azure is “a massive datacenter.” Azure is the name of the virtualized operating system that runs on thousands of servers. “When running on thousands of cores, failure is common. In the middle of our [BLAST] run, the Azure team did a system-wide upgrade, which would typically slow or explode the job.” It didn’t matter in this case. “What differentiates the system is it is fully virtualized and automated. Other machines spin up and pick up the job and keep running with it.”
Hilf sees cloud computing as “the evolution of HPC,” and reckons life science applications will quickly become “the largest consumer of cloud computing.”
He says Microsoft’s cores constitute a “truly worldwide system” where data can be shared very quickly and allowing users to keep the data close to the compute. “We have a set of ‘Uber’ datacenters – Chicago, Seattle, the East Coast, some in Latin America, quite a few in Europe and Asia,” says Hilf. “We don’t know where [the cores] are at, at any given point, because of this highly virtualized system.”
“Can you run a secure service at scale?” asks Hilf. “We’re offering a scale service around the world.”
Azure meets Amazon
Hilf acknowledges that Amazon has made “a great presence with cloud computing,” but says there are some fundamental differences between Amazon and Azure.
“Cloud computing for Microsoft is the number one strategic bet across our company,” says Hilf emphatically. “For the largest software company in the world with our financial assets, to say that is a non-trivial thing!” Hilf says Microsoft’s CEO recently claimed that in the next few years, more than 85% of the firm’s software programmers will be working on the cloud. Considering Microsoft’s assets and commitment for making software for consumers and video games, “that’s a very large investment from the datacenter side and transforming the business.”
One thing Amazon does not do, Hilf argues, is offer the ability for someone to keep their data on the premises in their own datacenter. It’s a view of the cloud as an addition versus the cloud as a replacement. “Amazon says, ‘Bring everything to the Cloud,’” says Hilf. “There is no on–premise version of Amazon.”
Hilf says he’s heard from many life science organizations express reservations about public cloud computing, perhaps because they’ve made big investments on their own datacenters, or they have concerns about security or transporting large volumes of data. They want to be able to handle sudden spikes in the workload and add in resources without having to build a new facility. They want to rent cores but stop paying for it week over week, month over month.
Azure provides the flexibility to schedule the same job “in concert” both on premise and off, says Hilf. “It’s the ability for to extend versus replace. One of our tenets with Azure is to provide the same enterprise-class capability as on premise. It’s one of our guiding principles . . . Our customers have developed trust in our systems. You need that same security in the cloud. We look at it as a top priority.”
Hilf cites Windows Update, which touches some 600 million computers around the world on a monthly basis, as an example of the resources and importance Microsoft has attached to improving security.
At SC10, Microsoft is also announcing the imminent release of Service Pack 1 for Windows HPC Server 2008 R2, which will allow customers to connect their HPC systems to Windows Azure. Moreover, a Japanese group is reporting that Windows HPC Server has surpassed a petaflop level of performance (a quadrillion mathematical computations per second).