Is A Science DMZ A Key To Solving Poor Data Utilization?

April 16, 2019

Contributed Commentary by Chin Fang

April 16, 2019 | It is often said that the bio-pharmaceutical industry is facing an onslaught of data, in volume, velocity, variety, and veracity. Can we quantify such observations? Luckily we can. Mr. Navin Shenoy, the Executive Vice President of Intel Corp.'s Data Center Group, revealed the following two incredible facts at the August 2018 Intel Innovation Summit:

  1. 90% of the world’s data was generated in the past two years (since 2016)
  2. Only about 1% of it is utilized, processed and acted upon

But what is the main reason of such poor data utilization?  I believe it's because so far, data movement at scale and speed have not been tackled with a balanced approach of integrating storage, computing, networking, and software stack for the best performance.  Instead, it's often regarded either as a software alone or network alone task.  In particular, the widespread employment of firewalls for network security is almost always the foremost hindrance to speeds of large scientific data flows as required by the bio-pharmaceutical industry.  As a result, many bio-pharmaceutical businesses, after investing in high-bandwidth connections, fail to see the benefit from them. But, one may wonder "Without firewalls, how can you maintain the security for an enterprise?"

It turns out you can!

The Office of Science of the U.S. Department of Energy is the lead federal agency supporting fundamental scientific research for energy and the nation’s largest supporter of basic research in the physical sciences. Under it there are 17 national labs and several less-known but just as critical "User facilities". One of them is the Energy Science Network, aka ESnet, a high-performance, unclassified network built to support scientific research.  ESnet, under the direction of Inder Monga and led by Eli Dart and other ESnet scientists and engineers, pioneered the concept of "Science DMZ". DMZ commonly means "demilitarized zone," so what is a "Science DMZ"?  The Science DMZ is a portion of the network, built at or near the campus or laboratory's local network perimeter that is designed such that the equipment, configuration, and security policies are optimized for high-performance scientific applications rather than for general-purpose business systems or “enterprise” computing, according to ESnet. A takeaway: always separate the general purpose "enterprise networks" from the networks used for large science data flows.

The Science DMZ approach has been adopted world-wide and its popularity is ever growing. Almost all well-established research institutions including DOE national labs, NSF-funded research institutions, and large universities use it and their Science DMZs are linked together into elaborated networks facilitating large scale data-intensive collaborations, even internationally. Soon, there will be a world-wide "grid" of Science DMZs. Two excellent examples are the Starlight International Networks and The Asia Pacific Research Platform (APRP).

Some bio-pharmaceutical IT security professional may think, "Ah, these traditional science networks, who cares?”  In fact, both the Starlight and APRP are far more elaborate than the most built-out enterprise networks.  A quick glimpse of the network topologies of Starlight networks should make the contrast obvious. Furthermore, in the words of Joel Mambretti, Director of the International Center of Advanced Internet Research, the term "High performance Enterprise Network" is an oxymoron!

In my experience, the Science DMZ methodology, regrettably, has not been widely adopted by the IT Security / networking folks working in the bio-pharmaceutical industry.  I strongly recommend the IT Security and networking professionals working in the industry to study and test out the Science DMZ concept more and put it in practice. In addition, several connectivity providers, owing to their collaboration with ESnet, are very proficient in the deployment of Science DMZs. They can help their customers to do so as well.  Internet2 is a good example.

Finally, I will also provide an actual example why firewalls are bottlenecks to attainable data transfer rates. High-end Palo Alto firewalls are pretty capable with a 60Gbps throughput.  An Active/Active configuration can thus provide 120Gbps throughput – seemingly sufficient. But the cited throughput pales when it comes to what modern large science data flows need: multi-100Gbps and even terrabytes per second transfer rates. Given the hockey-stick like fast data growth we all are facing, it’s time to put the well-proven Science DMZ methodology in practice, be ready for the future, and create value from the data.

Chin Fang, Ph.D. is the founder and CEO of Zettar Inc., a Mountain View, California based software startup, delivering a GA-grade hyperscale data distribution software solution for moving data at scale and speed. Chin is reachable at fangchin@zettar.com.