Modern scale-out architectures need good node interconnection, that’s a fact. 10Gbit Ethernet is quickly becoming the standard and 40Gbit is just around the corner. This is what is happening in the traditional enterprise datacenter. Does it make sense to think about something different? And if so, what should we target in the next future?
What is a scale-out infrastructure?
My quote from a recent paper I wrote:
“Scalability is the capability of an IT system to be enlarged to sustain a growing amount of work that needs to be managed. There are two types of scalability: vertical (scale-up) and horizontal (scale- out).
The main difference is that scale-up systems are monolithic while the others are composed of small nodes connected together. In practice when you talk about vertical scalability, you
talk about the ability of the system to add more resources in the same box (e.g.: adding CPUs, RAM, disks, etc. in the same computer). On the other hand, the expansion of a scale-out system/infrastructure occurs by adding more nodes, with each node adding its own resources to the cluster.Both approaches have their advantages and tradeoffs but, in the past few years, technology has made many steps further and scale-out systems are now chosen for a wider range of applications. Potential connection latencies between nodes and management complexity are no longer a big problem (when numbers are substantially small) while it’s relatively easy to deploy huge computing systems at a reasonable cost.
Scale-out is also becoming more popular in the storage industry to solve performance and space problems when the numbers are huge: BigData Hadoop/HDFS clusters are the most visible examples.”
The “real” world
Scale-out has its drawbacks! Leaving the discussion about the single implementation apart, each node of the cluster has to talk with other nodes as fast as possibile.
Latency, bandwidth, size of packets, protocol overhead, RDMA are all important and they contribute all together to the real scalability of the system.
This might not be a problem in smaller clusters, the kind of infrastructure that we usually see in traditional enterprise environments, but things change drastically when we talk about massive scale cluster (thousand of nodes).
HPC clusters for example, face these problems every day. In fact, many big clusters don’t use Ethernet for node interconnections, but Infiniband. This technology isn’t very successful in enterprise Data Centers but it is more present than you might think. For example many vendors (Oracle, EMC Isilon, Pure storage, just to name a few) use Infiniband in their backend (some offer Infiniband frontend options too). They chose to do this because of the speed and latency benefits, but probably also because Infiniband can efficiently use RDMA to transfer data (it means faster data movements and less CPU utilization).
Even VMware, in one of its latest ESXi versions, released infiniband/RDMA support and benchmarks demonstrated a huge benefit when compared to traditional Ethernet environments.
Obviously Infiniband doesn’t come cheap and support from most vendors is very limited.
Why ask for more?
Last week I had the chance to talk with Emilio Billi, founder and CTO of A3Cube. He is developing an innovative new technology to connect cluster nodes together.
This technology, called Ronnie Express (Perhaps more should be invested in marketing here, IMHO), allows to build a huge interconnection fabric (up to 64K nodes) through the extension of internal PCI bus of each node, and extending mapping memory capabilities of PCI Express.
Basically, it means the creation of a huge globally shared memory infrastructure where the term “in-memory network” is not that wrong! The implications are huge: no bottlenecks, incredible linear scalability and end-to-end I/O control. Moreover, the cluster can be configured with automatic redundant links for high availability.
A3Cube has also been developing a storage solution, based on an operating system they carved out of the team’s experience in high parallel supercomputing… And some of the benchmarks that I saw during the presentation just blew my mind, really!
And you know what? it’s cheaper than 10Gb Ethernet and Infiniband (or, at least, it is what A3Cube is saying about Ronnie Express).
Now, think about all those operations that move data back and forth between your cluster nodes, would you like to do it at the local speed? I’m sure you would!
Why it matters
Well, 10Gbit just sounds more than enough for many of us at the moment but it’s clear that the more scale-out and software-defiend we go the more networking we need.
Ronnie express has a really brilliant design, but it is not thought for small numbers and, in fact, the first A3Cube customers will be in HPC, Biotech, and the like. At the same time, there are a few verticals where this technology could be disruptive especially when you think about the backend of high end network equipment or storage systems.
We won’t be seeing Ronnie Express in enterprise data centers anytime soon probably but the potential is really high and I hope they’ll find a way to demonstrate it.