I usually tend to look at technology from a technical point of view, but when in the field talking to end users it often turns out they bought it for the same reason you pitched it for… only to realize later that actually they like it for totally different reasons.
 
What I’m about to say was backed up for me a few days ago while speaking to Andy Warfield (CTO and co-founder of Coho Data) about a couple of statements he had made during the last Storage Field Day  (find them starting at 2:20 of the following video).

Scale-up Vs. Scale-out

The difference between the two is very simple. Scale-up (or vertical scaling) happens when you add more resources to the same box. On the opposite side scale-out (or horizontal scaling) is an architecture where you add more boxes to obtain the same result.
 
The first one is much simpler but limited while the latter is now at the base of larger infrastructure designs.
 
In fact, the advantages of scale-out are seen at large scale. If the system is well designed, the more you add to the cluster the more you obtain in terms of performance and capacity, in a linear and predictable manner. On the other hand, if you don’t plan on growing consistently in the near future, starting small with scale-out is usually more expensive.
 
But it turns out that in the real world this is both true and false at the same time!

Modern “scale-well” infrastructures

Screenshot 2015-12-08 11.50.36For several reasons, ranging from ease of procurement to organizational issues or just project isolation, when it comes to building new infrastructures most enterprises have been adopting a POD-like approach for quite a while now, with standard building blocks (usually the size of a rack) made up of a certain amount of servers, storage and  networking. This also helps to facilitate operations and defines fault domains very clearly.
 
This is much more an organizational aspect than a technological one, but enterprises seem to like this idea and indeed you can find these PODs becoming more and more common in their datacenters. They can be pre-packaged by vendors or built in a DIY fashion. Either case, they all look the same and if you analyze the content of the POD, it resembles a small autonomous infrastructure. Don’t you think?

So why spend more money and add complexity to a small infrastructure by using scale-out components? At a first glance it doesn’t make any sense. But…

Where are the benefits of scale-out in a POD?

But if you go deeper into analyzing what large enterprise IT usually deals  with, the reality is quite different. Scale-out is not just about scalability, most of its benefits are very applicable also to smaller infrastructures or, better, to small parts of a large infrastructure.
 
Miniature Network Engineers At WorkWhat are the biggest problems for a large IT organization when it comes to day-to-day operations? Mileage could vary but, usually, procurement, provisioning and migrations are always in the top 5 (and please,  correct me if I’m wrong). Let’s take it Let me make a quick example. You need 100TB of storage in three years, let’s say a hybrid array made of disks and flash. Today you have the chance to buy a traditional 100TB scale-up system for $1/GB or a similar configured scale-out system for $2/GB. Even though these numbers are just hypothetical, for a small system like this, they are credible because the scale-out system needs much more resources (in terms of nodes and connectivity) to build a minimal cluster with comparable availability and resiliency features.
one at a time and analyze what the difference is between scale-up and scale-out.
 
Procurement. With traditional, scale-up, storage. Enterprises usually tend to buy more than is actually needed at the beginning of a system’s life. It is justified by long-term forecasts and capacity planning… which are often proven to be wrong (and we will come to this later…).
 
Puchased orderIn more general terms here, the big problem is that we are buying today what we suppose will be used in 2 or more years! Look at flash memory for example, buying flash memory today that you’ll be using in two years time, even with considerable discounts, means you will still be paying more than anything you can buy later.   
 
There is nothing new here, with scale-out you can buy exactly what you need today and, even though it is relatively more expensive than an equivalent scale-up system in terms of IOPS or Capacity, in time you’ll be able to buy additional resources at better prices.
 
No doubt that if you buy 100TB upfront today, economics works for a simpler system. It just costs less… but problem is that you are buying something that you’ll probably be using in two or more years!
On the other hand, if you buy a smaller scale-out system today, you’ll pay a higher $/GB, but chances are that you’ll pay all the expansions much less in the future… and eventually the scale-out systems will end up costing you less.
 
Scale-out systems, by their nature, are easy to expand and modern design products allow to build systems with nodes of different size and performance. And it is likely that the new nodes added over time will be more efficient and powerful than older ones.
 
Provisioning. Still today, this is one of the most critical issues in all IT organizations. Once you have the resources in place, you have to provision them and, after some time, you realize you need more of them… more than you have previously planned for maybe! (this is the exact moment you realize your forecast was off)
 
Man wrapped in cables.It’s time to go back to the vendor and buy expansions for your infrastructure. In the case of scale-out it’s easy, you buy a new box and add it to the cluster. As I’ve already said, it’s likely to be a different, faster and more capacitive node type but it will work as expected.
 
Things are totally different when it comes to scale-up systems. You can buy a new tray of disks or add new ports, more cache and so on. Plenty of choice in theory… but very few in practice. A traditional two-controller system has limited expandability and chances are that the disks you need to expand your pool for are out of production already. (how many times has it already  happened to you??! Many to me).
At this point the vendor comes out with different options:

  • Buy old disks at the cost of new ones!
  • Buy larger disks and use only a portion of them to maintain the system balanced and easy to configure/maintain.
  • Buy the same larger disks and start to mess up with the configuration.

This happens up to the limit of the controller, because then you’ll have to make a controller upgrade (Argh! But again, we will come to this later).
 
At the end, a scale-out system is just much more manageable and has a longer longevity.

Server frustrationMigrations. Isn’t a migration the worst part of a SysAdmin life??! And in large IT organizations this is a daily task! Even though things have vastly improved in the last years, you always reach a point where a service disruption could happen (especially if you still have to deal with some physical servers and archaic Oses).
 
Scale-up systems, considering what I’ve already written about procurement and provisioning, are a pain in the neck when it’s time to update controllers or change the system backend. Compatibility matrixes, check lists, firmware upgrades and so on… could make your migration activity a trip to hell (and sometimes it’s not a roundtrip!). But in any case, even though everything goes as smooth as possible it’s the activity itself that is expensive and time consuming.
 
For modern scale-out storage systems it is much easier and smoother… to add a new node (just as you do when you want to expand the cluster), and in the worst case the cluster gets rebalanced. When it has finished its job, you just decommission the node you no longer need. That’s all, it could take time… but it’s not your or anyone else’s time after all. And above all, it is always safer than a controller upgrade.

Closing the circle

Scale-out storage could cost more than scale-up when we talk about small systems but even in this case, TCO is the real parameter to look at when it comes to the whole life of the infrastructure and its sustainability. 
 
In the particular case of POD-based deployments, adopting scale-out storage arrays is not an advantage for scalability but for the overall benefits it brings to daily operations avoiding most of the limits and constraints of traditional scale-up systems.

If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in Austin on 2/2/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!

[Disclaimer: Coho Data is a client of Juku Consulting]