The success of services like AWS Lambda, Azure Functions or Google Cloud Functions is indisputable. It’s not for all use cases, of course, but the technology is intriguing, easy to implement and developers (and sysadmins!) can leverage it to offload some tasks to the infrastructure and automate a lot of operations that, otherwise, would be necessary to do at the application level, with a lower level of efficiency.
The code ( a Function) is triggered by events and object storage is perfect for this.
Why object storage
Object storage is usually implemented with a shared-nothing scale-out cluster design. Each node of the cluster has its own capacity, CPU, RAM and network connections. At the same time, modern CPUs are very powerful and usually underutilized when the only scope of the storage node is to access objects. By allowing the storage system to use its spare CPU cycles to run Functions, we obtain a sort of very efficient hyperconverged infrastructure (micro-converged?).
Usually, we tend to bring data close to the CPU but in this case we do the exact opposite (we take advantage of CPU power which is already close to the data), obtaining even better results. CPU-data vicinity coupled with event triggered micro-services is a very powerful concept that can radically change data and storage management.
Scalability, is not an issue. CPU power increases alongside the number of nodes and the code is instantiated asynchronously and in parallel, triggered by events. This also means that response time, hence performance, is not always predictable and consistent but, for the kind of operations and services that come to mind, it’s good enough.
Object metadata is another important key element. In fact, the Function can easily access data and metadata of the object that triggered it. Adding and modifying information is child’s play… helping to build additional information about content for example.
These are only a few examples, but the list of characteristics that make scale-out storage suitable for this kind of advanced data service is quite long. In general, it’s important to note that, thanks to the architecture design of this type of system, this functionality can boost efficiency of the infrastructure at an unprecedented level while improving application agility. It’s no coincidence that most of the triggering events implemented by cloud providers are related to their object storage service.
Possible applications
Ok, Serverless-enabled storage is cool but what can I do with it?
Even though this kind of system is not specifically designed to provide low latency responses, there are a lot of applications, even real time applications, can make use of this feature. Here are some examples:
Image recognition: for each new image that lands in the storage system, a process can verify relevant information (identify a person, check a plate number, analyze the quality of the image, classify the image by its characteristics, make comparisons and so on). All this new data can be added as metadata or in the object itself.
Security: for each new, or modified, file in the system, a process can verify if it contains a virus, sensitive information, specific patterns (i.e. credit card numbers) and take proper action.
Analytics: each action performed on an object can trigger a simple piece of code to populate a DB with relevant information.
Data normalization: every new piece of information added to the system can be easily verified and converted to other formats. This could be useful in complex IoT environments for example, where different types of data sources contribute to a single large database.
Big Data: AWS has already published a reference architecture for Map/Reduce jobs running on S3 and Lambda! (here The link)
And, as mentioned earlier, these are only the first examples that come to my mind. The only limit here is one’s imagination.
Back-end is the key
There are only a few serverless-enabled storage products at the moment, with others under development and coming in 2017. But I found two key factors that make this kind of solution viable in real production environments.
The first is multiple language support – in fact the product should be capable of running different types of code so as not to limit its possibilities. The second, is the internal process/Function scheduler. We are talking about a complex system which shares resources between storage and compute (in a hyperconverged fashion) and resource management is essential in order to grant the right level of performance and response time for storage and applications.
One of the best Serverless-enabled products I’m aware of is OpenIO (I wrote a paper about this product a while ago). The feature is called Grid For Apps while another component called Conscience technology is in charge of internal load balancing, data placement and overall resource management. The implementation is pretty slick and efficient. The product is open source, and there is a free download from their website. I strongly suggest taking a look at it to understand the potential of this technology. I installed it in a few minutes, and if I can do it… anyone can.
No standards… yet
Contrary to object storage, where the de facto standard is S3 API, Serverless is quite new and with no winner yet. Consequently, there are neither official nor de facto standards to look at.
I think it will take a while before one of these services will prevail over the others but, at that time, API compatibility won’t be hard to achieve. Most of these services have the same goal and similar functionalities…
Closing the circle
Data storage as we know it is a thing of the past. More and more end users are looking at object storage, even when the capacity requirement is under 100TB. Many begin with one application (usually as a replacement of traditional file services) but after grasping its full potential it gets adopted for more use cases ranging from backup to back-end for IoT applications through APIs.
Serverless-enabled storage is a step forward and introduces a new class of advanced data services which will help to simplify storage and data management. It has a huge potential, and I’m keeping my eye on it… I suggest you do the same.
Good afternoon thank you for the excellent article, and as much as safely and securely all?
You may also want to look at the Openstack Storlets project, which takes a similar approach with Openstack Swift object store. [1] a link to a use cases talk given recently at the Barcelona Openstack summit, [2] are some additional thoughts on co-locating storage and compute, and [3] is the project documentation. Hope you will find this interesting.
[1] https://www.openstack.org/videos/video/plethora-of-use-cases-with-openstack-storlets
[2] http://itsonlyme.name/blog
[3] http://storlets.readthedocs.io/en/latest/