I like to think that sometimes I get it right. A couple of years ago I wrote a lot about the role of metadata in large storage repositories and about its potential (here an example), and it’s interesting to see real world implementations finally seeing the light.
I’m not going to make comparisons in this article but I’d like to just point out a couple of rather intriguing implementations I’ve seen in the last couple of months. They are not unique in the market, but a trend worth mentioning.
Why metadata?
I don’t think it’s even necessary to ask this question. Metadata is additional data that describes a data set. In this particular case, metadata describes the content of a file or, more in general, an object.
Some metadata is quite easy to generate from the data itself, but in other cases it’s all but easy, data has to be analyzed during the ingestion process
Some metadata is quite easy to generate from the data itself: date of creation, file name, file type and so on. But in other cases it’s all but easy, data has to be analyzed during the ingestion process to create more info or, in the worst scenario, it has to/can be done manually. Depending on the application, it could be image recognition (i.e. face, text, object…), some sort of tagging or anything else that enriches the meaning, as well as the value, of data being saved in the system.
To be more precise, metadata can also be added while the file/object is already stored in the system and this can be useful in a wide range of situations starting from analytics down to compliance or auditing.
Why now?
As I said, rich metadata is neither new nor unique… especially in object storage. But now it is much more relevant than in the past. In fact, the number of developers taking advantage of object stores for their applications is growing like mad and having the ability to search objects and files according to their characteristics is very useful and enables to offload some of the complexity on the infrastructure instead of having it in the application. For the developer it is just a query and doesn’t need to worry about how to maintain a complex data structure (or an external DB), its scalability, consistency, etc… it’s just there and it works! And since most modern object storage systems are S3 compatible, application portability shouldn’t be a problem.
The ability to search objects and files according to their characteristics is very useful and enables to offload some of the complexity on the infrastructure instead of having it in the application.
About the implementations
Caringo and Cloudian both presented their solution at Tech Field Day (TFD10 and SFD10). There are some differences between the two but the core functionality is the same.
For example, what I loved about Caringo is that it has a nice UI which allows you to save queries that can be visualized and monitored over time. If not for the developer, it could be very useful for sysadmin tasks or reporting.
In any case, both presentations, which you can find at the bottom of this article, give you a clear idea of the potential of this approach.
Closing the circle
It’s not only about DevOps, the borders between infrastructure and applications are blurring and new paradigms are surfacing even in the most traditional of IT organizations.
The idea of offloading part of the application intelligence to the infrastructure is all but new and now it’s easier to do and, thanks to common APIS, it’s also less risky. At the same time, infrastructure components that act and behave as application elements help to overcome developer skepticism… an object store with the capability described above can be compared to a basic DB engine after all, right? And eventually this helps to build more flexible and scalable applications without the need to manage proper DB…
Disclaimer: I was invited to this meeting by Gestalt IT and they paid for travel and accommodation, I have not been compensated for my time and am not obliged to blog. Furthermore, the content is not reviewed, approved or edited by any other person than the Juku team.
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conferences in Amsterdam (6/Oct) and Chicago (27/Oct). A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!
Trackbacks/Pingbacks