Henry Golas, Director of Technology, Cloudian
There is a lot of talk in object storage about separating metadata from user data. How metadata is stored makes a critical difference in the success of your storage project. When you understand the benefits of a dedicated metadata store, you quickly realize why Cloudian’s HyperStore object storage platform has one.
When we talk about data and data storage today, the conversations are typically about data at scale—petabytes or even exabytes. To put this in context, if you binge-watched Netflix, non-stop, for about 3.5 years, you would have watched a petabyte worth of data. Now, an exabyte would take about 3,584 years to watch.
So, what does metadata have to do with this?
Metadata is information about data. There is system metadata that contains items like date and system information about the data, and there is custom metadata, which provides context for data. For example, for the 1993 movie Groundhog Day, custom metadata would include words like “1993, Bill Murray, romantic comedy, classics, feel-good.”
With that context, here are 4 reasons why having a dedicated metadata store is so important:
1. Storing metadata in a dedicated store—i.e., a database—makes data management and metadata operations easier without impacting system performance. If you don’t have a centralized store, a data request, such as an object search or bucket listing, will have to traverse the entire object store, which can contain billions of objects and be 100s of PBs in size, every single time.
See the difference below? On the left, I’m scrolling titles based on metadata, and when I find Groundhog Day, I can simply click it and start watching. On the right, without the help of metadata, I must start searching at the front of the row and keep going until I find my movie.
2. Keeping custom metadata in a metadata store allows for advanced analytics and searches, with the ability to search on virtually any piece of metadata, be it custom or system metadata.
Continuing with the movie example, if I wanted to list all the movies from 1993, it’s a simple metadata query. Without a metadata store, I’d have to scan each title and compile a list. Now, imagine a real-world analytics example with millions or billions of objects. Because of the sheer volume, scanning each object is not a realistic option.
3. By having an API-accessible metadata store, search and analytics engines, such as ELK Stack, can integrate seamlessly. Search and analytics operations can therefore occur without impacting data service.
While this type of integration is possible without a metadata store, there are significant performance risks by having to scan the system for each query (see point #1 above).
4. Lastly, by definition, traversing data and metadata in search of metadata means traversing actual data. This could be a security concern for sensitive or restricted data environments: PII, PCI, Healthcare & HIPPA compliance, etc. Having a dedicated metadata store removes this concern.
By now, it should be clear that enterprise object storage platforms need dedicated metadata stores.
Cloudian’s HyperStore object storage platform leverages a dedicated metadata store, enabling metadata transactions to occur without impacting system performance or data services. Cloudian HyperStore also allows API calls for search and analytics engines such as ELK Stack. Lastly, Cloudian HyperStore provides industry-leading, government-certified data security, including protection against ransomware and unauthorized root access.
Learn more at cloudian.com/products/hyperstore/