Cloudian Engineering VP, Gary Ogasawara, shares his thoughts on object storage and the future of AI/ML.
2017 was a year of change, especially in the storage sector. We witnessed exploding data growth in storage driven by AI/ML/DL and data-intensive formats, the rise of cloud storage as a service, and the decline of flash memory prices. In 2018, a growing number of organizations will capitalize on object storage for structured/tagged data, and we’ll see object storage creating structured/tagged data from unstructured data; metadata will be used to make sense of the avalanche of data generated by artificial intelligence; and as data volumes continue to mushroom, organizations will start adopting new budget-friendly storage strategies.
What does the future hold for storage in 2018? Here are my predictions:
- Object storage for structured/tagged data, and object storage creating structured/tagged data from unstructured data, will be increasingly important.
The recent past saw an explosion of analytics technology as businesses demanded tools to unlock insights from their vast and growing sets of data. Now that those tools are available, the pendulum will swing back to the demand side of the equation and force businesses to pay more attention to collection, management and storage of that increasingly valuable data.
The amount of data that businesses are collecting is spiking – witness Intel’s statement late last year that the average car will generate 4000 GB of data per hour of driving. But it’s not just the raw data that’s causing the spike. Emerging data formats are causing a spike, too, as we collect more unstructured data – think of the video captured by a Tesla and shared with the company. In order to use this data, you have to understand it. That need for rapid understanding is driving a spike in data tagging and an associated spike in the collection of metadata. As data volume increases, the value of structured or tagged data increases disproportionately to the value of unstructured data.
This data is collected by sensors of various types – wear and use sensors built into machines, video and radar recorders and transmitters in cars, and medical equipment that captures patient health information, for example. As that data – and its associated metadata – is leveraged for business advantage, it’s likely to spawn even more data.
As businesses analyze these enormous data sets, they will start to identify additional data types that could lead to further correlations and additional business advantages, which will justify the deployment of more sensors to understand more about the real-world experiences of customers. The data from these sensors will generate increased need for greater storage capacity and advanced data storage management, which will lead to more insights and more sensors, and on at on. We’re at the beginning of a cycle in which businesses will strive to understand more about their customers digitally, and it will increase the strain on IT to keep growing storage infrastructures while preserving the usefulness of the data.
- AI will Trigger an Avalanche of Data – And Help Dig Out from It, with the Help of Metadata
Business is taking advantage of a range of new technologies – artificial intelligence, high-quality video, Internet of Things, analytics and more. The three things all these technologies have in common is that they are all data-intensive, and they demand ever-greater storage capacity – and they will depend on tagged data to function most effectively.
It does little good to store vast amounts of data if you have no way to access what you need to retrieve, or if you don’t have any idea of which data assets exist in the first place. Metadata is the key to extracting value from the data.
Structured/tagged data are a type of metadata, or model of the data. The metadata and models are a higher-level of abstraction beyond the raw object data and are required for analytics. Without metadata, the unstructured data captured by data formats like video becomes an unsearchable liability instead of an asset. With metadata, the data can be navigated, analyzed, understood and put to use.
A good example of this is in the management of video assets; media and entertainment, surveillance and security, and even automotive uses of video are increasing dramatically. But it’s not reasonable to expect your employees to watch endless hours of video in search of the single clip you need. Instead, facial recognition software built on AI will be used to sift through tens of thousands of hours of material to tag recognized faces, meaning that when the need arises it’s a simple task to locate just the right clip or clips.
As AI/ML generates and uses metadata and models, systems that can efficiently and effectively manage the metadata and models become critical. AI will become an indispensable tool for finding the most valuable data within enormous data sets.
- Economics Will Force Businesses to Adopt New Storage Strategies as Data Volumes Explode
Not only is data a tremendously valuable asset, but businesses are creating more of it, faster than ever. That means that businesses must invest in new strategies for safeguarding and protecting data – but at the same time, they have to store data in economically responsible ways. That’s not as easy as it might sound. Configurable data policies that control how the data are stored once inside a storage system will become critical. These data policies can control the durability, cost, availability, and other properties of the data according to dynamic optimization criteria. A simple example is moving data from hot to cold storage. But the optimization criteria are continuously variable, and can reflect tradeoffs based on business priorities.
For example, a user may want to trade data durability for lower cost. One of the paradoxes of data storage is that the more you store in primary storage, the more per unit of storage it costs. That was bearable when a terabyte was considered a lot of storage, but today most large businesses have multiple petabytes of data under management. Is it affordable to keep it all in primary storage? Or is it smarter to look for secure archiving combined with advanced search tools to keep data costs down while making sure you can find the data when you need it?
The concept of tiered storage has existed for 20 years, but it was deemed an unnecessary expense as disk storage prices tumbled in the early 2000s – drive capacity was so inexpensive there was no reason to consider it. Today, however, the volume of data is flipping the economic formula on its head. Storage prices that once seemed negligible are now looking like less of a bargain since the amount of data stored is so enormous.
These economic factors will prompt businesses to revisit these old strategies in the coming year. Harkening back to the past is easier for a couple of reasons: that archived data is much more quickly available today, and search technology has come a very long way, allowing archived data to be examined as easily as data in primary storage.