S3 Data Lake for VMware Tanzu Greenplum
The amount of data consumed and generated by enterprises is accelerating at an unprecedented pace. Modern analytics applications including AI/ML need to analyze data rapidly while being able to capture and store data at peta-byte scale. With this increasing volume of data, there is also a growing variety of data types ranging from traditional enterprise database data, log and security data, video, voice, web, mobile and click stream data to IOT data like geo and graph data, among many other data types.
To handle these growing data challenges, analytics applications in the cloud have already modernized, combining the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses to create “data lakehouses.”
The need for a similar on-premises solution that is affordable, manageable, and scalable has never been greater. Key to this is the data lake built on open standards, with the ability to accommodate varied data types; the ability to expand to petabyte scale on-demand; and support for multi-cluster, multi-cloud, geo-distributed architectures.
VMware Tanzu Greenplum, a massively parallel processing (MPP) data warehouse platform, seamlessly integrates with Cloudian HyperStore S3-compatible object storage to provide enterprise customers the same data lakehouse architectures on-premises. This VMware-certified solution enables new efficiencies and savings and is ideal for the creation and deployment of advanced analytics models for complex enterprise applications. Customers can vary the number of compute nodes running Greenplum or HyperStore nodes elastically, independently, and on-demand. Adjusting the size of the cluster does not interrupt analytic workloads, allowing customers cloud-like flexibility and economics within the security of their own data center.
Figure 1. VMware Tanzu Greenplum | Cloudian HyperStore solution
Cloudian integrates with VMware Tanzu Greenplum enabling new efficiencies and savings with highly scalable, secure, and cost-effective data storage supporting the creation and deployment of advanced analytics models for complex enterprise applications, at scale.
VMware Tanzu Greenplum integrates seamlessly with Cloudian HyperStore, the industry’s leading on-prem S3 storage system, with native support for S3 API. Only Cloudian offers a 100% native S3 API for objects on-prem. Together, Greenplum and Cloudian provide customers with a robust data lakehouse solution, including scalable storage, high data durability, and fast recovery times at up to 70% lower total cost than traditional storage solutions.
Exabyte-scalable to Handle the Largest Greenplum Environments
The Greenplum-Cloudian data lakehouse separates compute and storage scalability, enabling customers to scale the resources most required. Cloudian HyperStore, a scale-out object storage system, provides a fast, cost-effective storage platform ideal for data lakes of semi-structured or unstructured data. Available as an appliance or as software, HyperStore lets customers start small and expand seamlessly from TBs to PBs without interruption. High-performance all-flash Cloudian HyperStore is also an option for performance-sensitive apps – at 3X better price/performance than leading competitors.
Data lakes built on Cloudian Hyperstore provide 14 9’s of resiliency. HyperStore provides storage policies (administrator selectable) for implementing data-protection based Replication (RF) or Erasure Coding (EC). Administrators can configure the number of replicas or type of erasure code scheme required to meet SLA and cost objectives. Storage policies also provide fine grain control of data placement across data centers, taking into consideration factors such as cost-efficiency, security levels, and proximity
The solution provides extensive security features that make it possible to deploy and operate a cost-effective data lake that keeps customers data extremely secure. These features include data encryption and transparent key management, AES-256 server-side encryption for data stored at rest, SSL for data in transit (HTTPS), role-based access controls with specified levels of access, audit trail logging, WORM (Write Once Read Multiple) for storage of immutable data, and more.
The solution allows multiple users to analyze data sitting on a single shared data lake infrastructure without compromising security. HyperStore gives role-based access to system and group administrators and to end users. Users can select and provision storage services on-demand from a service catalog. Billing, metering and QOS are built into the HyperStore platform to enable a full multi-tenant deployment.
Greenplum and Cloudian are both part of VMware’s Cloud Provider Program, enabling service providers to build and offer Data Lakehouse-as-a-Service to enterprise customers in a true consumption model.
Cloud Economics: Up to 70% Less Cost Than Conventional Storage
Running on standard x86 hardware, Cloudian drives down the cost of on-prem, disk-based storage to ½¢/GB/month or less, including support. High-performance all-flash Cloudian HyperStore is also an option for performance-sensitive apps – at 3X better price/ performance than leading competitors.
Jacque Istok, VMware Greenplum, describes some creative AI/ML use cases for VMware Greenplum & Cloudian. Fraud prevention and open source real estate pricing models are two of the use cases discussed.
- Enterprise-grade object storage software with proven VMware Tanzu Greenplum platform
- Single data analytics platform that can scale as needs evolve
- Separation of compute and storage scale
- Military-grade data storage security
- Hybrid and multi-cloud ready
- Shared storage with up to 70% TCO savings
- VMware certified for trouble-free integration
- Flexible deployment options: bare metal, VM, and container
- Available for purchase with VCPP points
- Provides all MSPs need to create a Data Lakehouse-as-a-Service offering
- High-performance, low-cost all-flash storage available
Proven at over 700 enterprise customers worldwide—including many MSPs–with nearly two exabytes of capacity under management, Cloudian Scored Highest for All Use Cases in the Gartner 2020 Critical Capabilities for Object Storage Report and was named a Gartner Peer Insights Customers’ Choice in 2020, 2021, and 2022.
- Modern data analytics platform
- HDFS offload and replacement for data lake modernization
- Data protection for VMware Tanzu Greenplum environments
- DR for VMware Greenplum environments