Data Analytics on a Cloudian S3 Data Lakehouse

The modernization of the data analytics architecture that started in the cloud combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses giving us the Data Lakehouse. Data Lakehouse allows enterprises to analyze large amounts of structured, semi-structured, or unstructured data, for instance, web server logs, RDBMS data, NoSQL data, social media, sensors, IoT data, and third-party data, all stored in one place, in the same format as its source systems in a modern data lake. 

However, not all data and/or enterprises can move to the public cloud. Customers are increasingly looking to replicate this Lakehouse architecture on-prem to overcome the challenges like scalability, flexibility, and cost to get a business advantage.  

Solution

Cloudian HyperStore running on Lenovo SR650 as a scale-out storage cluster comprised of shared nothing storage nodes is the S3 data lake that enterprise customers need to create their own on-prem Data Lakehouse.

Lenovo Data Lakehouse

Cloudian HyperStore, a native S3 object storage for on-prem or deployed in the cloud, provides customers a complete solution with limitless scalability, and integrates seamlessly with leading data warehouse solutions. HyperStore is fully validated/certified to run SQL Server 2022, Vertica, Teradata, Greenplum and Snowflake workloads analyzing data directly residing in the external tables in an S3 data lake built on HyperStore, as well as a solution to back up native data warehouse tables for data protection. 

It Brings the Cloud to Your Data Center

Modernization of data analytics started with the cloud. However, while most organizations want to be cloud-centric, only a fraction of the data can reside in the cloud. Cloudian HyperStore running on a Lenovo ThinkSystems cluster, brings the scalability and flexibility of the cloud into your data center as part of a hybrid cloud Data Lakehouse solution built on S3, enabling modern data analytics on-premises. Separate compute from storage scalability and expand by simply adding nodes, in one site or across multiple sites. Manage it all within a single namespace, from a single management console, and search metadata across all your sites with a single query. Integrated tools even let you create a data copy to a public cloud for disaster recovery, if desired.

  • Cloud-like – Modernize your data warehousing solution and get the flexibility of the Cloud 
  • Industry leading storage infrastructure with Lenovo ThinkSystems
  • Limitlessly scalable architecture with scalable throughput (>18 TB/hr with 16 nodes). Accelerate query performance and tuning with no code changes required.
  • Disaster Recovery– Ensure uptime with fully-managed DR within HyperStore. 
  • S3 API – Connect to any provider of S3-compatible object storage, example HyperStore
  • Data protection – Use immutable ledger to protect data from tampering

USE CASES

  • Modern Data Warehousing with S3 Data Lake
  • Backup / Restore SQL Server data to S3 for Data Protection 

Award-Winning
Proven at over 700 enterprise customers worldwide—with nearly two exabytes of capacity under management, Cloudian Scored Highest for All Use Cases in the Gartner 2020 Critical Capabilities for Object Storage Report and was named a Gartner Peer Insights Customers’ Choice in 2020, 2021, and 2022.

Easy Integration with All Leading Analytics Platforms

Cloudian HyperStore on Lenovo ThinkSystems, the best-in-class on-prem hybrid cloud storage platform with native support for S3 API, is validated and certified with leading data analytics platforms, including Greenplum, Teradata, Vertica, Apache Druid, Microsoft SQL Server 2022, Snowflake, Splunk, Elastic, and Cribl, amongst others. Only Cloudian offers a 100% native S3 API object storage system to modernize data analytics architectures on-premises. Together with these industry-leading analytics applications, Cloudian provides customers with a robust Data Lakehouse solution that includes scalable storage, high data durability, and fast recovery times, within the security of your own firewall, at up to 70% lower total cost than traditional storage solutions. 

The Data Lakehouse for All Use Cases

A Data Lakehouse built on Cloudian Hyperstore can be used as a storage destination for any application built for the S3 API. In addition to data analytics, organizations can use HyperStore as the repository for machine logs, media files, remote directory, as well as for a secure backup target (databases and/or system-level) with integration into Veeam, Rubrik, Commvault, Veritas, HYCU, and any other application vendors that support the S3 API. HyperStore brings the data to the application, for more efficient storage, management, and security.

The Most Security Certifications of Any On-Prem Object Storage

Cloudian HyperStore provides extensive security features that make it possible to deploy and operate a cost-effective Data Lakehouse that keeps organizational data extremely secure. Cloudian HyperStore has more security certifications than any other on-prem object storage.  Our DOD-level security certifications even make it possible to operate a secure AWS Outposts deployment. Capabilities include AES-256 server-side encryption for data stored at rest, SSL for data in transit (HTTPS), transparent key management, Amazon Identity and Access Management (IAM), API and role-based access controls (RBAC), audit trail logging, secure shell with root disable, and last but certainly not least, S3 Object Lock to keep your data immutable against the threat
of ransomware.

High Data Resiliency 

Cloudian Hyperstore provide 14 nines’ of resiliency for Data Lakehouse architecture running on Lenovo ThinkSystems clusters. HyperStore gives the option of storage policies (administrator selectable) for implementing data-protection based Replication (RF) or Erasure Coding (EC). Administrators can configure the number of replicas or type of erasure code scheme required to meet SLA and cost objectives. Storage policies also provide fine-grain control of data placement across data centers, taking into consideration factors such as cost-efficiency, security levels, and proximity.

Hybrid and Multi-Cloud Ready

Cloudian HyperStore’s native S3 API cloud integration gives Data Lakehouses the power to tier data across on-prem storage and private/public clouds. This lets organizations cost-effectively combine storage across environments into a single pool while consolidating storage management enterprise-wide to a single screen. This flexibility is ideal for cloud bursting—a temporary use of more storage resources than usual–or disaster recovery (DR) in public cloud.