data management Archives

How to Get Full GitHub Benefits On-Premises

Henry Chu, Director of Solution Management, Cloudian

How to Get Full GitHub Benefits On-Premises

GitHub has traditionally been known as a commonly used cloud service to store, manage and share code using tools like Git. It is this service that’s most familiar to many developers. However, GitHub also has an offering that provides the ability to use GitHub on-premises with GitHub Enterprise Server (GHES) for enterprises that require a solution behind their firewall due to network restrictions or generally want tighter control over their data and access to it. GHES includes GitHub Actions and GitHub Packages, both of which can use not only public cloud storage but also on-prem S3-compatible object storage, which is what’s needed for a complete on-prem deployment of GHES. Cloudian HyperStore provides a validated on-prem object storage for GHES with the highest levels of S3 compatibility. In addition, HyperStore can also be deployed across multiple public clouds, providing a distributed S3 service for GHES with a single namespace across multi-cloud infrastructure.

This blog focuses on how to deploy HyperStore with GitHub Actions and GitHub Packages on-prem.

GitHub Actions

GitHub Actions automates CI/CD workflows and are created from building, testing, pull, and deploying requests. Cloudian HyperStore provides the on-prem S3-compataible object storage for GitHub Actions to store data such as artifacts and logs.

Cloudian has validated the necessary S3 operations used by GitHub Actions with HyperStore (see Cloudian’s validation at GHES Storage Partners). Using GitHub’s ghe-storage-test.sh, all storage operations have passed.

To configure GitHub Actions with Cloudian HyperStore, go to GHES Site admin interface.

On Site admin, go to Management console.

At the Management console, go to Applications and Enable GitHub Actions. Under Artifact & Log Storage, choose Amazon S3.

Enter the following fields:

AWS Service URL: Enter the S3 endpoint configured for your HyperStore Cluster.
AWS S3 Bucket: Enter the name of the bucket to be used for GitHub Actions
AWS S3 Access Key: Enter the access key for the HyperStore user for the given bucket
AWS S3 Secret Key: Enter the secret key for the HyperStore user for the given bucket

Click on Test storage settings to validate the configuration. Then save the settings.

GitHub Packages

GitHub Packages gives you a safe way to publish and share application packages within your organization. With Cloudian HyperStore, you can achieve a complete on-prem deployment.

To configure GitHub Packages with HyperStore, go to Management console and then to Packages. Enable GitHub Packages.

Choose Amazon S3 for your storage. Enter the following fields:

AWS Service URL: Enter the S3 endpoint configured for your HyperStore Cluster.
AWS S3 Bucket: Enter the name of the bucket to be used for GitHub Packages.
AWS S3 Access Key: Enter the access key for the HyperStore user for the given bucket
AWS S3 Secret Key: Enter the secret key for the HyperStore user for the given bucket

Click on Test storage settings to validate the configuration and then save the settings.

Summary

With GitHub Enterprise Server and Cloudian HyperStore, enterprises now have a solution to store, manage, and share code with full control behind the security of their firewall. HyperStore provides the S3-compatible object storage on-prem (or distributed over multi-cloud) for GitHub Actions, a solution for CI/CD workflows. Similarly, HyperStore can also be used as a storage target for GitHub Packages, a repository for application packages.

To learn more about HyperStore’s features and benefits, go to Scalable Enterprise Object Storage | Cloudian HyperStore.

Building and Protecting Data Lakehouse Projects with Cloudian and Vertica

See how to start a data lakehouse with Vertica EON mode and Cloudian, extend the data lakehouse with Vertica external tables and Cloudian, and protect Vertica datasets with data backup to Cloudian.

Henry Golas, Director of Technology, Cloudian

View LinkedIn Profile

Building and Protecting Data Lakehouse Projects with Cloudian and Vertica

Over the past year, Cloudian has greatly expanded its support for data analytics through new partnerships. One of those key partnerships is with Vertica, where the combination of Vertica and Cloudian HyperStore enables organizations to build and protect data lakehouses for modern data analytics applications.

This blog highlights the three main use cases we’re currently serving together:

Starting a data lakehouse with Vertica in Eon mode and Cloudian
Extending the data lakehouse with Vertica external tables and Cloudian
Protecting Vertica datasets with data backup to Cloudian

Just as a reminder, Vertica is a unified analytics data warehouse platform, based on a massively scalable architecture, and Cloudian is a software-defined, limitlessly scalable, S3-compatible object storage platform for on-premises and hybrid cloud environments.

Starting a Data Lakehouse with Vertica in Eon Mode and Cloudian

Cloudian-Vertica Data Lakehouse In the data analytics space, Vertica is known for performance, whether it is run in “Enterprise Mode” or “Eon Mode.” In Enterprise Mode each database node stores a portion of the dataset and performs a portion of the computation. In Eon Mode, Vertica brings its cloud architecture to on-premises deployments and decouples compute and storage. In Eon Mode, each Vertica node can access a shared communal storage space via S3 API. The advantages are: a) compute can be scaled as required without having to scale storage, meaning no more server sprawl and b) storage can be consolidated into a single platform and accessed by various data tools:

Building out Vertica communal storage on Cloudian is easy. For this exercise we are going to assume we have both a functional Vertica and Cloudian HyperStore instance that can communicate via HTTP(s):

Configure a bucket via Cloudian Management Console (CMC) on your HyperStore cluster:
- - Let’s use the name “verticabucketoncloudian” for this example.
Create an auth_params.conf file:
- On your Vertica node, create an auth_params.conf file that will be accessible when you create the Vertica database instance.
  auth_params.conf values required are going to be:
  awsauth = Access_Key:Secret_Key awsendpoint = HyperstoreAddress:Port (either 443 or 80) awsenablehttps = 0 Is required if not using HTTPs
Create your Vertica in Eon Mode database instance:
- On your Vertica node, create the database instance. Specify the location of your auth_params.conf file to leverage a Cloudian S3 bucket for communal storage.
  admintools -t create_db -x auth_params.conf \ --communal-storage-location=s3://verticabucketoncloudian \ --depot-path=/home/dbadmin/depot --shard-count=6 \ -s vnode01,vnode02,vnode03,vnode04,vnode05,vnode06 -d verticadb -p 'YourDBAdminPasswordHere'
Success! Let’s test.
- Once the above command returns successfully, you can test the Vertica in Eon Mode instance.
- Connect to your db instance and load a dataset.
- Connect to Cloudian bucket “verticabucketoncloudian” via CMC or S3 browser, and you will see objects in the bucket.

Extending the Data Lakehouse with Vertica External Tables and Cloudian

One of the key tenants of a successful data lakehouse initiative is the ability to access and analyze datasets that have been generated by other analytics platforms.

Prior to the data lakehouse, an ETL (Extract Transform Load) operation would have been required to move data from one analytics platform to another. Today, Vertica can analyze the data in-place by leveraging external tables, without the need for complex and expensive data moves.

Let’s consider the following scenario… we have an ORC dataset, which was generated by an Apache Hive instance, stored on Cloudian, and we need to connect to it with Vertica. To analyze this dataset in-place, use the following Vertica syntax to connect to the ORC dataset:

That is much simpler and easier than working through any data ETL.

Here are the details for the S3 parameters and configuration.

Protecting Vertica Datasets with Data Backup to Cloudian

As with all datasets, backups of data are key to protecting and preserving data. For this purpose, Vertica has its own backup and recovery tool called “vbr,” and Vertica can leverage Cloudian as a backup target.

Vertica has thoroughly documented the process, but here’s a condensed version:

Configure connectivity and credentials for HyperStore
1. HyperStore credentials are important. They are configured within the database, as a security function, and they are configured as environmental variables to allow vbr to connect.
  - For the database that is going to be backed up, set the AWSAuth credentials (S3 credentials):
    ALTER DATABASE DEFAULT SET AWSAuth = 'accesskeyid:secretaccesskey';
2. Configure vbr HyperStore URL address and credentials
  export VBR_COMMUNAL_STORAGE_ENDPOINT_URL=http:// export VBR_COMMUNAL_STORAGE_ACCESS_KEY_ID= export VBR_COMMUNAL_STORAGE_SECRET_ACCESS_KEY= export VBR_BACKUP_STORAGE_ENDPOINT_URL=http:// export VBR_BACKUP_STORAGE_ACCESS_KEY_ID= export VBR_BACKUP_STORAGE_SECRET_ACCESS_KEY=
  - Keep in mind that you can back up to the same endpoint using the same credentials as the communal storage, but to a different bucket. Or backup can be to a second endpoint with different credentials. Most users will want to back up to a different bucket to reduce associated cost.
Setting the configuration file for vbr
1. There are some additional parameters that must be stored in a configuration file for Vertica to successfully backup / restore with Cloudian
2. Create a file called “eon_backup_restore.ini’ in the home directory of dbadmin
  As a quick reference, /opt/vertica/share/vbr/example_configs contains examples for cloud backups
  eon_backup_restore.ini [CloudStorage] cloud_storage_backup_path = s3://verticabackuponcloudian/fullbackup/ cloud_storage_backup_file_system_path = []:/home/dbadmin/backup_locks_dir/ cloud_storage_concurrency_backup = 10 cloud_storage_concurrency_restore = 10 [Misc] snapshotName = EONbackup_snapshot tempDir = /tmp/vbr restorePointLimit = 1 [Database] dbName = dbPromptForPassword = True dbUser = dbadmin
Target initialization and performing data backup
1. Vertica requires the S3 bucket to be initialized prior to use
  - vbr -t backup -c eon_backup_restore.ini
    Initializing backup locations. Backup locations initialized.
2. Run the Vertica backup
  - vbr -t backup -c eon_backup_restore.ini
    Enter vertica password: Starting backup of database VMart. Participating nodes: v_vmart_node0001, …., v_vmart_node0006. Snapshotting database. Snapshot complete. Approximate bytes to copy: x of y total. [================================================] 100% Copying backup metadata. Finalizing backup. Backup complete!

I hope this tech blog post helps make your Cloudian and Vertica data lakehouse project a success.

For more information about Cloudian data lakehouse / data analytics solutions, go to S3 Data Lakehouse for Modern Data Analytics.

Meeting Hybrid Cloud Demands: Microsoft AzureStack HCI and Cloudian HyperStore

Microsoft and Cloudian enable organizations to leverage the benefits of public cloud while keeping some infrastructure, applications and data on-premises, behind the firewall and fully under the organization’s control.

Steve Connors, Senior Alliances Manager, Cloudian

View LinkedIn Profile

Meeting Hybrid Cloud Demands: Microsoft AzureStack HCI and Cloudian HyperStore

Over the last two years, hybrid cloud has become the dominant IT deployment model, with 82% of IT leaders saying they’ve adopted it in a Cisco report earlier this year.^[1] It enables organizations to leverage the benefits of public cloud while keeping some infrastructure, applications and data on-premises, behind the firewall and fully under the organization’s control. Reflecting the increasing adoption of hybrid cloud, global hyperscalers have introduced new services to meet the demand and ensure a seamless experience across public and private clouds. Here we take a look at the Microsoft AzureStack HCI service and how Cloudian’s HyperStore object storage works with the service.

According to Microsoft, “AzureStack HCI is a hyperconverged infrastructure host platform integrated with Azure. Run Windows and Linux virtual machines on-premises with existing IT skills and familiar tools. Delivered as an Azure subscription service, Azure Stack HCI is always up-to-date and can be installed on your choice of server hardware.”

Cloudian HyperStore, a leading scale-out storage system, has been validated to work with Azure Stack HCI, enabling customers to store and protect large amounts of unstructured data on prem and use the Azure public cloud for real-time, on-demand computing power, which is more cost effective than buying additional hardware.

HyperStore employs policy-based tools to replicate or tier data to Azure for offsite disaster recovery, capacity expansion or data analysis in the cloud. HyperStore offers limitless scalability, multi-tenancy and military-grade security. This includes the ability to isolate storage pools using local and remote authentication methods such as Password, AD, LDAP, IAM and certificate-based authentication.

Deploying Cloudian HyperStore on Azure Stack HCI provides the following key benefits:

Turnkey HCI solution – network, compute, storage and virtualization
Hybrid cloud readiness – seamless movement of data across on-prem and public cloud environments
Unified view of data – a single namespace across multiple locations
Reduced operational costs – savings on data egress and bandwidth charges

To learn more about the Cloudian HyperStore-Azure Stack HCI hybrid cloud solution, go to cloudian.com/microsoft/#azure.

^[1] Report: 82% of IT leaders are adopting the hybrid cloud, Tech Republic, May 25, 2022

Cloudian Enhances MSP Partner Program

Organizations face a range of challenges in managing and protecting their data, providing new market opportunities for service providers. Cloudian’s enhanced MSP program ensures MSP partners are well-positioned to capitalize on these opportunities by delivering greater value-add to their customers and driving increased growth and profit.

Cloudian Enhances MSP Partner Program

Organizations face a range of challenges in managing and protecting their data. Unstructured data is growing more than 50% annually, and users expect access to their files anytime and anywhere. At the same, ransomware continues to pose a major threat — attacks rose 92.7% in 2021 compared to 2020* — showing how important it is to increase data security protections on different levels. In addition, regulatory requirements such as GDPR and HIPAA, along with concerns about data sovereignty, are imposing new constraints on how data is stored and handled. Finally, IT budgets are under increasing pressure, and hiring needed IT expertise is often difficult in today’s labor market.

All of this creates opportunities for Managed Service Providers (MSPs) to help customers address these challenges, and Cloudian’s newly enhanced MSP partner program ensures our partners are well-positioned to capitalize on these opportunities.

Providing a Leading S3 Data Management Platform

Cloudian’s HyperStore object storage solution offers many benefits, including:

Feature-rich management tool set – Integrated tools such as billing and quality of service controls make it easy for MSPs to manage their business and service delivery.
Secure multi-tenancy – Keeping one customer’s data separated from others while leveraging a shared infrastructure is a must-have feature for any cloud storage service. HyperStore allows multiple users and applications to share the same infrastructure while ensuring no-compromise data protection. Role-Based Access Controls (RBAC) also give MSPs’ customers management access within their environments.
Customized SLAs – HyperStore manages data at the bucket level, allowing MSP partners to build different storage and protection policies based on data durability, efficiency, and cost.
Native S3 API compatibility – HyperStore is fully S3 compatible. That means it supports all S3-compatible applications and can provide seamless data management across on-premises and public cloud environments.
Military-grade security – Features include Object Lock-based data immutability for ransomware protection, b, encryption for data at rest and in flight, and RBAC/IAM access policies and authentication. Certifications include Common Criteria EAL-2, FIPS 140-2, SEC Rule 17a-4(f), FINRA Rule 4511 and CFTC Rule 1.31(c)-(d).
Limitless, modular scalability – Unlike many other object storage providers requiring users to over-provision, Cloudian lets you start small and grow without limitations. As more capacity is needed, you can non-disruptively add additional nodes or even new sites, and that capacity becomes part of the available storage pool.
Global data fabric, single namespace – With HyperStore, you can manage data as a single storage pool, no matter where the data resides. That means petabytes of capacity can be spread across regions and data centers and still be managed from a central location.
Cost-effective capacity – HyperStore offers cost savings of up to 70% compared to traditional disk-based storage systems, enabling users to add significantly more capacity for the same budget.

Foundation for High-Value Services

Cloudian’s MSP Partner Program enables MSPs to create innovative offerings for various high-value use cases, either stand-alone or in conjunction with Cloudian’s technology partners. These services include:

Storage-as-a-Service (STaaS) – Provide additional storage capacity on a subscription or consumption (VMware VCPP) basis to help customers address their growing volumes of data. Seamlessly integrate with VMware Cloud Director.
Backup-as-a-Service (BaaS) – Deploy HyperStore as the backup target for leading backup platforms such as Veeam, Commvault, Rubrik, and Veritas.
Ransomware Protection-as-a-Service (RPaaS) – Leverage HyperStore’s Object Lock-based data immutability to prevent cybercriminals from encrypting or deleting data, thereby enabling quicky recovery of the unchanged data in the event of a ransomware attack, without paying ransom.
Archive-as-a-Service (AaaS) – With HyperStore’s limitless scalability and data durability (up to 14 nines), provide an ideal long-term data repository.
Disaster Recovery-as-a-Service (DRaaS) – Help customers avoid the risks of the business and organizational disruptions resulting from disasters by keeping a copy of their data offsite.
Big Data-as-a-Service (BDaaS) – Leveraging Cloudian’s rich metadata tagging, facilitate the application of machine learning and analytics to large data sets, enabling new insights, discoveries, and operational efficiencies.
Containers-as-a-Service (CaaS) – With Cloudian’s fully native S3 compatibility, provide persistent, enterprise-grade storage for Kubernetes environments.
Office 365-Backup-as-a-Service (ObaaS) – Help your clients address the urgent need to protect their Office 365 data with HyperStore as a secure backup repository.

Besides HyperStore, MSP partners can also incorporate Cloudian’s HyperIQ, HyperFile, and HyperBalance solutions into their service offerings. In addition, MSP program partners get access to Cloudian training, technical support, and collateral.

MSP Partner Testimonials

Here is just a sampling of what partners say about our solutions:

“Cloudian and VMware are helping us compete with the largest public cloud providers on a level playing field.” – Peter Zafiris, Senior Infrastructure Engineer, AUCloud (Read more about AUCloud and Cloudian here.)

“Our Cloudian partnership has enabled us to pursue new market opportunities, deliver enhanced value to our customers and drive more profitable growth.” – Adam Svoboda, principal cybersecurity strategist, Eagle Technologies (Read more about Eagle Technologies and Cloudian here.)

To learn more about Cloudian MSP Partner program, visit https://cloudian.com/msp-partners/.

To sign up for the program, go to https://cloudian.com/reseller-partners/.

* https://www.securitymagazine.com/articles/97166-ransomware-attacks-nearly-doubled-in-2021

From Data Warehouse to Data Lakehouse: The Evolution of Data Analytics Platforms

As a data management company, Cloudian has always been interested in how organizations manage their data. A lot of attention has been paid to the WHY and HOW of data interactions, as well as WHERE data is stored. One particularly interesting combination of why, how and where is data analytics.

Henry Golas, Director of Technology, Cloudian
View LinkedIn Profile

From Data Warehouse to Data Lakehouse: The Evolution of Data Analytics Platforms

To date, object storage has not had a defining role in the data analytics space. Instead, organizations have mostly relied on traditional block and file storage solutions housing structured/semi-structured data. At best organizations might have placed a database backup onto an S3 object storage target, but object storage was rarely used as a primary data repository.

Today, with the business driver of building out successful data lakehouses, analytics platforms such as Greenplum, Vertica and SQL Server 2022 now support object storage data repositories via the S3 API. Many other platforms, such as Teradata, have the functionality coming soon. This means that as an S3 compatible object storage platform, Cloudian can be used to house a variety of data sets for a variety of analytics (and non-analytics) use cases!

WHY is this important?

A brief history of data warehousing and analytics will help explain.

Data warehouses have existed for decades and are great for performing specific queries on structured data, such as a company billing/invoicing system. In a data warehouse, data inputs are structured; the data isn’t growing exponentially; and many frameworks/workflows exist as part of business intelligence (BI) and reporting tools. The challenge here is that as organizations have developed and evolved, so has the relationship between the organization and its data.

In the mid-to-late 2000s, a need to collect, query and monetize a large amount of company data began to emerge. This new data was structured, semi-structured and unstructured and came from different data sources at blinding speeds. Organizations wanted to leverage data science or machine learning techniques to provide some desired output or piece of monetizable information, such as a formula that would predict the failure rate of a widget based on millions of data points. The term “data lake” was coined, and a data lake’s purpose was to store data in raw formats. The challenge here was that data lakes are good for storing data, not enforcing data quality or running transactions on top of them.

Coming back to the present, object storage and the standardization of the S3 API for communication have changed the game. From a storage perspective, object stores can store a variety of data sets, everything from structured to unstructured data. From an analytics platform/BI tool perspective, it is now possible to tap into the entire data set via the S3 API.

HOW does this all come together?

S3-based object storage enables the creation of a modern data lakehouse, where storage can be decoupled from compute, diverse analytic workloads can be supported and tools/platforms are able to access data directly with standard S3 API calls.

WHERE does this all happen?

This all happens on premises, where Cloudian underpins a data lakehouse by providing scalable, cost-effective storage which is accessible by the S3 API.

Use Cases

Check out Cloudian’s data lakehouse/data analytics-focused solution briefs at: Hybrid Cloud Storage for Data Analytics

Data Management Partners Unite to Provide Comprehensive Object Storage

We just announced our Data Management Partners program to help our customers solve more capacity management problems in less time. The program combines technology, testing, and support to make it easy to put object storage to work. Inaugural members of this program are Rubrik, Komprise, Evolphin, and CTERA Networks.

Here’s why this program is exciting: object storage has the potential to solve many capacity management problems in the data center. It’s 2/3 less costly and infinitely scalable. In a recent survey, Gartner found that capacity management was the #1 concern of Infrastructure and Operations managers, so these are important benefits.

The question is how to get started with object storage? You can piece together solutions on your own, but that can be risky. We’ve done the homework for you and proved out these solutions.

The Solution for Unstructured Data Consolidation

These solutions solve capacity-intensive challenges where Cloudian’s scalability and cost benefits deliver huge savings. Cloudian consolidates data into one big storage pool, so you can add as many nodes as you want. With one set of users, groups, permissions, file structures, etc, storage managers see still only see one thing to manage. This cuts management workloads by 90% and makes it possible to grow with less headache and cost.

Solution areas in this program include:

Data protection: Rubrik and Cloudian together unify and automate backup, instant recovery, replication, global indexed search, archival, compliance, and copy data management into a single scale-out fabric across the data center and public cloud.
Data lifecycle management: Komprise and Cloudian tackle one of the biggest challenges in the data center industry, unstructured data lifecycle management, with solutions that offload non-critical data that is typically 70%+ of the footprint from costly Tier-1 NAS to a limitless scalable storage pool.
Media active archiving: Evolphin and Cloudian help media professionals address capacity-intensive formats (e.g., 4k, 8k, VR/360) with the performance to handle time-pressed workflows.
File sync and share: CTERA Networks and Cloudian provide enterprises with tools for collaboration in capacity-rich environments.

Reducing Risk with Proven Partners

This program is 100% proven solutions. All are deployed, with customers, in live production data centers, right now. They solve real capacity management problems and do not create new problems along the way.

Object storage is seeing rapid adoption. It costs significantly less than traditional storage and fixes the capacity problem with infinite scalability. If you’re looking into object storage, make sure you’re getting a complete solution, though. Learn more about our Data Management Partners today.

Cloudian Blog

Cloudian Blog

How to Get Full GitHub Benefits On-Premises

How to Get Full GitHub Benefits On-Premises

GitHub Actions

GitHub Packages

Summary

Building and Protecting Data Lakehouse Projects with Cloudian and Vertica

Building and Protecting Data Lakehouse Projects with Cloudian and Vertica

Meeting Hybrid Cloud Demands: Microsoft AzureStack HCI and Cloudian HyperStore

Meeting Hybrid Cloud Demands: Microsoft AzureStack HCI and Cloudian HyperStore

Cloudian Enhances MSP Partner Program

Providing a Leading S3 Data Management Platform

Foundation for High-Value Services

MSP Partner Testimonials

From Data Warehouse to Data Lakehouse: The Evolution of Data Analytics Platforms

From Data Warehouse to Data Lakehouse: The Evolution of Data Analytics Platforms

Use Cases

Data Management Partners Unite to Provide Comprehensive Object Storage

The Solution for Unstructured Data Consolidation

Reducing Risk with Proven Partners

Categories

Get Started With Cloudian Today

Request a Demo

Download a Free Trial

Pricing