How-To: S3 Your Data Center

As the Storage Administrator or a Data Protection Specialist in your data center, you are likely looking for some alternative storage solution to help store all your big data growth needs. And with all that’s been reported by Amazon (stellar growth, strong quarterly earnings report), I am pretty sure their Simple Storage Service (S3) is on your radar. S3 is a secure, highly durable and highly scalable cloud storage solution that is also very robust. Here’s an API view of what you can do with S3:

S3 API view

As a user or developer, you can securely manage and access your bucket and your data, anytime and anywhere in the world where you have web access. As a storage administrator, you can easily manage and provision storage to any group and any user on always-on, highly scalable cloud storage. So if you are convinced that you want to explore S3 as a cloud storage solution, Cloudian HyperStore should be on your radar as well. I believe a solution that is easy to deploy and use helps accelerates the adoption of the technology. Here’s what you will need to deploy your own cloud storage solution:

  • Cloudian’s HyperStore Software – Free Community Edition
  • Recommended minimum hardware configuration
    • Intel-compatible hardware
    • Processor: 1 CPU, 8 cores, 2.4GHz
    • Memory: 32GB
    • Disk: 12 x 2TB HDD, 2 x 250GB HDD (12 drives for data, 2 drives for OS/Metadata)
    • RAID: RAID-1 recommended for the OS/Metadata, JBOD for the Data Drives
    • Network: 1x1GbE Port


You can install a single Cloudian HyperStore node for non-production purposes, but it is best practice to deploy a minimum 3-node HyperStore cluster so that you can use logical storage policies (replication and erasure coding) to ensure your S3 cloud storage is highly available in your production cluster. It is also recommended to use physical servers for production environments.

Here are the steps to set up a 3-node Cloudian HyperStore S3 Cluster:

  1. Use the Cloudian HyperStore Community Edition ISO for OS installation on all 3 nodes. This will install CentOS 6.7 on your new servers.
  2. Log on to your servers
    1. The default root password is password (Update your root access for production environments)
  3. Under /root, there are 2 Cloudian directories:
    1. CloudianTools
      1. configure_appliance.sh allows you to perform the following tasks:
        1. Change the default root password
        2. Change time zone
        3. Configure network
        4. Format and mount available disks for Cloudian S3 data storage
          1. Available disks that were automatically formatted and mounted during the ISO install for S3 storage will look similar to the following /cloudian1 mount:
            Format and mount available disks for Cloudian S3 data storage
    2. CloudianPackages
      1. Run ./CloudianHyperStore-6.0.1.2.bin cloudian_xxxxxxxxxxxx.lic to extract the package content from one of your nodes. This will be the Puppet master node.
        S3 Puppet master mode
      2. Copy sample-survey.csv survey.csv
        sample-survey.csv
      3. Edit the survey.csv file
        Edit survey.csv
        In the survey.csv file, specify the region, the node name(s), IP address(s), DC, and RAC of your Cloudian HyperStore S3 Cluster.

        NOTE: You can specify an additional NIC on your x86 servers for internal cluster communication.

      4. Run ./cloudianInstall.sh and select “Install Cloudian HyperStore”. When prompted, input the survey.csv file name. Continue with the setup.
        NOTE: If deploying in a non-production environment, it is possible that your servers (virtual/physical) may not have the minimum resources or a DNS server. You can run your install with ./cloudianInstall.sh dnsmasq force. Cloudian HyperStore includes an open source domain resolution utility to resolve all HyperStore service endpoints.
      5. v. In the following screenshot, the information that we had provided in the survey.csv file is used in the Cloudian HyperStore cluster configuration. In this non-production setup, I am also using a DNS server for domain name resolution with my virtual environment.Cloudian HyperStore cluster configuration
      6. Your Cloudian HyperStore S3 Cloud Storage is now up and running.
        Cloudian HyperStore S3 cloud storage
      7. Access your Cloudian Management Console. The default System Admin group user ID is admin and the default password is public.
        Cloudian Management Console
      8. Complete the Storage Policies, Group, and SMTP settings.
        Cloudian HyperStore - near final

Congratulations! You have successfully deployed a 3-node Cloudian HyperStore S3 Cluster.

S3 API & Extensions for Enterprise Object Storage

Amazon’s S3 API is the de-facto standard for object storage APIs. Having multiple service providers, software providers, and applications standardize on S3 has made it easier to interchange between them and rapidly stand up new uses for object storage. But there are different grades of S3 compatibility. Some software and solutions provide only the basic CRUD (create, remove, update, delete) functions. At the other end is Cloudian’s Hyperstore, committed to providing the highest fidelity S3 compatibility backed by a guarantee.

The S3 API is an HTTP/S REST API where all operations are via HTTP PUT, POST, GET, DELETE, and HEAD requests. Each object is stored in a bucket. Beyond the basic object CRUD operations provided by S3, there are many advanced APIs like versioning, multi-part upload, access control list, and location constraint. There are multiple options for encryption including (1) server-side encryption where the server manages encyrption keys, (2) server-side encyption with customer keys, and (3) client-side encryption where the data is encrypted/decrypted at the client side. Though no single S3 user is likely to use all of the advanced APIs, the union of APIs used by different users quickly covers them all. The table below highlights some advanced object storage APIs supported by S3:

S3 Feature Azure Google Cloud OpenStack Swift
Object versioning No Yes Yes
Object ACL No Yes No
Bucket Lifecycle Expiry No Yes Yes
Multi-object delete No Yes Yes
Server-side encryption No Yes Yes
Server-side encryption with customer keys No No No
Cross-region replication Yes No Yes
Website No No No
Bucket logging No No No
POST object No No No

Table 1 – Comparison of some S3 advanced object storage APIs[1]

S3 API compatibility is a prerequisite, but not sufficient to provide object storage for enterprises. There are 4 additional areas that Cloudian has added to make S3 object storage enterprise-ready.

 

  1. Software or Appliance, not a service.The software-only package includes a Puppet-based installer with a wizard-style interface. It runs on commodity software (CentOS/RedHat) and commodity hardware. The appliances come in a few fixed models ranging from 1U (24TB) to the FL3000 series of PB-scale in 8U form.
  1. APIs for all functions
    • Configuration
    • Multi-Tenancy: User/Tenant provisioning
    • Quality of Service (QoS)
    • Reporting
    • S3 Extensions: Compression, Metadata APIs, Per-bucket Protection Policies.

    Highlighting the per-bucket protection policies feature, each bucket can have its own protection policy. For example, a“UK3US2” policy can be defined as UK DC with 3 replicas and US DC with 2 replicas. Another example is a “ECk6m2” policy as DC1 with Erasure Coding with 6 data and 2 coding fragments. As buckets are created they can be assigned a policy.

Bucket
Figure 1 – Per-bucket protection policies example

  1. O&M tools to install, monitor, and manage.In addition to the installer, a single pane web-based Cloudian Management Console (CMC) does system administration from the perspective of the system operator, a tenant/group administrator, and a regular user. It’s used to provision groups and users, view reports, manage the cluster, and monitor the cluster.

Cloudian Management Console

Figure 2 – CMC dashboard

  1. Integration with Other Products
    • NFS/CIFS file interface
    • OpenStack, CloudPlatform
    • Tiering to any S3 system (public or private).
    • Active Directory, LDAP

The opportunity and use case for enterprises and object storage has never been more compelling. Amazon S3 API compatibility ensures full portability of already working applications. Using Cloudian’s HyperStore platform instead of AWS, enterprise data can be brought on-premise for better data security and manageability at lower cost. For STaaS providers, S3 API compatibility, backed by a full guarantee, provides the same benefits of a fully controlled storage platform, and opens up a large range of compatible applications. Beyond the S3 API, Cloudian is committed to providing all operations by API and has added APIs to make the platform enterprise-ready, including multi-tenancy.

If you would like a technical overview, you can check out this webinar I recently presented, “S3 Technical Deep Dive” and make sure to check out more information on our S3 Guarantee…we’ll run all your S3 Apps anytime and anywhere – Guaranteed!

– Gary


[1] References:
http://docs.openstack.org/developer/swift/#object-storage-v1-rest-api-documentation
https://cloud.google.com/storage/docs/xml-api-overview
https://msdn.microsoft.com/en-us/library/azure/dd135733.aspx