pytorch workflowMachine learning workflows can require significant storage capacity, particularly when imaging data is in use.

Now PyTorch users have an easy way to deploy limitless storage capacity on-prem. The Cloudian contribution to the PyTorch Amazon S3 Connector repository allows PyTorch users to connect to Cloudian HyperStore S3-compatible storage, providing local capacity that is secure and exabyte-scalable.

By enabling direct access to a cost-effective, scalable data repository, Cloudian is simplifying the ML process, reducing both complexity and costs associated with data analysis.

Here are the steps to connect your Cloudian HyperStore object storage system to your PyTorch projects.

Getting Started

Prerequisites

  • Python 3.8 or greater is installed (Note: Using 3.12+ is not recommended as PyTorch does not support).
  • PyTorch 2.0 or greater

Installation

  • # pip install s3torchconnector

Configuration

To use s3torchconnector, AWS credentials must be provided through one of the following methods:

  • Install and configure awscli and run # aws configure.
  • Set credentials in the AWS credentials profile file on the local system, located at: `~/.aws/credentials` on Unix or macOS.
  • Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables.

Example with Cloudian Endpoint

The easiest method to utilize the S3 Connector for PyTorch involves creating a dataset, which can be either map-style or iterable-style. This is achieved by defining an S3 URI (comprising a bucket and, optionally, a prefix) along with specifying the region where the bucket resides and the custom S3 endpoint url:

pytorch storage

In addition to data loading primitives, the S3 Connector for PyTorch also provides an interface for saving and loading model checkpoints directly to and from an S3 bucket.

pytorch storage

Conclusion

In conclusion, integrating PyTorch with on-premises S3 storage powered by Cloudian presents a powerful solution for organizations seeking efficient and scalable deep learning workflows. By leveraging PyTorch’s robust framework alongside Cloudian’s reliable storage infrastructure, users can seamlessly train their models while securely storing and accessing data within their own premises.

This setup not only ensures data privacy and compliance but also optimizes performance and reduces latency by keeping data close to compute resources. As deep learning continues to drive innovation across industries, the combination of PyTorch and Cloudian’s S3 storage offers a compelling platform for organizations to unlock the full potential of their data and accelerate their AI initiatives.

The enhanced S3 connector is available from the GitHub repositories of AWS Labs and Cloudian.

View a demonstration of this installation process here:

Learn more at cloudian.com

Or, sign up for a free trial