How to Accelerate Genomics Data Analysis Pipelines by 10X
High performance and scalable storage for AI, ML and Advanced Analytics
Organizations are consuming and creating more data than ever before and many are applying AI/ML on these large data sets, to make better decisions in near/real-time and unlock new revenue streams. These advanced analytic workloads create and use massive data sets that pose new challenges for data storage. Traditional storage systems simply can’t handle the processing needs or the scalability required for iterative analytics workloads and introduce bottlenecks to productivity and data-driven decision making. The need to extract timely insights from data, while managing the rapid growth of this strategic asset, are the biggest challenges for enterprises today.
WekaFS and Cloudian HyperStoreTM provide an integrated storage solution allowing you to overcome the challenges associated with accelerating and scaling your data pipeline, while lowering over all storage costs associated with data analytics. WekaFS is a distributed, scale-out and POSIX compliant file system, built on a modern architecture using NVMe Flash and supports Ethernet or Infiniband transport, with low latency, multi-protocol (POSIX, NFS, SMB, S3, GPUDirect Storage, CSI) high performance access. WekaFS distributes metadata throughout the cluster via patented mechanisms that prevent hot spots to maximize performance levels. Performance is predictable, consistent and scales linearly as more hosts are added to the storage cluster. Cloudian’s HyperStore compliments WekaFS and is integrated through Weka’s tiering function, adding a cost-effective, exabyte-scale, software-defined object storage to the solution. Cloudian offers modular growth, letting you expand from terabytes to an exabyte without disruption. Embedded data redundancy features provide up to 14 nines of data durability, removing the need for a separate data backup process.
Together, WekaFS and Cloudian Hyperstore unify and simplify the data pipeline for performance-intensive workloads and accelerated DataOps. All at 1/3rd the TCO of traditional storage systems.
1Weka-Cloudian integrated solution for high performance storage use cases
WekaFS is an ultra-low-latency, high-throughput solution, that is purpose built for environments running concurrent workloads. It is architected to eliminate compute cluster bottlenecks, reducing valuable processing times making it ideal for iterative workloads like AI/machine learning. As the world’s fastest shared parallel file system, WekaFS is 3x faster than local file systems and 10x faster than traditional NAS.
Cloudian HyperStore brings the flexibility and elasticity of the cloud within your data center. Deployments can start small and grow as needed. Organizations can deploy HyperStore in a small 3-node configuration and scale out to thousands of nodes as needed. These nodes can be physical or virtual, running on industry-standard x86 hardware. In addition, unlike some systems that require all nodes to be identical, HyperStore lets you add heterogeneous nodes of any size, providing scalability across multiple data centers or facilities anywhere in the world.
The solution provides extensive security features to deploy and operate a secure storage solution that is FIPS, CFTC 4511, SEC 17 a-4, Common Criteria compliant and certified at the capacity tier. Security features include:
- Data encryption and transparent key management
- AES-256 server-side encryption for data stored at rest
- SSL encryption for data in transit (HTTPS)
- Role-based access controls with specified levels of access
- Fine-grained storage policies and Audit trail logging
- WORM (Write Once Read Multiple) for storage of immutable data
- Extremely short RTO providing near-instant recovery of files due to Ransomware
- Flexible RPO scheduling to meet any file protection requirement
The solution provides high data durability with the option to protect and distribute data using replication or erasure coding. Administrators can configure the number of replicas or type of erasure code scheme required to meet SLA and cost objectives. Storage policies also provide fine grain control of data placement across data centers, taking into consideration factors such as cost efficiency, security levels,
The solution allows multiple users on shared infrastructure without compromising security. Granular access control and audit logging capabilities control and logically separate
data access. Users can securely access data from the same nodes without impacting operations. Administrators can also control quality of service (QOS) by limiting usage rates and setting quotas on a per-group, per-user basis.
Running on standard x86 hardware with local NVMe SSD’s, the solution drives down the cost of storage for analytics workloads by 1/3rd as compared to traditional storage.
FEATURES & BENEFITS
- Proven, plug-and-play, exabyte-scalable Rubrik backup target
- Costs under $.01 per GB per month, including support
- Policy-based tiering to AWS, Google, Microsoft
- Most S3-compatible on-prem object storage
- Fast, local access to data
- High performance file and object storage
- Machine Learning, AI, Advanced Analytics and Big Data
- Life Sciences Research, Genomics
- Financial Services, High Frequency Trading