日本語DeutschFrançais
Blog Partners Events Press Support
Pricing
Products›
← Back

Cloudian Products  

HyperStore Object Storage
HyperFile NAS Storage
HyperIQ Observability & Analytics
HyperCare Managed Service
HyperBalance Load Balancer
Product Specifications

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

HyperIQ Observability & Analytics

Watch 2-min Intro

Evaluator Group Webinar

Skills Shortage? Ease the Storage Management Burden.
Watch On-Demand

Scaling Object Storage with Adaptive Data Management

Get White Paper

The Object Storage
Buyer’s Guide

Technical/financial benefits; how to evaluate for your environment.

Get Guide

Solutions›
← Back

Solutions  

Data Protection
Hybrid Cloud
Data Lakehouse
Ransomware Protection
Kubernetes
Data Storage Security

 

Sovereign Private Cloud
Data Lifecycle Management
File Services
Office 365 Backup
Cloudian Consumption Model

Industries  

Federal Government
State & Local Government
Financial Services
Telecommunications
Manufacturing
Media & Entertainment
Education
Healthcare
Life Sciences
Cloud Service Provider

2021 Enterprise Ransomware Victims Report

Don’t Be a Victim

Scalable S3-Compatible Storage, On-Prem with AWS Outposts

Learn More

Trending Topic: On-Prem S3 for Data Analytics

Watch Webinar

Ransomware 2021: A Conversation with Veeam CISO Gil Vega

Hear His Thoughts

How a Private Cloud Addresses the Kubernetes Storage Challenge

Free White Paper

Data Security & Compliance: 3 ?s Every CIO Should Ask

Ask the Right ??s

5 Things Every MSP Should Know About Sovereign Cloud

Get Free eBook

TCO Report: NAS File Tiering

Learn how object storage can dramatically reduce Tier 1 storage costs

Get TCO Analysis

Satellite Application Catapult Deploys Cloudian for Scalable Storage

Replaces conventional NAS, saves 75%

Read Their Story

On-Demand Webinar

Veeam & Cloudian: Office 365 Backup – It’s Essential

Watch Now

Blog: How to Grow Your Storage and Not Your CAPEX Spend

Pay as you grow, starting at 1.3 cents/GB/month

Read the Blog

Why the FBI Can’t Stop Cybercrime and How You Can

Register Now

8 Reasons to Choose Cloudian for State & Local Government Data

Get 8 Reasons

Cloudian HyperStore SEC17a-4 Cohasset Assessment Report

Read the Assessment

Hybrid Cloud for Telecom

Learn More

Hybrid Cloud for Manufacturers

Learn More

Tape: Does It Measure Up?

Get Free eBook

Customer Testimonial: University of Leicester

Hear from Mark

Public Health England: Resilient IT Infrastructure for an Uncertain Time

Watch On-Demand

How to Accelerate Genomics Data Analysis Pipelines by 10X

Hear from Weka

How MSPs Can Build Profitable Revenue Streams with Storage Services

Get IDC’s Take

Alliances›
← Back

Technology Partners  

AWS
Commvault
Cribl
Greenplum
HPE
Kasten by Veeam
Lenovo
Microsoft
Red Hat
RNT Rausch

 

Rubrik
Snowflake
Splunk
Teradata
Veeam
Veritas
Vertica
VMware
Weka
View All >

Get Scalable Storage On-Prem for AWS Outposts

Hear from AWS

Lock Ransomware Out with Commvault & Cloudian

Watch Now

Cribl Stream with Cloudian HyperStore S3 Data Lake

Learn More

Why Object Storage is Best for Advanced Analytics Apps in Greenplum

Explore Solution

Customer Video: NTT Communications

Hear from NTT

How to Store Kasten Backups to Cloudian

Watch Demo

Klik.Solutions Delivers World-Class Backup-as-a-Service with Lenovo & Cloudian

Why They Chose Us

Modernize SQL Server with S3 Data Lake

Find Out How

How to Run Cloudian on OpenShift as a Container

Watch Demo

Immutable Object Storage for European SMBs from RNT Rausch and Cloudian

Learn More

Backup/Archive to Cloudian with Rubrik NAS Cloud Direct

Explore Solution

On-Premises Object Storage for Snowflake Analytics Workloads

Get the Details

Splunk, ClearShark, and Cloudian discuss Federal Industry Storage Trends

Watch Now

Teradata & Cloudian: Modern Data Analytics for Hybrid and Multi-Cloud

Find Out How

1-Step to Data Protection: All You Need to Know About Veeam v12 + Cloudian

Step up to Cloudian

Modernize Your Enterprise Archive Storage with Cloudian and Veritas

Read About It

Unified Analytics Data Lake Platform with Vertica and Cloudian HyperStore

Find Out How

VMware Cloud Providers: Get started in cloud storage, free.

Get Started

Weka + Cloudian: High-Performance, Exabyte-Scalable Storage for AI/ML

Read About It

Customers›
← Back

Customers  

Financial Services
Government
Healthcare
Higher Education

 

Manufacturing
Media & Entertainment
Retail
Service Providers
Video Surveillance / Digital Evidence

Cloudian Enables Leading Swiss Financial Institution to Retain and Analyze More Big Data

Read Case Study

Indonesian Financial Services Company Replaces NAS With Cloudian

Read Case Study

State of California Selects Storage-as-a-Service Offering Powered by Cloudian

Learn Why

Cloudian Provides Utah State Agencies with Rubrik-Compatible Backup Target, Cuts Costs by 75 Percent

Read Case Study

Public Health England: Resilient IT Infrastructure for an Uncertain Time

Watch On-Demand

Australian Genomic Sequencing Leader Accelerates Research with Cloudian

Learn more

Swiss Education Non-Profit Achieves Scale and Flexibility of Public Cloud On-Prem with Cloudian

Get the Details

Indonesia Ministry of Education Deploys Cloudian Object Storage to Keep Up with Data Growth

Read Case Study

Leading German Paper Company Meets Growing Data Backup Needs with Cloudian

Read Case Study

Vox Media Automates Archive Process to Accelerate Workflow by 10X

Learn More

WGBH Boston Builds a Hybrid Cloud Active Archive With Cloudian HyperStore

Read Case Study

Large German Retailer Consolidates Primary and Secondary Storage to Cloudian

Read Case Study

How a Sovereign Cloud Provider Succeeds in Cloud Storage Services

View On-Demand

IT Service Provider Drives Business Growth with Cloudian-based Offering

Read Case Study

Calcasieu Parish Sheriff Deploys Hybrid Cloud for Digital Evidence Data

Read How

Montebello Bus Lines Mobile Video Surveillance with Cloudian Object Storage

Read Case Study

Resources›
← Back

Resources  

Case Studies
Datasheets
Demos & Videos
On-Demand Webinars
Reports
Solution Briefs
TCO Calculator
Whitepapers

Storage Guides  

Data Backup & Archive
Data Lake
Data Protection
Data Security
Disaster Recovery
Health Data Management
Hybrid IT
Kubernetes Storage
Ransomware Data Recovery
Splunk Architecture
VMware Storage
View All >

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Ransomware Protection Buyer’s Guide

Get Free Guide

Company›
← Back

Company  

About Us
Careers
Leadership Team
Press Releases

 

Customers
In the News
Training & Education
Awards

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Cloudian Named a Gartner Peer Insights Customers’ Choice for Distributed File Systems and Object Storage

Read Reviews

Blog Partners Events Press Support
日本語DeutschFrançais
Pricing

Enhancing Object Storage Analytics: Adding Metadata Labels to S3 Images with TensorFlow

Enhancing Object Storage Analytics: Adding Metadata Labels to S3 Images with TensorFlow

Posted by Gary Ogasawara on April 17, 2020

Gary Ogasawara
CTO, Cloudian

Object storage is known for its scalability and easy-to-use S3 APIs, but to make that object data useful for analytics, metadata about the objects sometimes needs to be added.  This article describes a case study of adding and then using metadata of S3 objects with Cloudian’s HyperStore Analytics Platform (HAP).  Starting with images stored in HyperStore object storage, we use a TensorFlow machine learning model to identify what’s depicted in the image, then attach those labels to each image as S3 metadata, and finally automatically index and search the object metadata using ElasticSearch and Kibana.

tensorflow pod diagram

INPUT
Unlabeled images stored in HyperStore S3 bucket.

OUTPUT
Images with metadata of labels of what’s in the image stored back in HyperStore and ElasticSearch.

METHOD
Use a TensorFlow deep learning model to determine labels of what’s in the image.
Use HyperStore’s ElasticSearch plugin to make metadata searchable and visualizable.

In an S3 bucket named “images,” we upload about 300 images of common items, including animals, vehicles, and household goods.  Using an object store for a collection of images, it’s very convenient to store a large amount of data easily and economically.

images bucket

HyperStore Analytics Platform (HAP) is a software package composed of Apache Spark, TensorFlow, and optional applications like this image recognition system.  HAP is managed by Kubernetes, and its Pods are typically deployed on the same hardware nodes as HyperStore.  By locating the analytics/computation processing close to the data, HAP with HyperStore takes advantage of the data locality and an edge-hub topology for efficient and timely processing.  It’s fast, with processing as close as possible to where the data is generated; cheap, with minimal network transfer costs for an upload and subsequent downloads, and secure because the data can be kept private and protected.

The image recognition process reads each object from the S3 bucket and calculates the image classifications by applying the TensorFlow model.  The S3 list-objects API is used to iterate over each object in a bucket.  For each object, checks are first done, including confirming the Content-Type is an image and the size is not above a threshold.  The image is then scaled to a fixed size, and the model is executed based on the TensorFlow’s LabelImage class.  The TensorFlow model used for image recognition is the pre-trained Inception 5h that recognizes 1,000 classes of images from ImageNet.

Below are examples of input images and the resulting classification outputs as a label and associated probability after the image recognition process runs.

red fox
2925[main] INFO com.cloudian.hap.LabelImage images/fox.jpg:
red fox (58.52% likely)
kit fox (39.54% likely)
coyote (0.73% likely)
grey fox (0.71% likely)
red wolf (0.25% likely)

image meta data

5095 [main] INFO com.cloudian.hap.LabelImage images/iphone.jpeg:
cellular telephone (41.24% likely)
hand-held computer (40.34% likely)
pay-phone (7.52% likely)
iPod (3.88% likely)
remote control (1.54% likely)

Some configurations to control the classifier:

image classifier

The image labels and their associated probabilities are added to the object using S3 user-defined metadata where the key is the prefix “imgtag_” plus the label (e.g., “red fox”) and the value is the associated probability (e.g., “0.59”).  The label is URL-encoded to ASCII to conform to the metadata key requirements, notably the <SPACE> character is converted to ‘+’.  To update an existing object’s user-defined metadata, the S3 Copy Object API is used with the x-amz-metadata-directive: REPLACE header.  The object and its metadata are now stored in HyperStore S3.  This example with a S3 GET command on bucket “images” and object “fox.jpg” shows the user-defined metadata output:

metadata output
metadata summary

HyperStore has the capability of indexing object metadata in ElasticSearch.  Once in ElasticSearch, Kibana can be used for data exploration.

elasticsearch results

Here’s an example query to find all images where the label “kit fox” has probability greater than 0.4.  The Kibana query is bucketname:images AND userMetadata.imgtab_kit+fox>0.4 that returns 2 objects:

kibana 1

If you don’t care what type of “fox” it is, you can use wildcards in the Kibana query bucketname:images AND userMetadata.imgtag\*fox\*:* that returns 13 objects:

kibana 2

S3 object stores like HyperStore have enabled storing PBs of data, and focus can turn to how to make that data usable for analytics. HAP provides a convenient way to move the compute to the data and, as in this use case, to add metadata to the object data.  In the same spirit, we are developing more use cases to enhance object storage analytics, including processing streaming data and other machine learning tasks.

Categories

  • Business Continuity
  • Cloud Service Providers
  • Data Analytics
  • Data Backup and Archive
  • Data Protection
  • Hybrid and Private Cloud
  • Object Storage
  • Performance
  • Ransomware
  • S3 Storage
  • Security
  • Sovereign Cloud

Get Started With Cloudian Today

Request a Demo

Join a 30 minute demo with a Cloudian expert.

Sign Up

Download a Free Trial

Try Cloudian in your shop. Run on any VM, even your laptop.

Try Now

Pricing

Receive a Cloudian quote and see how much you can save.

Pricing
Get Quote

Products

HyperStore Object Storage

HyperFile NAS Storage

HyperIQ Observability & Analytics

HyperCare Managed Service

HyperBalance Load Balancer

Product Specifications

Solutions

Data Protection

Hybrid Cloud

Data Lakehouse

Ransomware Protection

Kubernetes

Data Storage Security

Sovereign Private Cloud

Data Lifecycle Management

File Services

Office 365 Backup

Cloudian Consumption Model

Industries

Federal Government

State & Local Government

Financial Services

Telecommunications

Manufacturing

Media & Entertainment

Education

Healthcare

Life Sciences

Cloud Service Provider

Technology Partners

AWS

Commvault

Cribl

Greenplum

HPE

Kasten by Veeam

Lenovo

Microsoft

Red Hat

RNT Rausch

Rubrik

Snowflake

Splunk

Teradata

Veeam

Veritas

Vertica

VMware

Weka

Customers

Financial Services

Government

Healthcare

Higher Education

Manufacturing

Media & Entertainment

Retail

Service Providers

Video Surveillance / Digital Evidence

Resources

Case Studies

Datasheets

Demos & Videos

Webinars

Reports

Solution Briefs

TCO Calculator

Whitepapers

Storage Guides

Data Backup & Archive

Data Lake

Data Protection

Disaster Recovery

Health Data Management

Hybrid IT

Ransomware Data Recovery

Splunk Architecture

VMware Storage

Company

About Us

Careers

Leadership Team

Press Releases

Customers

In the News

Training & Education

Awards

Blog

Partners

Events

Support
©2023 All Right Reserved. Privacy Policy
Pricing 
Contact Us