• Languages
    • 日本語
    • Deutsch
    • Français
  • Blog
    • Object Storage
    • Hybrid and Private Cloud
    • Data Backup and Archive
    • S3 Storage
    • Data Protection
    • Business Continuity
  • Partners
  • Events
  • Support
Cloudian - Logo
  • Products
    • CLOUDIAN PRODUCTS
      HyperStore Object Storage HyperFile NAS Storage HyperIQ Observability and Analytics HyperStore Flash Product Specifications Configuration OptionsTCO CALCULATOR
    • The Object Storage Buyer's Guide

      How hyperscale object storage can help you reap the maximum ROI from your storage investment

      READ NOW
      Free Trial Schedule A Demo
  • Solutions
    • SOLUTIONS
      Data Protection Ransomware Protection with Veeam Ransomware Kubernetes Data Lifecycle Management File ServicesBLOG
    •  
      File Sync and Share Big Data Storage Public Cloud Storage Office 365 Backup with Veeam Private Cloud Storage Security
    • INDUSTRIES
      Media & Entertainment Healthcare Financial Services Life Sciences Cloud Service Provider Public Sector Education
    • Ransomware Protection Buyer’s Guide

      What you can do to protect your organization.

      DOWNLOAD GUIDE
      Free Trial Schedule A Demo
  • Alliances
    • TECHNOLOGY ALLIANCES
      Cisco Commvault Evertz Milestone NutanixVIEW ALL
    •  
      Pure Storage Rubrik Splunk Veeam Veritas VMware
    • Alliances Ecosystem

      The Cloudian partner ecosystem delivers proven solutions for enterprises biggest capacity challenges, such as data protection, file management, and media archiving.

      LEARN MORE
      RESELLERS & DISTRIBUTORS PARTNER PORTAL
  • Resources
    • RESOURCES
      Datasheets Case Studies Whitepapers Solution Briefs Reports Demos & Videos On-Demand Webinars TCO CalculatorVIEW ALL
    • STORAGE GUIDES
      Data Protection Data Backup & Archive Hybrid IT Disaster Recovery VMware Storage Health Data Management Splunk Architecture Ransomware Data Recovery
    • Forrester Report:

      Four Technologies Combine to Protect You From Ransomware Attacks

      GET THE REPORT
      Free Trial Schedule A Demo
  • Company
    • COMPANY
      About Us Customers Leadership Team Awards In the News Press Releases Training and Education Careers
    • Contact Cloudian

      Find out more about object storage or locate a sales rep or channel partner in your area.

      CONTACT US
      IN THE NEWS BLOG
Free Trial

Enhancing Object Storage Analytics: Adding Metadata Labels to S3 Images with TensorFlow

Posted by Gary Ogasawara on April 17, 2020

Gary Ogasawara
CTO, Cloudian

Object storage is known for its scalability and easy-to-use S3 APIs, but to make that object data useful for analytics, metadata about the objects sometimes needs to be added.  This article describes a case study of adding and then using metadata of S3 objects with Cloudian’s HyperStore Analytics Platform (HAP).  Starting with images stored in HyperStore object storage, we use a TensorFlow machine learning model to identify what’s depicted in the image, then attach those labels to each image as S3 metadata, and finally automatically index and search the object metadata using ElasticSearch and Kibana.

tensorflow pod diagram

INPUT
Unlabeled images stored in HyperStore S3 bucket.

OUTPUT
Images with metadata of labels of what’s in the image stored back in HyperStore and ElasticSearch.

METHOD
Use a TensorFlow deep learning model to determine labels of what’s in the image.
Use HyperStore’s ElasticSearch plugin to make metadata searchable and visualizable.

 

In an S3 bucket named “images,” we upload about 300 images of common items, including animals, vehicles, and household goods.  Using an object store for a collection of images, it’s very convenient to store a large amount of data easily and economically.

images bucket

HyperStore Analytics Platform (HAP) is a software package composed of Apache Spark, TensorFlow, and optional applications like this image recognition system.  HAP is managed by Kubernetes, and its Pods are typically deployed on the same hardware nodes as HyperStore.  By locating the analytics/computation processing close to the data, HAP with HyperStore takes advantage of the data locality and an edge-hub topology for efficient and timely processing.  It’s fast, with processing as close as possible to where the data is generated; cheap, with minimal network transfer costs for an upload and subsequent downloads, and secure because the data can be kept private and protected.

The image recognition process reads each object from the S3 bucket and calculates the image classifications by applying the TensorFlow model.  The S3 list-objects API is used to iterate over each object in a bucket.  For each object, checks are first done, including confirming the Content-Type is an image and the size is not above a threshold.  The image is then scaled to a fixed size, and the model is executed based on the TensorFlow’s LabelImage class.  The TensorFlow model used for image recognition is the pre-trained Inception 5h that recognizes 1,000 classes of images from ImageNet.

Below are examples of input images and the resulting classification outputs as a label and associated probability after the image recognition process runs.

red fox
2925[main] INFO com.cloudian.hap.LabelImage images/fox.jpg:
red fox (58.52% likely)
kit fox (39.54% likely)
coyote (0.73% likely)
grey fox (0.71% likely)
red wolf (0.25% likely)

image meta data

5095 [main] INFO com.cloudian.hap.LabelImage images/iphone.jpeg:
cellular telephone (41.24% likely)
hand-held computer (40.34% likely)
pay-phone (7.52% likely)
iPod (3.88% likely)
remote control (1.54% likely)

Some configurations to control the classifier:

image classifier

The image labels and their associated probabilities are added to the object using S3 user-defined metadata where the key is the prefix “imgtag_” plus the label (e.g., “red fox”) and the value is the associated probability (e.g., “0.59”).  The label is URL-encoded to ASCII to conform to the metadata key requirements, notably the <SPACE> character is converted to ‘+’.  To update an existing object’s user-defined metadata, the S3 Copy Object API is used with the x-amz-metadata-directive: REPLACE header.  The object and its metadata are now stored in HyperStore S3.  This example with a S3 GET command on bucket “images” and object “fox.jpg” shows the user-defined metadata output:

metadata output
metadata summary

HyperStore has the capability of indexing object metadata in ElasticSearch.  Once in ElasticSearch, Kibana can be used for data exploration.

elasticsearch results

Here’s an example query to find all images where the label “kit fox” has probability greater than 0.4.  The Kibana query is bucketname:images AND userMetadata.imgtab_kit+fox>0.4 that returns 2 objects:

kibana 1

If you don’t care what type of “fox” it is, you can use wildcards in the Kibana query bucketname:images AND userMetadata.imgtag\*fox\*:* that returns 13 objects:

kibana 2

S3 object stores like HyperStore have enabled storing PBs of data, and focus can turn to how to make that data usable for analytics. HAP provides a convenient way to move the compute to the data and, as in this use case, to add metadata to the object data.  In the same spirit, we are developing more use cases to enhance object storage analytics, including processing streaming data and other machine learning tasks.

Categories

  • Business Continuity
  • Cloud Service Providers
  • Data Backup and Archive
  • Data Protection
  • Hybrid and Private Cloud
  • Object Storage
  • S3 Storage

Stay Connected

Like Cloudian on Facebook Follow Cloudian on Twitter Connect with Cloudian on LinkedIn Cloudian on YouTube Cloudian on SlideShare

Products

  • HyperStore Object Storage
  • HyperFile NAS Storage
  • HyperIQ

Solutions

  • Private Cloud Storage
  • Data Protection
  • Media & Entertainment
  • Kubernetes
  • File Storage

Resources

  • Demos & Videos
  • Case Studies
  • Solution Briefs
  • Whitepapers

News & Info

  • Press Releases
  • In The News
  • Events

Company

  • Contact Us
  • Careers
  • Leadership Team
  • Support Center
  • Legal

Languages

  • 日本語
  • Deutsch
  • Français
©2021 All Right Reserved.  Privacy Policy
  • Products
    • HyperStore Object Storage
    • HyperFile NAS Storage
    • HyperIQ Observability and Analytics
    • HyperStore Flash
    • Product Specifications
    • Configuration Options
  • Solutions
    • Data Protection
    • Ransomware Protection with Veeam
    • Ransomware
    • Kubernetes
    • Data Lifecycle Management
    • File Services
    • File Sync and Share
    • Big Data Storage
    • Public Cloud Storage
    • Office 365 Backup with Veeam
    • Private Cloud Storage
    • Security
    • Media & Entertainment
    • Healthcare
    • Financial Services
    • Life Sciences
    • Cloud Service Provider
    • Public Sector
    • Education
  • Alliances
    • Cisco
    • Commvault
    • Evertz
    • Milestone
    • Nutanix
    • Pure Storage
    • Rubrik
    • Splunk
    • Veeam
    • Veritas
    • VMware
  • Resources
    • Datasheets
    • Case Studies
    • Whitepapers
    • Solution Briefs
    • Reports
    • Demos & Videos
    • On-Demand Webinars
    • TCO Calculator
    • Data Protection
    • Data Backup & Archive
    • Hybrid IT
    • Disaster Recovery
    • VMware Storage
    • Health Data Management
    • Splunk Architecture
    • Ransomware Data Recovery
  • Company
    • About Us
    • Customers
    • Leadership Team
    • Awards
    • In the News
    • Press Releases
    • Training and Education
    • Careers
Free Trial 
Contact Us 
Please note that on our website we use cookies necessary for the functioning of our website, cookies that optimize the performance. To learn more about our cookies, how we use them, and their benefits, please read our Cookie Policy
I Understand
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.