Many companies will likely run into a situation where data must be processed and generated into a report. Sources of the data may be from one to many small appliances, logs, or files exported by an application. Hadoop is a very powerful tool used by enthusiasts to the enterprise to perform Big Data analytics. When used with Cloudian HyperStore, uploading and processing the data will be more efficient and faster. For example, Hadoop has a built-in connector called S3N that communicates directly with Cloudian HyperStore. Map Reduce jobs have the ability to upload or download data directly from a HyperStore bucket bypassing HDFS. In a typical data analytics scenario, data needs to be copied from the source to the Hadoop server and then finally to HDFS. By using the S3 API to copy the data from the source machine or appliance, the upload process can take advantage of parallel multi-part uploads to save time and then avoid the extra copy time of the data from the Hadoop cluster to HDFS.

In this demonstration, a script is ran which gatherers the support logs from a Cloudian HyperStore cluster to a bucket using the S3N connector. Once the logs are uploaded to a bucket, Map Reduce jobs are ran to analyze the data. Finally, the data is exported to a file in a comma separated value format and graphed using GNU Plot. The video demonstrates the ease-of-use capabilities of combining two great  technologies such as HyperStore and Hadoop to analyze big data efficiently.

 

Share This:
Facebooktwitterlinkedinmail