Difference between revisions of "Hadoop Workshop"

From Gridkaschool
(Created page with "== Content == * Session 1 - Setup ** The Hadoop Ecosystem *** Prepare the DEMO VM *** HUE, the Hadoop Web-UI * Session 2 - The Kite-SDK, a Convinient Framework ** The KITE-SDK …")
 
Line 1: Line 1:
 
== Content ==
 
== Content ==
   
* Session 1 - Setup
+
* ''Session 1'' - '''Setup'''
 
** The Hadoop Ecosystem
 
** The Hadoop Ecosystem
 
*** Prepare the DEMO VM
 
*** Prepare the DEMO VM
 
*** HUE, the Hadoop Web-UI
 
*** HUE, the Hadoop Web-UI
   
* Session 2 - The Kite-SDK, a Convinient Framework
+
* ''Session 2'' - '''The Kite-SDK, a Convinient Framework'''
 
** The KITE-SDK
 
** The KITE-SDK
 
*** Accessing data
 
*** Accessing data
Line 12: Line 12:
 
*** Kite-Modules
 
*** Kite-Modules
   
** Session 3 - Real Time Indexing
+
* ''Session 3'' - '''Real Time Indexing'''
*** Importing data with Flume
+
** Importing data with Flume
*** Indexing datasets using Morphlines
+
** Indexing datasets using Morphlines
   
** Session 4 - Introduction to Apache Crunch
+
* ''Session 4'' - '''Introduction to Apache Crunch'''
*** The Crunch Datamodel
+
** The Crunch Datamodel
*** Crunch data pipelines
+
** Crunch data pipelines
   
   

Revision as of 18:46, 18 August 2014

Content

  • Session 1 - Setup
    • The Hadoop Ecosystem
      • Prepare the DEMO VM
      • HUE, the Hadoop Web-UI
  • Session 2 - The Kite-SDK, a Convinient Framework
    • The KITE-SDK
      • Accessing data
      • Metadata management
      • Kite-Modules
  • Session 3 - Real Time Indexing
    • Importing data with Flume
    • Indexing datasets using Morphlines
  • Session 4 - Introduction to Apache Crunch
    • The Crunch Datamodel
    • Crunch data pipelines


Material

Slides:

 will be available after the workshop

Hand-Out:

 will be provided as a hardcopy and as PDF after the workshop


Important Information

  • For this workshop a personal notebook is necessary. You will use [VirtualBox] to run the WorkshopVW
  • If you use Windows, please prepare the program "PuTTY" for this workshop:
 http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

Abstract

In the last couple of years cloud computing has achieved an important status in the IT scene.
The renting of computing power, storage and applications according to requirements is regarded as future business.
This tutorial course gives an introduction of the basic concepts of the Infrastructure-as-a-Service (IaaS) model
based on the cloud offerings provided by Amazon, one of the present leading commercial cloud computing providers.

Workshop Exercise

Efficient Data Management with Apache Hadoop We will walk through the dataset life cycle. Starting with data ingestion and real time indexing we use several tools to conserve important datasets and to extract information using high level processing and query frameworks.

Tools

  • HUE
  • Flume
  • SOLR
    • Hive & Impala
    • Crunch & MapReduce