Difference between revisions of "Hadoop Hands-on"

From Gridkaschool
Line 1: Line 1:
 
28.8.2012 – 13:30
 
28.8.2012 – 13:30
   
==Session 1==
+
==Session A==
   
 
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
 
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
Line 9: Line 9:
 
Installation, starting and basic configurations of a small cluster
 
Installation, starting and basic configurations of a small cluster
   
==Session 2==
+
==Session B==
   
 
HDFS intro (Name Node, Data Node, Secondary Name Node)
 
HDFS intro (Name Node, Data Node, Secondary Name Node)
Line 19: Line 19:
 
HDFS commands
 
HDFS commands
   
==Session 3==
+
==Session C==
   
 
Working with the webbased-GUI
 
Working with the webbased-GUI
Line 29: Line 29:
 
Streaming API sample
 
Streaming API sample
   
==Session 4==
+
==Session D==
   
 
Map Reduce details, Java-API and Streaming (awk sample)
 
Map Reduce details, Java-API and Streaming (awk sample)
Line 37: Line 37:
 
Breaking down a cluster and heal it
 
Breaking down a cluster and heal it
   
==Session 5==
+
==Session E==
   
 
Intro to Hive and Sqoop
 
Intro to Hive and Sqoop
Line 45: Line 45:
 
Hive scripts
 
Hive scripts
   
==Session 6 (optional)==
+
==Session F (optional)==
   
SerDe and UDF with Hive
+
+ Serialisation / Deserialisation and user defined functions with Hive
   
Workflows with oozie
+
+ Workflows with oozie

Revision as of 19:54, 25 August 2012

28.8.2012 – 13:30

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)

How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands

Session C

Working with the webbased-GUI

Running and tracking jobs

Java-API and samples

Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop

Dataimport via Sqoop

Hive scripts

Session F (optional)

+ Serialisation / Deserialisation and user defined functions with Hive

+ Workflows with oozie