Hadoop Hands-on
From Gridkaschool
28.8.2012 – 13:30
Session 1
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster
Session 2
HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands
Session 3
Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample
Session 4
Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it
Session 5
Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts
Session 6 (optional)
SerDe and UDF with Hive
Workflows with oozie