Hadoop Hands-on
28.8.2012 – 13:30
Session A
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster
Session B
HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands
Session C
Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample
Session D
Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it
Session E
Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts
Session F (optional)
+ Serialisation / Deserialisation and user defined functions with Hive
+ Workflows with oozie