Hadoop Hands-on: Difference between revisions

Revision as of 19:55, 25 August 2012

28.8.2012 – 13:30

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster

HDFS intro (Name Node, Data Node, Secondary Name Node)

How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands

Working with the webbased-GUI

Running and tracking jobs

Java-API and samples

Streaming API sample

Map Reduce details, Java-API and Streaming (awk sample)

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it

Intro to Hive and Sqoop

Dataimport via Sqoop

Hive scripts

@@ Line 46: / Line 46: @@
 ==Session F (optional)==
+* Serialisation / Deserialisation and user defined functions with Hive
+* Workflows with oozie
-+ Serialisation / Deserialisation and user defined functions with Hive
-+ Workflows with oozie