Hadoop Hands-on

From Gridkaschool
Revision as of 19:55, 25 August 2012 by Kamir1604 (talk | contribs)

28.8.2012 – 13:30

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)

How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands

Session C

Working with the webbased-GUI

Running and tracking jobs

Java-API and samples

Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop

Dataimport via Sqoop

Hive scripts

Session F (optional)

  • Serialisation / Deserialisation and user defined functions with Hive
  • Workflows with oozie