Hadoop Hands-on

From Gridkaschool
Revision as of 19:50, 25 August 2012 by Kamir1604 (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.


Hadoop hands on

28.8.2012 – 13:30

Session 1

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster

Session 2

HDFS intro (Name Node, Data Node, Secondary Name Node)

How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands

Session 3

Working with the webbased-GUI

Running and tracking jobs

Java-API and samples

Streaming API sample

Session 4

Map Reduce details, Java-API and Streaming (awk sample)

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it

Session 5

Intro to Hive and Sqoop

Dataimport via Sqoop

Hive scripts

Session 6 (optional)

SerDe and UDF with Hive

Workflows with oozie