Difference between revisions of "Hadoop Hands-on"

From Gridkaschool
(Blanked the page)
Line 1: Line 1:
  +
  +
  +
Hadoop hands on
  +
  +
28.8.2012 – 13:30
  +
  +
Session 1
  +
  +
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  +
  +
What is CDH and the Cloudera-Manager?
  +
  +
Installation, starting and basic configurations of a small cluster
  +
  +
Session 2
  +
  +
HDFS intro (Name Node, Data Node, Secondary Name Node)
  +
  +
How is data stored in HDFS?
  +
  +
Properties and configurations, relevant for efficient working with HDFS.
  +
  +
HDFS commands
  +
  +
Session 3
  +
  +
Working with the webbased-GUI
  +
  +
Running and tracking jobs
  +
  +
Java-API and samples
  +
  +
Streaming API sample
  +
  +
Session 4
  +
  +
Map Reduce details, Java-API and Streaming (awk sample)
  +
  +
HDFS details, using the webbased-GUI for deeper insights
  +
  +
Breaking down a cluster and heal it
  +
  +
Session 5
  +
  +
Intro to Hive and Sqoop
  +
  +
Dataimport via Sqoop
  +
  +
Hive scripts
  +
  +
Session 6 (optional)
  +
  +
SerDe and UDF with Hive
  +
  +
Workflows with oozie

Revision as of 19:50, 25 August 2012


Hadoop hands on

28.8.2012 – 13:30

Session 1

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster

Session 2

HDFS intro (Name Node, Data Node, Secondary Name Node)

How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands

Session 3

Working with the webbased-GUI

Running and tracking jobs

Java-API and samples

Streaming API sample

Session 4

Map Reduce details, Java-API and Streaming (awk sample)

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it

Session 5

Intro to Hive and Sqoop

Dataimport via Sqoop

Hive scripts

Session 6 (optional)

SerDe and UDF with Hive

Workflows with oozie