Hadoop Hands-on

From Gridkaschool

Revision as of 20:01, 25 August 2012 by Kamir1604 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to:navigation, search

Tuesday, 28.8.2012, 13:00 - 18:30

Contents

1 Preparation
- 1.1 Important Information
2 Content

Preparation

Important Information

For this workshop a personal notebook is recommendet.
If you use Windows:

- please prepare the program "PuTTY" for this workshop: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

- please prepare the VMWare-Player

Content

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands

Session C

Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts

Session F (optional)

Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
Workflows with oozie

Retrieved from "https://wiki.scc.kit.edu/gridkaschool/index.php?title=Hadoop_Hands-on&oldid=476"

Navigation menu