Difference between revisions of "Hadoop Hands-on"

From Gridkaschool

Jump to:navigation, search

Revision as of 20:04, 25 August 2012

Tuesday, 28.8.2012, 13:00 - 18:30

Contents

1 Requirements
- 1.1 Computer
2 Material
3 Content

Requirements

Computer

For this workshop a personal notebook is recommendet.
If you use Windows: please install "PuTTY" and the VMWare-Player

Material

Hadoop the Defenitive Guide [www.amazon.de]

Hadoop in Action

Data Intensive Text Processing with MapReduce

Content

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands

Session C

Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts

Session F (optional)

Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
Workflows with oozie

Retrieved from "https://wiki.scc.kit.edu/gridkaschool/index.php?title=Hadoop_Hands-on&oldid=478"

Navigation menu