The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Tuesday, 28.8.2012, 13:00 - 18:30

Prerequisites

Basic understanding of Unix/Linux OS management is needed to do the exercises.

No prior knowledge of Hadoop is required, as we go through the basic concepts.
For this workshop a personal notebook is recommendet.
If you use Windows: please install "PuTTY" and the VMWare-Player.

Recommendet Material

Books

Hadoop the Defenitive Guide [Amazon|http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1]
Hadoop in Action
Data Intensive Text Processing with MapReduce

Content

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands

Session C

Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts

Session F (optional)

Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
Workflows with oozie

Hadoop Hands-on

Contents

Prerequisites

Recommendet Material

Books

Content

Session A

Session B

Session C

Session D

Session E

Session F (optional)

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools