Hadoop Hands-on

From Gridkaschool

Revision as of 20:12, 25 August 2012 by Kamir1604 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to:navigation, search

Tuesday, 28.8.2012, 13:00 - 18:30

Contents

1 Prerequisites
2 Recommendet Material
- 2.1 Books
- 2.2 Scripts from last year
3 Content

Prerequisites

Basic understanding of Unix/Linux OS management is needed to do the exercises.
No prior knowledge of Hadoop is required, as we go through the basic concepts.
For this workshop a personal notebook is recommendet.
If you use Windows: please install "PuTTY" and the VMWare-Player.

Recommendet Material

Books

Hadoop the Defenitive Guide [1]
Hadoop in Action [2]
Data Intensive Text Processing with MapReduce [ http://www.amazon.de/Data-Intensive-Processing-Mapreduce-Author-Paperback/dp/B006V38ZCK/ref=sr_1_2?ie=UTF8&qid=1345918261&sr=8-2 ]

Scripts from last year

http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf

Hand-outs: http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf

Content

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands

Session C

Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample

Session D

Map Reduce details, Java-API and Streaming (awk sample)
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts

Session F (optional)

Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
Workflows with oozie

Retrieved from "https://wiki.scc.kit.edu/gridkaschool/index.php?title=Hadoop_Hands-on&oldid=481"

Navigation menu