Latest revision as of 21:05, 26 August 2012

Tuesday, 28.8.2012, 13:00 - 18:30

Objectives

The focus of this session is on the hadoop ecosystem and the interplay of many specialized tools for data analysis.

We look into the Java API as well, but not in so much detail as in a pure developer class. We will try to show a big picture

of hadoop in the context of scientific computing. You will learn, what hadoop can be used for, and what it is not intended to be

applied to. Therefore we will discuss the underlying principles as well as the programming model and installation / configuration

procedures. You will test some of the commands on a real cluster and some life demos give you an idea of lots of features provided

by the web based user interface.

Prerequisites

Basic understanding of Unix/Linux OS management is needed to do the exercises.
No prior knowledge of Hadoop is required, as we go through the basic concepts.
For this workshop a personal notebook is recommendet.
If you use Windows: please install "PuTTY" and the VMWare-Player.

Recommendet Material

Books

Hadoop the Defenitive Guide [1]
Hadoop in Action [2]
Data Intensive Text Processing with MapReduce [3]

Scripts from last year

Introduction [4]
MapReduce [5]
Pig [6]
Hand-out [7]

Content

Session A

The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
What is CDH and the Cloudera-Manager?
Installation, starting and basic configurations of a small cluster

Session B

HDFS intro (Name Node, Data Node, Secondary Name Node)
How is data stored in HDFS?
Properties and configurations, relevant for efficient working with HDFS.
HDFS commands

Session C

Working with the webbased-GUI
Running and tracking jobs
Java-API and samples
Streaming API sample

Session D

Map Reduce details, Java-API and Streaming
HDFS details, using the webbased-GUI for deeper insights
Breaking down a cluster and heal it

Session E

Intro to Hive and Sqoop
Dataimport via Sqoop
Hive scripts

Session F (optional)

Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
Workflows with oozie

@@ Line 1: / Line 1: @@
-.8.2012 – 13:30
+Tuesday, 28.8.2012, 13:00 - 18:30
+=Objectives=
-==Session A==
-The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
+The focus of this session is on the hadoop ecosystem and the interplay of many specialized tools for data analysis.
+We look into the Java API as well, but not in so much detail as in a pure developer class. We will try to show a big picture
-What is CDH and the Cloudera-Manager?
+of hadoop in the context of scientific computing. You will learn, what hadoop can be used for, and what it is not intended to be
-Installation, starting and basic configurations of a small cluster
+applied to. Therefore we will discuss the underlying principles as well as the programming model and installation / configuration
-==Session B==
+procedures. You will test some of the commands on a real cluster and some life demos give you an idea of lots of features provided
-HDFS intro (Name Node, Data Node, Secondary Name Node)
+by the web based user interface.
-How is data stored in HDFS?
+=Prerequisites=
-Properties and configurations, relevant for efficient working with HDFS.
+* Basic understanding of Unix/Linux OS management is needed to do the exercises.
+* No prior knowledge of Hadoop is required, as we go through the basic concepts.
+* For this workshop a personal notebook is recommendet.
+* If you use Windows: please install "PuTTY" and the VMWare-Player.
+=Recommendet Material=
-HDFS commands
-==Session C==
+==Books==
+* Hadoop the Defenitive Guide [http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1]
+* Hadoop in Action [http://www.amazon.de/Hadoop-Action-Chuck-Lam/dp/1935182196/ref=sr_1_1?s=books-intl-de&ie=UTF8&qid=1345918219&sr=1-1]
+* Data Intensive Text Processing with MapReduce [http://www.amazon.de/Data-Intensive-Processing-Mapreduce-Author-Paperback/dp/B006V38ZCK/ref=sr_1_2?ie=UTF8&qid=1345918261&sr=8-2]
+==Scripts from last year==
-Working with the webbased-GUI
+* Introduction [http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf]
-Running and tracking jobs
+* MapReduce [http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf]
+* Pig [http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf]
+* Hand-out [http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf]
-Java-API and samples
-Streaming API sample
+=Content=
-==Session D==
+==Session A==
+* The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
+* What is CDH and the Cloudera-Manager?
+* Installation, starting and basic configurations of a small cluster
+==Session B==
-Map Reduce details, Java-API and Streaming (awk sample)
+* HDFS intro (Name Node, Data Node, Secondary Name Node)
+* How is data stored in HDFS?
+* Properties and configurations, relevant for efficient working with HDFS.
+* HDFS commands
+==Session C==
-HDFS details, using the webbased-GUI for deeper insights
+* Working with the webbased-GUI
+* Running and tracking jobs
+* Java-API and samples
+* Streaming API sample
+==Session D==
-Breaking down a cluster and heal it
+* Map Reduce details, Java-API and Streaming
+* HDFS details, using the webbased-GUI for deeper insights
+* Breaking down a cluster and heal it
 ==Session E==
+* Intro to Hive and Sqoop
-Intro to Hive and Sqoop
+* Dataimport via Sqoop
+* Hive scripts
-Dataimport via Sqoop
-Hive scripts
 ==Session F (optional)==
+* Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
+* Workflows with oozie
-+ Serialisation / Deserialisation and user defined functions with Hive
-+ Workflows with oozie

Difference between revisions of "Hadoop Hands-on"

Latest revision as of 21:05, 26 August 2012

Contents

Objectives

Prerequisites

Recommendet Material

Books

Scripts from last year

Content

Session A

Session B

Session C

Session D

Session E

Session F (optional)

Navigation menu

Views

Personal tools

Navigation

Search

Tools