Difference between revisions of "Hadoop Hands-on"

From Gridkaschool
Line 11: Line 11:
 
==Books==
 
==Books==
 
* Hadoop the Defenitive Guide [http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1]
 
* Hadoop the Defenitive Guide [http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1]
  +
* Hadoop in Action [http://www.amazon.de/Hadoop-Action-Chuck-Lam/dp/1935182196/ref=sr_1_1?s=books-intl-de&ie=UTF8&qid=1345918219&sr=1-1]
* Hadoop in Action
 
* Data Intensive Text Processing with MapReduce
+
* Data Intensive Text Processing with MapReduce [ http://www.amazon.de/Data-Intensive-Processing-Mapreduce-Author-Paperback/dp/B006V38ZCK/ref=sr_1_2?ie=UTF8&qid=1345918261&sr=8-2 ]
  +
  +
==Scripts from last year==
  +
  +
http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf
  +
http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf
  +
http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf
  +
  +
Hand-outs:
  +
http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf
  +
  +
   
 
=Content=
 
=Content=

Revision as of 20:12, 25 August 2012

Tuesday, 28.8.2012, 13:00 - 18:30

Prerequisites

  • Basic understanding of Unix/Linux OS management is needed to do the exercises.
  • No prior knowledge of Hadoop is required, as we go through the basic concepts.
  • For this workshop a personal notebook is recommendet.
  • If you use Windows: please install "PuTTY" and the VMWare-Player.

Recommendet Material

Books

Scripts from last year

http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf

Hand-outs: http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf


Content

Session A

  • The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  • What is CDH and the Cloudera-Manager?
  • Installation, starting and basic configurations of a small cluster

Session B

  • HDFS intro (Name Node, Data Node, Secondary Name Node)
  • How is data stored in HDFS?
  • Properties and configurations, relevant for efficient working with HDFS.
  • HDFS commands

Session C

  • Working with the webbased-GUI
  • Running and tracking jobs
  • Java-API and samples
  • Streaming API sample

Session D

  • Map Reduce details, Java-API and Streaming (awk sample)
  • HDFS details, using the webbased-GUI for deeper insights
  • Breaking down a cluster and heal it

Session E

  • Intro to Hive and Sqoop
  • Dataimport via Sqoop
  • Hive scripts

Session F (optional)

  • Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
  • Workflows with oozie