Difference between revisions of "Hadoop Hands-on"
From Gridkaschool
Line 11: | Line 11: | ||
==Books== |
==Books== |
||
* Hadoop the Defenitive Guide [http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1] |
* Hadoop the Defenitive Guide [http://www.amazon.de/Hadoop-Definitive-Guide-Tom-White/dp/1449311520/ref=sr_1_fkmr1_1?ie=UTF8&qid=1345918087&sr=8-1-fkmr1] |
||
+ | * Hadoop in Action [http://www.amazon.de/Hadoop-Action-Chuck-Lam/dp/1935182196/ref=sr_1_1?s=books-intl-de&ie=UTF8&qid=1345918219&sr=1-1] |
||
− | * Hadoop in Action |
||
− | * Data Intensive Text Processing with MapReduce |
+ | * Data Intensive Text Processing with MapReduce [ http://www.amazon.de/Data-Intensive-Processing-Mapreduce-Author-Paperback/dp/B006V38ZCK/ref=sr_1_2?ie=UTF8&qid=1345918261&sr=8-2 ] |
+ | |||
+ | ==Scripts from last year== |
||
+ | |||
+ | http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf |
||
+ | http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf |
||
+ | http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf |
||
+ | |||
+ | Hand-outs: |
||
+ | http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf |
||
+ | |||
+ | |||
=Content= |
=Content= |
Revision as of 20:12, 25 August 2012
Tuesday, 28.8.2012, 13:00 - 18:30
Contents
Prerequisites
- Basic understanding of Unix/Linux OS management is needed to do the exercises.
- No prior knowledge of Hadoop is required, as we go through the basic concepts.
- For this workshop a personal notebook is recommendet.
- If you use Windows: please install "PuTTY" and the VMWare-Player.
Recommendet Material
Books
- Hadoop the Defenitive Guide [1]
- Hadoop in Action [2]
- Data Intensive Text Processing with MapReduce [ http://www.amazon.de/Data-Intensive-Processing-Mapreduce-Author-Paperback/dp/B006V38ZCK/ref=sr_1_2?ie=UTF8&qid=1345918261&sr=8-2 ]
Scripts from last year
http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-1-Introduction.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-2_4-MapReduce.pdf http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-5-Pig.pdf
Hand-outs: http://gridka-school.scc.kit.edu/2011/downloads/Hadoop_tutorial-Hand_outs.pdf
Content
Session A
- The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
- What is CDH and the Cloudera-Manager?
- Installation, starting and basic configurations of a small cluster
Session B
- HDFS intro (Name Node, Data Node, Secondary Name Node)
- How is data stored in HDFS?
- Properties and configurations, relevant for efficient working with HDFS.
- HDFS commands
Session C
- Working with the webbased-GUI
- Running and tracking jobs
- Java-API and samples
- Streaming API sample
Session D
- Map Reduce details, Java-API and Streaming (awk sample)
- HDFS details, using the webbased-GUI for deeper insights
- Breaking down a cluster and heal it
Session E
- Intro to Hive and Sqoop
- Dataimport via Sqoop
- Hive scripts
Session F (optional)
- Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
- Workflows with oozie