Difference between revisions of "Hadoop Hands-on"
From Gridkaschool
Line 1: | Line 1: | ||
Tuesday, 28.8.2012, 13:00 - 18:30 |
Tuesday, 28.8.2012, 13:00 - 18:30 |
||
+ | =Requirements= |
||
− | =Preparation= |
||
− | == |
+ | == Computer == |
* For this workshop a personal notebook is recommendet. |
* For this workshop a personal notebook is recommendet. |
||
− | |||
* If you use Windows: please install "PuTTY" and the VMWare-Player |
* If you use Windows: please install "PuTTY" and the VMWare-Player |
||
=Material= |
=Material= |
||
+ | * Hadoop the Defenitive Guide [www.amazon.de] |
||
+ | |||
+ | * Hadoop in Action |
||
+ | * Data Intensive Text Processing with MapReduce |
||
=Content= |
=Content= |
Revision as of 20:04, 25 August 2012
Tuesday, 28.8.2012, 13:00 - 18:30
Contents
Requirements
Computer
- For this workshop a personal notebook is recommendet.
- If you use Windows: please install "PuTTY" and the VMWare-Player
Material
- Hadoop the Defenitive Guide [www.amazon.de]
- Hadoop in Action
- Data Intensive Text Processing with MapReduce
Content
Session A
- The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
- What is CDH and the Cloudera-Manager?
- Installation, starting and basic configurations of a small cluster
Session B
- HDFS intro (Name Node, Data Node, Secondary Name Node)
- How is data stored in HDFS?
- Properties and configurations, relevant for efficient working with HDFS.
- HDFS commands
Session C
- Working with the webbased-GUI
- Running and tracking jobs
- Java-API and samples
- Streaming API sample
Session D
- Map Reduce details, Java-API and Streaming (awk sample)
- HDFS details, using the webbased-GUI for deeper insights
- Breaking down a cluster and heal it
Session E
- Intro to Hive and Sqoop
- Dataimport via Sqoop
- Hive scripts
Session F (optional)
- Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
- Workflows with oozie