Difference between revisions of "Hadoop Hands-on"

From Gridkaschool
Line 1: Line 1:
 
Tuesday, 28.8.2012, 13:00 - 18:30
 
Tuesday, 28.8.2012, 13:00 - 18:30
   
  +
=Preparation=
  +
== Important Information ==
  +
  +
* For this workshop a personal notebook is recommendet.
  +
* If you use Windows:
  +
  +
** please prepare the program "PuTTY" for this workshop: http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
  +
  +
** please prepare the VMWare-Player
  +
  +
=Content=
 
==Session A==
 
==Session A==
 
* The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
 
* The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

Revision as of 20:01, 25 August 2012

Tuesday, 28.8.2012, 13:00 - 18:30

Preparation

Important Information

  • For this workshop a personal notebook is recommendet.
  • If you use Windows:
    • please prepare the VMWare-Player

Content

Session A

  • The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  • What is CDH and the Cloudera-Manager?
  • Installation, starting and basic configurations of a small cluster

Session B

  • HDFS intro (Name Node, Data Node, Secondary Name Node)
  • How is data stored in HDFS?
  • Properties and configurations, relevant for efficient working with HDFS.
  • HDFS commands

Session C

  • Working with the webbased-GUI
  • Running and tracking jobs
  • Java-API and samples
  • Streaming API sample

Session D

  • Map Reduce details, Java-API and Streaming (awk sample)
  • HDFS details, using the webbased-GUI for deeper insights
  • Breaking down a cluster and heal it

Session E

  • Intro to Hive and Sqoop
  • Dataimport via Sqoop
  • Hive scripts

Session F (optional)

  • Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
  • Workflows with oozie