Hadoop Hands-on

From Gridkaschool
Revision as of 20:09, 25 August 2012 by Kamir1604 (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Tuesday, 28.8.2012, 13:00 - 18:30

Prerequisites

  • Basic understanding of Unix/Linux OS management is needed to do the exercises.
  • No prior knowledge of Hadoop is required, as we go through the basic concepts.
  • For this workshop a personal notebook is recommendet.
  • If you use Windows: please install "PuTTY" and the VMWare-Player.



Recommendet Material

Books

Content

Session A

  • The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  • What is CDH and the Cloudera-Manager?
  • Installation, starting and basic configurations of a small cluster

Session B

  • HDFS intro (Name Node, Data Node, Secondary Name Node)
  • How is data stored in HDFS?
  • Properties and configurations, relevant for efficient working with HDFS.
  • HDFS commands

Session C

  • Working with the webbased-GUI
  • Running and tracking jobs
  • Java-API and samples
  • Streaming API sample

Session D

  • Map Reduce details, Java-API and Streaming (awk sample)
  • HDFS details, using the webbased-GUI for deeper insights
  • Breaking down a cluster and heal it

Session E

  • Intro to Hive and Sqoop
  • Dataimport via Sqoop
  • Hive scripts

Session F (optional)

  • Serialisation and deserialisation (SerDe) and user defined functions (UDF) with Hive
  • Workflows with oozie