Hadoop Hands-on

From Gridkaschool
Revision as of 19:56, 25 August 2012 by Kamir1604 (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

28.8.2012 – 13:30

Session A

  • The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  • What is CDH and the Cloudera-Manager?
  • Installation, starting and basic configurations of a small cluster

Session B

  • HDFS intro (Name Node, Data Node, Secondary Name Node)
  • How is data stored in HDFS?
  • Properties and configurations, relevant for efficient working with HDFS.
  • HDFS commands

Session C

  • Working with the webbased-GUI
  • Running and tracking jobs
  • Java-API and samples
  • Streaming API sample

Session D

  • Map Reduce details, Java-API and Streaming (awk sample)
  • HDFS details, using the webbased-GUI for deeper insights
  • Breaking down a cluster and heal it

Session E

  • Intro to Hive and Sqoop
  • Dataimport via Sqoop
  • Hive scripts

Session F (optional)

  • Serialisation / Deserialisation and user defined functions with Hive
  • Workflows with oozie