Hadoop Hands-on: Difference between revisions

From Gridkaschool
Jump to navigationJump to search
No edit summary
No edit summary
Line 2: Line 2:


==Session A==
==Session A==
* The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie

* What is CDH and the Cloudera-Manager?
The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
* Installation, starting and basic configurations of a small cluster

What is CDH and the Cloudera-Manager?

Installation, starting and basic configurations of a small cluster


==Session B==
==Session B==
* HDFS intro (Name Node, Data Node, Secondary Name Node)

* How is data stored in HDFS?
HDFS intro (Name Node, Data Node, Secondary Name Node)
* Properties and configurations, relevant for efficient working with HDFS.

* HDFS commands
How is data stored in HDFS?

Properties and configurations, relevant for efficient working with HDFS.

HDFS commands


==Session C==
==Session C==
* Working with the webbased-GUI

* Running and tracking jobs
Working with the webbased-GUI
* Java-API and samples

* Streaming API sample
Running and tracking jobs

Java-API and samples

Streaming API sample


==Session D==
==Session D==
* Map Reduce details, Java-API and Streaming (awk sample)

Map Reduce details, Java-API and Streaming (awk sample)
* HDFS details, using the webbased-GUI for deeper insights
* Breaking down a cluster and heal it

HDFS details, using the webbased-GUI for deeper insights

Breaking down a cluster and heal it


==Session E==
==Session E==
* Intro to Hive and Sqoop

Intro to Hive and Sqoop
* Dataimport via Sqoop
* Hive scripts

Dataimport via Sqoop

Hive scripts


==Session F (optional)==
==Session F (optional)==

Revision as of 19:56, 25 August 2012

28.8.2012 – 13:30

Session A

  • The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
  • What is CDH and the Cloudera-Manager?
  • Installation, starting and basic configurations of a small cluster

Session B

  • HDFS intro (Name Node, Data Node, Secondary Name Node)
  • How is data stored in HDFS?
  • Properties and configurations, relevant for efficient working with HDFS.
  • HDFS commands

Session C

  • Working with the webbased-GUI
  • Running and tracking jobs
  • Java-API and samples
  • Streaming API sample

Session D

  • Map Reduce details, Java-API and Streaming (awk sample)
  • HDFS details, using the webbased-GUI for deeper insights
  • Breaking down a cluster and heal it

Session E

  • Intro to Hive and Sqoop
  • Dataimport via Sqoop
  • Hive scripts

Session F (optional)

  • Serialisation / Deserialisation and user defined functions with Hive
  • Workflows with oozie