Difference between revisions of "Hadoop Hands-on"
From Gridkaschool
Line 2: | Line 2: | ||
==Session A== |
==Session A== |
||
+ | * The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie |
||
− | |||
+ | * What is CDH and the Cloudera-Manager? |
||
− | The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie |
||
+ | * Installation, starting and basic configurations of a small cluster |
||
− | |||
− | What is CDH and the Cloudera-Manager? |
||
− | |||
− | Installation, starting and basic configurations of a small cluster |
||
==Session B== |
==Session B== |
||
+ | * HDFS intro (Name Node, Data Node, Secondary Name Node) |
||
− | |||
+ | * How is data stored in HDFS? |
||
− | HDFS intro (Name Node, Data Node, Secondary Name Node) |
||
+ | * Properties and configurations, relevant for efficient working with HDFS. |
||
− | |||
+ | * HDFS commands |
||
− | How is data stored in HDFS? |
||
− | |||
− | Properties and configurations, relevant for efficient working with HDFS. |
||
− | |||
− | HDFS commands |
||
==Session C== |
==Session C== |
||
+ | * Working with the webbased-GUI |
||
− | |||
+ | * Running and tracking jobs |
||
− | Working with the webbased-GUI |
||
+ | * Java-API and samples |
||
− | |||
+ | * Streaming API sample |
||
− | Running and tracking jobs |
||
− | |||
− | Java-API and samples |
||
− | |||
− | Streaming API sample |
||
==Session D== |
==Session D== |
||
+ | * Map Reduce details, Java-API and Streaming (awk sample) |
||
− | |||
− | + | * HDFS details, using the webbased-GUI for deeper insights |
|
+ | * Breaking down a cluster and heal it |
||
− | |||
− | HDFS details, using the webbased-GUI for deeper insights |
||
− | |||
− | Breaking down a cluster and heal it |
||
==Session E== |
==Session E== |
||
+ | * Intro to Hive and Sqoop |
||
− | |||
− | + | * Dataimport via Sqoop |
|
+ | * Hive scripts |
||
− | |||
− | Dataimport via Sqoop |
||
− | |||
− | Hive scripts |
||
==Session F (optional)== |
==Session F (optional)== |
Revision as of 19:56, 25 August 2012
28.8.2012 – 13:30
Session A
- The hadoop ecosystem: HDFS, MR, HUE, Sqoop, Hive, Pig, HBase, Flume, Oozie
- What is CDH and the Cloudera-Manager?
- Installation, starting and basic configurations of a small cluster
Session B
- HDFS intro (Name Node, Data Node, Secondary Name Node)
- How is data stored in HDFS?
- Properties and configurations, relevant for efficient working with HDFS.
- HDFS commands
Session C
- Working with the webbased-GUI
- Running and tracking jobs
- Java-API and samples
- Streaming API sample
Session D
- Map Reduce details, Java-API and Streaming (awk sample)
- HDFS details, using the webbased-GUI for deeper insights
- Breaking down a cluster and heal it
Session E
- Intro to Hive and Sqoop
- Dataimport via Sqoop
- Hive scripts
Session F (optional)
- Serialisation / Deserialisation and user defined functions with Hive
- Workflows with oozie