Design and Deployment of a Sharded Cluster for the KASCADE Cosmic-ray Data Centre

From Lsdf
Revision as of 15:23, 9 February 2016 by M Szuba (talk | contribs) (Created page with "Zurück zur Themenliste = Description = [https://kcdc.ikp.kit.edu/ KASCADE Cosmic-ray Data Centre] (KCDC) makes publicly available the data f...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Zurück zur Themenliste

Description

KASCADE Cosmic-ray Data Centre (KCDC) makes publicly available the data from the astroparticle-physics experiment KASCADE. The system will eventually hold over 20 TB of data, or nearly half a billion events. Since 2015 it has been using the NoSQL database MongoDB as its storage back-end.

The goal of this project is to assist the KCDC database from a single server to a sharded (partitioned) cluster. In particular, you will be required to select and evaluate optimal shard keys for partitioned collections.

This is a joint project between the Steinbuch Centre for Computing (SCC) and the Institute for Nuclear Physics (IKP).

Tasks

  • analyse common work flows of KCDC from the point of view of database operations
  • identify candidates for shard keys
  • deploy a MongoDB cluster and activate sharding
  • evaluate performance

A possibility exists to extend the project with additional goals to meet the requirements of a Master thesis.

Requirements

  • familiarity with MongoDB and sharding
  • basic administrator-level knowledge of Linux
  • knowledge of Python, Node.js JavaScript or another cross-platform scripting language would be an asset

Contact

Marek.Szuba@kit.edu - 29178

Doris.Wochele@kit.edu - 22418