Design and Deployment of a Sharded Cluster for the KASCADE Cosmic-ray Data Centre

From Lsdf
Revision as of 15:23, 9 February 2016 by M Szuba (talk | contribs) (Created page with "Zurück zur Themenliste = Description = [https://kcdc.ikp.kit.edu/ KASCADE Cosmic-ray Data Centre] (KCDC) makes publicly available the data f...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Zurück zur Themenliste

Description

KASCADE Cosmic-ray Data Centre (KCDC) makes publicly available the data from the astroparticle-physics experiment KASCADE. The system will eventually hold over 20 TB of data, or nearly half a billion events. Since 2015 it has been using the NoSQL database MongoDB as its storage back-end.

The goal of this project is to assist the KCDC database from a single server to a sharded (partitioned) cluster. In particular, you will be required to select and evaluate optimal shard keys for partitioned collections.

This is a joint project between the Steinbuch Centre for Computing (SCC) and the Institute for Nuclear Physics (IKP).

Tasks

  • analyse common work flows of KCDC from the point of view of database operations
  • identify candidates for shard keys
  • deploy a MongoDB cluster and activate sharding
  • evaluate performance

A possibility exists to extend the project with additional goals to meet the requirements of a Master thesis.

Requirements

  • familiarity with MongoDB and sharding
  • basic administrator-level knowledge of Linux
  • knowledge of Python, Node.js JavaScript or another cross-platform scripting language would be an asset

Contact

Marek.Szuba@kit.edu - 29178

Doris.Wochele@kit.edu - 22418