Difference between revisions of "Design and Deployment of a Sharded Cluster for the KASCADE Cosmic-ray Data Centre"

From Lsdf
(Created page with "Zurück zur Themenliste = Description = [https://kcdc.ikp.kit.edu/ KASCADE Cosmic-ray Data Centre] (KCDC) makes publicly available the data f...")
 
(Replaced content with "{{db|1=topic exists no longer}}")
 
Line 1: Line 1:
  +
{{db|1=topic exists no longer}}
[[Studentische_Arbeiten_am_SCC|Zurück zur Themenliste]]
 
 
= Description =
 
[https://kcdc.ikp.kit.edu/ KASCADE Cosmic-ray Data Centre] (KCDC) makes publicly available the data from the astroparticle-physics experiment KASCADE. The system will eventually hold over 20 TB of data, or nearly half a billion events. Since 2015 it has been using the NoSQL database MongoDB as its storage back-end.
 
 
The goal of this project is to assist the KCDC database from a single server to a sharded (partitioned) cluster. In particular, you will be required to select and evaluate optimal shard keys for partitioned collections.
 
 
This is a joint project between the Steinbuch Centre for Computing (SCC) and the Institute for Nuclear Physics (IKP).
 
 
= Tasks =
 
* analyse common work flows of KCDC from the point of view of database operations
 
* identify candidates for shard keys
 
* deploy a MongoDB cluster and activate sharding
 
* evaluate performance
 
A possibility exists to extend the project with additional goals to meet the requirements of a Master thesis.
 
 
= Requirements =
 
* familiarity with MongoDB and sharding
 
* basic administrator-level knowledge of Linux
 
* knowledge of Python, Node.js JavaScript or another cross-platform scripting language would be an asset
 
 
= Contact =
 
Marek.Szuba@kit.edu - 29178
 
 
Doris.Wochele@kit.edu - 22418
 

Latest revision as of 10:00, 6 February 2017