Latest revision as of 16:56, 14 September 2016

Description

At SCC, we operate a large amount of computing resources that we offer to KIT but also to other research facilities in Europe. One important aspect in operating such platforms is to monitor the resources permanently to determine broken services as soon as possible.

Tasks

In this research project, we want to setup a monitoring system for the batch systems (HTCondor [0] & Hadoop [1]) including the development of plugins to obtain the data, store them in a database, and visualize them finally in a dashboard.

We make use of collectd [2] to collect the data on the monitored system, send then to logstash for pre-processing, store them in an appropriate database (e.g. Elasticsearch [3], InfluxDB [4] or Graphite [5]), and visualize them in Grafana [6]. You need to develop a collectd plugin to collect the data from batch system, setup the database, and create a Grafana dashboard to show your results.

After implementing your approach, you need to evaluate and write a documentation (including theoretical aspects and your approach) about it.

After the project has finished, you also have to give a presentation about your achievements.

Requirements

familiarity with Python and/or C/C++
deeper understanding of the Linux operating system

References

[0] http://research.cs.wisc.edu/htcondor

[1] http://http://hadoop.apache.org

[2] http://collectd.org

[3] http://elastic.co

[4] https://www.influxdata.com

[5] https://graphiteapp.org/

[6] http://grafana.org

Contact

Christoph.Koenig@kit.edu

@@ Line 6: / Line 6: @@
 finally in a dashboard.
-We make use of collectd [1] to collect the data on the monitored system, send then to logstash for pre-processing, store them in Elasticsearch [3], and visualize them in Grafana [4]. You need to develop a collectd plugin to collect the data from batch system, setup the logstash and Elasticsearch stack, and create a Grafana dashboard to show your results.
+We make use of collectd [2] to collect the data on the monitored system, send then to logstash for pre-processing, store them in an appropriate database (e.g. Elasticsearch [3], InfluxDB [4] or Graphite [5]), and visualize them in Grafana [6]. You need to develop a collectd plugin to collect the data from batch system, setup the database, and create a Grafana dashboard to show your results.
 After implementing your approach, you need to evaluate and write a documentation (including theoretical aspects and your approach) about it.
@@ Line 19: / Line 19: @@
 = References =
-: [0] [http://research.cs.wisc.edu/htcondor/ http://research.cs.wisc.edu/htcondor/]
+: [0] [http://research.cs.wisc.edu/htcondor/ http://research.cs.wisc.edu/htcondor]
-: [1] [http://http://hadoop.apache.org http://http://hadoop.apache.org/]
+: [1] [http://http://hadoop.apache.org http://http://hadoop.apache.org]
 : [2] [http://collectd.org http://collectd.org]
 : [3] [http://elastic.co http://elastic.co]
-: [4] [http://grafana.org http://grafana.org]
+: [4] [https://www.influxdata.com/ https://www.influxdata.com]
+: [5] [https://graphiteapp.org/ https://graphiteapp.org/]
+: [6] [http://grafana.org http://grafana.org]

Difference between revisions of "Monitoring Scientific Computing Platforms"

Latest revision as of 16:56, 14 September 2016

Contents

Description

Tasks

Requirements

References

Contact

Navigation menu

Views

Personal tools

Navigation

Search

Tools