Performance Optimization of the dCache Storage System

From Lsdf
Revision as of 11:24, 9 October 2018 by Nico.schlitter (talk | contribs)

Introduction

dCache is a distributed management software for governing huge amounts of data without a hard limit, easily reaching the petabyte range. Besides disk storage, it also has the ability to incorporate additional "tertiary storage systems", like magnetic tape libraries, as supplementary, cheaper storage extensions. Whenever a file is copied from disk to such a tertiary storage backend, dCache relies on user-provided, third party executables - scripts or binaries - to perform all necessary steps. The result of that process is a URI, which dCache needs to stage the specific file back to disk on demand, ideally by means of the very same executable.

Task

With the current implementation, we face problems about Kernel limitations and memory starvation, which are not fully understood. The task you will take on is to evaluate different solutions and implement the one you consider most appropriate.

Requirements

dCache is developed in Java, hence, some familiarity would be highly beneficial. Because dCache and the frontend of the tape backend are hosted on different nodes, communication has to travel through network. So whatever facilities are employed, knowledge regarding TCP/IP capabilities of the selected tools is of vital importance. Lastly, we're talking about storage solutions, which are always based on POSIX standards (so far). Depending on your ideas, you might need to work in accordance with those standards.

Contact

samuel.perez@kit.edu