About the service
Who are the designated users?
- Employees of KIT, universities and institutions in Baden-Wuerttemberg
What's with the bw in the name bwDataArchiv
- bw Stands for Baden-Wuerttemberg, the state that KIT is located in. The initial investments and the develoment of the service were funded by the Ministery of science, research and arts of Baden-Wuerttemberg. The name bwDataArchiv should be self explanatory except maybe for the missing 'e' in Archiv which is the German word for archive. The english name of the bwDataArchiv brand is RDA, the Research Data Archive (with e :-))
Wissenschaft, Forschung und Kunst Multiple service models
- The service offers three service models. Your home organisation can apply for a contract with bwDataArchiv. Its up to your home organisation to decide who has access and is allowed to store data. Universities in Baden-Wuerttemberg typically have an IDP which forwards the entitlement for users entitled to use the archive (idp-service-model). Organisations outside the BWIDM federation designate an administrator who can invite users to register for the service using an advanced administration portal (admin-service-model). A third operational model is the sa-service-model, in which a single service account (sa) from an organisation or project is used to allow an application to store data to the archive. This is used i.e. by the RADAR and bwDataDiss projects
What storage technologies do you use?
What ist HPSS?
- HPSS is a data management application that is being developed at several computer centres that require long term storage for large amounts of data. See here for more detailed information: HPSS web site
How is the data secured?
I have a suggestion for improvement. What is the award if my idea gets implemented?
How many copies of the data are made and where are they stored?
How long will the data remain in the archive?
How can I make sure my data did not change. Do you support checksums?
- We store a MD5 checksum for every file. When the file is read the checksum will be build again and compared with the stored checksum. If there is no match the file will not be delivered to the user. For detailed information s. https://www.rda.kit.edu/img/FAQ-bwDataArchiv%20Data%20Protection%20%20-%20V2.pdf
Also at a more basic level on disk and on tape the data is protected with checksums.
Data to and from the research data archive is encrypted with 2048 RSA keys. This ensures secure communication and protects against eavesdropping. Additionally the encryption guarantees that the data arriving at the archive is the same as the data send to the archive. The encrypted channel functions as a 2048 bit error detection system. Before the first communication between hosts a key exchange is need. The SSH initial process generates a fingerprint which the client can use to verify the authenticity of the server. See the table of current fingerprints finger of the set of public hosts.
I have a question. Who do I ask?
- Support and help https://www.rda.kit.edu/english/65.php
I did everything right. Still my client cannot access the archive. What could be wrong?
- Please contact bwDataArchiv per E-Mail or, if you are a User from BW alternative via Baden-Württemberg Support Portal https://bw-support.scc.kit.edu/. Describe your problems and what you have done and add for example some screenshots.
Registration and access
Where do I register for the service?
I have registered but still cannot access the service. What is wrong?
I lost my password
Why do I need a different password for the archive. Cant I use the one I use at my home - institution ====
Do you have recommendations regarding the size of the data
- The objects you store should be as large as possible. But, "what is large?", you may ask. You have to remember that your data is stored on tape. Access to data takes time for locating, winding and positioning. Although current tape drives can read at speeds over 300 MB/s, when the files become smaller the positioning time takes the overhand. So if you can easily construct files of several GigaBytes please do but remember you have to download the ZIP before you can pick out one of the internal files. The system accepts files up to 600 GB but upload and download of 600 GB takes a considerate amount of time. On the other hand, we aggregrate smaller files in large compounds as not to stress our drives too much. The bottom line is that if you are not generating large (multiple GB) files routinely, forget about the size and store the files without ZIPping.
Do many small files take up more space than a single large file?
- Yes on disk not on tape. The archive system (HPSS) caches files on disk before sending them off to tape. To keep up speed during IO, disks have fixed size allocations. Actually the file system software that manages the disk space determines the allocations. Since there is no file system on tape, the space used for one large file is the same as for 10 files each of one tenth of the size of the large file. The answer to the question is therefore: after data has been migrated to tape small files do not take up more space. (I'm not taking into account that each file on tape has a small header (a few bytes). Therefore many very small files still take up more tape space then the equivalent content of a large file. HPSS aggregrates small files into larger objects and therefore the question is academic.)
I want to routinely create and validate checksums of large amounts of files
- This tool may be of help Hash build and check
What protocols do you support for uploading and downloading data
Accessing my data takes a long time. Why?
- Long response time maybe due to several reasons:
- Retrieval of lots of small files takes longer than of a few large files.
- Something is broken (but we are fixing it).
I deleted [a file, some files, a directory, my files]. Can I recover the lost data?