Archival services: Difference between revisions
mNo edit summary |
mNo edit summary |
||
(46 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
The development of large scale archive services |
The development of large scale archive services is underway [[File:320px-Storagetek-tape_drive_hg.jpg|border|right|300px|caption]]. The archival service will be delivered through the '''[[bwDataArchiv]]''' project that provides: |
||
* An |
* An easy-to-use interface for deposit, retrieval and update of scientific data |
||
* Access to files via persistent URLs |
* Access to files via persistent URLs known as persistant identifiers (PID). |
||
* Access to collections (groups of files) can be open or available to depositors only (644 or 600) |
|||
In due time it is planned to offer additionally: |
|||
* Support for specific file types and raw data |
* Support for specific file types and raw data |
||
* Permanent storage with tools for long-term management |
* Permanent storage with tools for long-term management of data content (aka curation) |
||
[[File:320px-Storagetek-tape_drive_hg.jpg|right|100px|caption]] |
|||
[[#Terminology|Click here to jump to some definitions and terminology related to long time storage used on these pages]] |
|||
⚫ | |||
Several projects have started to make this happen. The LSDMA activities within the PoF programme "Supercompting and Big Data" delivers input from a wide range of scientific communities and defines requirements for data archival and repositories. |
|||
==Service description== |
|||
===bwDataArchiv=== |
|||
Focus of the project '''bwDataArchiv''' funded by the state of Baden-Wuerttemberg, is the installation and deployment of the High Performance Storage System (HPSS). The result is a reliable yet economical mass storage system with interfaces for users and programmed tools. First users will be [http://www.hlrs.de/ HLRS] to store expedited projects, the LSDF counterpart in Heidelberg [http://www.bioquant.uni-heidelberg.de/about_us/organization/bioquant-it/it-services/large-scale-data-facility.html link] and the [http://www.gridka.de/cgi-bin/frame.pl?seite=/welcome.html GridKa] LHC Tier1 center. |
|||
How do I use the archive service and what does it offer? |
|||
===RADAR=== |
|||
The DFG project ‘Research Data Repository’ '''RADAR''' aims at deploying and establishing an infrastructure for both scientific data archiving and scientific data publication to support various research areas in scientific data management. Within the project a first low level approach will be developed which addresses basic archive functionalities. Later on, the offer is to be extended in order to provide an archive system which is adaptable to specific research areas and includes scientific data publication services. The project will use the infrastructue developed in the '''bwDataArchiv''' project. |
|||
The UK based Digital Curation Centre has drafted a checklist to help decide where to store research data. It can be found here [[http://www.dcc.ac.uk/resources/how-guides-checklists/where-keep-research-data]] |
|||
==Service components== |
|||
At first the LSDF Archive Service offers bit stream preservation. This form of preservation is concerned with the maintenance of existing manifestations of a digital resource. Its function is to ensure the continuing integrity of, and controlled access to, the digital objects which are contained within the LSDF storage environment, including their associated metadata. It is sometimes referred to as passive preservation contrary to the “content preservation”. |
|||
We will make sure that your information is accessible only to those authorized to have access and is that it is protected throughout its lifecycle. Furthermore integrity checks will ensure that data is complete and unaltered [ISO/DIS 13008 – ISO 15489-1:2001] |
|||
⚫ | |||
Several projects have started to build reliable LSDF archive services. Research for large scale data management at KIT is contributing to the development of efficient and secure long time storage of petabytes of data. In the programme '''Supercomputing and Big Data''' of the Helmholtz Association the close collaboration with a multiple scientific communities resulted in clear requirements for future data management including the need to archive big datasets along with tools for provenance and curation. The archival projects at KIT strive to offer a reliable, dependable and above all easy to use service for scientists. |
|||
* [[bwDataArchiv]] - long time storage infrastructure for Baden-Wuerttemberg |
|||
* [[RADAR]] - archival service for the long tail of scientific data |
|||
* bwDataDiss - service for scientific data from publications. |
|||
* bwDIM - enhancement of archive services and support |
|||
==Research and development== |
==Research and development== |
||
What else? The long-term vision is a standardization of preservation services and |
What else? The long-term vision is a standardization of preservation services and interfaces. See [[hidden: programming interfaces for archives|'''here''']] for a collection of [[hidden: programming interfaces for archives|'''interfaces to tape storage''']]. The implementation of these services is not ready but many such services, or component services which can be brought together to produce the required results, already exist at the LSDF. |
||
In addition we seek to guarantee that trust in the quality of the services is quantified using '''reproducible preservation metrics'''. |
In addition we seek to guarantee that trust in the quality of the services is quantified using '''reproducible preservation metrics'''. |
||
''[[Preservation and archive software]]'': a collection of software products services methods and standards used in long time storage, preservation and archival of data. |
|||
==Follow us on Twitter :-) == |
|||
[[File:Archive_projects_small.jpg]] |
|||
Please turn back to this page for regular updates |
Please turn back to this page for regular updates |
||
== <div id="Terminology">Terminology</div> == |
|||
⚫ | |||
{| class="wikitable" |
|||
|+'''Common terms and their use in this WIKI''' |
|||
|- |
|||
|Difference between a backup and an archive |
|||
|backups are created for the express purposes of data restoration and continuity of operations in an emergency. Archives, are a means for long-term storage of scientifically or historically important data which require no immediate access. |
|||
|- |
|||
|Difference between an archive and a repository |
|||
|there is none, although one usually refers to an archive if it is institutional e.g. the 'national archive' |
|||
|} |
|||
⚫ |
Latest revision as of 12:43, 29 June 2016
The development of large scale archive services is underway
. The archival service will be delivered through the bwDataArchiv project that provides:
- An easy-to-use interface for deposit, retrieval and update of scientific data
- Access to files via persistent URLs known as persistant identifiers (PID).
- Access to collections (groups of files) can be open or available to depositors only (644 or 600)
In due time it is planned to offer additionally:
- Support for specific file types and raw data
- Permanent storage with tools for long-term management of data content (aka curation)
Service description
How do I use the archive service and what does it offer?
The UK based Digital Curation Centre has drafted a checklist to help decide where to store research data. It can be found here [[1]]
Service components
At first the LSDF Archive Service offers bit stream preservation. This form of preservation is concerned with the maintenance of existing manifestations of a digital resource. Its function is to ensure the continuing integrity of, and controlled access to, the digital objects which are contained within the LSDF storage environment, including their associated metadata. It is sometimes referred to as passive preservation contrary to the “content preservation”.
We will make sure that your information is accessible only to those authorized to have access and is that it is protected throughout its lifecycle. Furthermore integrity checks will ensure that data is complete and unaltered [ISO/DIS 13008 – ISO 15489-1:2001]
Archival Projects
Several projects have started to build reliable LSDF archive services. Research for large scale data management at KIT is contributing to the development of efficient and secure long time storage of petabytes of data. In the programme Supercomputing and Big Data of the Helmholtz Association the close collaboration with a multiple scientific communities resulted in clear requirements for future data management including the need to archive big datasets along with tools for provenance and curation. The archival projects at KIT strive to offer a reliable, dependable and above all easy to use service for scientists.
- bwDataArchiv - long time storage infrastructure for Baden-Wuerttemberg
- RADAR - archival service for the long tail of scientific data
- bwDataDiss - service for scientific data from publications.
- bwDIM - enhancement of archive services and support
Research and development
What else? The long-term vision is a standardization of preservation services and interfaces. See here for a collection of interfaces to tape storage. The implementation of these services is not ready but many such services, or component services which can be brought together to produce the required results, already exist at the LSDF.
In addition we seek to guarantee that trust in the quality of the services is quantified using reproducible preservation metrics.
Preservation and archive software: a collection of software products services methods and standards used in long time storage, preservation and archival of data.
Follow us on Twitter :-)
Please turn back to this page for regular updates
Terminology
Difference between a backup and an archive | backups are created for the express purposes of data restoration and continuity of operations in an emergency. Archives, are a means for long-term storage of scientifically or historically important data which require no immediate access. |
Difference between an archive and a repository | there is none, although one usually refers to an archive if it is institutional e.g. the 'national archive' |
(last edit 27.05.2016)