Difference between revisions of "Archival services"

From Lsdf
m
m
 
(48 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The development of large scale archive services now underway. Within the coming years it is planned to enhance the LSDF with:
+
The development of large scale archive services is underway [[File:320px-Storagetek-tape_drive_hg.jpg|border|right|300px|caption]]. The archival service will be delivered through the '''[[bwDataArchiv]]''' project that provides:
* An Easy-to-use interface for deposit and update
+
* An easy-to-use interface for deposit, retrieval and update of scientific data
* Access to files via persistent URLs. Collections can be openly accessible or available to administrators only.
+
* Access to files via persistent URLs known as persistant identifiers (PID).
  +
* Access to collections (groups of files) can be open or available to depositors only (644 or 600)
  +
  +
In due time it is planned to offer additionally:
  +
 
* Support for specific file types and raw data
 
* Support for specific file types and raw data
* Permanent storage with tools for long-term management
+
* Permanent storage with tools for long-term management of data content (aka curation)
   
  +
[[#Terminology|Click here to jump to some definitions and terminology related to long time storage used on these pages]]
==Archival Projects==
 
Several projects have started to make this happen. The LSDMA activities within the PoF programme "Supercompting and Big Data" delivers input from a wide range of scientific communities and defines requirements for data archival and repositories.
 
   
  +
==Service description==
===bwDataArchiv===
 
Focus of the project '''bwDataArchiv''' funded by the state of Baden-Wuerttemberg, is the installation and deployment of the High Performance Storage System (HPSS). The result is a reliable yet economical mass storage system with interfaces for users and programmed tools. First users will be [http://www.hlrs.de/ HLRS] to store expedited projects, the LSDF counterpart in Heidelberg [http://www.bioquant.uni-heidelberg.de/about_us/organization/bioquant-it/it-services/large-scale-data-facility.html link] and the [http://www.gridka.de/cgi-bin/frame.pl?seite=/welcome.html GridKa] LHC Tier1 center.
 
   
  +
How do I use the archive service and what does it offer?
===RADAR===
 
  +
The DFG project ‘Research Data Repository’ '''RADAR''' aims at deploying and establishing an infrastructure for both scientific data archiving and scientific data publication to support various research areas in scientific data management. Within the project a first low level approach will be developed which addresses basic archive functionalities. Later on, the offer is to be extended in order to provide an archive system which is adaptable to specific research areas and includes scientific data publication services. The project will use the infrastructue developed in the '''bwDataArchiv''' project.
 
  +
The UK based Digital Curation Centre has drafted a checklist to help decide where to store research data. It can be found here [[http://www.dcc.ac.uk/resources/how-guides-checklists/where-keep-research-data]]
  +
  +
==Service components==
  +
  +
At first the LSDF Archive Service offers bit stream preservation. This form of preservation is concerned with the maintenance of existing manifestations of a digital resource. Its function is to ensure the continuing integrity of, and controlled access to, the digital objects which are contained within the LSDF storage environment, including their associated metadata. It is sometimes referred to as passive preservation contrary to the “content preservation”.
  +
  +
We will make sure that your information is accessible only to those authorized to have access and is that it is protected throughout its lifecycle. Furthermore integrity checks will ensure that data is complete and unaltered [ISO/DIS 13008 – ISO 15489-1:2001]
  +
  +
==Archival Projects==
  +
Several projects have started to build reliable LSDF archive services. Research for large scale data management at KIT is contributing to the development of efficient and secure long time storage of petabytes of data. In the programme '''Supercomputing and Big Data''' of the Helmholtz Association the close collaboration with a multiple scientific communities resulted in clear requirements for future data management including the need to archive big datasets along with tools for provenance and curation. The archival projects at KIT strive to offer a reliable, dependable and above all easy to use service for scientists.
  +
  +
* [[bwDataArchiv]] - long time storage infrastructure for Baden-Wuerttemberg
  +
* [[RADAR]] - archival service for the long tail of scientific data
  +
* bwDataDiss - service for scientific data from publications.
  +
* bwDIM - enhancement of archive services and support
   
 
==Research and development==
 
==Research and development==
What else? The long-term vision is a standardization of preservation services and their application programming interfaces (APIs). The implementation of these services is not ready but many such services, or component services which can be brought together to produce the required results, already exist at the LSDF.
+
What else? The long-term vision is a standardization of preservation services and interfaces. See [[hidden: programming interfaces for archives|'''here''']] for a collection of [[hidden: programming interfaces for archives|'''interfaces to tape storage''']]. The implementation of these services is not ready but many such services, or component services which can be brought together to produce the required results, already exist at the LSDF.
   
 
In addition we seek to guarantee that trust in the quality of the services is quantified using '''reproducible preservation metrics'''.
 
In addition we seek to guarantee that trust in the quality of the services is quantified using '''reproducible preservation metrics'''.
   
  +
''[[Preservation and archive software]]'': a collection of software products services methods and standards used in long time storage, preservation and archival of data.
   
  +
==Follow us on Twitter :-) ==
[[File:Archive_projects_small.jpg]]
 
   
 
Please turn back to this page for regular updates
 
Please turn back to this page for regular updates
   
  +
== <div id="Terminology">Terminology</div> ==
(last edit 19.02.2014)
 
  +
{| class="wikitable"
  +
|+'''Common terms and their use in this WIKI'''
  +
|-
  +
|Difference between a backup and an archive
  +
|backups are created for the express purposes of data restoration and continuity of operations in an emergency. Archives, are a means for long-term storage of scientifically or historically important data which require no immediate access.
  +
|-
  +
|Difference between an archive and a repository
  +
|there is none, although one usually refers to an archive if it is institutional e.g. the 'national archive'
  +
|}
  +
  +
(last edit 27.05.2016)

Latest revision as of 12:43, 29 June 2016

The development of large scale archive services is underway

caption

. The archival service will be delivered through the bwDataArchiv project that provides:

  • An easy-to-use interface for deposit, retrieval and update of scientific data
  • Access to files via persistent URLs known as persistant identifiers (PID).
  • Access to collections (groups of files) can be open or available to depositors only (644 or 600)

In due time it is planned to offer additionally:

  • Support for specific file types and raw data
  • Permanent storage with tools for long-term management of data content (aka curation)

Click here to jump to some definitions and terminology related to long time storage used on these pages

Service description

How do I use the archive service and what does it offer?

The UK based Digital Curation Centre has drafted a checklist to help decide where to store research data. It can be found here [[1]]

Service components

At first the LSDF Archive Service offers bit stream preservation. This form of preservation is concerned with the maintenance of existing manifestations of a digital resource. Its function is to ensure the continuing integrity of, and controlled access to, the digital objects which are contained within the LSDF storage environment, including their associated metadata. It is sometimes referred to as passive preservation contrary to the “content preservation”.

We will make sure that your information is accessible only to those authorized to have access and is that it is protected throughout its lifecycle. Furthermore integrity checks will ensure that data is complete and unaltered [ISO/DIS 13008 – ISO 15489-1:2001]

Archival Projects

Several projects have started to build reliable LSDF archive services. Research for large scale data management at KIT is contributing to the development of efficient and secure long time storage of petabytes of data. In the programme Supercomputing and Big Data of the Helmholtz Association the close collaboration with a multiple scientific communities resulted in clear requirements for future data management including the need to archive big datasets along with tools for provenance and curation. The archival projects at KIT strive to offer a reliable, dependable and above all easy to use service for scientists.

  • bwDataArchiv - long time storage infrastructure for Baden-Wuerttemberg
  • RADAR - archival service for the long tail of scientific data
  • bwDataDiss - service for scientific data from publications.
  • bwDIM - enhancement of archive services and support

Research and development

What else? The long-term vision is a standardization of preservation services and interfaces. See here for a collection of interfaces to tape storage. The implementation of these services is not ready but many such services, or component services which can be brought together to produce the required results, already exist at the LSDF.

In addition we seek to guarantee that trust in the quality of the services is quantified using reproducible preservation metrics.

Preservation and archive software: a collection of software products services methods and standards used in long time storage, preservation and archival of data.

Follow us on Twitter :-)

Please turn back to this page for regular updates

Terminology

Common terms and their use in this WIKI
Difference between a backup and an archive backups are created for the express purposes of data restoration and continuity of operations in an emergency. Archives, are a means for long-term storage of scientifically or historically important data which require no immediate access.
Difference between an archive and a repository there is none, although one usually refers to an archive if it is institutional e.g. the 'national archive'

(last edit 27.05.2016)