Fast fixity checking with rsync

From Lsdf
Revision as of 09:49, 13 May 2016 by Jvw (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

By default rsync determines which files differ between the sending and receiving systems by checking the modification time and size of each file. As this only requires reading file directory information, it is quick, but it will miss unusual modifications which change neither.

rsync performs a slower but comprehensive check if invoked with --checksum. This forces a full checksum comparison on every file present on both systems. Barring rare checksum collisions, this avoids the risk of missing changed files at the cost reading of every file present on both systems. Reading files to compute checksums is even more expensive is the files reside on a tape system because recalling every file involves loading and reading mechanical tapes.

In this task you will adapt the rsync daemon to read and return the checksum entries that are stored in the HPSS tape system at SCC. HPSS computes the checksums of files on a regular basis and stores them in them as meta data along the files. An new option will instruct rsyncd to leave the file on tape and read and return the checksum stored in its database. Thereby vastly improving checking massive numbers of files.

You will need C programming experience on Linux.

Contact: Jos van Wezel <jos.vanwezel@kit.edu>