I am adding LZ4 and LZO compression the mlogger and as part of this work, I would like to add
computation of checksums for the midas files.
On one side, such checksums help me confirm that uncompressed data contents is the same as original
data (compression/decompression is okey).
On the other side, such checksums can confirm to the end user that today's contents of the midas file is
the same as originally written by mlogger (maybe years ago) - there was no bit rot, no file corruption, no
accidental or intentional modification of contents.
There are several choices of checksums available:
crc32 - as implemented by zlib (already written inside mid.gz files)
crc32c - improved and hardware accelerated version of CRC32 (http://tools.ietf.org/html/rfc3309)
md5 - cryptographically strong checksum, but obsolete
sha1 - same, also obsolete
sha256 - currently considered to be cryptographically strong
Of these checksums, only sha256 (sha512, etc) are presently considered to be cryptographically strong,
meaning that they can detect intentional file modifications. As opposed to (for example) crc32 where
it is easy to construct 2 files with different contents but the same checksum. Both md5 and sha1 are
presently considered to be similarly cryptographically broken. But all of them are still usable
as checksums - as they will detect non-intentional data modifications (bit rot, etc) with
very high probability.
(Of course the strongest checksum is also the most expensive to compute).
I will probably implement crc32 (already in zlib), crc32c (easy to find hardware-accelerated
implementations) and sha256 (cryptographically strong).
I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or
both).
Any thoughts on this?
K.O. |