Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  14 May 2015, Konstantin Olchanski, Suggestion, checksums for midas data files 
    Reply  14 May 2015, Stefan Ritt, Suggestion, checksums for midas data files 
       Reply  15 May 2015, Konstantin Olchanski, Suggestion, checksums for midas data files 
    Reply  05 Oct 2016, Lee Pool, Suggestion, checksums for midas data files 
       Reply  13 Oct 2016, Konstantin Olchanski, Suggestion, checksums for midas data files 
          Reply  13 Mar 2017, Konstantin Olchanski, Suggestion, checksums for midas data files 
Message ID: 1059     Entry time: 15 May 2015     In reply to: 1058
Author: Konstantin Olchanski 
Topic: Suggestion 
Subject: checksums for midas data files 
> > Any thoughts on this?
> 
> We use binary midas files now for ~20 years and never felt the necessity to put any checksums or even encryption on these files ...
>

"I have never seen a corrupted file, therefore nobody should ever need checksums". Well,

1) actually if you write mid.gz files, you get gzip checksums "for free" (but the checksums are not recorded anywhere, so 5 years later you cannot confirm that the file did not change).
2) I had a defective computer once where reading the same file several times yielded different data. (the defect was on the motherboard, not in the disks)
3) I am presently testing the btrfs filesystem which (like ZFS) keeps checksums for all data. For these tests I am using 3rd quality disks and I see btrfs regularly detect (and correct) "data corruption" events - where data on disk has changed.
4) there was a report from CERN(?) where they checked the checksums on a large number of data files and found a good number of corrupted files.

So bit rot does exist.

In more practical terms:

a) CRC32C is "free" to compute (hardware accelerated on latest CPUs), but does not detect malicious file modifications
b) SHA256 does detect that (but for how long?), but probably too expensive to compute (speed measurement TBD).
c) gzip compressed files have internal whole-file CRC32
d) bzip2 compressed files have internal per-block CRC32
e) lz4 compressed files have internal per-block xxhash checksums

Personally, when dealing with compressed files, I prefer to have a checksum recoded somewhere that I can check against after I decompress the file.

I think there is no need to add checksums to the MIDAS data files format itself (see c,d,e above).

K.O.
ELOG V3.1.4-2e1708b5