14 May 2015, Stefan Ritt, Suggestion, checksums for midas data files
|
> Any thoughts on this?
We use binary midas files now for ~20 years and never felt the necessity to put any checksums or even encryption on these files. The reason for that is the following: Data on
modern hard disks is already protected by CRC code or even ERC on the lower level, so it's very unlikely that single bytes change. If something happens, then it's a
corruption of the file system, so a few sectors of a file are missing or wrong. In that case a CRC won't help you much, just tells you that the files are corrupt. But you see that
also in the midas event structure. Each event has a header with the size of the event, so you can follow the file event by event. If something is missing, the next event header
is no event header but something in the middle of the date, and you recognise this immediately since the header does not make any send (date is off by many years, event ID
is arbitrary, event size is very different). So this redundancy in the midas event structure helps you to identify any corrupt files as good in my opinion as a CRC code will. I
would not want to waste a single CPU cycle on lengthy CRC or even SHA algorithms, unless I see single bytes change inside events. But in this case this can even happen at
the network level between frontend and backends. So we should add the CRC/SHA code at the frontend level. This could increase the dead time of the experiment which is
bad. And what about VME transfer? While hard disks and Ethernet networks have already built-in CRC checks, VME transfer doesn't. So how can you be sure that no bits
get corrupt between your ADC and your frontend computer?
If people insist of having CRC or SHA protection/encryption for some reason I do not understand yet, we should make this optional, so that I can turn it off, since I don't
need it.
/Stefan |
15 May 2015, Konstantin Olchanski, Suggestion, checksums for midas data files
|
> > Any thoughts on this?
>
> We use binary midas files now for ~20 years and never felt the necessity to put any checksums or even encryption on these files ...
>
"I have never seen a corrupted file, therefore nobody should ever need checksums". Well,
1) actually if you write mid.gz files, you get gzip checksums "for free" (but the checksums are not recorded anywhere, so 5 years later you cannot confirm that the file did not change).
2) I had a defective computer once where reading the same file several times yielded different data. (the defect was on the motherboard, not in the disks)
3) I am presently testing the btrfs filesystem which (like ZFS) keeps checksums for all data. For these tests I am using 3rd quality disks and I see btrfs regularly detect (and correct) "data corruption" events - where data on disk has changed.
4) there was a report from CERN(?) where they checked the checksums on a large number of data files and found a good number of corrupted files.
So bit rot does exist.
In more practical terms:
a) CRC32C is "free" to compute (hardware accelerated on latest CPUs), but does not detect malicious file modifications
b) SHA256 does detect that (but for how long?), but probably too expensive to compute (speed measurement TBD).
c) gzip compressed files have internal whole-file CRC32
d) bzip2 compressed files have internal per-block CRC32
e) lz4 compressed files have internal per-block xxhash checksums
Personally, when dealing with compressed files, I prefer to have a checksum recoded somewhere that I can check against after I decompress the file.
I think there is no need to add checksums to the MIDAS data files format itself (see c,d,e above).
K.O. |
05 Oct 2016, Lee Pool, Suggestion, checksums for midas data files
|
Hi
> On one side, such checksums help me confirm that uncompressed data contents is the same as original
> data (compression/decompression is okey).
>
> I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or
> both).
>
Just a thought on my side. I have been using a checksum, on data produced by our experiments via mlogger, the runxxxx.mid.gz, in
the same manner you proposed and I see now implemented.
I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in
run00007.mid.gz.sha256 as an example.
$ cat ~/Data/run00007.mid.gz.sha256
f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz
It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.
May I suggest that the pairing should be ,
f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee run00007.mid as an example.
As I find, this information will sit in an archive, database in my case for a long period, and it might
be confusing later on, when verification of the checksum is required. |
13 Oct 2016, Konstantin Olchanski, Suggestion, checksums for midas data files
|
Confirmed, this is a bug in mlogger. It should be creating *2* files, one with the before-compression checksum and one with the after-compression checksum. At
least both checksums are written to midas.log, so you can grep them from there. K.O.
> Hi
>
> > On one side, such checksums help me confirm that uncompressed data contents is the same as original
> > data (compression/decompression is okey).
> >
>
> > I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or
> > both).
> >
>
> Just a thought on my side. I have been using a checksum, on data produced by our experiments via mlogger, the runxxxx.mid.gz, in
> the same manner you proposed and I see now implemented.
>
> I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in
> run00007.mid.gz.sha256 as an example.
>
>
> $ cat ~/Data/run00007.mid.gz.sha256
> f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz
>
>
> It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.
>
> May I suggest that the pairing should be ,
>
> f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee run00007.mid as an example.
>
> As I find, this information will sit in an archive, database in my case for a long period, and it might
> be confusing later on, when verification of the checksum is required. |
13 Mar 2017, Konstantin Olchanski, Suggestion, checksums for midas data files
|
> Confirmed, this is a bug in mlogger. It should be creating *2* files, one with the before-compression checksum and one with the after-compression checksum. At
> least both checksums are written to midas.log, so you can grep them from there. K.O.
This should be fixed now. Thank you for nudging me.
K.O.
>
> > Hi
> >
> > > On one side, such checksums help me confirm that uncompressed data contents is the same as original
> > > data (compression/decompression is okey).
> > >
> >
> > > I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or
> > > both).
> > >
> >
> > Just a thought on my side. I have been using a checksum, on data produced by our experiments via mlogger, the runxxxx.mid.gz, in
> > the same manner you proposed and I see now implemented.
> >
> > I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in
> > run00007.mid.gz.sha256 as an example.
> >
> >
> > $ cat ~/Data/run00007.mid.gz.sha256
> > f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz
> >
> >
> > It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.
> >
> > May I suggest that the pairing should be ,
> >
> > f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee run00007.mid as an example.
> >
> > As I find, this information will sit in an archive, database in my case for a long period, and it might
> > be confusing later on, when verification of the checksum is required. |
08 Jun 2010, nicholas, Forum, check out from svn
|
do: svn co svn+ssh://svn@savannah.psi.ch/afs/psi.ch/project/meg/svn/midas/trunk
midas
shows: ssh: connect to host savannah.psi.ch port 22: Connection timed out
svn: Connection closed unexpectedly |
08 Jun 2010, nicholas, Forum, check out from svn
|
> do: svn co svn+ssh://svn@savannah.psi.ch/afs/psi.ch/project/meg/svn/midas/trunk
> midas
> shows: ssh: connect to host savannah.psi.ch port 22: Connection timed out
> svn: Connection closed unexpectedly
sorry, my side network problem.
N. |
19 Mar 2018, Andreas Suter, Suggestion, check current ODB size
|
A feature I always missed (or just missed to find in the docu) is the following:
1) It would be nice to have a command in odbedit which allows to check how full
the ODB currently is.
2) Even more important: I would like to have an ODB routine which allows me to
check the fill level of the ODB, and/or a routine which tells me if I would
create a structure of given size that it still fits in the current ODB or not.
The use case is that some clients create on the fly ODB entries and I would like
to make sure before hand the ODB's remaining space in order not to crash things
by overfilling the ODB. |
19 Mar 2018, Stefan Ritt, Suggestion, check current ODB size
|
> A feature I always missed (or just missed to find in the docu) is the following:
> 1) It would be nice to have a command in odbedit which allows to check how full
> the ODB currently is.
> 2) Even more important: I would like to have an ODB routine which allows me to
> check the fill level of the ODB, and/or a routine which tells me if I would
> create a structure of given size that it still fits in the current ODB or not.
> The use case is that some clients create on the fly ODB entries and I would like
> to make sure before hand the ODB's remaining space in order not to crash things
> by overfilling the ODB.
If you do "mem" in odbedit, you see the currently free areas, one for the keys themselves, one for
the data of the keys. The corresponding C function is db_show_mem. At the moment it outputs the
list of free blocks as a long ASCII string, but if necessary I can write a variant which returns the
number of free bytes.
Stefan |
19 Mar 2018, Andreas Suter, Suggestion, check current ODB size
|
> > A feature I always missed (or just missed to find in the docu) is the following:
> > 1) It would be nice to have a command in odbedit which allows to check how full
> > the ODB currently is.
> > 2) Even more important: I would like to have an ODB routine which allows me to
> > check the fill level of the ODB, and/or a routine which tells me if I would
> > create a structure of given size that it still fits in the current ODB or not.
> > The use case is that some clients create on the fly ODB entries and I would like
> > to make sure before hand the ODB's remaining space in order not to crash things
> > by overfilling the ODB.
>
> If you do "mem" in odbedit, you see the currently free areas, one for the keys themselves, one for
> the data of the keys. The corresponding C function is db_show_mem. At the moment it outputs the
> list of free blocks as a long ASCII string, but if necessary I can write a variant which returns the
> number of free bytes.
>
> Stefan
Thanks for the info, and yes a variant of db_show_mem returning the number of free bytes would be just
prefect! |
19 Mar 2018, Stefan Ritt, Suggestion, check current ODB size
|
> > > A feature I always missed (or just missed to find in the docu) is the following:
> > > 1) It would be nice to have a command in odbedit which allows to check how full
> > > the ODB currently is.
> > > 2) Even more important: I would like to have an ODB routine which allows me to
> > > check the fill level of the ODB, and/or a routine which tells me if I would
> > > create a structure of given size that it still fits in the current ODB or not.
> > > The use case is that some clients create on the fly ODB entries and I would like
> > > to make sure before hand the ODB's remaining space in order not to crash things
> > > by overfilling the ODB.
> >
> > If you do "mem" in odbedit, you see the currently free areas, one for the keys themselves, one for
> > the data of the keys. The corresponding C function is db_show_mem. At the moment it outputs the
> > list of free blocks as a long ASCII string, but if necessary I can write a variant which returns the
> > number of free bytes.
> >
> > Stefan
>
> Thanks for the info, and yes a variant of db_show_mem returning the number of free bytes would be just
> prefect!
I made you db_get_free_mem(HNDLE hDB, INT *key_size, INT *data_size)
The first return gets you the number of free bytes for the key area, the second one for the data area (values of keys).
Committed to develop
Stefan |
24 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
I am updating the history plots. Main changes:
- the old history display code should again be easily usable (use the "open in old history display" checkbox)
- the history plot editor has an "edit in ODB" button that takes as to the plot definition in ODB (sometimes it is
easier to editing things in the ODB editor)
- error in history plot editor that created "formula" entry of incorrect size should be fixed
- "reorder" (and "delete entry") functions in the history plot editor should work again (plus added explanation text)
- "factor" and "offset" restored in the history plot editor
- added the long desired "voffset" to simplify plot scaling and positioning
- (factor, offset and voffset do not yet work in the new history plots, TBI ASAP)
- history plot editor and generate_hist_graph() now use the same code to read plot definitions from ODB. There should
be no more confusion about content of history plot entries in ODB and what each entry is supposed to do.
These changes have been precipitated by our inability to plot high voltage voltage and current on the same plot,
see bug https://bitbucket.org/tmidas/midas/issues/308/history-plot-formula-cannot-be-used-to
Voltage is in the range 0..1000 (volts) and current is in the range 0..50 and 0..0.100, autoscaling on voltage
makes the currents invisible at the zero line. In the past, we used the "factor" setting to scale
the graphs so we can see both voltage and currents at the same time (currents scaled up by factor 25 and 600,
as example).
The new "formula" feature was supposed to replace (and improve upon) the "factor" and "offset". But if I use
the formula "x*25", suddenly the plot is telling us that current values are not 50 uA, but 1250 uA (50*25),
and this is just wrong. We do not want to scale the micro-amps, we want to better position the plot on the graph,
like the old "factor" and "offset" allowed us to do.
So the idea is to use this computation:
y_position_on_plot = offset + factor*(formula(history_value) - voffset)
- "formula" is to transform history values into physical values (i.e. pressure meter reports bars, but we want atm, or
voltmeter is reading in discrete units of 0.125V, we want to see volts)
- "factor" and "offset" is to position the graphs on the plot for best visual presentation of data
- I also added is the much desired "voffset", you only know it is needed if you have a non-zero "offset" and you need
to change the "factor", surprise, "offset" has ot be changed, too, and good luck recalculating it correctly in one
try.
The way to use this stuff:
- adjust "voffset" to bring the graph to around y=0
- increase the "factor" to zoom-in on features and stuff
- adjust "offset" to move the graph up and down relative to all the other graphs on the plot
- now one can zoom in and out as needed by changing the "factor" and the plot will stay roughly in the right place
without having to readjust the offsets.
K.O. |
24 Jun 2021, Stefan Ritt, Bug Fix, changes in history plots
|
I disagree with the proposed change to scale the HV current for a "nice" display. If values are scaled, the axis should be
scaled in the same way. Otherwise people might read the current from the plot, look at the axis, and again get the wrong
value (the factor of 25x you mention). Sure you can hover with the cursor over the graph, and see the right value, but think
of taking a screen shot, putting this into a publication, and get complaints from the reviewer.
The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
Stefan |
25 Jun 2021, Stefan Ritt, Bug Fix, changes in history plots
|
A general warning: With the recent history changes implemented in the develop branch, starting from a fresh ODB and editing
any history panel, on gets tons of errors and debug output from mhttpd:
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Log axis" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
Load from ODB History/Display/Default/Trigger rate: hist plot: 2 variables
timescale: 1h, minimum: 0.000000, maximum: 0.000000, zero_ylow: 0, log_axis: 0, show_run_markers: 1, show_values: 1,
show_fill: 1
var[0] event [System][Trigger per sec.] formula [], colour [#00AAFF] label [] factor 1.000000 offset 0.000000 voffset
0.000000 order 10
var[1] event [System][Trigger kB per sec.] formula [], colour [#FF9000] label [] factor 1.000000 offset 0.000000 voffset
0.000000 order 20
This has to be fixed by the original author. I strongly recommend to make such modifications on a separate branch not to
break running experiments.
Stefan |
25 Jun 2021, Marco Francesconi, Bug Fix, changes in history plots
|
We are using the new history formula as a quick way to convert signals from sensors to actual physical values (for example Voltage->Temperature, Voltage->relative humidity
...), so it is great that the shown voltage is the calculated one.
I would like to add a point to this discussion.
In our collaboration people attach images of history plots to elogs, meeting presentation and/or physical logbooks.
The proposed scaling formula may work fine online using the cursors, but, once an image is created, I do not understand how it is possible to extract the value for a scaled
variables.
Suppose you see a graph in a presentation with a current increase by some PSU and the current was scaled to be in the same plot of the voltage.
Looking at the delta in the image, how can you judge the current increase without any axis/grid to refer to?
So I support Stefan proposal for a secondary axis, as long as it is clear which value belong to which axis.
Maybe marking the channels in the description or using different line styles/thickness?
Best,
Marco |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> A general warning: With the recent history changes implemented in the develop branch, starting from a fresh ODB and editing
> any history panel, on gets tons of errors and debug output from mhttpd: ...
This is the reason most projects have separate development and production branches.
I recommend everybody to use the released tagged versions of midas for production.
> I strongly recommend to make such modifications on a separate branch not to
> break running experiments.
Is there something that does not work anymore? Did I break something? The debug messages I am still
tuning.
K.O.
>
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Log axis" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
> Load from ODB History/Display/Default/Trigger rate: hist plot: 2 variables
> timescale: 1h, minimum: 0.000000, maximum: 0.000000, zero_ylow: 0, log_axis: 0, show_run_markers: 1, show_values: 1,
> show_fill: 1
> var[0] event [System][Trigger per sec.] formula [], colour [#00AAFF] label [] factor 1.000000 offset 0.000000 voffset
> 0.000000 order 10
> var[1] event [System][Trigger kB per sec.] formula [], colour [#FF9000] label [] factor 1.000000 offset 0.000000 voffset
> 0.000000 order 20
>
>
>
> This has to be fixed by the original author. I strongly recommend to make such modifications on a separate branch not to
> break running experiments.
>
> Stefan |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> I disagree ...
I am happy with disagreement and differences of opinions. Zest of life, driver of progress and improvements, etc.
I am even more happy with solutions to problems. The current problem is that the offset and factor feature
of history plots has been removed without much discussion.
I stress, we have been using this feature to run experiments for the last 20 years.
I do not understand objections to it being restored. If you do not want to use it, do not use it.
K.O.
> with the proposed change to scale the HV current for a "nice" display. If values are scaled, the axis should be
> scaled in the same way. Otherwise people might read the current from the plot, look at the axis, and again get the wrong
> value (the factor of 25x you mention). Sure you can hover with the cursor over the graph, and see the right value, but think
> of taking a screen shot, putting this into a publication, and get complaints from the reviewer.
>
> The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
> new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
> the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
>
> Stefan |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> > The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
> > new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
> > the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
In the past, we have done some useful plots with maybe 10 variables plotted
at the same time with different scaling and positioning on the graph.
Having 2 vertical axis is maybe useful for the specific case of plotting high voltages,
but not in the general case.
Actually, just 2 vertical axis will not work to plot high voltages in ALPHA-g, because
we have anode currents on the scale 0..0.1 uA and cathode currents on the scale 50..60 uA.
K.O. |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
I will have to post an example of a scaled plot. I figure everybody forgot how they look like.
K.O.
> We are using the new history formula as a quick way to convert signals from sensors to actual physical values (for example Voltage->Temperature, Voltage->relative humidity
> ...), so it is great that the shown voltage is the calculated one.
>
> I would like to add a point to this discussion.
> In our collaboration people attach images of history plots to elogs, meeting presentation and/or physical logbooks.
> The proposed scaling formula may work fine online using the cursors, but, once an image is created, I do not understand how it is possible to extract the value for a scaled
> variables.
> Suppose you see a graph in a presentation with a current increase by some PSU and the current was scaled to be in the same plot of the voltage.
> Looking at the delta in the image, how can you judge the current increase without any axis/grid to refer to?
>
> So I support Stefan proposal for a secondary axis, as long as it is clear which value belong to which axis.
> Maybe marking the channels in the description or using different line styles/thickness?
>
> Best,
> Marco |
30 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> I am updating the history plots.
> So the idea is to use this computation:
> y_position_on_plot = offset + factor*(formula(history_value) - voffset)
Stefan and myself did some brain storming on zoom. Writing it down the way I remember it.
- we distilled the gist of the problem - the numerical values we show in the plot labels and in hover-over-the-graph
are before formula is applied or after the formula is applied?
- I suggested a universal solution using a double formula: use formula1 for one case;
use formula2 for the other case;
use formula1 for "physics calibration", use formula2 for factor and offset for composite plots:
numeric_value = formula1(history_value)
plotted_value = formula2(numeric_value)
- we agree that this is way too complicated, difficult to explain and difficult to coherently present in the history editor
- Stefan suggested a simple solution, a checkbox labeled "show raw value" next to each history variable. by default, the
value after the formula is plotted and displayed. if checked, the raw value (before the formula) is displayed, and the
value after the formula is plotted. (so this works the same as the factor and offset on the old history plots).
- if "show raw value" is enabled, the numerical values shown will be inconsistent against the labels on the vertical axis.
Our solution it to turn the axis labels off. (for composite plots, like oscillator frequency in Hz vs oscillator
temperature in degC, both scaled to see their correlation, the vertical axis is unit-less "arbitrary units", of course)
- to simplify migration of old history plots that use custom factor and offset settings, we think in the direction of
automatically moving them to the "formula". (factor=2, offset=10 automatically populates formula with "2*x+10", "show raw
value" checked/enabled). Thus we can avoid implementing factor and offset in the new history code (an unwelcome
complication).
- I think this covers all the use cases I have seen in the past, so we will move in this direction.
K.O. |
|