Back Midas Rome Roody Rootana
  Midas DAQ System, Page 138 of 138  Not logged in ELOG logo
ID Dateup Author Topic Subject
  2767   16 May 2024 Konstantin OlchanskiBug Reportmidas alarm borked condition evaluation
I am updating the TRIUMF IRIS experiment to the latest version of MIDAS. I see following error messages in midas.log:

19:06:16.806 2024/05/16 [mhttpd,ERROR] [odb.cxx:6967:db_get_data_index,ERROR] index (29) exceeds array length (20) for key 
"/Equipment/Beamline/Variables/Measured"

19:06:15.806 2024/05/16 [mhttpd,ERROR] [odb.cxx:6967:db_get_data_index,ERROR] index (30) exceeds array length (20) for key 
"/Equipment/Beamline/Variables/Measured"

The errors are correct, there is only 20 elements in that array. The errors are coming every few seconds, they spam midas.log. How do I fix 
them? Where do they come from? There is no additional diagnostics or information to go from.

In the worst case, they come from some custom web page that reads wrong index variables from ODB. mhttpd currently provides no diagnostics to 
find out which web page could be causing this.

But maybe it's internal to MIDAS? I save odb to odb.json, "grep Measured odb.json" yields:
iris00:~> grep Measured odb.json 
        "Condition" : "/Equipment/Beamline/Variables/Measured[29] > 1e-5",
        "Condition" : "/Equipment/Beamline/Variables/Measured[30] < 0.5",

So wrong index errors is coming from evaluated alarms.
ODB "/Alarms/Alarm system active" is set to "no" (alarm system is disabled), the errors are coming.
ODB "/Alarms/Alarms/TP4/Active" is set to "no" (specific alarm is disabled), the errors are coming.
WTF? (and this is recentish borkage, old IRIS MIDAS had the same wrong index alarms and did not generate these errors).

Breakage:
- where is the error message "evaluated alarm XXX cannot be computed because YYY cannot be read from ODB!"
- disabled alarm should not be computed
- alarm system is disabled, alarms should not be computed

K.O.

P.S. I am filing bug reports here, I cannot be bothered with the 25-factor authentication to access bitbucket.
  2768   17 May 2024 Stefan RittBug Reportmidas alarm borked condition evaluation
This is a common problem I also encountered in the past. You get a low-level ODB access error (could also be a read of a non-existing key) and you 
have no idea where this comes from. Could be the alarm system, a mhttpd web page, even some user code in a front-end over which the midas library 
has no control.

One option would be to add a complete stack dump to each of these error (ROOT does something like that), but I hear already people shouting "my 
midas.log is flooded with stack dumps!". So what I do in this case is I run a midas program in the debugger and set a breakpoint (in your case at 
odb.cxx:6967). If the breakpoint triggers, I inspect the stack and find out where this comes from.

Not that I print a stack dump for such error in the odbxx API. This goes to stdout, not the midas log, and it helped me in the past. Unfortunately 
stack dumps work only under linux (not MacOSX), and they do not contain all information a debugger can show you.

It is not true that alarm conditions are evaluated when the alarm system is off. I just tried and it works fine. The code is here:

alarm.cxx:591

      /* check global alarm flag */
      flag = TRUE;
      size = sizeof(flag);
      db_get_value(hDB, 0, "/Alarms/Alarm system active", &flag, &size, TID_BOOL, TRUE);
      if (!flag)
         return AL_SUCCESS;

so no idea why you see this error if you correctly st "Alarm system active" to false.

Stefan
  2769   17 May 2024 Konstantin OlchanskiBug Reportmidas alarm borked condition evaluation
> This is a common problem I also encountered in the past. You get a low-level ODB access error (could also be a read of a non-existing key) and you 
> have no idea where this comes from. Could be the alarm system, a mhttpd web page, even some user code in a front-end over which the midas library 
> has no control.

committed a partial fix, added an error message in alarm condition evaluation code to report alarm name and odb paths when we cannot get something from 
ODB. Now at least midas.log gives some clue that ODB errors are coming from alarms.

and the errors are actually coming from the alarms web page.

the alarms web page shows all the alarms even if alarms are disabled and it shows evaluated alarm conditions and current values even for alarms that 
are disabled.

I could change it to show "disabled" for disabled alarms, but I think it would not be an improvement,
right now it is quite convenient to see the alarm values for disabled/inactive alarms,
it is easy to see if they will immediately trip if I enable them. Hiding the values would make
them blind.

And I think I know what caused the original problem in IRIS experiment, I think the list of EPICS variables got truncated from 30 to 20 and EPICS 
values 29 and 30 used in the alarm conditions have become lost.

So the next step is to fix feepics to not truncate the list of variables (right now it is hardwired to 20 variables) and restore
the lost variable definition from a saved odb dump.

K.O.






> 
> One option would be to add a complete stack dump to each of these error (ROOT does something like that), but I hear already people shouting "my 
> midas.log is flooded with stack dumps!". So what I do in this case is I run a midas program in the debugger and set a breakpoint (in your case at 
> odb.cxx:6967). If the breakpoint triggers, I inspect the stack and find out where this comes from.
> 
> Not that I print a stack dump for such error in the odbxx API. This goes to stdout, not the midas log, and it helped me in the past. Unfortunately 
> stack dumps work only under linux (not MacOSX), and they do not contain all information a debugger can show you.
> 
> It is not true that alarm conditions are evaluated when the alarm system is off. I just tried and it works fine. The code is here:
> 
> alarm.cxx:591
> 
>       /* check global alarm flag */
>       flag = TRUE;
>       size = sizeof(flag);
>       db_get_value(hDB, 0, "/Alarms/Alarm system active", &flag, &size, TID_BOOL, TRUE);
>       if (!flag)
>          return AL_SUCCESS;
> 
> so no idea why you see this error if you correctly st "Alarm system active" to false.
> 
> Stefan
  2770   17 May 2024 Konstantin OlchanskiBug Reportodbedit load into the wrong place
Trying to restore IRIS ODB was a nasty surprise, old save files are in .odb format and odbedit "load xxx.odb" 
does an unexpected thing.

mkdir tmp
cd tmp
load odb.xml loads odb.xml into the current directory "tmp"
load odb.json same thing
load odb.odb loads into "/" unexpectedly overwriting everything in my ODB with old data

this makes it impossible for me to restore just /equipment/beamline from old .odb save file (without 
overwriting all of my odb with old data).

I look inside db_paste() and it looks like this is intentional, if ODB path names in the odb save file start 
with "/" (and they do), instead of loading into the current directory it loads into "/", overwriting existing 
data.

The fix would be to ignore the leading "/", always restore into the current directory. This will make odbedit 
load consistent between all 3 odb save file formats.

Should I apply this change?

K.O.
  2771   17 May 2024 Konstantin OlchanskiBug Reportmidas alarm borked condition evaluation
> 
> And I think I know what caused the original problem in IRIS experiment, I think the list of EPICS variables got truncated from 30 to 20 and EPICS 
> values 29 and 30 used in the alarm conditions have become lost.
> 
> So the next step is to fix feepics to not truncate the list of variables (right now it is hardwired to 20 variables) and restore
> the lost variable definition from a saved odb dump.

for the record, I restored the old ODB settings from feepics, my EPICS variables now have the correct size and the alarm now works correctly.

I also updated the example feepics to read the number of EPICS variables from ODB instead of always truncating them to 20 (IRIS MIDAS had a local change 
setting number of variables to 40).

I think I will make no more changes to the alarms, leave well enough alone.

K.O.
  Draft   18 May 2024 Stefan RittBug Reportmidas alarm borked condition evaluation
For everybody using EPICS: There is now a new system called MSetPoint (Midas Set Point) to control whole beamlines via EPICS.
It's under midas/msetpoint and the documentation is here:

   https://bitbucket.org/tmidas/midas/wiki/MSetPoint

It is basically an EPICS frontend and two custom pages. The special thing is that the EPICS elements are not hardcoded in
the frontend but come from the ODB. There is even an editor for the beamline elements (second custom page). By loading different
ODB settings, one can switch easily between completely different beamlines without having to recompile the frontend. The system
can be operated standalone (all other MIDAS pages do not appear), or as a custom page in a normal midas setup. At PSI, this
system is now used as the standard editor for our beamlines.

Attached and example screen.

Stefan
  2773   18 May 2024 Stefan RittBug Reportmidas alarm borked condition evaluation
For everybody using EPICS: There is now a new system called MSetPoint (Midas Set Point) to control whole beamlines via EPICS.
It's under midas/msetpoint and the documentation is here:

   https://bitbucket.org/tmidas/midas/wiki/MSetPoint

It is basically an EPICS frontend and two custom pages. The special thing is that the EPICS elements are not hardcoded in
the frontend but come from the ODB. There is even an editor for the beamline elements (second custom page). By loading different
ODB settings, one can switch easily between completely different beamlines without having to recompile the frontend. The system
can be operated standalone (all other MIDAS pages do not appear), or as a custom page in a normal midas setup. At PSI, this
system is now used as the standard editor for our beamlines.

Attached and example screen.

Stefan
Attachment 1: MSetPoint.png
MSetPoint.png
  2774   18 May 2024 Stefan RittBug Reportodbedit load into the wrong place
Taht's strange. I always was under the impression that .odb files are loaded relatively to the current location in 
the ODB. The behaviour should not be different for different data formats, so I agree to change the .odb loading to 
behave like the .xml and .json save/load.

Stefan
ELOG V3.1.4-2e1708b5