Difference between revisions of "/Alarms ODB tree"

From MidasWiki
Jump to navigation Jump to search
Line 5: Line 5:
 
* [[Online_Database|Online Database]]
 
* [[Online_Database|Online Database]]
 
</div>
 
</div>
 +
 +
 +
 +
The ODB /Alarms tree contains user and system information related to alarms.
 +
 +
When the ODB is created, two Classes of alarm are created:
 +
* Alarm
 +
** Demo ODB
 +
** Demo Periodic
 +
* Warning
 +
 +
by default, the alarm system is NOT active
 +
 +
Currently, the overall alarm is checked once every minute. Once the alarm has been triggered, the message associated with the alarm can be repeated at a different rate. The Alarms structure is split into 2 sections:
 +
 +
"Alarms" which define the condition to be tested. The user can create as many Alarms as desired, but each must be one of the four defined Alarm Types .
 +
"Classes" which define the action to be taken when the alarm occurs. Two Classes (Alarm and Warning) are defined by default. The user can add more Classes as desired.
 +
 +
The four available Alarm Types are shown in the following table. They are defined in midas.h.
 +
Above: Defined Alarm Types.
 +
Alarm Type
 +
INT value
 +
Explanation
 +
Internal alarms
 +
AT_INTERNAL
 +
1
 +
Trigger on internal (program) alarm setting through the use of the al_...() functions.
 +
Program alarms
 +
AT_PROGRAM
 +
2
 +
Triggered on condition of the state of the defined task.
 +
Evaluated
 +
AT_EVALUATED
 +
3
 +
Triggered by ODB value on given arithmetical condition.
 +
Periodic alarms
 +
AT_PERIODIC
 +
4
 +
Triggered by timeout condition defined in the alarm setting.
 +
 +
In order to make the system flexible, each alarm class may perform different actions when an alarm is given. For example, it may write a system message, write to the elog, stop the run or spawn a detached script listed in the ODB variable /Programs/Classes/Execute command. This feature is used when an Alarm triggers Email or SMS alerts .
 +
 +
Evaluated Alarm conditions
 +
 +
The alarm condition for evaluated alarms is entered into the ODB key /Alarms/Alarms/<alarm_name>/Condition where <alarm_name> is the name of the alarm. See condition key.
 +
 +
The condition may be simply a comparison between any ODB variable and a threshold parameter, e.g.
 +
 +
/Runinfo/Run number > 100
 +
or it may be an evaluated condition. One can write conditions like
 +
 +
  /Equipment/HV/Variables/Input[*] < 100
 +
or
 +
 +
  /Equipment/HV/Variables/Input[2-3] < 100
 +
to check all values from an array or a certain range. If one array element fulfills the alarm condition, the alarm is triggerrd. In addition, bit-wise alarm conditions are possible, e.g.
 +
 +
  /Equipment/Environment/Variables/Input[0] & 8
 +
The alarm is triggered if bit #2 is set in Input[0].
 +
 +
Meaning of the keys in the /Alarms ODB tree
 +
 +
Above: Meaning of keys in the ODB /Alarms tree.
 +
Keys in the ODB tree /Alarms
 +
 +
ODB Key
 +
 +
Explanation
 +
 +
Alarms
 +
 +
 +
 +
 +
DIR
 +
 +
 +
 +
Alarm system active
 +
 +
 +
 +
BOOL
 +
 +
If set to "y"the alarm system is active. Set to "n" to deactivate.
 +
 +
 +
Alarms
 +
 +
 +
 +
DIR
 +
 +
Sub-tree defining each individual alarm condition.
 +
 +
 +
 +
Demo odb
 +
 +
 +
DIR
 +
 +
Name of one of the defined alarms
 +
 +
 +
 +
 +
Active
 +
 +
BOOL
 +
 +
If set to "y" , this particular alarm is active.
 +
 +
 +
 +
 +
Triggered
 +
 +
INT
 +
 +
If non-zero, alarm is triggered. Filled by System.
 +
 +
 +
 +
 +
Type
 +
 +
INT
 +
 +
One of the listed Alarm Types
 +
 +
 +
 +
 +
Check interval
 +
 +
INT
 +
 +
Frequency in seconds that alarm condition is checked
 +
 +
 +
 +
 +
Checked last
 +
 +
DWORD
 +
 +
Written by Alarm System
 +
 +
 +
 +
 +
Time triggered first
 +
 +
STRING
 +
 +
Written by Alarm System
 +
 +
 +
 +
 +
Time triggered last
 +
 +
STRING
 +
 +
Written by Alarm System
 +
 +
 +
 +
 +
Condition
 +
 +
STRING
 +
 +
Condition on which alarm should trigger.
 +
 +
 +
 +
 +
Alarm class
 +
 +
STRING
 +
 +
Set to one of the existing Alarm classes, e.g. Alarm, Warning
 +
 +
 +
 +
 +
Alarm message
 +
 +
STRING
 +
 +
Message to be written when alarm triggers
 +
 +
 +
Classes
 +
 +
 +
 +
DIR
 +
 +
Sub-tree defining each individual action to be performed by a pre-defined and requested alarm.
 +
 +
 +
 +
Warning
 +
 +
 +
DIR
 +
 +
Name of one of the defined classes
 +
 +
 +
 +
 +
Write System Message
 +
 +
BOOL
 +
 +
If set to "y" a message will be sent to the System log when alarm is triggered.
 +
 +
 +
 +
 +
Write Elog Message
 +
 +
BOOL
 +
 +
If set to "y" a message will be written to the Elog when alarm is triggered
 +
 +
 +
 +
 +
System message interval
 +
 +
INT
 +
 +
Interval in seconds between successive system messages when alarm is triggered
 +
 +
 +
 +
 +
System message last
 +
 +
DWORD
 +
 +
Filled by System...
 +
 +
 +
 +
 +
Execute command
 +
 +
STRING
 +
 +
Command to be executed when alarm is triggered.
 +
 +
 +
 +
 +
Execute last
 +
 +
DWORD
 +
 +
 +
 +
 +
 +
Stop run
 +
 +
BOOL
 +
 +
 +
 +
 +
 +
Display BGColor
 +
 +
STRING
 +
 +
Background colour of alarm banner (mhttpd only).
 +
 +
 +
 +
 +
Display FGColor
 +
 +
STRING
 +
 +
Foreground colour of alarm banner (mhttpd only).
 +
 +
 +
 +
Examples of an /Alarms tree
 +
 +
Part of the /Alarms tree is shown below using odbedit (see also mhttpd Alarm page).
 +
 +
[local:pol:S]/>cd /alarms
 +
[local:pol:S]/Alarms>ls
 +
Alarm system active            y
 +
Alarms
 +
Classes
 +
Some of the types of alarm under the /Alarms/Alarms tree for an experiment are shown below:
 +
 +
[local:pol:S]/Alarms>ls -r -lt
 +
Key name                        Type    #Val  Size  Last Opn Mode Value
 +
---------------------------------------------------------------------------
 +
Alarms                          DIR
 +
    Alarm system active        BOOL    1    4    4h  0  RWD  y
 +
    Alarms                      DIR
 +
        Demo ODB                DIR
 +
            Active              BOOL    1    4    >99d 0  RWD  n
 +
            Triggered          INT    1    4    >99d 0  RWD  0
 +
            Type                INT    1    4    >99d 0  RWD  3
 +
            Check interval      INT    1    4    >99d 0  RWD  60
 +
            Checked last        DWORD  1    4    >99d 0  RWD  0
 +
            Time triggered firstSTRING  1    32    >99d 0  RWD
 +
            Time triggered last STRING  1    32    >99d 0  RWD
 +
            Condition          STRING  1    256  >99d 0  RWD  /Runinfo/Run number > 100
 +
            Alarm Class        STRING  1    32    >99d 0  RWD  Alarm
 +
            Alarm Message      STRING  1    80    >99d 0  RWD  Run number became too large
 +
        Demo periodic          DIR
 +
              Active              BOOL    1    4    >99d 0  RWD  n
 +
            Triggered          INT    1    4    >99d 0  RWD  0
 +
            Type                INT    1    4    >99d 0  RWD  4
 +
            Check interval      INT    1    4    >99d 0  RWD  28800
 +
            Checked last        DWORD  1    4    >99d 0  RWD  1058817867
 +
            Time triggered firstSTRING  1    32    >99d 0  RWD
 +
            Time triggered last STRING  1    32    >99d 0  RWD
 +
            Condition          STRING  1    256  >99d 0  RWD
 +
            Alarm Class        STRING  1    32    >99d 0  RWD  Warning
 +
            Alarm Message      STRING  1    80    >99d 0  RWD  Please do your shift checks
 +
        fePOL                  DIR
 +
            Active              BOOL    1    4    19s  0  RWD  y
 +
            Triggered          INT    1    4    19s  0  RWD  205
 +
            Type                INT    1    4    3s  0  RWD  2
 +
            Check interval      INT    1    4    19s  0  RWD  60
 +
            Checked last        DWORD  1    4    19s  0  RWD  1259196026
 +
            Time triggered firstSTRING  1    32    19s  0  RWD  Wed Nov 25 12:59:33 2009
 +
            Time triggered last STRING  1    32    19s  0  RWD  Wed Nov 25 16:40:26 2009
 +
            Condition          STRING  1    256  3s  0  RWD  Program not running
 +
            Alarm Class        STRING  1    32    19s  0  RWD  Caution
 +
            Alarm Message      STRING  1    80    19s  0  RWD  Program fePOL is not running
 +
        thr2 trip              DIR
 +
            Active              BOOL    1    4    3s  0  RWD  y
 +
            Triggered          INT    1    4    3s  0  RWD  0
 +
            Type                INT    1    4    3s  0  RWD  3
 +
            Check interval      INT    1    4    3s  0  RWD  15
 +
            Checked last        DWORD  1    4    3s  0  RWD  1259196042
 +
            Time triggered firstSTRING  1    32    3s  0  RWD
 +
            Time triggered last STRING  1    32    3s  0  RWD
 +
            Condition          STRING  1    256  3s  0  RWD  /Equipment/Info ODB/Variables/last failed thr test = 2
 +
            Alarm Class        STRING  1    32    3s  0  RWD  Threshold
 +
            Alarm Message      STRING  1    80    3s  0  RWD  Laser threshold check failed
 +
In the above example,
 +
 +
Demo odb and Demo periodic were created when the ODB was created.
 +
The alarm Fepol was added automatically when the user filled the alarm class field in the /Programs/fepol sub-tree.
 +
The other alarm thr2_trip was added by the user.
 +
Four Classes of alarms (Alarm, Caution, Warning and Threshold) are defined under the /Alarms/Classes tree for this experiment. Alarm and Warning were created when the ODB was created. The user added two more classes, Caution and Threshold, by copying and editing one of the existing classes. The Classes defined for the experiment are shown below:
 +
 +
  Classes                      DIR
 +
        Alarm                  DIR
 +
            Write system messageBOOL    1    4    27h  0  RWD  y
 +
            Write Elog message  BOOL    1    4    27h  0  RWD  n
 +
            System message interINT    1    4    27h  0  RWD  60
 +
            System message last DWORD  1    4    27h  0  RWD  0
 +
            Execute command    STRING  1    256  27h  0  RWD
 +
            Execute interval    INT    1    4    27h  0  RWD  0
 +
            Execute last        DWORD  1    4    27h  0  RWD  0
 +
            Stop run            BOOL    1    4    27h  0  RWD  n
 +
            Display BGColor    STRING  1    32    27h  0  RWD  red
 +
            Display FGColor    STRING  1    32    27h  0  RWD  black
 +
        Warning                DIR
 +
            Write system messageBOOL    1    4    >99d 0  RWD  y
 +
            Write Elog message  BOOL    1    4    >99d 0  RWD  n
 +
            System message interINT    1    4    >99d 0  RWD  60
 +
            System message last DWORD  1    4    >99d 0  RWD  0
 +
            Execute command    STRING  1    256  >99d 0  RWD
 +
            Execute interval    INT    1    4    >99d 0  RWD  0
 +
            Execute last        DWORD  1    4    >99d 0  RWD  0
 +
            Stop run            BOOL    1    4    >99d 0  RWD  n
 +
            Display BGColor    STRING  1    32    >99d 0  RWD  red
 +
            Display FGColor    STRING  1    32    >99d 0  RWD  black
 +
      Caution                DIR
 +
            Write system messageBOOL    1    4    19s  0  RWD  y
 +
            Write Elog message  BOOL    1    4    19s  0  RWD  n
 +
            System message interINT    1    4    19s  0  RWD  60
 +
            System message last DWORD  1    4    19s  0  RWD  1259196026
 +
            Execute command    STRING  1    256  19s  0  RWD
 +
            Execute interval    INT    1    4    19s  0  RWD  0
 +
            Execute last        DWORD  1    4    19s  0  RWD  0
 +
            Stop run            BOOL    1    4    19s  0  RWD  y
 +
            Display BGColor    STRING  1    32    19s  0  RWD  blue
 +
            Display FGColor    STRING  1    32    19s  0  RWD  red
 +
      Threshold              DIR
 +
            Write system messageBOOL    1    4    >99d 0  RWD  n
 +
            Write Elog message  BOOL    1    4    >99d 0  RWD  n
 +
            System message interINT    1    4    >99d 0  RWD  60
 +
            System message last DWORD  1    4    >99d 0  RWD  0
 +
            Execute command    STRING  1    256  >99d 0  RWD
 +
            Execute interval    INT    1    4    >99d 0  RWD  0
 +
            Execute last        DWORD  1    4    >99d 0  RWD  0
 +
            Stop run            BOOL    1    4    >99d 0  RWD  n
 +
            Display BGColor    STRING  1    32    >99d 0  RWD  yellow
 +
            Display FGColor    STRING  1    32    >99d 0  RWD  black
 +
 +
 +
Alarm triggers Email or SMS alerts
 +
 +
It is also possible to have the MIDAS alarm system send email or SMS alerts to cell phones when alarms are triggered. This can be configured by defining an ODB alarm on a critical ODB parameter, e.g.
 +
 +
/Alarms/Alarms/Liquid Level
 +
Active                  y
 +
Triggered                0 (0x0)
 +
Type                    3 (0x3)
 +
Check interval          60 (0x3C)
 +
Checked last    1227690148 (0x492D10A4)
 +
Time triggered first    (empty)
 +
Time triggered last    (empty)
 +
Condition              /Equipment/Environment/Variables/Input[0] < 10
 +
Alarm Class            Level Alarm
 +
Alarm Message          Liquid Level is only %s
 +
In this example, the alarm triggers an alarm of class "Level Alarm". This alarm class is defined as follows:
 +
 +
/Alarms/Classes/Level Alarm
 +
Write system message    y
 +
Write Elog message      n
 +
System message interval 600 (0x258)
 +
System message last    0 (0x0)
 +
Execute command        /home/midas/level_alarm '%s'
 +
Execute interval        1800 (0x708)
 +
Execute last            0 (0x0)
 +
Stop run                n
 +
Display BGColor        red
 +
Display FGColor        black
 +
The key here is to call a script "level_alarm", which can send emails. Use something like:
 +
 +
#/bin/csh
 +
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
 +
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'
 +
The second command just generates a MIDAS system message for confirmation. Most cell phones (depends on the provider) have an email address. If you send an email there, it will be translated into a SMS message.
 +
 +
The script file above can of course be more complicated. A perl script could be used that parses an address list, so other interested parties can register by adding his/her email address to that list. The script may also collects some other slow control variables (like pressure, temperature) and combine them into the SMS message.
 +
 +
For very sensitive systems, having an alarm via SMS may not be sufficient, since the alarm system could be down (e.g. computer crash, network failure). In this case 'negative alarms' can be used. For example, every 30 minutes the system may send an SMS with the current parameter values. If the expected message is not received, it may indicate that something in the MIDAS system is wrong.
 +
 +
Implementation of the MIDAS Alarm System
 +
 +
Alarms are checked inside alarm.c::al_check(). This function is called by cm_yield() every 10 seconds and by rpc_server_thread(), also every 10 seconds. For remote MIDAS clients, their al_check() issues an RPC_AL_CHECK RPC call into the MIDAS server utility mserver, where rpc_server_dispatch() calls the local al_check(). As result, all alarm checks run inside a process directly attached to the local MIDAS shared memory (inside a local client or inside an mserver process for a remote client). Each and every MIDAS client runs the alarm checks. To prevent race conditions between different MIDAS clients, access to al_check() is serialized using the ALARM semaphore. Inside al_check(), alarms are triggered using al_trigger_alarm(), which in turn calls al_trigger_class(). Inside al_trigger_class(), the alarm is recorded into an elog or into midas.log using cm_msg(MTALK).
 +
 +
Special note should be made of the ODB setting "/Alarm/Classes/xxx/System message interval", which has a surprising effect - after an alarm is recorded into system messages (using cm_msg(MTALK)), no record is made of any subsequent alarms until the time interval set by this variable elapses. With default value of 60 seconds, after one alarm, no more alarms are recorded for 60 seconds. Also, because all the alarms are checked at the same time, only the first triggered alarm will be recorded.
 +
 +
As of alarm.c rev 4683, "System message interval" is set to 0 ensures that every alarm is recorded into the MIDAS log file. (In previous revisions, this setting may still miss some alarms).
 +
 +
There are 3 types of alarms:
 +
 +
1) "program not running" alarms.
 +
 +
These alarms are enabled in ODB by setting /Programs/ppp/Alarm class. Each time al_check() runs, every program listed in /Programs is tested using "cm_exist()" and if the program is not running, the time of first failure is remembered in /Programs/ppp/First failed.
 +
 +
If the program has not been running for longer than the time set in ODB key /Programs/ppp/Check interval, an alarm is triggered (if enabled by /Programs/ppp/Alarm class and the program is restarted (if enabled by /Programs/ppp/Auto restart).
 +
 +
The "not running" condition is tested every 10 seconds (each time al_check() is called), but the frequency of "program not running" alarms can be reduced by increasing the value of /Alarms/Alarms/ppp/Check interval (default value 60 seconds). This can be useful if System message interval is set to zero.
 +
 +
2) "evaluated" alarms
 +
 +
3) "periodic" alarms
 +
 +
There is nothing surprising in these alarms. Each alarm is checked with a time period set by /Alarm/xxx/Check interval. The value of an evaluated alarm is computed using al_evaluate_condition().

Revision as of 21:43, 15 February 2014


The ODB /Alarms tree contains user and system information related to alarms.

When the ODB is created, two Classes of alarm are created:

  • Alarm
    • Demo ODB
    • Demo Periodic
  • Warning

by default, the alarm system is NOT active

Currently, the overall alarm is checked once every minute. Once the alarm has been triggered, the message associated with the alarm can be repeated at a different rate. The Alarms structure is split into 2 sections:

"Alarms" which define the condition to be tested. The user can create as many Alarms as desired, but each must be one of the four defined Alarm Types . "Classes" which define the action to be taken when the alarm occurs. Two Classes (Alarm and Warning) are defined by default. The user can add more Classes as desired.

The four available Alarm Types are shown in the following table. They are defined in midas.h. Above: Defined Alarm Types. Alarm Type INT value Explanation Internal alarms AT_INTERNAL 1 Trigger on internal (program) alarm setting through the use of the al_...() functions. Program alarms AT_PROGRAM 2 Triggered on condition of the state of the defined task. Evaluated AT_EVALUATED 3 Triggered by ODB value on given arithmetical condition. Periodic alarms AT_PERIODIC 4 Triggered by timeout condition defined in the alarm setting.

In order to make the system flexible, each alarm class may perform different actions when an alarm is given. For example, it may write a system message, write to the elog, stop the run or spawn a detached script listed in the ODB variable /Programs/Classes/Execute command. This feature is used when an Alarm triggers Email or SMS alerts .

Evaluated Alarm conditions

The alarm condition for evaluated alarms is entered into the ODB key /Alarms/Alarms/<alarm_name>/Condition where <alarm_name> is the name of the alarm. See condition key.

The condition may be simply a comparison between any ODB variable and a threshold parameter, e.g.

/Runinfo/Run number > 100

or it may be an evaluated condition. One can write conditions like

 /Equipment/HV/Variables/Input[*] < 100

or

 /Equipment/HV/Variables/Input[2-3] < 100

to check all values from an array or a certain range. If one array element fulfills the alarm condition, the alarm is triggerrd. In addition, bit-wise alarm conditions are possible, e.g.

 /Equipment/Environment/Variables/Input[0] & 8

The alarm is triggered if bit #2 is set in Input[0].

Meaning of the keys in the /Alarms ODB tree

Above: Meaning of keys in the ODB /Alarms tree. Keys in the ODB tree /Alarms

ODB Key

Explanation

Alarms



DIR


Alarm system active


BOOL

If set to "y"the alarm system is active. Set to "n" to deactivate.


Alarms


DIR

Sub-tree defining each individual alarm condition.


Demo odb


DIR

Name of one of the defined alarms



Active

BOOL

If set to "y" , this particular alarm is active.



Triggered

INT

If non-zero, alarm is triggered. Filled by System.



Type

INT

One of the listed Alarm Types



Check interval

INT

Frequency in seconds that alarm condition is checked



Checked last

DWORD

Written by Alarm System



Time triggered first

STRING

Written by Alarm System



Time triggered last

STRING

Written by Alarm System



Condition

STRING

Condition on which alarm should trigger.



Alarm class

STRING

Set to one of the existing Alarm classes, e.g. Alarm, Warning



Alarm message

STRING

Message to be written when alarm triggers


Classes


DIR

Sub-tree defining each individual action to be performed by a pre-defined and requested alarm.


Warning


DIR

Name of one of the defined classes



Write System Message

BOOL

If set to "y" a message will be sent to the System log when alarm is triggered.



Write Elog Message

BOOL

If set to "y" a message will be written to the Elog when alarm is triggered



System message interval

INT

Interval in seconds between successive system messages when alarm is triggered



System message last

DWORD

Filled by System...



Execute command

STRING

Command to be executed when alarm is triggered.



Execute last

DWORD



Stop run

BOOL



Display BGColor

STRING

Background colour of alarm banner (mhttpd only).



Display FGColor

STRING

Foreground colour of alarm banner (mhttpd only).


Examples of an /Alarms tree

Part of the /Alarms tree is shown below using odbedit (see also mhttpd Alarm page).

[local:pol:S]/>cd /alarms [local:pol:S]/Alarms>ls Alarm system active y Alarms Classes Some of the types of alarm under the /Alarms/Alarms tree for an experiment are shown below:

[local:pol:S]/Alarms>ls -r -lt Key name Type #Val Size Last Opn Mode Value


Alarms DIR

   Alarm system active         BOOL    1     4     4h   0   RWD  y
   Alarms                      DIR
       Demo ODB                DIR
           Active              BOOL    1     4     >99d 0   RWD  n
           Triggered           INT     1     4     >99d 0   RWD  0
           Type                INT     1     4     >99d 0   RWD  3
           Check interval      INT     1     4     >99d 0   RWD  60
           Checked last        DWORD   1     4     >99d 0   RWD  0
           Time triggered firstSTRING  1     32    >99d 0   RWD
           Time triggered last STRING  1     32    >99d 0   RWD
           Condition           STRING  1     256   >99d 0   RWD  /Runinfo/Run number > 100
           Alarm Class         STRING  1     32    >99d 0   RWD  Alarm
           Alarm Message       STRING  1     80    >99d 0   RWD  Run number became too large
       Demo periodic           DIR
             Active              BOOL    1     4     >99d 0   RWD  n
           Triggered           INT     1     4     >99d 0   RWD  0
           Type                INT     1     4     >99d 0   RWD  4
           Check interval      INT     1     4     >99d 0   RWD  28800
           Checked last        DWORD   1     4     >99d 0   RWD  1058817867
           Time triggered firstSTRING  1     32    >99d 0   RWD
           Time triggered last STRING  1     32    >99d 0   RWD
           Condition           STRING  1     256   >99d 0   RWD
           Alarm Class         STRING  1     32    >99d 0   RWD  Warning
           Alarm Message       STRING  1     80    >99d 0   RWD  Please do your shift checks
       fePOL                   DIR
           Active              BOOL    1     4     19s  0   RWD  y
           Triggered           INT     1     4     19s  0   RWD  205
           Type                INT     1     4     3s   0   RWD  2
           Check interval      INT     1     4     19s  0   RWD  60
           Checked last        DWORD   1     4     19s  0   RWD  1259196026
           Time triggered firstSTRING  1     32    19s  0   RWD  Wed Nov 25 12:59:33 2009
           Time triggered last STRING  1     32    19s  0   RWD  Wed Nov 25 16:40:26 2009
           Condition           STRING  1     256   3s   0   RWD  Program not running
           Alarm Class         STRING  1     32    19s  0   RWD  Caution
           Alarm Message       STRING  1     80    19s  0   RWD  Program fePOL is not running
       thr2 trip               DIR
           Active              BOOL    1     4     3s   0   RWD  y
           Triggered           INT     1     4     3s   0   RWD  0
           Type                INT     1     4     3s   0   RWD  3
           Check interval      INT     1     4     3s   0   RWD  15
           Checked last        DWORD   1     4     3s   0   RWD  1259196042
           Time triggered firstSTRING  1     32    3s   0   RWD
           Time triggered last STRING  1     32    3s   0   RWD
           Condition           STRING  1     256   3s   0   RWD  /Equipment/Info ODB/Variables/last failed thr test = 2
           Alarm Class         STRING  1     32    3s   0   RWD  Threshold
           Alarm Message       STRING  1     80    3s   0   RWD  Laser threshold check failed

In the above example,

Demo odb and Demo periodic were created when the ODB was created. The alarm Fepol was added automatically when the user filled the alarm class field in the /Programs/fepol sub-tree. The other alarm thr2_trip was added by the user. Four Classes of alarms (Alarm, Caution, Warning and Threshold) are defined under the /Alarms/Classes tree for this experiment. Alarm and Warning were created when the ODB was created. The user added two more classes, Caution and Threshold, by copying and editing one of the existing classes. The Classes defined for the experiment are shown below:

  Classes                      DIR
       Alarm                   DIR
           Write system messageBOOL    1     4     27h  0   RWD  y
           Write Elog message  BOOL    1     4     27h  0   RWD  n
           System message interINT     1     4     27h  0   RWD  60
           System message last DWORD   1     4     27h  0   RWD  0
           Execute command     STRING  1     256   27h  0   RWD
           Execute interval    INT     1     4     27h  0   RWD  0
           Execute last        DWORD   1     4     27h  0   RWD  0
           Stop run            BOOL    1     4     27h  0   RWD  n
           Display BGColor     STRING  1     32    27h  0   RWD  red
           Display FGColor     STRING  1     32    27h  0   RWD  black
       Warning                 DIR
           Write system messageBOOL    1     4     >99d 0   RWD  y
           Write Elog message  BOOL    1     4     >99d 0   RWD  n
           System message interINT     1     4     >99d 0   RWD  60
           System message last DWORD   1     4     >99d 0   RWD  0
           Execute command     STRING  1     256   >99d 0   RWD
           Execute interval    INT     1     4     >99d 0   RWD  0
           Execute last        DWORD   1     4     >99d 0   RWD  0
           Stop run            BOOL    1     4     >99d 0   RWD  n
           Display BGColor     STRING  1     32    >99d 0   RWD  red
           Display FGColor     STRING  1     32    >99d 0   RWD  black
     Caution                 DIR
           Write system messageBOOL    1     4     19s  0   RWD  y
           Write Elog message  BOOL    1     4     19s  0   RWD  n
           System message interINT     1     4     19s  0   RWD  60
           System message last DWORD   1     4     19s  0   RWD  1259196026
           Execute command     STRING  1     256   19s  0   RWD
           Execute interval    INT     1     4     19s  0   RWD  0
           Execute last        DWORD   1     4     19s  0   RWD  0
           Stop run            BOOL    1     4     19s  0   RWD  y
           Display BGColor     STRING  1     32    19s  0   RWD  blue
           Display FGColor     STRING  1     32    19s  0   RWD  red
      Threshold               DIR
           Write system messageBOOL    1     4     >99d 0   RWD  n
           Write Elog message  BOOL    1     4     >99d 0   RWD  n
           System message interINT     1     4     >99d 0   RWD  60
           System message last DWORD   1     4     >99d 0   RWD  0
           Execute command     STRING  1     256   >99d 0   RWD
           Execute interval    INT     1     4     >99d 0   RWD  0
           Execute last        DWORD   1     4     >99d 0   RWD  0
           Stop run            BOOL    1     4     >99d 0   RWD  n
           Display BGColor     STRING  1     32    >99d 0   RWD  yellow
           Display FGColor     STRING  1     32    >99d 0   RWD  black


Alarm triggers Email or SMS alerts

It is also possible to have the MIDAS alarm system send email or SMS alerts to cell phones when alarms are triggered. This can be configured by defining an ODB alarm on a critical ODB parameter, e.g.

/Alarms/Alarms/Liquid Level Active y Triggered 0 (0x0) Type 3 (0x3) Check interval 60 (0x3C) Checked last 1227690148 (0x492D10A4) Time triggered first (empty) Time triggered last (empty) Condition /Equipment/Environment/Variables/Input[0] < 10 Alarm Class Level Alarm Alarm Message Liquid Level is only %s In this example, the alarm triggers an alarm of class "Level Alarm". This alarm class is defined as follows:

/Alarms/Classes/Level Alarm Write system message y Write Elog message n System message interval 600 (0x258) System message last 0 (0x0) Execute command /home/midas/level_alarm '%s' Execute interval 1800 (0x708) Execute last 0 (0x0) Stop run n Display BGColor red Display FGColor black The key here is to call a script "level_alarm", which can send emails. Use something like:

  1. /bin/csh

echo $1 | mail -s \"Level Alarm\" your.name@domain.edu odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"' The second command just generates a MIDAS system message for confirmation. Most cell phones (depends on the provider) have an email address. If you send an email there, it will be translated into a SMS message.

The script file above can of course be more complicated. A perl script could be used that parses an address list, so other interested parties can register by adding his/her email address to that list. The script may also collects some other slow control variables (like pressure, temperature) and combine them into the SMS message.

For very sensitive systems, having an alarm via SMS may not be sufficient, since the alarm system could be down (e.g. computer crash, network failure). In this case 'negative alarms' can be used. For example, every 30 minutes the system may send an SMS with the current parameter values. If the expected message is not received, it may indicate that something in the MIDAS system is wrong.

Implementation of the MIDAS Alarm System

Alarms are checked inside alarm.c::al_check(). This function is called by cm_yield() every 10 seconds and by rpc_server_thread(), also every 10 seconds. For remote MIDAS clients, their al_check() issues an RPC_AL_CHECK RPC call into the MIDAS server utility mserver, where rpc_server_dispatch() calls the local al_check(). As result, all alarm checks run inside a process directly attached to the local MIDAS shared memory (inside a local client or inside an mserver process for a remote client). Each and every MIDAS client runs the alarm checks. To prevent race conditions between different MIDAS clients, access to al_check() is serialized using the ALARM semaphore. Inside al_check(), alarms are triggered using al_trigger_alarm(), which in turn calls al_trigger_class(). Inside al_trigger_class(), the alarm is recorded into an elog or into midas.log using cm_msg(MTALK).

Special note should be made of the ODB setting "/Alarm/Classes/xxx/System message interval", which has a surprising effect - after an alarm is recorded into system messages (using cm_msg(MTALK)), no record is made of any subsequent alarms until the time interval set by this variable elapses. With default value of 60 seconds, after one alarm, no more alarms are recorded for 60 seconds. Also, because all the alarms are checked at the same time, only the first triggered alarm will be recorded.

As of alarm.c rev 4683, "System message interval" is set to 0 ensures that every alarm is recorded into the MIDAS log file. (In previous revisions, this setting may still miss some alarms).

There are 3 types of alarms:

1) "program not running" alarms.

These alarms are enabled in ODB by setting /Programs/ppp/Alarm class. Each time al_check() runs, every program listed in /Programs is tested using "cm_exist()" and if the program is not running, the time of first failure is remembered in /Programs/ppp/First failed.

If the program has not been running for longer than the time set in ODB key /Programs/ppp/Check interval, an alarm is triggered (if enabled by /Programs/ppp/Alarm class and the program is restarted (if enabled by /Programs/ppp/Auto restart).

The "not running" condition is tested every 10 seconds (each time al_check() is called), but the frequency of "program not running" alarms can be reduced by increasing the value of /Alarms/Alarms/ppp/Check interval (default value 60 seconds). This can be useful if System message interval is set to zero.

2) "evaluated" alarms

3) "periodic" alarms

There is nothing surprising in these alarms. Each alarm is checked with a time period set by /Alarm/xxx/Check interval. The value of an evaluated alarm is computed using al_evaluate_condition().