Difference between revisions of "Alarm System"

From MidasWiki
Jump to navigation Jump to search
Line 1: Line 1:
 
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
 
<div style="column-count:4;-moz-column-count:4;-webkit-column-count:4">
* [[/Alarms ODB tree]]
 
 
* [[Midas_documentation|Midas Documentation]]
 
* [[Midas_documentation|Midas Documentation]]
 
* [[Feature_listing|Feature Listing]]
 
* [[Feature_listing|Feature Listing]]
Line 16: Line 15:
  
 
= Introduction =
 
= Introduction =
MIDAS provides an alarm system, which by default is turned off. When the alarm system is activated and an alarm condition is detected, alarms messages are sent by the system which appear as an alarm banner on the mhttpd main status page, and as a message on any windows running [[odbedit]] clients. The alarm system is flexible and can be extensively customized for each experiment
+
MIDAS provides an alarm system, which by default is turned off. When the alarm system is [[/Alarms ODB tree##Alarm system active|activated]] and an alarm condition is detected, alarm messages are sent by the system which appear as an alarm banner on the [[Status Page|mhttpd status page]], and as a message on any windows running [[odbedit]] clients. The alarm system is flexible and can be extensively customized for each experiment
using [[odbedit]] or the [[Alarm Page|mhttpd Alarm Page]]. Some of the features (such as colour) are applicable only to the [[Alarm Page]].
+
using the [[Alarms Page|mhttpd Alarms Page]] or [[odbedit]].  
  
 
The alarm system is built-in and part of the main experiment scheduler. This means no separate task is necessary to benefit from the alarm system. Its setup and activation is done through the '''[[/Alarms ODB tree]]'''. The alarm system includes several other features such as sequencing and control of the experiment. The alarm capabilities are:
 
The alarm system is built-in and part of the main experiment scheduler. This means no separate task is necessary to benefit from the alarm system. Its setup and activation is done through the '''[[/Alarms ODB tree]]'''. The alarm system includes several other features such as sequencing and control of the experiment. The alarm capabilities are:
Line 40: Line 39:
 
As of alarm.c rev 4683, "System message interval" is set to 0 ensures that every alarm is recorded into the MIDAS log file. (In previous revisions, this setting may still miss some alarms).
 
As of alarm.c rev 4683, "System message interval" is set to 0 ensures that every alarm is recorded into the MIDAS log file. (In previous revisions, this setting may still miss some alarms).
  
There are 3 main types of alarms:
+
There are four [[/Programs ODB tree#Alarm Type|types of alarm]]:
  
 
1) "program not running" alarms.
 
1) "program not running" alarms.
  
These alarms are enabled in ODB by setting /Programs/ppp/Alarm class. Each time al_check() runs, every program listed in /Programs is tested using "cm_exist()" and if the program is not running, the time of first failure is remembered in /Programs/ppp/First failed.
+
These alarms are controlled through the [[/Programs ODB tree]] rather than the [[/Alarms ODB tree]] except of course for the [[/Alarms ODB tree#Alarms class|alarm class]]. All the ODB
 +
keys mentioned in this section are in the <span style="color: purple; font-style:italic;">/Programs/<client-name></span> subtree unless otherwise specified.
  
If the program has not been running for longer than the time set in ODB key /Programs/ppp/Check interval, an alarm is triggered (if enabled by /Programs/ppp/Alarm class and the program is restarted (if enabled by /Programs/ppp/Auto restart).
+
These alarms are enabled by setting the ODB key [[/Programs ODB tree#Alarm class|Alarm class]] to a valid Alarm class specified in the [[/Alarms ODB tree]]. Each time al_check() runs, every client listed in the <span style="color: purple; font-style:italic;">/Programs</span> ODB tree is tested using "cm_exist()" and if the client is not running, the time of first failure is recorded in the ODB key [[/Programs ODB tree#First failed|First failed]].
 +
 
 +
If the client has not been running for longer than the time set in ODB key [[/Programs ODB tree#Check interval|Check interval]], an alarm is triggered (if enabled by [[/Programs ODB tree#Alarm class|Alarm class]] and the program is restarted (if enabled by [[/Programs ODB tree#Auto restart|Auto restart]]).
  
 
The "not running" condition is tested every 10 seconds (each time al_check() is called), but the frequency of "program not running" alarms can be reduced by increasing the value of  
 
The "not running" condition is tested every 10 seconds (each time al_check() is called), but the frequency of "program not running" alarms can be reduced by increasing the value of  
[[ /Alarms ODB tree#Check interval|Check interval]] (default value 60 seconds). This can be useful if System message interval is set to zero.
+
[[/Programs ODB tree#Check interval|Check interval]]
 +
(default value 60 seconds). This can be useful if [[/Alarms ODB tree#System message interval|System message interval]] in the specified alarm class subtree is set to zero.
  
2) "evaluated" alarms
+
2) "evaluated alarms" are programmed through the [[/Alarms ODB tree]]. See [[/Alarms ODB tree#Evaluated Alarm conditions|evaluated alarms]].
  
3) "periodic" alarms
+
3) "periodic alarms" are programmed through the [[/Alarms ODB tree]]. See[[/Alarms ODB tree#Periodic Alarm|"periodic" alarms]].
  
There is nothing surprising in these alarms. Each alarm is checked with a time period set by ODB key [[ /Alarms ODB tree#Check interval|Check interval]]. The value of an evaluated alarm is computed using al_evaluate_condition().
+
4) "internal" alarms are programmed through the [[/Alarms ODB tree]].
 +
 
 +
These are triggered in a program using a call to
 +
[http://ladd00.triumf.ca/~daqweb/doc/midas/doc/html/group__alfunctioncode.html#gac024cd8160dc8b9418f05a63678f6c68&nbsp al_trigger_alarm()].
 +
 
 +
There is nothing surprising in these alarms. Except "program not running" alarms (see above), each alarm is checked with a time period set by ODB key [[ /Alarms ODB tree#Check interval|Check interval]] in the [[/Alarms ODB tree]]. The value of an evaluated alarm is computed using al_evaluate_condition().
  
 
= Setting up the Alarm system =
 
= Setting up the Alarm system =
See the [[/Alarms ODB tree]] for details and [[/Alarms ODB tree#Examples|Examples]]. See also
+
See the [[/Alarms ODB tree]] for details and
[[/Alarms ODB tree#/Alarms tree structure|Alarms tree structure]].
+
[[/Alarms ODB tree#/Alarms tree structure|Alarms tree structure]]. See also [[/Alarms ODB tree#Examples|Examples]].
 +
 
 +
[[Category:Alarms]]

Revision as of 15:52, 12 August 2014


Links

Introduction

MIDAS provides an alarm system, which by default is turned off. When the alarm system is activated and an alarm condition is detected, alarm messages are sent by the system which appear as an alarm banner on the mhttpd status page, and as a message on any windows running odbedit clients. The alarm system is flexible and can be extensively customized for each experiment using the mhttpd Alarms Page or odbedit.

The alarm system is built-in and part of the main experiment scheduler. This means no separate task is necessary to benefit from the alarm system. Its setup and activation is done through the /Alarms ODB tree. The alarm system includes several other features such as sequencing and control of the experiment. The alarm capabilities are:

  • Alarm setting on any ODB variable against a threshold parameter.
  • Alarm triggered by evaluated condition
  • Selection of Alarm check frequency
  • Selection of Alarm trigger frequency
  • Customization alarm scheme; under this scheme multiple choices of alarm type can be selected
  • Selection of alarm message destination (to system message log or to elog)
  • email or SMS alerts can be sent
  • Program control on run transition


Implementation of the MIDAS Alarm System

The alarm system source code is alarm.c. Alarms are checked inside alarm.c::al_check(). This function is called by cm_yield() every 10 seconds and by rpc_server_thread(), also every 10 seconds. For remote MIDAS clients, their al_check() issues an RPC_AL_CHECK RPC call into the MIDAS server utility mserver, where rpc_server_dispatch() calls the local al_check(). As result, all alarm checks run inside a process directly attached to the local MIDAS shared memory (inside a local client or inside an mserver process for a remote client). Each and every MIDAS client runs the alarm checks. To prevent race conditions between different MIDAS clients, access to al_check() is serialized using the ALARM semaphore. Inside al_check(), alarms are triggered using al_trigger_alarm(), which in turn calls al_trigger_class(). Inside al_trigger_class(), the alarm is recorded into an elog or into midas.log using cm_msg(MTALK).

Special note should be made of the ODB setting system message interval, which has a surprising effect - after an alarm is recorded into system messages (using cm_msg(MTALK)), no record is made of any subsequent alarms until the time interval set by this variable elapses. With default value of 60 seconds, after one alarm, no more alarms are recorded for 60 seconds. Also, because all the alarms are checked at the same time, only the first triggered alarm will be recorded.

As of alarm.c rev 4683, "System message interval" is set to 0 ensures that every alarm is recorded into the MIDAS log file. (In previous revisions, this setting may still miss some alarms).

There are four types of alarm:

1) "program not running" alarms.

These alarms are controlled through the /Programs ODB tree rather than the /Alarms ODB tree except of course for the alarm class. All the ODB keys mentioned in this section are in the /Programs/<client-name> subtree unless otherwise specified.

These alarms are enabled by setting the ODB key Alarm class to a valid Alarm class specified in the /Alarms ODB tree. Each time al_check() runs, every client listed in the /Programs ODB tree is tested using "cm_exist()" and if the client is not running, the time of first failure is recorded in the ODB key First failed.

If the client has not been running for longer than the time set in ODB key Check interval, an alarm is triggered (if enabled by Alarm class and the program is restarted (if enabled by Auto restart).

The "not running" condition is tested every 10 seconds (each time al_check() is called), but the frequency of "program not running" alarms can be reduced by increasing the value of Check interval (default value 60 seconds). This can be useful if System message interval in the specified alarm class subtree is set to zero.

2) "evaluated alarms" are programmed through the /Alarms ODB tree. See evaluated alarms.

3) "periodic alarms" are programmed through the /Alarms ODB tree. See"periodic" alarms.

4) "internal" alarms are programmed through the /Alarms ODB tree.

These are triggered in a program using a call to al_trigger_alarm().

There is nothing surprising in these alarms. Except "program not running" alarms (see above), each alarm is checked with a time period set by ODB key Check interval in the /Alarms ODB tree. The value of an evaluated alarm is computed using al_evaluate_condition().

Setting up the Alarm system

See the /Alarms ODB tree for details and Alarms tree structure. See also Examples.