Back Midas Rome Roody Rootana
  Midas DAQ System, Page 111 of 137  Not logged in ELOG logo
IDdown Date Author Topic Subject
  534   27 Nov 2008 Konstantin OlchanskiInfolazylogger updated
lazylogger was updated to improve handling of the list of runs still on disk
(odb /Lazy/xxx/List).

Previously, each and every run was listed in the List arrays. With modern
Terabyte-sized data disks, many many days worth of runs tend to remain on disk
and these List arrays were getting too big, inflating the size of ODB dumps
written by mlogger into the output data file and slowing down starting and
stopping of runs considerably.

Now, the runs are listed as ranges of "first run" - "last run", (see example below).

This significantly reduces the size of the "List" arrays and makes lazylogger
usable for the ALPHA experiment at CERN and for T2K/ND280 prototype DAQ at
TRIUMF (writing to Castor and Dcache respectively, using the newly added
"Script" method).

The new List format is fully compatible with the old format and you can update
and run the new lazylogger without changing anything in ODB. New runs will be
added to the List arrays in the new format and data in the old format will
eventually go away as old runs are removed from disk.

svn revision 4394.
K.O.

Example: this reads like this:
range from 7100 to 7154
range from 7157 to 7161 (7155-7156 are missing)
range from 7163 to 7168 (7162 is missing)
runs 7170, 7173, 7176
range from 7179 to 7182
and so forth.

ODB /Lazy/Dcache/List
007100
[0] 7100 (0x1BBC)
[1] -7154 (0xFFFFE40E)
[2] 7157 (0x1BF5)
[3] -7161 (0xFFFFE407)
[4] 7163 (0x1BFB)
[5] -7168 (0xFFFFE400)
[6] 7170 (0x1C02)
[7] 7173 (0x1C05)
[8] 7176 (0x1C08)
[9] 7179 (0x1C0B)
[10] -7182 (0xFFFFE3F2)
[11] 7184 (0x1C10)
[12] 7188 (0x1C14)
[13] -7199 (0xFFFFE3E1)
007200
[0] 7200 (0x1C20)
[1] -7225 (0xFFFFE3C7)
  533   27 Nov 2008 Konstantin OlchanskiBug Reportlazylogger complains about zero-size files
I now have a better understanding of this: lazylogger uses ss_file_size() to find
out if a file exists or not. This function used to return 0 (probably) for
non-existant files (there was no check for error status from stat() system call,
so the return value for non-existant files was never well defined).

With ss_file_size() returning 0 for nonexistant files, 0-size files clearly cause
problems to lazylogger.

Now, since svn revision 4397, ss_file_size() returns -1 for non-existant files,
but lazylogger still needs to be tought about this.

The problem "lazylogger does not like 0-size files" remains for now.

K.O.


> With latest midas, I see this:
> 
> Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> 
> The file run17567.ybs has size zero:
> 
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:29
> /twist/data_onl/current/run17565.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:45
> /twist/data_onl/current/run17566.ybs
> -rw-r--r--    1 twistonl users           0 Oct 13 20:00
> /twist/data_onl/current/run17567.ybs
> -rw-r--r--    1 twistonl users      983040 Oct 13 20:03
> /twist/data_onl/current/run17568.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 20:26
> /twist/data_onl/current/run17569.ybs
> 
> I am not sure how to fix this lazylogger logic. Please help.
> 
> K.O.
  532   27 Nov 2008 Konstantin OlchanskiBug FixFix ss_file_size() on 32-bit Linux
It turns out that on 32-bit Linux, ss_file_size() returns the wrong answer for
files bigger than 2 GB (4GB?). The Linux stat() system call returns an error
(which is ignored) and bogus file size data (returned to the caller).

On 64-bit Linux (compiled with -m64), stat() appears to return correct data.

Related functions ss_disk_size() and ss_disk_free() return correct answers on
both 32-bit and 64-bit Linux (biggest disk I tried was 5.5 TB).

I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
the system call returns an error. I also added a test program
utils/test_ss_file_size.c.

svn revision 4397.
K.O.
  531   26 Nov 2008 Stefan RittInfoSend email alert in alarm system
> We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
> sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
> laboratory environment or the liquid level reached some critical levels. Can 
> MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
> triggered? If yes, how can I config it?

Sure that's possible, that's why MIDAS contains an alarm system. To use it, define 
an ODB alarm on your liquid level, like

/Alarms/Alarms/Liquid Level
Active	                 y
Triggered	         0 (0x0)
Type	                 3 (0x3)
Check interval	        60 (0x3C)
Checked last	1227690148 (0x492D10A4)
Time triggered first	(empty)
Time triggered last	(empty)
Condition	        /Equipment/Environment/Variables/Input[0] < 10
Alarm Class	        Level Alarm
Alarm Message	        Liquid Level is only %s

The Condition if course might be different in your case, just select the correct 
variable from your equipment. In this case, the alarm triggers an alarm of class 
"Level Alarm". Now you define this alarm class:

/Alarms/Classes/Level Alarm
Write system message	y
Write Elog message	n
System message interval	600 (0x258)
System message last	0 (0x0)
Execute command	        /home/midas/level_alarm '%s'
Execute interval	1800 (0x708)
Execute last	        0 (0x0)
Stop run	        n
Display BGColor	        red
Display FGColor	        black

The key here is to call a script "level_alarm", which can send emails. Use 
something like:

#/bin/csh
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'

The second command just generates a midas system message for confirmation. Most 
cell phones (depends on the provider) have an email address. If you send an email 
there, it gets translated into a SMS message.

The script file above can of course be more complicated. We use a perl script 
which parses an address list, so everyone can register by adding his/her email 
address to that list. The script collects also some other slow control variables 
(like pressure, temperature) and combines this into the SMS message.

For very sensitive systems, having an alarm via SMS is not everything, since the 
alarm system could be down (computer crash or whatever). In this case we use 
'negative alarms' or however you might call it. The system sends every 30 minutes 
an SMS with the current levels etc. If the SMS is missing for some time, it might 
be an indication that something in the midas system is wrong and one can go there 
and investigate.
  530   26 Nov 2008 Jimmy NgaiInfoSend email alert in alarm system
Dear All,

We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
laboratory environment or the liquid level reached some critical levels. Can 
MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
triggered? If yes, how can I config it?

Many thanks!

Best Regards,
Jimmy
  529   20 Nov 2008 Stefan RittInfoRecommended platform for running MIDAS
> Dear All,
> 
> Is there any recommended platforms for running MIDAS? Have anyone encountered 
> problems when running MIDAS on Scientific Linux?
> 
> Thanks.
> 
> Jimmy

I run MIDAS on scientific Linux 5.1 without any problem.
  528   20 Nov 2008 Jimmy NgaiInfoRecommended platform for running MIDAS
Dear All,

Is there any recommended platforms for running MIDAS? Have anyone encountered 
problems when running MIDAS on Scientific Linux?

Thanks.

Jimmy
  527   09 Nov 2008 Stefan RittBug Reportcustom web pages: customscript buttons and start/stop buttons generate errors
> Thanks Stefan. 
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
> 
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).

That has been fixed in rev. 4377
  526   09 Nov 2008 Stefan RittSuggestion<odb ... edit=1> buttons and javascript
> When writing custom webpages, it would be nice to be able to write code such as
> 
> <odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset (ms)" edit=1>
> 
> from Javascript, e.g.
> <script  type="text/javascript">
> if ( flag != 3)
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset
> (ms)" edit=1>ms');
> else
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans4/time offset
> (ms)" edit=1>ms');
> </script>
> 
> This is not translated correctly by mhttpd; the final quote and bracket get
> stripped off, and it gives Javascript error
> 
>  Error: unterminated string literal
> Source File: http://titan04:8089/CS/ppg_cycle?cmd=Edit&index=11
> Line: 477, Column: 18
> Source Code:
>    document.write('<input type=text size=10 maxlength=80 name=value value="1">
> 
> I can get round this by using an input box and a combination of ODBGet and
> ODBSet, but it would be easier if the edit=1 form above worked correctly, or
> there was a command like ODBSet that would accept input from the user.
> 
> Thanks.
> 
>  would be nice is there was a command such as ODBGet or ODBSet that would work
> with javascript to 

Actually that won't work, even if I would fix it. The <odb> tag is evaluated on the
server side (mhttpd), where is gets replaced by the actual ODB value. But if you
use JavaScript to generate the <odb> tag dynamically, this only happens on the
client side, so the server has no chance to substitute them. So you have to go with
ODBGet's I'm afraid. Nevertheless, I changed the code such that any ODB tags inside
a JavaScript is not interpreted by mhttpd.
  525   09 Nov 2008 Stefan RittBug Reportbool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string
> Not sure if this is a bug or a feature:
> 
> Writing a boolean label on an image seems to produce rather strange behaviour.
> 
> For example,
> odb>ls /Equipment/gas/settings/my_bool -lt
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> my_bool                         BOOL    1     4     14m  0   RWD  y
> 
> 
> odb>cd /custom/images/my_image.gif/labels
> odb>ls
> Src                             /Equipment/gas/settings/my_bool
> Format                          val: %d (bool)
> Font                            Medium
> X                               10
> Y                               10
> Align                           0
> FGColor                         FFFFFF
> BGColor                         FF8800
> 
> Instead of the expected string "val: y (bool)", only the value of the key
> appears, i.e. "y". 
> The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 

That has been fixed in rev. 4379
  524   06 Nov 2008 Konstantin OlchanskiInfomidas elog outage
Around Wednesday Noon, there was a power outage at triumf (loss of ups power in the triumf 
computing center) and after rebooting ladd00, https/ssl access stopped working with a complaint 
about mismatching server name and ssl certificate name. This configuration used to work, so one of the 
system updated must have broke it. This problem is now fixed and access to midas elog is restored. 
K.O.
  523   04 Nov 2008 Suannah DavielBug Reportbool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string
Not sure if this is a bug or a feature:

Writing a boolean label on an image seems to produce rather strange behaviour.

For example,
odb>ls /Equipment/gas/settings/my_bool -lt
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
my_bool                         BOOL    1     4     14m  0   RWD  y


odb>cd /custom/images/my_image.gif/labels
odb>ls
Src                             /Equipment/gas/settings/my_bool
Format                          val: %d (bool)
Font                            Medium
X                               10
Y                               10
Align                           0
FGColor                         FFFFFF
BGColor                         FF8800

Instead of the expected string "val: y (bool)", only the value of the key
appears, i.e. "y". 
The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 
  522   04 Nov 2008 Suzannah DavielSuggestion<odb ... edit=1> buttons and javascript
When writing custom webpages, it would be nice to be able to write code such as

<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset (ms)" edit=1>

from Javascript, e.g.
<script  type="text/javascript">
if ( flag != 3)
   document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset
(ms)" edit=1>ms');
else
   document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans4/time offset
(ms)" edit=1>ms');
</script>

This is not translated correctly by mhttpd; the final quote and bracket get
stripped off, and it gives Javascript error

 Error: unterminated string literal
Source File: http://titan04:8089/CS/ppg_cycle?cmd=Edit&index=11
Line: 477, Column: 18
Source Code:
   document.write('<input type=text size=10 maxlength=80 name=value value="1">

I can get round this by using an input box and a combination of ODBGet and
ODBSet, but it would be easier if the edit=1 form above worked correctly, or
there was a command like ODBSet that would accept input from the user.

Thanks.

 would be nice is there was a command such as ODBGet or ODBSet that would work
with javascript to 
  521   04 Nov 2008 Suannah DavielBug Reportcustom web pages: customscript buttons and start/stop buttons generate errors
Thanks Stefan. 
Your fix works nicely with the start/stop buttons not returning to the same or to a
different web page.

However, it does not seem to have fixed the problem with the Customscript button. It does
not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
appear when the run starts).


> To fix this problem, do the following:
> 
> - Update to the current SVN revision 4368 of mhttpd.c
> - Add following tag into your custom page:
> 
>   <input type=hidden name="redir" value="name">
> 
>   where "name" is the name of your custom page which follows the CS/ in the URL. Like 
> if you have a custom page which you access through httpd://localhost/CS/junk then the 
> tag would be 
> 
>   <input type=hidden name="redir" value="junk">
> 
> The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
> custom page. You can also define another custom page as the target, if that makes 
> sense in your application.
> 
> Pierre: Would be nice to document this somewhere more officially.
  520   29 Oct 2008 Stefan RittBug Reportcustom web pages: customscript buttons and start/stop buttons generate errors
To fix this problem, do the following:

- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:

  <input type=hidden name="redir" value="name">

  where "name" is the name of your custom page which follows the CS/ in the URL. Like 
if you have a custom page which you access through httpd://localhost/CS/junk then the 
tag would be 

  <input type=hidden name="redir" value="junk">

The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
custom page. You can also define another custom page as the target, if that makes 
sense in your application.

Pierre: Would be nice to document this somewhere more officially.
  519   28 Oct 2008 Stefan RittInfomscb timeouts and retries
> A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
> applications:
> 
> +   int EXPRT mscb_get_max_retry();
> +   int EXPRT mscb_set_max_retry(int max_retry);
> +   int EXPRT mscb_get_usb_timeout();
> +   int EXPRT mscb_set_usb_timeout(int timeout);
> +   int EXPRT mscb_get_eth_max_retry();
> +   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

In the spirit of this, a variable retry scheme has been implemented in the mscbdev.c device driver. At the 
MEG experiment, we have one mscb device which is pretty slow, while the others are fast. Therefore it is 
necessary to have a per-device max retry count which can be different for different submasters. I moved 
therefore the max_eth_retry variable into the mscb_fd structure and adjusted a few functions accordingly. I 
did not bother with the other timeouts and retries, since I don't need this for the moment, but it would be 
nice if they would be handled in the same way. Then I added code into mscbdev.c to read the retry variable 
form the ODB under /Equipment/<name>/Settings/Device/<Name>/Retries. The default is 10, but it can be 
changed and becomes valid after the program has been restarted. 
  518   28 Oct 2008 Stefan RittBug Reportstrange output from "odbedit cleanup"
> When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
> output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
> it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.

The db_get_open_records() call was by mistake there, I removed it. What remains is that the notification 
message if a client is removed from the ODB goes through the system messages. When running locally, odbedit 
echoes it's own messages, but when running remotely, this is not the case. So the messages can be seen by 
everybody else (plus it ends up in the message file), but not by the remote odbedit where the cleanup is 
started. The quick fix for that is to say "old" in odbedit which shows the last few lines of the message 
file, so one can see any successful cleanup. 
  517   28 Oct 2008 Stefan RittBug ReportInconsistent handling of odb and evet buffer timeouts
> In midas.c there are several places where client last activity time stamps are checked against the 
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
> 
> Most checks are done like this:
> 
> now = ss_millitime();
> if (client->watchdog > 0      <----- check that the watchdog is enabled
>     && now > client->last_activity    <---- check for crazy time stamps from the future
>     && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
>         remove_client(client);
> 
> But in a few places, the extra checks are missing:
> 
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
>         remove_client(client);
> 
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
> 
> K.O.

This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog 
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a 
breakpoint and sit there for a while, you might be declared dead and the application is removed from the 
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This 
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying 
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in 
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed 
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a 
"cleanup" inside odbedit will even remove clients having the timeout set to zero. 

Now there might be more clever ways to accomplish that, but that's how it is implemented right now.
  516   23 Oct 2008 Konstantin OlchanskiBug Reportbm_wait_for_free_space never sleeps inside the mserver
When mserver receives events from remote client, writes them into a data buffer and this data buffer 
becomes 100% full, we see mserver go into 100% consumption.

It turns out this happens because bm_wait_for_free_space() never sleeps, instead, it busy-loops waiting 
for free space. bm_wait_for_free_space() does call ss_suspend(), but ss_suspend() does not sleep 
because there is pending data in the event network connection and it want to process it.

Best solution I have is to use silly "if (ss_suspend()!=SS_TIMEOUT) sleep(1);"

Also read this explanation: (bm_cleanup is needed to detect that the client holding the buffer at 100% 
full (a stuck or dead GET_ALL reader, mevb in our case), has been killed off and we can continue as 
usual)

       /* signal other clients wait mode */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = requested_space;
 
+      bm_cleanup("bm_wait_for_free_space", ss_millitime(), FALSE);
+
       status = ss_suspend(1000, MSG_BM);
 
+      /* make sure we do sleep in this loop:
+       * if we are the mserver receiving data on the event
+       * socket and the data buffer is full, ss_suspend() will
+       * never sleep: it will detect data on the event channel,
+       * call rpc_server_receive() (recursively, we already *are* in
+       * rpc_server_receive()) and return without sleeping. Result
+       * is a busy loop waiting for free space in data buffer */
+      if (status != SS_TIMEOUT)
+         sleep(1);
+
       /* validate client index: we could have been removed from the buffer */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = 0;

K.O.
  515   23 Oct 2008 Konstantin OlchanskiBug Reportstrange output from "odbedit cleanup"
When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.
ELOG V3.1.4-2e1708b5