Back Midas Rome Roody Rootana
  Midas DAQ System, Page 95 of 143  Not logged in ELOG logo
    Reply  11 Oct 2008, Stefan Ritt, Bug Report, mhttpd "messages" broken 
> mhttpd "messages" page stopped working after svn revision 4327 because of uninitialized variable 
> "filename2" in midas.c:cm_message_retrieve(). Attached patch fixes the problem for me.
> K.O.
> 
> 
> --- src/midas.c (revision 4342)
> +++ src/midas.c (working copy)
> @@ -978,6 +978,8 @@
>        size = sizeof(filename);
>        db_get_value(hDB, 0, "/Logger/Message file", filename, &size, TID_STRING, TRUE);
>  
> +      strlcpy(filename2, filename, sizeof(filename2));
> +
>        if (strchr(filename, '%')) {
>           /* replace strings such as midas_%y%m%d.mid with current date */
>           tzset();

Ups, was my fault, sorry. I committed your change.
Entry  20 Oct 2008, Suzannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
I am using an external Custom web page via a link in the ODB in /Custom, and
Javascript to add customscript button(s) and run start/stop buttons.

After executing these buttons, instead of returning to the custom page, or
to the Midas main status page, there is an error page generated:

Invalid custom page: NULL path
and the URL is 

http://lxfred:8082/CS/

The behaviour is the same whether the custom page replaces the main status page
or not.

I am using
MIDAS version 2.0.0
mhttpd.c SVN Rev 4282

In an older version of mhttpd.c, buttons of this type used to return to the
Midas main status page regardless of whether the custom page replaced the status
page. I found this behaviour annoying, and I made a custom mhttpd.c that
returned to the custom page. 
Would it be possible to fix this problem, and to return to the custom page after
pressing the buttons?


Here is the Javascript to add the buttons:

<script type="text/javascript">
var rstate = '<odb src="/runinfo/run state">'

 if (rstate == 1) // stopped
    document.write('<input name="cmd" value="Start" type="submit">')
 else if (rstate == 2 // paused
    document.write('<input name="cmd" value="Resume" type="submit">')
 else  // running
 {
    document.write('<input name="cmd" value="Stop" type="submit">')
    document.write('<input name="cmd" value="Pause" type="submit">')
 }

 if (rstate == 1) // stopped
    document.write('<input name="customscript" value="tri_config" type="submit">');
</script>
Entry  23 Oct 2008, Konstantin Olchanski, Bug Report, Inconsistent handling of odb and evet buffer timeouts 
In midas.c there are several places where client last activity time stamps are checked against the 
watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
ODB time stamps and data buffer time stamps appear to be handled in a similar manner.

Most checks are done like this:

now = ss_millitime();
if (client->watchdog > 0      <----- check that the watchdog is enabled
    && now > client->last_activity    <---- check for crazy time stamps from the future
    && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
        remove_client(client);

But in a few places, the extra checks are missing:

now = ss_millitime();
if (now - client->last_activity > client->watchdog_timeout)
        remove_client(client);

Is this an oversight from when additional checks were added?
Should I make all checks read like the first one?

K.O.
Entry  23 Oct 2008, Konstantin Olchanski, Bug Report, strange output from "odbedit cleanup" 
When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.
Entry  23 Oct 2008, Konstantin Olchanski, Bug Report, bm_wait_for_free_space never sleeps inside the mserver 
When mserver receives events from remote client, writes them into a data buffer and this data buffer 
becomes 100% full, we see mserver go into 100% consumption.

It turns out this happens because bm_wait_for_free_space() never sleeps, instead, it busy-loops waiting 
for free space. bm_wait_for_free_space() does call ss_suspend(), but ss_suspend() does not sleep 
because there is pending data in the event network connection and it want to process it.

Best solution I have is to use silly "if (ss_suspend()!=SS_TIMEOUT) sleep(1);"

Also read this explanation: (bm_cleanup is needed to detect that the client holding the buffer at 100% 
full (a stuck or dead GET_ALL reader, mevb in our case), has been killed off and we can continue as 
usual)

       /* signal other clients wait mode */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = requested_space;
 
+      bm_cleanup("bm_wait_for_free_space", ss_millitime(), FALSE);
+
       status = ss_suspend(1000, MSG_BM);
 
+      /* make sure we do sleep in this loop:
+       * if we are the mserver receiving data on the event
+       * socket and the data buffer is full, ss_suspend() will
+       * never sleep: it will detect data on the event channel,
+       * call rpc_server_receive() (recursively, we already *are* in
+       * rpc_server_receive()) and return without sleeping. Result
+       * is a busy loop waiting for free space in data buffer */
+      if (status != SS_TIMEOUT)
+         sleep(1);
+
       /* validate client index: we could have been removed from the buffer */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = 0;

K.O.
    Reply  28 Oct 2008, Stefan Ritt, Bug Report, Inconsistent handling of odb and evet buffer timeouts 
> In midas.c there are several places where client last activity time stamps are checked against the 
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
> 
> Most checks are done like this:
> 
> now = ss_millitime();
> if (client->watchdog > 0      <----- check that the watchdog is enabled
>     && now > client->last_activity    <---- check for crazy time stamps from the future
>     && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
>         remove_client(client);
> 
> But in a few places, the extra checks are missing:
> 
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
>         remove_client(client);
> 
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
> 
> K.O.

This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog 
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a 
breakpoint and sit there for a while, you might be declared dead and the application is removed from the 
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This 
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying 
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in 
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed 
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a 
"cleanup" inside odbedit will even remove clients having the timeout set to zero. 

Now there might be more clever ways to accomplish that, but that's how it is implemented right now.
    Reply  28 Oct 2008, Stefan Ritt, Bug Report, strange output from "odbedit cleanup" 
> When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
> output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
> it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.

The db_get_open_records() call was by mistake there, I removed it. What remains is that the notification 
message if a client is removed from the ODB goes through the system messages. When running locally, odbedit 
echoes it's own messages, but when running remotely, this is not the case. So the messages can be seen by 
everybody else (plus it ends up in the message file), but not by the remote odbedit where the cleanup is 
started. The quick fix for that is to say "old" in odbedit which shows the last few lines of the message 
file, so one can see any successful cleanup. 
    Reply  29 Oct 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
To fix this problem, do the following:

- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:

  <input type=hidden name="redir" value="name">

  where "name" is the name of your custom page which follows the CS/ in the URL. Like 
if you have a custom page which you access through httpd://localhost/CS/junk then the 
tag would be 

  <input type=hidden name="redir" value="junk">

The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
custom page. You can also define another custom page as the target, if that makes 
sense in your application.

Pierre: Would be nice to document this somewhere more officially.
    Reply  04 Nov 2008, Suannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
Thanks Stefan. 
Your fix works nicely with the start/stop buttons not returning to the same or to a
different web page.

However, it does not seem to have fixed the problem with the Customscript button. It does
not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
appear when the run starts).


> To fix this problem, do the following:
> 
> - Update to the current SVN revision 4368 of mhttpd.c
> - Add following tag into your custom page:
> 
>   <input type=hidden name="redir" value="name">
> 
>   where "name" is the name of your custom page which follows the CS/ in the URL. Like 
> if you have a custom page which you access through httpd://localhost/CS/junk then the 
> tag would be 
> 
>   <input type=hidden name="redir" value="junk">
> 
> The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
> custom page. You can also define another custom page as the target, if that makes 
> sense in your application.
> 
> Pierre: Would be nice to document this somewhere more officially.
Entry  04 Nov 2008, Suannah Daviel, Bug Report, bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string 
Not sure if this is a bug or a feature:

Writing a boolean label on an image seems to produce rather strange behaviour.

For example,
odb>ls /Equipment/gas/settings/my_bool -lt
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
my_bool                         BOOL    1     4     14m  0   RWD  y


odb>cd /custom/images/my_image.gif/labels
odb>ls
Src                             /Equipment/gas/settings/my_bool
Format                          val: %d (bool)
Font                            Medium
X                               10
Y                               10
Align                           0
FGColor                         FFFFFF
BGColor                         FF8800

Instead of the expected string "val: y (bool)", only the value of the key
appears, i.e. "y". 
The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 
    Reply  09 Nov 2008, Stefan Ritt, Bug Report, bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string 
> Not sure if this is a bug or a feature:
> 
> Writing a boolean label on an image seems to produce rather strange behaviour.
> 
> For example,
> odb>ls /Equipment/gas/settings/my_bool -lt
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> my_bool                         BOOL    1     4     14m  0   RWD  y
> 
> 
> odb>cd /custom/images/my_image.gif/labels
> odb>ls
> Src                             /Equipment/gas/settings/my_bool
> Format                          val: %d (bool)
> Font                            Medium
> X                               10
> Y                               10
> Align                           0
> FGColor                         FFFFFF
> BGColor                         FF8800
> 
> Instead of the expected string "val: y (bool)", only the value of the key
> appears, i.e. "y". 
> The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 

That has been fixed in rev. 4379
    Reply  09 Nov 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
> Thanks Stefan. 
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
> 
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).

That has been fixed in rev. 4377
    Reply  27 Nov 2008, Konstantin Olchanski, Bug Report, lazylogger complains about zero-size files 
I now have a better understanding of this: lazylogger uses ss_file_size() to find
out if a file exists or not. This function used to return 0 (probably) for
non-existant files (there was no check for error status from stat() system call,
so the return value for non-existant files was never well defined).

With ss_file_size() returning 0 for nonexistant files, 0-size files clearly cause
problems to lazylogger.

Now, since svn revision 4397, ss_file_size() returns -1 for non-existant files,
but lazylogger still needs to be tought about this.

The problem "lazylogger does not like 0-size files" remains for now.

K.O.


> With latest midas, I see this:
> 
> Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> 
> The file run17567.ybs has size zero:
> 
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:29
> /twist/data_onl/current/run17565.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:45
> /twist/data_onl/current/run17566.ybs
> -rw-r--r--    1 twistonl users           0 Oct 13 20:00
> /twist/data_onl/current/run17567.ybs
> -rw-r--r--    1 twistonl users      983040 Oct 13 20:03
> /twist/data_onl/current/run17568.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 20:26
> /twist/data_onl/current/run17569.ybs
> 
> I am not sure how to fix this lazylogger logic. Please help.
> 
> K.O.
Entry  01 Dec 2008, Randolf Pohl, Bug Report, gcc warning in melog.c for midas 4401 
Hi all,

I have just compiled midas 4401 using SuSE 11.0.
gcc is some odd SuSE version:
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE
Linux) 

Anyway, gcc stumbled over melog.c. I don't see the reason myself, but my
experience is that gcc is usually right when complaining about "array subscript
is above array bounds". So, just in case somebody knowlegeable wants to have a
look at this....

Cheers,

Randolf

The gcc output:

[...]
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/melog
utils/melog.c linux/lib/libmidas.a -lutil -lpthread -lz
utils/melog.c: In function 'submit_elog':
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/mlxspeaker
utils/mlxspeaker.c linux/lib/libmidas.a -lutil -lpthread -lz
Entry  17 Dec 2008, Renee Poutissou, Bug Report, Overflow on "cm_msg" command generates segfault 
The following error has been reported to me by T2K colleagues:

When using  "odbedit -c "msg my_message", the following behavior 
has been observed depending on the length "n" of the message. 

1)  n < 100        All is well
2)  100 <= n < 245 Log not written but exit code = 0
3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1

Also, when logging from compiled C code - when messages reach some magic length
the MIDAS client sending them segfaults.

Please fix
    Reply  22 Dec 2008, Stefan Ritt, Bug Report, Overflow on "cm_msg" command generates segfault 
> The following error has been reported to me by T2K colleagues:
> 
> When using  "odbedit -c "msg my_message", the following behavior 
> has been observed depending on the length "n" of the message. 
> 
> 1)  n < 100        All is well
> 2)  100 <= n < 245 Log not written but exit code = 0
> 3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1
> 
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
> 
> Please fix

Uhhh, who wants this long messages? You should consider to split this into several 
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in 
SVN revision 4422. I increased the maximum length to 1000 characters. Above that, 
the message gets truncated. If you need even more, we can make it a #define.

The second problem you describe (logging from compiled C code) I could not 
reproduce, so maybe it was related to the first one. Please try again and report 
if it persists.
    Reply  21 Jan 2009, Andreas Suter, Bug Report, mhttpd, mlogger updates 
There is an obvious "unwanted feature" in this version of the mhttpd. It writes the
"plot time" into the gif (mhttpd, if-statement starting in line 8853). 

Please check this obvious things more carefully in the future before submitting code. ;-)

> mhttpd and mlogger have been updated with potentially troublesome changes.
> Before using these latest versions, please make a backup of your ODB. This is
> svn revisions 4434 (mhttpd.c) and 4435 (mlogger.c).
> 
> These new features are now available:
> - a "feature complete" implementation of "history in an SQL database". We use
> this new code to write history data from the T2K test setup in the TRIUMF M11
> beam line to a MySQL database (mlogger) and to make history plots directly from
> this database (mhttpd). We still write normal midas history files and we have a
> utility to import midas .hst files into an SQL database (utils/mh2sql). The code
> is functional, but incomplete. For best SQL database data layout, you should
> enable the "per variable history" (but backup your ODB before you do this!). All
> are welcome to try it, kick the tires, report any problems. Documentation TBW.
> - experimental implementation of "ODBRpc" added to the midas javascript library
> (ODBSet, ODBGet & co). This permits buttons on midas "custom" web pages to
> invoke RPC calls directly into user frontend programs, for example to turn
> things on or off. Documentation TBW.
> - the mlogger/mhttpd implementation of /History/Tags has proved troublesome and
> we are moving away from it. The SQL database history implementation already does
> not use it. During the present transition period:
> - mlogger and mhttpd will now work without /History/Tags. This implementation
> reads history tags directly from the history files themselves. Two downsides to
> this: it is slower and tags become non-persistent: if some frontends have not
> been running for a while, their variables may vanish from the history panel
> editor. To run in this mode, set "/History/DisableTags" to "y". Existing
> /History/Tags will be automatically deleted.
> - for the above 2 reasons, I still recommend using /History/Tags, but the format
> of the tags is now changed to simplify management and reduce odb size. mlogger
> will automatically convert the tags to this new format (this is why you should
> make a backup of your ODB).
> - using old mlogger with new mhttpd is okey: new mhttpd understands both formats
> of /History/Tags.
> - using old mhttpd with new mlogger is okey: please set ODB
> "/History/CreateOldTags" to "y" (type TID_BOOL/"boolean") before starting mlogger.
> 
> K.O.
Entry  07 May 2009, Konstantin Olchanski, Bug Report, odbedit bad ctrl-C 
When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal 
enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes 
^H instead).

This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.

When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty 
erase ^H".

The terminal is set into this funny state by system.c::getchar() and normal settings are never restored 
unless the midas program calls getchar(1) at the end. If the program does not finish normally, original 
terminal settings are never restored and the terminal is left in a funny state.

It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal 
settings automatically for us.
K.O.
Entry  07 May 2009, Konstantin Olchanski, Bug Report, mlogger duplicate event problem 
We have seen on several daq systems this problem: we start a run and observe that the number of 
events written by mlogger to the output file is double the number of events actually collected. Upon 
inspection of the output file, we see that every event is written twice. Restarting the run usually fixes 
this problem.

We now traced this to an error in mlogger.c. If we start a run and the run transition fails in some 
frontend,  mlogger does not disconnect from the SYSTEM buffer (it does not know the transition failed 
and the run did not really start). The SYSTEM buffer connection and the associated event request 
remain active. Then we start the next run and mlogger connects to the SYSTEM buffer again, creates a 
second (third, etc) event request. Eventually mlogger reaches the maximum permitted number of event 
requests and no more runs can be started unless mlogger is restarted.

If at some point a run actually starts successfully, there are multiple event requests present from 
mlogger and theoretically, each event should be written to the output file many times. This was a 
puzzle until we got a good laugh from looking at mlogger.c::receive_event() callback - in retrospect it 
is obvious why events are only written in duplicate.

Then, after the run is ended, mlogger disconnects from the SYSTEM buffer, all multiple event requests 
are automatically deleted and the problem is not present during the next run.

I am not yet sure how to best fix this, but I see that other midas programs (i.e. mevb) suffer form the 
same problem - multiple connections to the event buffer - in presence of failed run starts. I think we 
have seen "event duplication" from mevb, as well.

K.O.
    Reply  02 Jun 2009, Konstantin Olchanski, Bug Report, mlogger duplicate event problem 
> We have seen on several daq systems this problem: we start a run and observe that the number of 
> events written by mlogger to the output file is double the number of events actually collected. Upon 
> inspection of the output file, we see that every event is written twice. Restarting the run usually fixes 
> this problem.

mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).

K.O.
ELOG V3.1.4-2e1708b5