ID |
Date |
Author |
Topic |
Subject |
516
|
23 Oct 2008 |
Konstantin Olchanski | Bug Report | bm_wait_for_free_space never sleeps inside the mserver | When mserver receives events from remote client, writes them into a data buffer and this data buffer
becomes 100% full, we see mserver go into 100% consumption.
It turns out this happens because bm_wait_for_free_space() never sleeps, instead, it busy-loops waiting
for free space. bm_wait_for_free_space() does call ss_suspend(), but ss_suspend() does not sleep
because there is pending data in the event network connection and it want to process it.
Best solution I have is to use silly "if (ss_suspend()!=SS_TIMEOUT) sleep(1);"
Also read this explanation: (bm_cleanup is needed to detect that the client holding the buffer at 100%
full (a stuck or dead GET_ALL reader, mevb in our case), has been killed off and we can continue as
usual)
/* signal other clients wait mode */
pheader->client[bm_validate_client_index(pbuf)].write_wait = requested_space;
+ bm_cleanup("bm_wait_for_free_space", ss_millitime(), FALSE);
+
status = ss_suspend(1000, MSG_BM);
+ /* make sure we do sleep in this loop:
+ * if we are the mserver receiving data on the event
+ * socket and the data buffer is full, ss_suspend() will
+ * never sleep: it will detect data on the event channel,
+ * call rpc_server_receive() (recursively, we already *are* in
+ * rpc_server_receive()) and return without sleeping. Result
+ * is a busy loop waiting for free space in data buffer */
+ if (status != SS_TIMEOUT)
+ sleep(1);
+
/* validate client index: we could have been removed from the buffer */
pheader->client[bm_validate_client_index(pbuf)].write_wait = 0;
K.O. |
517
|
28 Oct 2008 |
Stefan Ritt | Bug Report | Inconsistent handling of odb and evet buffer timeouts | > In midas.c there are several places where client last activity time stamps are checked against the
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps.
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
>
> Most checks are done like this:
>
> now = ss_millitime();
> if (client->watchdog > 0 <----- check that the watchdog is enabled
> && now > client->last_activity <---- check for crazy time stamps from the future
> && now - client->last_activity > client->watchdog_timeout) <--- normal timeout
> remove_client(client);
>
> But in a few places, the extra checks are missing:
>
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
> remove_client(client);
>
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
>
> K.O.
This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a
breakpoint and sit there for a while, you might be declared dead and the application is removed from the
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a
"cleanup" inside odbedit will even remove clients having the timeout set to zero.
Now there might be more clever ways to accomplish that, but that's how it is implemented right now. |
518
|
28 Oct 2008 |
Stefan Ritt | Bug Report | strange output from "odbedit cleanup" | > When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the
> output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run
> it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.
The db_get_open_records() call was by mistake there, I removed it. What remains is that the notification
message if a client is removed from the ODB goes through the system messages. When running locally, odbedit
echoes it's own messages, but when running remotely, this is not the case. So the messages can be seen by
everybody else (plus it ends up in the message file), but not by the remote odbedit where the cleanup is
started. The quick fix for that is to say "old" in odbedit which shows the last few lines of the message
file, so one can see any successful cleanup. |
520
|
29 Oct 2008 |
Stefan Ritt | Bug Report | custom web pages: customscript buttons and start/stop buttons generate errors | To fix this problem, do the following:
- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:
<input type=hidden name="redir" value="name">
where "name" is the name of your custom page which follows the CS/ in the URL. Like
if you have a custom page which you access through httpd://localhost/CS/junk then the
tag would be
<input type=hidden name="redir" value="junk">
The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper
custom page. You can also define another custom page as the target, if that makes
sense in your application.
Pierre: Would be nice to document this somewhere more officially. |
521
|
04 Nov 2008 |
Suannah Daviel | Bug Report | custom web pages: customscript buttons and start/stop buttons generate errors | Thanks Stefan.
Your fix works nicely with the start/stop buttons not returning to the same or to a
different web page.
However, it does not seem to have fixed the problem with the Customscript button. It does
not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
appear when the run starts).
> To fix this problem, do the following:
>
> - Update to the current SVN revision 4368 of mhttpd.c
> - Add following tag into your custom page:
>
> <input type=hidden name="redir" value="name">
>
> where "name" is the name of your custom page which follows the CS/ in the URL. Like
> if you have a custom page which you access through httpd://localhost/CS/junk then the
> tag would be
>
> <input type=hidden name="redir" value="junk">
>
> The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper
> custom page. You can also define another custom page as the target, if that makes
> sense in your application.
>
> Pierre: Would be nice to document this somewhere more officially. |
523
|
04 Nov 2008 |
Suannah Daviel | Bug Report | bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string | Not sure if this is a bug or a feature:
Writing a boolean label on an image seems to produce rather strange behaviour.
For example,
odb>ls /Equipment/gas/settings/my_bool -lt
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
my_bool BOOL 1 4 14m 0 RWD y
odb>cd /custom/images/my_image.gif/labels
odb>ls
Src /Equipment/gas/settings/my_bool
Format val: %d (bool)
Font Medium
X 10
Y 10
Align 0
FGColor FFFFFF
BGColor FF8800
Instead of the expected string "val: y (bool)", only the value of the key
appears, i.e. "y".
The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. |
525
|
09 Nov 2008 |
Stefan Ritt | Bug Report | bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string | > Not sure if this is a bug or a feature:
>
> Writing a boolean label on an image seems to produce rather strange behaviour.
>
> For example,
> odb>ls /Equipment/gas/settings/my_bool -lt
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> my_bool BOOL 1 4 14m 0 RWD y
>
>
> odb>cd /custom/images/my_image.gif/labels
> odb>ls
> Src /Equipment/gas/settings/my_bool
> Format val: %d (bool)
> Font Medium
> X 10
> Y 10
> Align 0
> FGColor FFFFFF
> BGColor FF8800
>
> Instead of the expected string "val: y (bool)", only the value of the key
> appears, i.e. "y".
> The behaviour is the same whether I use %d, %u, %s, %c etc as the format character.
That has been fixed in rev. 4379 |
527
|
09 Nov 2008 |
Stefan Ritt | Bug Report | custom web pages: customscript buttons and start/stop buttons generate errors | > Thanks Stefan.
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
>
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).
That has been fixed in rev. 4377 |
533
|
27 Nov 2008 |
Konstantin Olchanski | Bug Report | lazylogger complains about zero-size files | I now have a better understanding of this: lazylogger uses ss_file_size() to find
out if a file exists or not. This function used to return 0 (probably) for
non-existant files (there was no check for error status from stat() system call,
so the return value for non-existant files was never well defined).
With ss_file_size() returning 0 for nonexistant files, 0-size files clearly cause
problems to lazylogger.
Now, since svn revision 4397, ss_file_size() returns -1 for non-existant files,
but lazylogger still needs to be tought about this.
The problem "lazylogger does not like 0-size files" remains for now.
K.O.
> With latest midas, I see this:
>
> Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
>
> The file run17567.ybs has size zero:
>
> -rw-r--r-- 1 twistonl users 950272 Oct 13 19:29
> /twist/data_onl/current/run17565.ybs
> -rw-r--r-- 1 twistonl users 950272 Oct 13 19:45
> /twist/data_onl/current/run17566.ybs
> -rw-r--r-- 1 twistonl users 0 Oct 13 20:00
> /twist/data_onl/current/run17567.ybs
> -rw-r--r-- 1 twistonl users 983040 Oct 13 20:03
> /twist/data_onl/current/run17568.ybs
> -rw-r--r-- 1 twistonl users 950272 Oct 13 20:26
> /twist/data_onl/current/run17569.ybs
>
> I am not sure how to fix this lazylogger logic. Please help.
>
> K.O. |
537
|
01 Dec 2008 |
Randolf Pohl | Bug Report | gcc warning in melog.c for midas 4401 | Hi all,
I have just compiled midas 4401 using SuSE 11.0.
gcc is some odd SuSE version:
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE
Linux)
Anyway, gcc stumbled over melog.c. I don't see the reason myself, but my
experience is that gcc is usually right when complaining about "array subscript
is above array bounds". So, just in case somebody knowlegeable wants to have a
look at this....
Cheers,
Randolf
The gcc output:
[...]
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/melog
utils/melog.c linux/lib/libmidas.a -lutil -lpthread -lz
utils/melog.c: In function 'submit_elog':
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/mlxspeaker
utils/mlxspeaker.c linux/lib/libmidas.a -lutil -lpthread -lz |
543
|
17 Dec 2008 |
Renee Poutissou | Bug Report | Overflow on "cm_msg" command generates segfault | The following error has been reported to me by T2K colleagues:
When using "odbedit -c "msg my_message", the following behavior
has been observed depending on the length "n" of the message.
1) n < 100 All is well
2) 100 <= n < 245 Log not written but exit code = 0
3) 245 <= n < 280 Error: "Experiment not defined" and exit code = 1
4) 280 <= n Error: "Cannot connect to remote host" and exit code = 1
Also, when logging from compiled C code - when messages reach some magic length
the MIDAS client sending them segfaults.
Please fix |
545
|
22 Dec 2008 |
Stefan Ritt | Bug Report | Overflow on "cm_msg" command generates segfault | > The following error has been reported to me by T2K colleagues:
>
> When using "odbedit -c "msg my_message", the following behavior
> has been observed depending on the length "n" of the message.
>
> 1) n < 100 All is well
> 2) 100 <= n < 245 Log not written but exit code = 0
> 3) 245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4) 280 <= n Error: "Cannot connect to remote host" and exit code = 1
>
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
>
> Please fix
Uhhh, who wants this long messages? You should consider to split this into several
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in
SVN revision 4422. I increased the maximum length to 1000 characters. Above that,
the message gets truncated. If you need even more, we can make it a #define.
The second problem you describe (logging from compiled C code) I could not
reproduce, so maybe it was related to the first one. Please try again and report
if it persists. |
557
|
21 Jan 2009 |
Andreas Suter | Bug Report | mhttpd, mlogger updates | There is an obvious "unwanted feature" in this version of the mhttpd. It writes the
"plot time" into the gif (mhttpd, if-statement starting in line 8853).
Please check this obvious things more carefully in the future before submitting code. ;-)
> mhttpd and mlogger have been updated with potentially troublesome changes.
> Before using these latest versions, please make a backup of your ODB. This is
> svn revisions 4434 (mhttpd.c) and 4435 (mlogger.c).
>
> These new features are now available:
> - a "feature complete" implementation of "history in an SQL database". We use
> this new code to write history data from the T2K test setup in the TRIUMF M11
> beam line to a MySQL database (mlogger) and to make history plots directly from
> this database (mhttpd). We still write normal midas history files and we have a
> utility to import midas .hst files into an SQL database (utils/mh2sql). The code
> is functional, but incomplete. For best SQL database data layout, you should
> enable the "per variable history" (but backup your ODB before you do this!). All
> are welcome to try it, kick the tires, report any problems. Documentation TBW.
> - experimental implementation of "ODBRpc" added to the midas javascript library
> (ODBSet, ODBGet & co). This permits buttons on midas "custom" web pages to
> invoke RPC calls directly into user frontend programs, for example to turn
> things on or off. Documentation TBW.
> - the mlogger/mhttpd implementation of /History/Tags has proved troublesome and
> we are moving away from it. The SQL database history implementation already does
> not use it. During the present transition period:
> - mlogger and mhttpd will now work without /History/Tags. This implementation
> reads history tags directly from the history files themselves. Two downsides to
> this: it is slower and tags become non-persistent: if some frontends have not
> been running for a while, their variables may vanish from the history panel
> editor. To run in this mode, set "/History/DisableTags" to "y". Existing
> /History/Tags will be automatically deleted.
> - for the above 2 reasons, I still recommend using /History/Tags, but the format
> of the tags is now changed to simplify management and reduce odb size. mlogger
> will automatically convert the tags to this new format (this is why you should
> make a backup of your ODB).
> - using old mlogger with new mhttpd is okey: new mhttpd understands both formats
> of /History/Tags.
> - using old mhttpd with new mlogger is okey: please set ODB
> "/History/CreateOldTags" to "y" (type TID_BOOL/"boolean") before starting mlogger.
>
> K.O. |
575
|
07 May 2009 |
Konstantin Olchanski | Bug Report | odbedit bad ctrl-C | When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal
enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes
^H instead).
This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.
When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty
erase ^H".
The terminal is set into this funny state by system.c::getchar() and normal settings are never restored
unless the midas program calls getchar(1) at the end. If the program does not finish normally, original
terminal settings are never restored and the terminal is left in a funny state.
It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal
settings automatically for us.
K.O. |
576
|
07 May 2009 |
Konstantin Olchanski | Bug Report | mlogger duplicate event problem | We have seen on several daq systems this problem: we start a run and observe that the number of
events written by mlogger to the output file is double the number of events actually collected. Upon
inspection of the output file, we see that every event is written twice. Restarting the run usually fixes
this problem.
We now traced this to an error in mlogger.c. If we start a run and the run transition fails in some
frontend, mlogger does not disconnect from the SYSTEM buffer (it does not know the transition failed
and the run did not really start). The SYSTEM buffer connection and the associated event request
remain active. Then we start the next run and mlogger connects to the SYSTEM buffer again, creates a
second (third, etc) event request. Eventually mlogger reaches the maximum permitted number of event
requests and no more runs can be started unless mlogger is restarted.
If at some point a run actually starts successfully, there are multiple event requests present from
mlogger and theoretically, each event should be written to the output file many times. This was a
puzzle until we got a good laugh from looking at mlogger.c::receive_event() callback - in retrospect it
is obvious why events are only written in duplicate.
Then, after the run is ended, mlogger disconnects from the SYSTEM buffer, all multiple event requests
are automatically deleted and the problem is not present during the next run.
I am not yet sure how to best fix this, but I see that other midas programs (i.e. mevb) suffer form the
same problem - multiple connections to the event buffer - in presence of failed run starts. I think we
have seen "event duplication" from mevb, as well.
K.O. |
585
|
02 Jun 2009 |
Konstantin Olchanski | Bug Report | mlogger duplicate event problem | > We have seen on several daq systems this problem: we start a run and observe that the number of
> events written by mlogger to the output file is double the number of events actually collected. Upon
> inspection of the output file, we see that every event is written twice. Restarting the run usually fixes
> this problem.
mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).
K.O. |
588
|
04 Jun 2009 |
Stefan Ritt | Bug Report | odbedit bad ctrl-C | > When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal
> enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes
> ^H instead).
>
> This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.
>
> When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty
> erase ^H".
>
> The terminal is set into this funny state by system.c::getchar() and normal settings are never restored
> unless the midas program calls getchar(1) at the end. If the program does not finish normally, original
> terminal settings are never restored and the terminal is left in a funny state.
>
> It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal
> settings automatically for us.
> K.O.
Who uses bash ??? And who keeps baning on Ctrl-C, when there is a nice "exit" command ;-)
Well, I implemented a simple CTRL-C handler in odbedit (Rev. 4503) which resets the terminal before exiting.
Give it a try. Of course this cannot catch a hard kill (-9), but CTRL-C works now correctly under bash at
least. |
590
|
04 Jun 2009 |
bazinski | Bug Report | mhttpd command line experiment specifying | Hi
Not sure how the rest of you specify mhttpd to work with multiple experiments on
one machine, but it would seem not the same as me ;-)
when executing mhttpd with
mhttpd -e "experimentname" -p "experimentport" -D
that experiment name is not transfered to transitions as cm_transition never
specifies the experiment in the call to "transition STOP" etc.
the only flag it sends is a -d for debug if selected.
The result is that the stop and start button of the webinterface does not work,
and transitions sit endlessly doing nothing but consuming all the processor,
odbedit works fine though.
Does everyone else use an apache reverse proxy and or explicit experiment choice
in the url ?
As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
should be a -e. line 13378.
Thanks
Sean |
591
|
05 Jun 2009 |
Stefan Ritt | Bug Report | mhttpd command line experiment specifying | > Not sure how the rest of you specify mhttpd to work with multiple experiments on
> one machine, but it would seem not the same as me ;-)
Please note that there has been a change concerning multiple experiments inside
mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment,
and the experiment name in the URL (aka ?exp=name) is not supported any more. So if
you have several experiments, you start several instances of mhttpd now on
different ports.
> that experiment name is not transfered to transitions as cm_transition never
> specifies the experiment in the call to "transition STOP" etc.
> the only flag it sends is a -d for debug if selected.
When connecting to an experiment, any midas client uses the ODB from that
experiment so lives in that "namespace". So one client can never call any client
from another experiment. So your problem must be something else. Of course there is
not parameter "experiment" passed to cm_transition() since the experiment is
implicitly defined by the ODB mhttpd is attached to.
> The result is that the stop and start button of the webinterface does not work,
> and transitions sit endlessly doing nothing but consuming all the processor,
> odbedit works fine though.
I guess you have to do some debugging there. Note that "detached" transitions have
been implemented recently by Konstantin, so maybe your problem is related to that.
In this case Konstantin should check what's wrong.
> Does everyone else use an apache reverse proxy and or explicit experiment choice
> in the url ?
I use a
ProxyPass /megon/ http://megon.psi.ch/
on our public web server to make an online machine accessible from outside the
firewall, but just with a single experiment.
> As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
> should be a -e. line 13378.
Fixed in revision 4504. |
592
|
05 Jun 2009 |
bazinski | Bug Report | mhttpd command line experiment specifying | Hi
> > Not sure how the rest of you specify mhttpd to work with multiple experiments on
> > one machine, but it would seem not the same as me ;-)
>
> Please note that there has been a change concerning multiple experiments inside
> mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment,
> and the experiment name in the URL (aka ?exp=name) is not supported any more. So if
> you have several experiments, you start several instances of mhttpd now on
> different ports.
That i do with :
mhttpd -p xx -e experiment_name -D
>
> > that experiment name is not transfered to transitions as cm_transition never
> > specifies the experiment in the call to "transition STOP" etc.
> > the only flag it sends is a -d for debug if selected.
>
> When connecting to an experiment, any midas client uses the ODB from that
> experiment so lives in that "namespace". So one client can never call any client
> from another experiment. So your problem must be something else. Of course there is
> not parameter "experiment" passed to cm_transition() since the experiment is
> implicitly defined by the ODB mhttpd is attached to.
Will have to look else where.
>
> > The result is that the stop and start button of the webinterface does not work,
> > and transitions sit endlessly doing nothing but consuming all the processor,
> > odbedit works fine though.
>
> I guess you have to do some debugging there. Note that "detached" transitions have
> been implemented recently by Konstantin, so maybe your problem is related to that.
> In this case Konstantin should check what's wrong.
cm_transition does a "system(str)" on line 3243 inside the "if(async_flag == DETACH)" of
line 3219, how does an external program know about the state of the originating mhttpd
process ? Surely that str which executes "mtransition ......." should get a -e
specifying the experiment explicitly ? probably a -h as well to be thorough.
The only other way that mtransition.cxx will be able to pull in the experimentname is
from the environment variable in its call to cm_get_environment(....) on its startup.
Ok after some testing ....
If i start the mhttpd with the environment variable MIDAS_EXPT_NAME set then its happy
as mtransition inherits the environment of mhttpd so cm_get_environment(...) of
mtransition picks up the experiment. Similarly if i insert "-e experimentname" into the
string "str" that is passed in system(str) of line 3243. Then start and stop buttons work.
Konstantin any comments.
I suppose i can live with starting mhttpd with the environment set before running, but
that kind of negates the command line argument to mhttpd.
Thanks for the help
Sean |
|