ID |
Date |
Author |
Topic |
Subject |
96
|
20 Nov 2003 |
Renee Poutissou | | cannot shutdown defunct clients | Indeed the ODB command "cleanup" really works. I have used it several
times with the TWIST DAQ and regularly with the BNMR/MUSR setups where
we have these stubborn clients (ie feepics) that do not want to shutdown
cleanly.
But there is one problem with "cleanup". It has a hardwired timeout of
2 seconds. This is a problem for tasks like lazylogger which set a timeout
of 60 seconds when moving the tape. So BEWARE, if you issue the "cleanup"
command, it might kill some clients who have setup their timeout to longer
than 2 seconds.
I have asked Stefan to change this before. He said that, to be effective,
the timeout value used for "cleanup" has to be rather short.
One possibility, would be to allow for a user entered "cleanup" timeout.
The default could stay at 2 seconds.
> > Have you tried a "cleanup" in ODBEdit?
>
> Nope. Will try next time...
> |
95
|
20 Nov 2003 |
Stefan Ritt | | cannot shutdown defunct clients | > INT == "int", wraparound in 1 month
> DWORD == "unsigned int", wraparound in 2 months
>
> should we make it the 64-bit "long long" (or C98's "int64_t")?
Won't work on all supported compilers. The point is that DWORD wraps around in
2 months, but the difference of two DWORDs is alywas positive, never negative
like you had it. We only have to distinguish if the difference of the current
time (im ms) minus the last_activity of a client is larget than the timeout,
typically 10 seconds or so. If you have a wraparound on 32-bit DWORD, the
difference is still ok. Like
current "time" : 0x0000 0100
last_activity: 0xFFFF FF00
then current_time - last_activity = 0x00000100 - 0xFFFFFF00 = 0x00000200 if
calculated with 32-bit values. |
94
|
20 Nov 2003 |
Konstantin Olchanski | | cannot shutdown defunct clients | > > 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> Have you tried a "cleanup" in ODBEdit?
Nope. Will try next time...
> The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it
> wraps around after about one month.... change last_activity
> from INT to DWORD. This way it's alway positive and the wraparound does not
> hurt.
INT == "int", wraparound in 1 month
DWORD == "unsigned int", wraparound in 2 months
should we make it the 64-bit "long long" (or C98's "int64_t")?
K.O. |
93
|
20 Nov 2003 |
Stefan Ritt | | cannot shutdown defunct clients | > 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> 2) "sh mhttpd" from odbedit ->
> [midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
> midtig01.triumf.ca, port 32853
> Client mhttpd not active
> 3) in odbedit: "cd /system/clients; rm xxxx"
> refuses to delete the key
Have you tried a "cleanup" in ODBEdit?
The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it
wraps around after about one month. So if a all clients are stopped
simultaneously the hard way (such that nobody's watchdog can clean any other
client from the ODB), like with a power off, and you start the thing one
month later, there might be a problem. I never tried that before. So next
time to a cleanup. If that does not help, we should change last_activity
from INT to DWORD. This way it's alway positive and the wraparound does not
hurt. |
92
|
20 Nov 2003 |
Konstantin Olchanski | | cannot shutdown defunct clients | > While reviving midas on midtig01 after it was not used for a while ...
> [local:tigress:S]/>scl -w
> Name Host Timeout Last called
> mhttpd midtig01.triumf.ca 10000 -2037131082
These clients cannot be deleted. I tried:
1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
2) "sh mhttpd" from odbedit ->
[midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
midtig01.triumf.ca, port 32853
Client mhttpd not active
3) in odbedit: "cd /system/clients; rm xxxx"
refuses to delete the key
Lacking any better ideas, I deleted them via brain surgery on the odb file:
1) stop everything
2) ipcrm the SYSV shared memory segment
3) odbedit -> save xxx.odb
4) xemacs xxx.odb, delete offending odb entries
5) rm .ODB.SHM
6) odbedit -> load xxx.odb
7) voila, bad clients gone, gone, gone.
K.O. |
91
|
20 Nov 2003 |
Konstantin Olchanski | | midas timeout wraparound | While reviving midas on midtig01 after it was not used for a while, we see
this. Notice negative "last called" numbers. Looks like a time_t wraparound
somewhere...
[local:tigress:S]/>scl -w
Name Host Timeout Last called
mhttpd midtig01.triumf.ca 10000 -2037131082
Logger midtig01.triumf.ca 10000 -2037131166
Analyzer midtig01.triumf.ca 10000 -2037131048
JACQ midtig01.triumf.ca 10000 -2037131667
mhttpd1 midtig01.triumf.ca 10000 325
ODBEdit midtig01.triumf.ca 10000 829
K.O. |
90
|
30 Nov 2003 |
Stefan Ritt | | bad call to cm_cleanup() in fal.c | > fal.c does not compile: it calls cm_cleanup() with one argument when there
> should be two arguments. K.O.
Fixed and committed. |
89
|
30 Nov 2003 |
Konstantin Olchanski | | bad call to cm_cleanup() in fal.c | fal.c does not compile: it calls cm_cleanup() with one argument when there
should be two arguments. K.O. |
88
|
01 Dec 2003 |
Stefan Ritt | | delete key followed by create record leads to empty structure in experim.h | > I have noticed a problem with deleting a key to an array in odb, then
> recreating the record as in the code below. The record is recreated
> successfully, but when viewing it with mhttpd, a spurious blank line
> (coloured orange) is visible, followed by the rest of the data as normal.
>
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
> }
> else {
> exp_mode = 2; /* TDmusr types - noscans */
> status =
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
> }
The first problem is that the db_create_record has a trailing "/" in the key name
after Settings. This causes the (empty) subsirectory which causes your trouble.
Simple removing it fixes the problem. I agree that this is not obvious, so I
added some code in db_create_record() which removes such a trailing slash if
present. New version under CVS.
Second, the db_create_record() call is deprecated. You should use the new
function db_check_record() instead, and remove your db_delete_key(). This avoids
possible ODB trouble since the structure is not re-created each time, but only
when necessary.
- Stefan |
87
|
25 Nov 2003 |
Suzannah Daviel | | delete key followed by create record leads to empty structure in experim.h | Hi,
I have noticed a problem with deleting a key to an array in odb, then
recreating the record as in the code below. The record is recreated
successfully, but when viewing it with mhttpd, a spurious blank line
(coloured orange) is visible, followed by the rest of the data as normal.
This blank line causes trouble with experim.h because it
produces an empty structure e.g. :
#define CYCLE_SCALERS_SETTINGS_DEFINED
typedef struct {
struct {
} ;
char names[60][32];
} CYCLE_SCALERS_SETTINGS;
rather than :
#define CYCLE_SCALERS_SETTINGS_DEFINED
typedef struct {
char names[60][32];
} CYCLE_SCALERS_SETTINGS;
This empty structure causes a compilation error when rebuilding clients that
use experim.h
SD
CYCLE_SCALERS_TYPE1_SETTINGS_STR(type1_str);
CYCLE_SCALERS_TYPE2_SETTINGS_STR(type2_str);
Both type1_str and type2_str have been defined as in
experim.h
i.e.
#define CYCLE_SCALERS_TYPE1_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
........
........
"[32] General%NeutBm Cycle Sum",\
"[32] General%NeutBm Cycle Asym",\
"",\
NULL }
#define CYCLE_SCALERS_TYPE2_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
...........
............
"[32] General%B/F Cumul -",\
"[32] General%Asym Cumul -",\
"",\
NULL }
if (db_find_key(hDB, 0, "/Equipment/Cycle_scalers/Settings/",&hKey) ==
DB_SUCCESS)
db_delete_key(hDB,hKey,FALSE);
if ( strncmp(fs.input.experiment_name,"1",1) == 0) {
exp_mode = 1; /* Imusr type - scans */
status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
}
else {
exp_mode = 2; /* TDmusr types - noscans */
status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
} |
86
|
01 Dec 2003 |
Konstantin Olchanski | | Implementation of db_check_record() | > Fixed and committed. Can you check if it's working?
Yes, it is fixed. Thanks. K.O. |
85
|
30 Nov 2003 |
Stefan Ritt | | Implementation of db_check_record() | Fixed and committed. Can you check if it's working? |
84
|
30 Nov 2003 |
Konstantin Olchanski | | Implementation of db_check_record() | > > I have therefore implemented the function
> > db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
> correct)
>
> Stephan, something is very wrong with the new code. My
> "/logger/channels/0/settings" is being destroyed on "begin run".
Okey. I found the problem in db_check_record(): when we decide that we have a
mismatch, we call db_create_record(...,rec_str), but by this time, rec_str no
longer points to the beginning of the ODB string because we started parsing it.
I tried this solution: save rec_str into rec_str_orig, then when we decide that
we have a mismatch, call db_create_record() with this saved rec_str_orig. It
fixes my immediate problem (destruction of "/logger/channels/0/settings"), but is
it correct?
I would like to fix it ASAP to get cvs-head working again: our mhttpd dumps core
on an assert() failure in db_create_record() and the set of db_check_record()
changes might fix it for me.
Here is the CVS diff:
RCS file: /usr/local/cvsroot/midas/src/odb.c,v
retrieving revision 1.73
diff -r1.73 odb.c
7810a7811
> char *rec_str_orig = rec_str;
7820c7821
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
7838c7839
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
8023c8024
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
8037c8038
< return db_create_record(hDB, hKey, keyname, rec_str);
---
> return db_create_record(hDB, hKey, keyname, rec_str_orig);
K.O. |
83
|
27 Nov 2003 |
Konstantin Olchanski | | Implementation of db_check_record() | > I have therefore implemented the function
> db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
correct)
Stephan, something is very wrong with the new code. My
"/logger/channels/0/settings" is being destroyed on "begin run". Midas
checkout from october 31st is okey. This is a show stopper, but I am in a rush
and cannot debug it. I am falling back to the Oct 31st version... K.O. |
82
|
20 Nov 2003 |
Stefan Ritt | | Implementation of db_check_record() | As Konstantin pointed out correctly, the db_create_record() call is pretty
heavy since it copies whole structures around the ODB. Therefore, it
should not used frequently. It might be that several problems are caused
by that, for example the "phantom" records reported in elog:40 .
I have therefore implemented the function
db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str,
BOOL correct)
which takes an ASCII structure in the same way as db_create_record(), but
only checks this ASCII structure against the ODB contents without writing
anything to the ODB.
If the record does not exist at all, it is created via db_create_record().
This is useful for example with the /Runinfo structure on a virgin ODB.
If the parameter "correct" is FALSE, the function returns
DB_STRUCT_MISMATCH if the ODB contents is wrong (wrong order of variables,
wrong name of variables, wrong type or array size). The calling function
should then abort, since a subsequent db_open_record() would fail. Note
that although abort() is useful, one should add cm_disconnect_experiment()
just before the abort() in order to have the application "log out" from
the ODB gracefully. If the parameter "correct" is TRUE, the function
db_create_record() is called internally to correct a mismatching record.
I have changed most calls of db_create_record() in mhttpd.c, mfe.c, mana.c
and mlogger.c. Pierre, could you do the same for lazylogger.c?
I also started to put assert()'s everywhere and encourage everyone to
follow. Under Windows, the asserts() are removed automatically if
compiling in "Release" mode.
So I committed many changes, did some quick tests, but am not 100%
convinced that all the changes are good. So please use the new code
cautiously, and let me know if there is any new problem. I also would like
to get some feedback if the whole thing becomes more stable now. |
81
|
12 Dec 2003 |
Stefan Ritt | | db_close_record non-local/non-return | Hi Paul,
sorry my late reply, I had to find some time for debugging your problem.
Thank you very much for the detailed description of the problem, I wish all
bug reports would be such elaborate!
You were right that there was a bug in the RPC system. The function
db_remove_open_record() got a new parameter recently, which was not changed
in the RPC call, and caused the mserver side to crash on any
db_close_record() call.
I fixed it and the update is under CVS (http://midas.psi.ch/cgi-
bin/cvsweb/midas/src/). Since you need to update many files, I wonder if I
should enable anonymous CVS read access. Does anybody know how to set this
up using "ssh" as the protocol (via CVS_RSH=ssh)?
Please note that db_close_record() is not necessary as
cm_disconnect_experiment() takes care of this, but having it there does not
hurt. |
80
|
09 Dec 2003 |
Paul Knowles | | db_close_record non-local/non-return |
Hi All,
I have found a weird one:
The following code executes on the frontend machine in the
frontend_exit() routine, and connects to the odb running on
another separate machine:
...
cm_msg(MINFO,__func__, "line %d", __LINE__);
cm_get_experiment_database(&hdb, NULL);
cm_msg(MINFO,__func__, "line %d", __LINE__);
status = db_find_key(hdb, 0, "/Experiment/Run Parameters", &hkey);
cm_msg(MINFO,__func__, "line %d, hkey=%d, status=%d",
__LINE__, hkey, status);
checkstat("db_find_key returned status %d", status);
cm_msg(MINFO,__func__, "line %d", __LINE__);
status = db_close_record(hdb, hkey);
/* NOTREACHED!! the above call to db_close_record
doesn't return!
*/
cm_msg(MINFO,__func__, "line %d, status=%d", __LINE__, status);
checkstat("db_close_record returned status %d", status);
checkstat is a macro that does the following:
#define checkstat(format, arg...)\
do{ if(status != DB_SUCCESS) {\
cm_msg(MERROR, __func__, format, ## arg);\
return FE_ERR_ODB;}}while(0)
The key exists, and the status of the search is 1
(i.e., DB_SUCCESS) and rest of the code tries to run. What gets
really weird is that the db_close_record _doesn't_ _return_.
The code following the NOTREACHED comment just doesn't get
called. I get the message from the __LINE__ just in front
of the call, but not the message afterwards (cm_msg and printf
were tried). Somehow db_close_record is causing a non-local
exit or signal or something. No error message is printed and the
frontend continues to exit with exit code 0. But, since the rest
of my frontend_exit/odb closing doesn't happen, the odb is left in
a lost state requiring a cleanup. If I comment out the calls to
db_close_record, the rest of my frontend_exit runs normally
and the cm_disconnect_experiment() in mfe.c eventually closes my
open records correctly (I expect, anyway) and this is the present
workaround i am using. The terror i have is that several of my
hotlinked callback routines will call the close_record routine
when resetting illegal values. No end of hilarity will result there...
I was using the same code in the frontend under 1.9.2 and
have only recently upgraded to 1.9.3-? tarball from PAA and
there were no problems using the 1.9.2 code: this is a 1.9.3
issue.
I have localized the weirdness to what I think is the RPC interface.
Running the nullfrontend (no camac access) on the same machine as
hosts the ODB I can make the problem appear and disappear in the
following way:
(odb is local on machine ``monet'')
nullfe -h monet -e acqmonad : db_close_record will get lost
nullfe -e acqmonad : db_close_record works as expected.
I've tried also with the patch for the 256 byte odb string bug since
many of the open records have strings of that length, but that isn't
it. The only substancial looking change to mserver from 1.9.2 to 1.9.3
is the SIGPIPE ignore and that doesn't look like a good candidate either.
Can this be that some of the
#IFDEF LOCAL_ROUTINES
that got moved about in odb.c and others
are causing the remote call to get confused?
Clearly the answer is to just use stable and happy 1.9.2, but the
people for whom I am working now really want to use ROOT for
an analyzer...
cheers,
.p.
Paul Knowles. phone: 41 26 300 90 64
email: Paul.Knowles@unifr.ch Fax: 41 26 300 97 47
finger me at pexppc33.unifr.ch for more contact information |
79
|
12 Dec 2003 |
Stefan Ritt | | Several small fixes and changes | I committed several small fixes and changes:
- install.txt which mentions explicitly ROOT
- mana.c and the main Makefile which fixes all HBOOK compiler warnings
- mana.c to write an explicit warning if the experiment directoy contains
uppercase letters in the path (HBOOK does not like this and refuses to
read/write histos)
- mserver.c, mrpc.c, odb.c to fix a wrong parameter in
db_remove_open_record() (see previous entry from Paul)
- added experim.h into the dependency of the hbookexpt Makefile |
78
|
06 Jan 2004 |
Stefan Ritt | | Poll about default indent style | Ok, taking all comments so far into account, I conclude adopting the ROOT
coding style would be best for us. So I put
indent:
find . -name "*.[hc]" -exec indent -kr -nut -i3 {} \;
Into the makefile. Hope everybody is happy now (;-))) |
77
|
01 Jan 2004 |
Konstantin Olchanski | | Poll about default indent style | > I don't feel a strong need of giving up a "-i2"...
I am comfortable with the current MIDAS styling convention and I would rather not
have yet another private religious war over the right location for the curley braces.
If we are to consider changing the MIDAS coding convention, I urge all and sundry
to read the ROOT coding convention, as written by Rene Brun and Fons Rademakers at
http://root.cern.ch/root/Conventions.html. The ROOT people did their homework, they
did read the literature and they produced a well considered and well argumented style.
Also, while there, do read the Taligent documentation- by far, one of the most
coherent manuals to C++ programming style.
K.O. |
|