Entry  09 Dec 2003, Paul Knowles, , db_close_record non-local/non-return 
    Reply  12 Dec 2003, Stefan Ritt, , db_close_record non-local/non-return 
Entry time: 09 Dec 2003
Author: Paul Knowles 
Subject: db_close_record non-local/non-return 
Hi All,

I have found a weird one:

The following code executes on the frontend machine in the
frontend_exit() routine, and connects to the odb running on
another separate machine:
     cm_msg(MINFO,__func__, "line %d", __LINE__);

     cm_get_experiment_database(&hdb, NULL);

     cm_msg(MINFO,__func__, "line %d", __LINE__);
     status = db_find_key(hdb, 0, "/Experiment/Run Parameters", &hkey);
     cm_msg(MINFO,__func__, "line %d, hkey=%d, status=%d",
            __LINE__, hkey, status);
     checkstat("db_find_key returned status %d", status);
     cm_msg(MINFO,__func__, "line %d", __LINE__);
     status = db_close_record(hdb, hkey);

     /* NOTREACHED!! the above call to db_close_record
        doesn't return!
     cm_msg(MINFO,__func__, "line %d, status=%d", __LINE__, status);
     checkstat("db_close_record returned status %d", status);

checkstat is a macro that does the following:
#define checkstat(format, arg...)\
do{ if(status != DB_SUCCESS) {\
cm_msg(MERROR, __func__, format, ## arg);\
return FE_ERR_ODB;}}while(0)

The key exists, and the status of the search is 1
(i.e., DB_SUCCESS) and rest of the code tries to run.  What gets
really weird is that the db_close_record _doesn't_ _return_.
The code following the NOTREACHED comment just doesn't get
called.  I get the message from the __LINE__ just in front
of the call, but not the message afterwards (cm_msg and printf 
were tried).  Somehow db_close_record is causing a non-local 
exit or signal or something. No error message is printed and the 
frontend continues to exit with exit code 0.  But, since the rest
of my frontend_exit/odb closing doesn't happen, the odb is left in
a lost state requiring a cleanup.  If I comment out the calls to 
db_close_record, the rest of my frontend_exit runs normally 
and the cm_disconnect_experiment() in mfe.c eventually closes my 
open records correctly (I expect, anyway) and this is the present 
workaround i am using.  The terror i have is that several of my 
hotlinked callback routines will call the close_record routine 
when resetting illegal values.  No end of hilarity will result there...

I was using the same code in the frontend under 1.9.2 and
have only recently upgraded to 1.9.3-? tarball from PAA and 
there were no problems using the 1.9.2 code: this is a 1.9.3

I have localized the weirdness to what I think is the RPC interface.
Running the nullfrontend (no camac access) on the same machine as 
hosts the ODB I can make the problem appear and disappear in the 
following way:
(odb is local on machine ``monet'')

nullfe -h monet -e acqmonad     : db_close_record will get lost

nullfe -e acqmonad              : db_close_record works as expected.

I've tried also with the patch for the 256 byte odb string bug since
many of the open records have strings of that length, but that isn't
it. The only substancial looking change to mserver from 1.9.2 to 1.9.3
is the SIGPIPE ignore and that doesn't look like a good candidate either.
Can this be that some of the 
that got moved about in odb.c and others
are causing the remote call to get confused?

Clearly the answer is to just use stable and happy 1.9.2, but the 
people for whom I am working now really want to use ROOT for
an analyzer...


Paul Knowles.                   phone: 41 26 300 90 64
email:      Fax: 41 26 300 97 47
finger me at for more contact information
