When using remote midas clients with mserver, you may have noticed the zero-size .RPC.SHM files
these clients create in the directory where you run them. These files are associated with the semaphore
created by the midas rpc layer (rpc_call) to synchronize rpc calls between multiple threads. This
semaphore is always created, even for single-threaded midas applications. Also normally midas
semaphore files are created in the midas experiment directory specified in exptab (same place as
.ODB.SHM), but for remote clients, we do not know that location until we start making rpc calls, so the
semaphore file is created in the current directory (and it is on a remote machine anyway, so this
location may not be visible locally).
There are 2 problems with these semaphores:
1) in multiple experiments, we have observed the RPC.SHM semaphore stuck in a locked state,
requiring manual cleanup (ipcrm -s xxx). So far, I have failed to duplicate this lockup using test
programs and test experiments. The code appears to be coded correctly to automatically unlock the
semaphore when the program exits or is killed.
2) RPC.SHM is created as a global shared semaphore so it synchronizes rpc calls not just for all threads
inside one application, but across all threads in all applications (excessive locking - separate
applications are connected to separate mservers and do not need this locking); but only for applications
that run from the same current directory - RPC.SHM files in different directories are "connected" to
different semaphores.
To try to fix this, I implemented "private semaphores" in system.c and made rpc_call() use them.
This introduced a major bug - a semaphore leak - quickly using up all sysv semaphores (see sysctl
kernel.sem).
The code was now reverted back to using RPC.SHM as described above.
The "bad" svn revisions start with rev 4472, the problem is fixed in rev 4480.
If you use remote midas clients and have one of these bad revisions, either update midas.c to rev 4480
or apply this patch to midas.c::rpc_call():
ss_mutex_create("", &_mutex_rpc);
should read
ss_mutex_create("RPC", &_mutex_rpc);
Apologies for any inconvenience caused by this problem
K.O. |