Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  07 May 2009, Konstantin Olchanski, Info, mhttpd now uses mtransition 
    Reply  21 May 2009, Konstantin Olchanski, Info, mhttpd now uses mtransition 
       Reply  02 Jun 2009, Konstantin Olchanski, Info, mhttpd now uses mtransition 
          Reply  26 Jun 2009, Konstantin Olchanski, Info, mhttpd now uses mtransition 
Message ID: 583     Entry time: 21 May 2009     In reply to: 571     Reply to this: 584
Author: Konstantin Olchanski 
Topic: Info 
Subject: mhttpd now uses mtransition 
> mhttpd function for starting and stopping runs now uses cm_transition(DETACH) which spawns an 
> external helper program called mtransition to handle the transition sequencing.
> 
> P.S. In one of our experiments, I sometimes see mhttpd getting "stuck" when starting or stopping a run 
> using this feature. strace shows it is stuck in repeated calls to wait(), but I am unable to reproduce this 
> problem in a test system and it happens only sometimes in the experiment. When it does, mhttpd has to 
> be restarted. Replacing system("mtransition ...") with ss_sysem("mtransition ...") seems to fix this problem, 
> but there are downsides to this (mtransition debug output vanishes) so I am not committing this yet.
> K.O.

Found the problem. As observed on SL5 systems, the GLIBC "system()" function breaks if the user application
installs a SIGCHLD handler that "steals" wait() notifications. Such a handler is installed by the MIDAS ss_exec()
function in system.c.

I would count this as a GLIBC bug - their "system()" function should survive in the presence of non-default signal
handlers installed by the user, and in fact my copy of "man signal" talks about the "system()" doing something
special about SIGCHLD. Obviously whatever they do is broken, at least in the SL5 GLIBC.

I am now testing an implementation using MIDAS ss_spawnvp().

The simplest way to reproduce the problem: start mhttpd; start/stop runs - mtransition works perfectly; start some
program from the MIDAS "programs" page (this calls "ss_exec()"), try to start a run - mhttpd will hang inside the
system() GLIBC function, every time. mhttpd has to be killed with "kill -KILL" to recover.

K.O.
ELOG V3.1.4-2e1708b5