Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  20 Nov 2013, Konstantin Olchanski, Bug Report, Too many bm_flush_cache() in mfe.c 
    Reply  21 Nov 2013, Stefan Ritt, Bug Report, Too many bm_flush_cache() in mfe.c 
Message ID: 939     Entry time: 20 Nov 2013     Reply to this: 940
Author: Konstantin Olchanski 
Topic: Bug Report 
Subject: Too many bm_flush_cache() in mfe.c 
I was looking at something in the mserver and noticed that for remote frontends, for every periodic event, 
there are about 3 RPC calls to bm_flush_cache().

Sure enough, in mfe.c::send_event(), for every event sent, there are 2 calls to bm_flush_cache() (once for 
the buffer we used, second for all buffers). Then, for a good measure, the mfe idle loop calls 
bm_flush_cache() for all buffers about once per second (even if no events were generated).

So what is going on here? To allow good performance when processing many small events,
the MIDAS event buffer code (bm_send_event()) buffers small events internally, and only after this internal
buffer is full, the accumulated events are flushed into the shared memory event buffer,
where they become visible to the mlogger, mdump and other consumers.

Because of this internal buffering, infrequent small size periodic events can become
stuck for quite a long time, confusing the user: "my frontend is sending events, how come I do not
see them in mdump?"

To avoid this, mfe.c manually flushes these internal event buffers by calling bm_flush_buffer().

And I think that works just fine for frontends directly connected to the shared memory, one call to 
bm_flush_buffer() should be sufficient.

But for remote fronends connected through the mserver, it turns out there is a race condition between 
sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another 
tcp connection.

I see that the mserver always reads the rpc connection before the event connection, so bm_flush_cache() 
is done *before* the event is written into the buffer by bm_send_event(). So the newly
send event is stuck in the buffer until bm_flush_cache() for the *next* event shows up:

mfe.c: send_event1 -> flush -> ... wait until next event ... -> send_event2 -> flush
mserver: flush -> receive_event1 -> ... wait ... -> flush -> receive_event2 -> ... wait ...
mdump -> ... nothing ... -> ... nothing ... -> event1 -> ... nothing ...

Enter the 2nd call to bm_flush_cache in mfe.c (flush all buffers) - now because mserver seems to be 
alternating between reading the rpc connection and the event connection, the race condition looks like 
this:

mfe.c: send_event -> flush -> flush
mserver: flush -> receive_event -> flush
mdump: ... -> event -> ...

So in this configuration, everything works correctly, the data is not stuck anywhere - but by accident, and 
at the price of an extra rpc call.

But what about the periodic 1/second bm_flush_cache() on all buffers? I think it does not quite work
either because the race condition is still there: we send an event, and the first flush may race it and only 
the 2nd flush gets the job done, so the delay between sending the event and seeing it in mdump would be 
around 1-2 seconds. (no more than 2 seconds, I think). Since users expect their events to show up "right
away", a 2 second delay is probably not very good.

Because periodic events are usually not high rate, the current situation (4 network transactions to send 1 
event - 1x send event, 3x flush buffer) is probably acceptable. But this definitely sets a limit on the 
maximum rate to 3x (2x?) the mserver rpc latency - without the rpc calls to bm_flush_buffer() there
would be no limit - the events themselves are sent through a pipelined tcp connection without 
handshaking.

One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to 
bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).

Another solution could be to send events with a special flag telling the mserver to "flush the buffer right 
away".

P.S. Look ma!!! A race condition with no threads!!!

K.O.
ELOG V3.1.4-2e1708b5