I was looking at something in the mserver and noticed that for remote frontends, for every periodic event,
there are about 3 RPC calls to bm_flush_cache().
Sure enough, in mfe.c::send_event(), for every event sent, there are 2 calls to bm_flush_cache() (once for
the buffer we used, second for all buffers). Then, for a good measure, the mfe idle loop calls
bm_flush_cache() for all buffers about once per second (even if no events were generated).
So what is going on here? To allow good performance when processing many small events,
the MIDAS event buffer code (bm_send_event()) buffers small events internally, and only after this internal
buffer is full, the accumulated events are flushed into the shared memory event buffer,
where they become visible to the mlogger, mdump and other consumers.
Because of this internal buffering, infrequent small size periodic events can become
stuck for quite a long time, confusing the user: "my frontend is sending events, how come I do not
see them in mdump?"
To avoid this, mfe.c manually flushes these internal event buffers by calling bm_flush_buffer().
And I think that works just fine for frontends directly connected to the shared memory, one call to
bm_flush_buffer() should be sufficient.
But for remote fronends connected through the mserver, it turns out there is a race condition between
sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another
I see that the mserver always reads the rpc connection before the event connection, so bm_flush_cache()
is done *before* the event is written into the buffer by bm_send_event(). So the newly
send event is stuck in the buffer until bm_flush_cache() for the *next* event shows up:
mfe.c: send_event1 -> flush -> ... wait until next event ... -> send_event2 -> flush
mserver: flush -> receive_event1 -> ... wait ... -> flush -> receive_event2 -> ... wait ...
mdump -> ... nothing ... -> ... nothing ... -> event1 -> ... nothing ...
Enter the 2nd call to bm_flush_cache in mfe.c (flush all buffers) - now because mserver seems to be
alternating between reading the rpc connection and the event connection, the race condition looks like
mfe.c: send_event -> flush -> flush
mserver: flush -> receive_event -> flush
mdump: ... -> event -> ...
So in this configuration, everything works correctly, the data is not stuck anywhere - but by accident, and
at the price of an extra rpc call.
But what about the periodic 1/second bm_flush_cache() on all buffers? I think it does not quite work
either because the race condition is still there: we send an event, and the first flush may race it and only
the 2nd flush gets the job done, so the delay between sending the event and seeing it in mdump would be
around 1-2 seconds. (no more than 2 seconds, I think). Since users expect their events to show up "right
away", a 2 second delay is probably not very good.
Because periodic events are usually not high rate, the current situation (4 network transactions to send 1
event - 1x send event, 3x flush buffer) is probably acceptable. But this definitely sets a limit on the
maximum rate to 3x (2x?) the mserver rpc latency - without the rpc calls to bm_flush_buffer() there
would be no limit - the events themselves are sent through a pipelined tcp connection without
One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to
bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).
Another solution could be to send events with a special flag telling the mserver to "flush the buffer right
P.S. Look ma!!! A race condition with no threads!!!