Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 1 
    Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 2, bm_send_event() 
       Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 3, rpc_send_event() 
          Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 4, reading from event buffer 
             Reply  02 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 5, bm_read_buffer() 
                Reply  02 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 6, reading events through the mserver 
                   Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 7, event buffer polling frequencies 
                      Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
                         Reply  10 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
Message ID: 1427     Entry time: 28 Dec 2018     In reply to: 1426     Reply to this: 1428
Author: Konstantin Olchanski 
Topic: Info 
Subject: note on the midas event buffer code, part 2, bm_send_event() 
> In this technical note, I write down the workings of the midas event buffer code
> we need to understand and write down how the event buffer code works.

The data ingress part of the event buffer code is very simple, events are sent to the event buffer via the one 
function bm_send_event(). There is no other way to inject data into an event buffer.

bm_send_event() does this:
= if the write cache is active:
- the new event is written into the local write cache
- if the write cache is full, it is flushed into the event buffer via bm_flush_cache()
- if the new event does not fit into the write cache, the write cache is flushed and the event is written into the 
event buffer (preserving the event ordering).
= if the write cache is inactive, new events are written directly into the event buffer.
= to write an event into the event buffer:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy data into buffer shared memory
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for this event

bm_flush_cache() does this:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy events from write cache to buffer shared memory
- at the same time keep track of readers that wait for these events
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for previous cached events

bm_wait_for_free_space() does this:
= if buffer is full and sync_flag==BM_NO_WAIT
- return BM_ASYNC_RETURN immediately
- causing bm_send_event() to immediately return BM_ASYNC_RETURN without writing anything to the event 
buffer
= if buffer is full,
- sleep using ss_suspend(1000, MSG_BM), then check again
- there is no timeout, bm_wait_for_free_space() will wait forever

The most expensive operation when writing to the event buffer is the locking,
we must wait until all readers finish their reading and all other writers finish their writing
and unlock the buffer for us. (read on "lock fairness and starvation"). In general, the less
locking we do, the better.

To reduce lock contention the event buffer code has a write cache. In theory, when we write
a large number of small events, it is more efficient to "batch" them together, significantly
reducing the number of locking operations "per event". Even if the events are large, if there
is significant lock contention from multiple writers and multiple readers, batching the writes
is still a good idea. The downside of this is the cost of an extra memcpy(). instead of one memcpy() from
user buffer to the shared memory, we do 2 memcpy() - from user buffer to write cache, then from
write cache to the shared memory. Today's typical PC-type machines have very fast RAM,
so memcpy() is inexpensive. However embedded and low power machines (ARM SoCs, FPGA based SoCs,
etc) tend to have pretty slow memory, so extra memcpy() can be expensive.

The bottom line is that the write cache size should be tuned to the actual use case,
but in general it is less useful at low data rates (hardly any contention for the event buffer locks),
and more useful at high data rates especially with very small event sizes (high overhead from
locking on each event).

Right now the write cache is always enabled for mfe-based frontends:
bm_set_cache_size(..., 0, SERVER_CACHE_SIZE); // 100000 bytes
(perhaps the cache size should be made configurable via ODB /Eq/xxx/Common).

To ensure that events do not sit in the write cache for too long,
mfe-based frontends call bm_flush_cache() about once per second.
(see mfe.c, good luck!)

to be continued,
K.O.
ELOG V3.1.4-2e1708b5