Midas Talk

Midas DAQ System
Not logged in
Back | Find | Login | Help
Message ID: 2170 Entry time: 19 May 2021
Author:	Konstantin Olchanski
Topic:	Info
Subject:	update of event buffer code
a big update to the event buffer code was merged today.

two important bug fixes:

- a logic error in bm_receive_event() (actually bm_fill_read_cache_locked()) 
caused use of uninitialized variable to increment the read pointer and crash 
with error "read pointed points to an invalid event")
- missing bm_unlock() in bm_flush_cache() caused double-locking of event buffer 
caused a hang and a subsequent crash via the watchdog timeout.

several improvements:

- bm_receive_event_vec(std::vector<char>) with automatic memory allocation, one 
does not need to worry about providing a large event buffer to receive event 
data. For local connections MAX_EVENT_SIZE is no longer used, for remote 
connections, a buffer of MAX_EVENT_SIZE is allocated automatically, this is a 
limitation of the MIDAS RPC layer (it does not know how to allocate memory to 
receive arbitrary large data)

(MAX_EVENT_SIZE is now only used in bm_receive_event_rpc()).

- rpc_send_event_sg() - thread safe method to send events to the mserver. it 
takes an array of scatter-gather buffers, so a midas event does not have to be 
in one continious buffer.

- bm_send_event_sg() - same for local connections.

- on top of bm_send_event_sg() we now have bm_send_event_vec(std::vector<char>) 
and bm_send_event_vec(std::vector<vector<char>>). now we can move forward with 
implementing a new "event object" (the TMEvent event object from midasio.h 
already works with these new methods).

- remote connected bm_send_event() & co now always send events to the mserver 
using the event socket. (before, bm_send_event() used RPC_BM_SEND_EVENT and 
suffered from the RPC layer encoding/decoding overhead. mfe.c used 
rpc_send_event() for remote connections)

- bm_send_event(), bm_receive_event() & co now take a timeout value (in 
milliseconds) instead of an async_flag. The old async_flag values BM_WAIT and 
BM_NO_WAIT continue working as expected (wait forever and do not wait at all, 
respectively).

- following improvements are only for remote connections:

- in the case of event buffer congestion (event readers are slow, event buffers 
are close to 100% full), the bm_flush_cache() RPC will no longer timeout due to 
mserver being stuck waiting for free buffer space. (RPC is called with a 1000 
msec timeout, infinite loop waiting for flush is done on the frontend side, the 
RPC timeout will never fire)

- in the case of event buffer congestion, ODB RPC will no longer timeout. 
(previously mserver was stuck waiting for free buffer space and did not process 
any RPCs).

- at the end of run, last few events could be stuck in the event socket. now, 
frontends can flush it using bm_flush_cache(0,BM_WAIT) (use zero for the buffer 
handle). correct run transition should stop the trigger, stop generating new 
events, call bm_flush_cache(0,BM_WAIT), call bm_flush_cache("SYSTEM",BM_WAIT) 
and return success. (TMFE frontend already does this). Note that 
bm_flush_cache(BM_WAIT) can be stuck for very long time waiting for the event 
buffers to empty-out, so run transition RPC timeout is still possible.

K.O.
ELOG V3.1.4-2e1708b5