a big update to the event buffer code was merged today.
two important bug fixes:
- a logic error in bm_receive_event() (actually bm_fill_read_cache_locked())
caused use of uninitialized variable to increment the read pointer and crash
with error "read pointed points to an invalid event")
- missing bm_unlock() in bm_flush_cache() caused double-locking of event buffer
caused a hang and a subsequent crash via the watchdog timeout.
several improvements:
- bm_receive_event_vec(std::vector<char>) with automatic memory allocation, one
does not need to worry about providing a large event buffer to receive event
data. For local connections MAX_EVENT_SIZE is no longer used, for remote
connections, a buffer of MAX_EVENT_SIZE is allocated automatically, this is a
limitation of the MIDAS RPC layer (it does not know how to allocate memory to
receive arbitrary large data)
(MAX_EVENT_SIZE is now only used in bm_receive_event_rpc()).
- rpc_send_event_sg() - thread safe method to send events to the mserver. it
takes an array of scatter-gather buffers, so a midas event does not have to be
in one continious buffer.
- bm_send_event_sg() - same for local connections.
- on top of bm_send_event_sg() we now have bm_send_event_vec(std::vector<char>)
and bm_send_event_vec(std::vector<vector<char>>). now we can move forward with
implementing a new "event object" (the TMEvent event object from midasio.h
already works with these new methods).
- remote connected bm_send_event() & co now always send events to the mserver
using the event socket. (before, bm_send_event() used RPC_BM_SEND_EVENT and
suffered from the RPC layer encoding/decoding overhead. mfe.c used
rpc_send_event() for remote connections)
- bm_send_event(), bm_receive_event() & co now take a timeout value (in
milliseconds) instead of an async_flag. The old async_flag values BM_WAIT and
BM_NO_WAIT continue working as expected (wait forever and do not wait at all,
respectively).
- following improvements are only for remote connections:
- in the case of event buffer congestion (event readers are slow, event buffers
are close to 100% full), the bm_flush_cache() RPC will no longer timeout due to
mserver being stuck waiting for free buffer space. (RPC is called with a 1000
msec timeout, infinite loop waiting for flush is done on the frontend side, the
RPC timeout will never fire)
- in the case of event buffer congestion, ODB RPC will no longer timeout.
(previously mserver was stuck waiting for free buffer space and did not process
any RPCs).
- at the end of run, last few events could be stuck in the event socket. now,
frontends can flush it using bm_flush_cache(0,BM_WAIT) (use zero for the buffer
handle). correct run transition should stop the trigger, stop generating new
events, call bm_flush_cache(0,BM_WAIT), call bm_flush_cache("SYSTEM",BM_WAIT)
and return success. (TMFE frontend already does this). Note that
bm_flush_cache(BM_WAIT) can be stuck for very long time waiting for the event
buffers to empty-out, so run transition RPC timeout is still possible.
K.O. |