Thanks Konstantin for your detailed description.
I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly
from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the
rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx.
Each thread does (simplified):
while (true) {
do {
status = rb_get_wp(&pevent);
} while (status == DB_TIMEOUT)
bm_compose_event_threadsafe(pevent, ..., &serial_number);
bk_init32(pevent+1);
... fill event ...
bk_close(pevent)
rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size);
}
The framework now collects all these events in receive_trigger_event() which runs in the main thread:
for (i=0 ; i<n_thread ; i++) {
rb_get_rp(i, pevent);
if (pevent->serial_number == prev_serial+1)
break;
}
prev_serial = pevent->serial_number;
rpc_send_event(pevent);
rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size);
This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only
from a single thread, so the write buffer can be used effectively without complicated multi-thread locks.
This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well
instead of struggling with all your locks.
Stefan |