Back Midas Rome Roody Rootana
  Midas DAQ System, Page 2 of 121  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subject
  2416   18 Jul 2022 Konstantin OlchanskiReleasemidas-2022-05
There is a release branch for midas-2022-05 and corresponding git tag midas-2022-
05-b. This branch is known to be stable and is working well for the ALPHA 
experiment at CERN. Latest update to this branch fixes two problems in the 
mserver (rpc timeout and a use-after-free internal error).

https://bitbucket.org/tmidas/midas/branch/release/midas-2022-05
K.O.
  2415   18 Jul 2022 Konstantin OlchanskiBug ReportRPC timeout for manalyzer over network
> In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...

there is a subtle bug in the mserver. under rare conditions, ss_suspend() will recurse in an unexpected way
and mserver will go to sleep waiting for data from a udp socket (that will never arrive, so sleep forever).
remote client will see it as an rpc timeout. in my tests (and in ALPHA-g at CERN, as reported by Joseph),
I see this rare condition to happen about every 5 minutes. in normal use, this is the first time we become
aware of this problem, the best I can tell this bug was in the mserver since day one.

commit https://bitbucket.org/tmidas/midas/commits/fbd06ad9d665b1341bd58b0e28d6625877f3cbd0
to develop and
to release/midas-2022-05

The stack trace that shows the mserver hang/crash (sleep() is the stand-in for the sleep-forever socket read).

(gdb) bt
#0  0x00007f922c53f9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00007f922c53f894 in sleep () from /lib64/libc.so.6
#2  0x0000000000451922 in ss_suspend (millisec=millisec@entry=100, msg=msg@entry=1) at /home/agmini/packages/midas/src/system.cxx:4433
#3  0x0000000000411d53 in bm_wait_for_more_events_locked (pbuf_guard=..., pc=pc@entry=0x7f920639b93c, timeout_msec=timeout_msec@entry=100, 
    unlock_read_cache=unlock_read_cache@entry=1) at /home/agmini/packages/midas/src/midas.cxx:9429
#4  0x00000000004238c3 in bm_fill_read_cache_locked (timeout_msec=100, pbuf_guard=...) at /home/agmini/packages/midas/src/midas.cxx:9003
#5  bm_read_buffer (pbuf=pbuf@entry=0xdf8b50, buffer_handle=buffer_handle@entry=2, bufptr=bufptr@entry=0x0, buf=buf@entry=0x7f9203d75020, 
    buf_size=buf_size@entry=0x7f920639aa20, vecptr=vecptr@entry=0x0, timeout_msec=timeout_msec@entry=100, convert_flags=0, 
    dispatch=dispatch@entry=0) at /home/agmini/packages/midas/src/midas.cxx:10279
#6  0x0000000000424161 in bm_receive_event (buffer_handle=2, destination=0x7f9203d75020, buf_size=0x7f920639aa20, timeout_msec=100)
    at /home/agmini/packages/midas/src/midas.cxx:10649
#7  0x0000000000406ae4 in rpc_server_dispatch (index=11111, prpc_param=0x7ffcad70b7a0) at /home/agmini/packages/midas/progs/mserver.cxx:575
#8  0x000000000041ce9c in rpc_execute (sock=10, buffer=buffer@entry=0xe11570 "g+", convert_flags=0)
    at /home/agmini/packages/midas/src/midas.cxx:15003
#9  0x000000000041d7a5 in rpc_server_receive_rpc (idx=idx@entry=0, sa=0xde6ba0) at /home/agmini/packages/midas/src/midas.cxx:15958
#10 0x0000000000451455 in ss_suspend (millisec=millisec@entry=1000, msg=msg@entry=0) at /home/agmini/packages/midas/src/system.cxx:4575
#11 0x000000000041deb2 in rpc_server_loop () at /home/agmini/packages/midas/src/midas.cxx:15907
#12 0x0000000000405266 in main (argc=9, argv=<optimized out>) at /home/agmini/packages/midas/progs/mserver.cxx:390
(gdb) 

K.O.
  2414   25 Jun 2022 Joseph McKennaBug ReportRPC timeout for manalyzer over network

In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...

gdb trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff5d35859 in __GI_abort () at abort.c:79
#2  0x00005555555a2a22 in rpc_call (routine_id=11111) at /home/alpha/packages/midas/src/midas.cxx:13866
#3  0x000055555562699d in bm_receive_event_rpc (buffer_handle=buffer_handle@entry=2, buf=buf@entry=0x0, buf_size=buf_size@entry=0x0, ppevent=ppevent@entry=0x0, pvec=pvec@entry=0x7fffffffd700,
    timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10510
#4  0x0000555555631082 in bm_receive_event_vec (buffer_handle=2, pvec=pvec@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10794
#5  0x0000555555673dbb in TMEventBuffer::ReceiveEvent (this=this@entry=0x555557388b30, e=e@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/tmfe.cxx:312
#6  0x0000555555607b56 in ReceiveEvent (b=0x555557388b30, e=0x7fffffffd6c0, timeout_msec=100) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1411
#7  0x000055555560d8dc in ProcessMidasOnlineTmfe (args=..., progname=<optimized out>, hostname=<optimized out>, exptname=<optimized out>, bufname=<optimized out>, event_id=<optimized out>,
    trigger_mask=<optimized out>, sampling_type_string=<optimized out>, num_analyze=0, writer=<optimized out>, multithread=<optimized out>, profiler=<optimized out>,
    queue_interval_check=<optimized out>) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1534
#8  0x000055555560f93b in manalyzer_main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/9/bits/basic_string.h:2304
#9  0x00007ffff5d37083 in __libc_start_main (main=0x5555555b1130 <main(int, char**)>, argc=8, argv=0x7fffffffdda8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
    stack_end=0x7fffffffdd98) at ../csu/libc-start.c:308
#10 0x00005555555b184e in _start () at /usr/include/c++/9/bits/stl_vector.h:94

Any suggestions? Many thanks
  2413   20 Jun 2022 Stefan RittForumAlarm on variable not updating
There are two functions to do that, one check the last write access, the other the last write access if the run is running. The alarm condition looks like:

access(/Equipment/.../Variables/Input[10]) > 60

which will cause an alarm if the Input[10] is not written for more than 60 seconds. The other function which checks the run status as well is like:

access_running(...odb key...) > 60

You can actually see an example on the MEG alarm page.

Rather than having an alarm for that I would however recommend that you program you frontend such that it realizes if it looses connections, then tries automatically to reconnect or trigger an alarm itself (so-called "internal" alarm). This is also how the MSCB system is working and is much more robust.

Stefan
  2412   20 Jun 2022 jianrunBug ReportError in "midas/src/mana.cxx"
Dear Midas developers,

When we are running the examples in $MIDASSYS/examples/experiment/, we meet some 
problems when analyzing the results:
1. When we analyze the data using the analyzer: ./analyzer -i run00001.mid -o 
run00001.rz  , we find some bugs: 
"
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
[Analyzer,ERROR] [mana.cxx:1832:bor,ERROR] HBOOK support is not compiled in
[Analyzer,INFO] Set run number 6 in ODB
Load ODB from run 6...OK
run00006.mid:2680  events, 0.00s
"
We think this occurs in the "midas/src/mana.cxx ". How can we solve this?

2. When we analyze the above data, an error also occurs: 
[Analyzer,ERROR] [odb.cxx:847:db_validate_name,ERROR] Invalid name 
"/Analyzer/Tests/Always true/Rate [Hz]" passed to db_create_key_wlocked: should 
not contain "["

We simply fixed that just by replacing the "Rate [Hz]" with "Rate" in the 
test_write in midas/src/mana.cxx 
We are curious whether you can fix the problem permanently in the next version, 
or we are not running the code properly. Thanks!
  2411   19 Jun 2022 Francesco RengaForumAlarm on variable not updating
Dear all,
I've an ODB equipment that sometimes loses the connection with the hardware, so that the variables are not updated anymore. The connection can be restored by restarting the frontend. It would be useful to have an alarm based on the time from the last update of some variable (i.e. the alarm is triggered if the variable is not updated for more than X seconds). Is there a method to implement such an alarm in MIDAS?

Thank you very much,
Francesco
  2410   17 May 2022 Ben SmithInfoMIDAS switched to C++
>  - have you posted somewhere this guide about converting C frontends to C++?

There's documentation in the wiki at:
https://daq00.triumf.ca/MidasWiki/index.php/Changelog#2019-06

It includes a step-by-step guide of how to upgrade, what changes need to be made to frontends, and common issues that people had.
  2409   17 May 2022 Konstantin OlchanskiInfoMIDAS switched to C++
> Hi, I have three naive questions about this:

all good questions, ask more of them.

>  - have you posted somewhere this guide about converting C frontends to C++?

yes, in this elog here I posted a guide for converting C mfe.c frontends to C++ and
a guide for converting mfe.c frontend to C++ TMFE frontend. please use the "find" function,
if you cannot find them, let me know, I will look for it for you.

>  - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?

correct. please run "git tag", tags before "midas-2019-05-cxx"is "C", after is "C++".

>  - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
> is expected to fail or is just me glitching? Is it because of thread library differences?

yes, it is expected to fail, you have spaces after "-I", "-L" and "-l", incorrect g++ command syntax. after
correcting this, it may or may not work depending on what you have inside odb_test.c. I would be happy
to help you debug this, but please start a separate thread instead of necroposting into the C++ announcements.

K.O.
  Draft   17 May 2022 Ben SmithInfoMIDAS switched to C++
>  - have you posted somewhere this guide about converting C frontends to C++?

See the instructions at:
https://daq00.triumf.ca/MidasWiki/index.php/Changelog#2019-06

>  - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?

>  - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas

Correct. Midas is built with C++, so names get mangled 
  2407   17 May 2022 Razvan Stefan GorneaInfoMIDAS switched to C++
Hi, I have three naive questions about this:
 - have you posted somewhere this guide about converting C frontends to C++?
 - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
 - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?

Something like

g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas

is expected to fail or is just me glitching? Is it because of thread library differences?

Thanks!


> The last bits of code to switch MIDAS to C++ have been committed, see tag midas-2019-05-cxx.
> 
> Since the cmake conversion is still in progress, for now, I recommend using the old "make" build for trying this update.
> 
> From the switch to C++, the biggest change is the requirement that frontend programs be build and linked
> using the C++ compiler. Since mfe.o and the rest of MIDAS are built with C++, building frontends
> with C is no longer possible.
> 
> To help with this, I will post a short guide for converting C frontends to C++.
> 
> K.O.
  2406   17 May 2022 Stefan RittInfoRO_STOPPED with triggered events
> > > some old front-end are not running any more since they do use RO_ALWAYS together with 
> > triggered events.
> > 
> > I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS, 
> > some of these frontends will fail to start.
> > https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
> > 
> > tmfe c++ frontends do not have this restriction but by default only read data when run 
> > is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
> 
> As of commit 
> https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c 
> frontends will no longer fail to start. an error will still be issued "Equipment \"%s\" 
> contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be 
> removed."
> 
> BTW 1:
> 
> Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments. 
> They do not generate any events when there is no run (some of them do not generate any 
> events at all). Now they will start printing this error message, for no reason. (no we will 
> not be rewriting them justy to get rid of this message. life is too short).
> 
> BTW 2:
> 
> the c++ tmfe frontend does not have any protections against these "undersired side-effects".
> 
> What are these undesired side effects and should we add protection against them?
> 
> K.O.

The undesired side-effects are the following: The logger tries to collect all events at the end of 
the run by emptying the SYSTEM buffer. If events keep coming after the run is stopped, this loop in 
the logger might be an endless loop, crashing the whole experiment in the end. 

Another issue (and actually the reason for this change) is the funciton receive_trigger_event() in 
mfe.cxx which will get confused if events are still coming in after a run has been stopped and 
actually enters an infinite loop.

Combining EQ_MULTITHREAD with EQ_PERIODIC or EQ_SLOW is a wrong parameter combination as written in 
the documentation. If one wants to have multi-threaded slow control events, one has to use the 
DF_MULTITHREAD flag in the DEVICE_DRIVER structure.

Having triggered events being sent to the system after a run has been stopped I would consider 
simply wrong. Why should we ever use a run start/stop if events are always flowing? Adding 
protections in all places for this case is certainly much more work than just changing one flag for 
frontends which produce this error message now for a wrong parameter combination.
  2405   16 May 2022 Konstantin OlchanskiBug Fixmserver buffer overrun and crash
> There is a memory allocation bug in the mserver.

Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache, 
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer

This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401

Event buffer write cache (as set via ODB Equipment/Common and via 
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful 
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.

mfe.c and tmfe c++ frontends updated to request the new write cache size by default.

if events are getting stuck in the write cache for too long, instead of reducing the 
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by 
default).

commit 373bcc3ab7f83c3c7bf6c051c237de043a982502

K.O.
  2404   16 May 2022 Konstantin OlchanskiInfoanalysis of corner cases in event buffer write cache
> > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:

if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should 
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.

this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.

with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does 
not need to change).

this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.

multiple writers can run into contention for event buffer space.

consider 10 writers want to flush their 10 Mbyte write cache all at the same time:

if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).

but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).

so there should be no problem in practice.

K.O.
  2403   16 May 2022 Konstantin OlchanskiInfoanalysis of corner cases in event buffer write cache
> for correct operation of bm_send_event() under all conditions we need to ...

to continue computation from last message:

default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes

hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)

max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes

number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)

default write cache size: 100 kbytes

max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes

max size of cached events:

current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.

max data rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec

max event rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)

K.O.
  2402   16 May 2022 Konstantin OlchanskiInfoRO_STOPPED with triggered events
> > some old front-end are not running any more since they do use RO_ALWAYS together with 
> triggered events.
> 
> I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS, 
> some of these frontends will fail to start.
> https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
> 
> tmfe c++ frontends do not have this restriction but by default only read data when run 
> is active (per-equipment fEqConfReadOnlyWhenRunning default is true).

As of commit 
https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c 
frontends will no longer fail to start. an error will still be issued "Equipment \"%s\" 
contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be 
removed."

BTW 1:

Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments. 
They do not generate any events when there is no run (some of them do not generate any 
events at all). Now they will start printing this error message, for no reason. (no we will 
not be rewriting them justy to get rid of this message. life is too short).

BTW 2:

the c++ tmfe frontend does not have any protections against these "undersired side-effects".

What are these undesired side effects and should we add protection against them?

K.O.
  2401   13 May 2022 Konstantin OlchanskiInfoanalysis of corner cases in event buffer write cache
introduction:

to remember, bm_send_event() writes an event to the write cache, bm_flush_cache() 
writes the contents of the write cache into the shared memory event buffer, buffer 
free space is consumed. in the usual case, mlogger is reading events from the shared 
memory event buffer, buffer free space is released. there is also a read cache, not 
part of this discussion.

the purpose of the write cache is to reduce contention for the shared memory 
semaphore. in the case of large number of small events, semaphore is locked per 
cache-flush, instead of per-event. correct tuning of write cache and event size can 
reduce lock rate from >100 kHz to around 100 Hz or lower.

analysis:

for correct operation of bm_send_event() under all conditions we need to consider 
all corner cases:

1) no write cache: (cache size set to 0)

- event_size > buffer_size -> reject the event (obviously)
- event_size > 0.5 * buffer_size -> only 1 event fits into the buffer, next write 
will stall until mlogger reads the previous event (sequential operation, bad)
- event_size < 0.3 * buffer_size -> at least 2 events fit into the buffer (good)

decision: limit event size to 0.5 to 0.3 * buffer_size (current limit is 0.5 * 
buffer_size, I think).

consequence: buffer size limit is 2 Gbytes (32-bit byte offsets, code is only 31-
bit-clean), max event size is between 1 Gbytes and 0.6 Gbytes.

2) writing to write cache:

- event_size > cache_size -> flush cache, write event to directly to buffer
- event_size > 0.5 * cache_size -> inefficient use of cache: write to cache, next 
event does not fit, flush to buffer, repeat. no gain in semaphore locking (bad), one 
additional memcpy() (event to cache and cache to buffer) (bad)
- event_size < 0.3 * cache_size -> multiple events fit into cache, but probably no 
gain in semaphore locking

decision: events that are bigger than 0.3 to 0.1 * cache_size should not go through 
the cache. (flush cache, write directly to buffer).

3) flush write cache to buffer:

- cache_size > buffer_size -> cannot flush in 1 operation, must have a loop and 
flush the cache in pieces
- cache_size between 0.5 and 1.0 * buffer_size -> can flush in 1 operation, but must 
wait for mlogger to fully empty the buffer (sequential operation, bad)
- cache size < 0.3 * buffer_size -> can flush in 1 operation, at least 2 "flushes" 
fit inside the buffer (good)

decision: limit write cache size to 0.3 * buffer_size. (current limit is 
0.25*buffer_size).

consequences:

- write cache size limit is 0.3..0.25 * 2GB = 0.6..0.5 Gbytes
- cached event size limit is 0.3..0.1 * 0.5 GBytes = 150..50 Mbytes
- minimum number of cached events: 3 to 10
- semaphore locks reduced: 3 to 10 locks become 1 lock (all events cached),
4 to 11 locks become 2 locks (big event causes cache flush).

4) complications:

- there is a periodic 1/second bm_flush_cache() that flushes the cache early and 
reduces it's efficiency (but needed to avoid having data stuck in cache for long 
time)
- if multiple frontends use large write cache (~ 0.3..0.5 * buffer_size), again, 
sequential operation can happen (bad)
- write cache is per-frontend, not per-equipment. if different equipments request 
different cache sizes, mfe.c and tmfe c++ frontends complain about this, but the 
user has to sort it out.

K.O.
  2400   08 May 2022 Konstantin OlchanskiInfoRO_STOPPED with triggered events
> some old front-end are not running any more since they do use RO_ALWAYS together with 
triggered events.

I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS, 
some of these frontends will fail to start.
https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d

tmfe c++ frontends do not have this restriction but by default only read data when run 
is active (per-equipment fEqConfReadOnlyWhenRunning default is true).

K.O.
  Draft   08 May 2022 Konstantin OlchanskiInfoRO_STOPPED with triggered events
> If events are sent when a run is stopped, this leads to many unexpected results

I think we need to understand what these unexpected results are.

Naively thinking, of would expect midas to not care 
  2398   08 May 2022 Stefan RittInfoRO_STOPPED with triggered events
We had issues in one of our experiment that people used RO_STOPPED in the 
equipment list together with triggered events (EQ_USER). If events are sent when 
a run is stopped, this leads to many unexpected results, so I added a check in 
the mfe.cxx code which prevents RO_STOPPED (or RO_ALWAYS which includes 
RO_STOPPED) together with EQ_TRIGGERED, EQ_INTERRUPT, EQ_MULTITHREAD and EQ_USER 
type of events.

I got now complaints that some old front-end are not running any more since they 
do use RO_ALWAYS together with triggered events. Can the author of these frontend 
please tell me the rationale why this is needed, then I can maybe add a better 
fix for that.

Stefan
  2397   06 May 2022 Stefan RittInfoIncreased timeout for program shut down
We had the problem in our lab that a frontend took about 6 seconds to gracefully 
shut down, mainly it needed to park some motors. I found that the shutdown command 
had a hard-coded timeout of 5 seconds, after which the frontend gets killed, and 
cannot finish the park operation. I change the code so that the client timeout 
stored in the ODB is taken instead of the hard-coded 5 seconds. This allows each 
client to fine-tune its timeout, to allow graceful shutdown, but also not let the 
user wait too long if the client gets stuck and needs a hard kill.

The default timeout for mfe.cxx based frontends has been changed to 10 seconds 
now, but in the frontend_init function this can be changed by the user code 
easily.

I hope this char does not trigger any bad side effects, but if it does, please 
report here.

Stefan
ELOG V3.1.4-2e1708b5