ID |
Date |
Author |
Topic |
Subject |
2391
|
01 May 2022 |
Konstantin Olchanski | Info | added web page for "mdump" | > added JSON RPC for bm_receive_event()
there is a number of problems with implementing bm_receive_event() as a RPC:
1) mhttpd has only event buffer 1 read pointer for all javascript connections, if two browser tabs are
running mdump, they will "steal" events from each other.
2) javascript connections are state-less and we cannot specify per-connection event_id and trigger_mask
filters to bm_receive_event(). our bm_request_event() has to be for all event_id and all trigger_mask.
3) for same reason, we cannot have some requests to be GET_ALL, some to be GET_RECENT and some to be
GET_OLD (if GET_OLD is ever implemented).
Problem (1) is hard to fix. Only solution I can see is to have mhttpd have it's own event buffer that can
somehow track which events have been sent to which javascript connection.
The same scheme allows implementing GET_ALL and per-connection event_id and trigger_mask filters.
The difficulty is in detecting javascript connections that are no longer active and it's event request and
events we have buffered for it can be deleted. Unlike proper rpc clients, javascript browser tabs can be
closed without warning and without opportunity to tell rpc server that they are closed, gone.
K.O. |
2392
|
01 May 2022 |
Konstantin Olchanski | Info | added web page for "mdump" | > added a web page for "mdump".
missing functions:
- get a list of existing event buffers (should read event buffer names from /Experiment/Buffer sizes)
- selector box to select event buffer
- button for "get next" and "get new" (should call bm_skip_event() before bm_receive_event())
- entry fields for event_id and trigger_mask event filter
- check box for "keep getting new data" and entry field for update frequency
- (eventually) entry field for bank name filter
K.O. |
2394
|
02 May 2022 |
Stefan Ritt | Info | added web page for "mdump" | Here are some of my thoughts:
- I volunteer to write the JavaScript midas bank decoder. Just a couple of pure javascript functions, no
midasio.cxx library needed.
- If different javascript connections "steal" events from each other, I would not be concerned. Actually I
would rather like that all connections see the SAME event. So mhttpd keeps one event, serves it to all
links, so displays are consistent. If a browser wants to see the "next" event, it send the old serial
number and days "please send next event AFTER serial number". If the serial number is larger than the
event in the buffer, mhttpd fetches a new event and puts it into its buffer.
- Since javascript connections are connectionless, I would rather pass event_id and trigger_mask with each
request. Then mhttpd can retrieve events until event_id and trigger_mask match, then serve that event.
Since reading events from a midas buffer is fast (many 10'000s of events per second), the won't be much of
a delay.
- GET_ALL does not make sense for browsers, you don't want to slow down any frontend. If someone wants to
do histogramming in the browser, then GET_SOME (which is kind of GET_OLD) would make sense, but most of
the cases we have some single event display, and there a GET_RECENT is most appropriate. |
2395
|
04 May 2022 |
Stefan Ritt | Info | added web pages for "show odb clients" and "show open records" | Concerning the "scl" page, we are currently having a discussion. At the moment, one can
see midas clients in three different places:
1) the main status page at the bottom, only names and hosts are there
2) the programs page, where one can also start/stop program
3) now the new page "Show ODB clients" in the ODB editor page, which shows also the
alive status, PID and timeout
I'm thinking that three locations are two too much, so we are considering to merge the
tree pages into one. That would mean that 1) goes away, and the "Programs" page will
show more information. We have some rare cases that programs are removed from
/System/Clients in the ODB but still attached to the ODB. For those "zombies" we would
add a "hard kill" function.
I would like to hear feedback from the midas community before we proceed with the
plans. Anybody desperately in need of the programs shown on the status page?
Best,
Stefan |
2397
|
06 May 2022 |
Stefan Ritt | Info | Increased timeout for program shut down | We had the problem in our lab that a frontend took about 6 seconds to gracefully
shut down, mainly it needed to park some motors. I found that the shutdown command
had a hard-coded timeout of 5 seconds, after which the frontend gets killed, and
cannot finish the park operation. I change the code so that the client timeout
stored in the ODB is taken instead of the hard-coded 5 seconds. This allows each
client to fine-tune its timeout, to allow graceful shutdown, but also not let the
user wait too long if the client gets stuck and needs a hard kill.
The default timeout for mfe.cxx based frontends has been changed to 10 seconds
now, but in the frontend_init function this can be changed by the user code
easily.
I hope this char does not trigger any bad side effects, but if it does, please
report here.
Stefan |
2398
|
08 May 2022 |
Stefan Ritt | Info | RO_STOPPED with triggered events | We had issues in one of our experiment that people used RO_STOPPED in the
equipment list together with triggered events (EQ_USER). If events are sent when
a run is stopped, this leads to many unexpected results, so I added a check in
the mfe.cxx code which prevents RO_STOPPED (or RO_ALWAYS which includes
RO_STOPPED) together with EQ_TRIGGERED, EQ_INTERRUPT, EQ_MULTITHREAD and EQ_USER
type of events.
I got now complaints that some old front-end are not running any more since they
do use RO_ALWAYS together with triggered events. Can the author of these frontend
please tell me the rationale why this is needed, then I can maybe add a better
fix for that.
Stefan |
Draft
|
08 May 2022 |
Konstantin Olchanski | Info | RO_STOPPED with triggered events | > If events are sent when a run is stopped, this leads to many unexpected results
I think we need to understand what these unexpected results are.
Naively thinking, of would expect midas to not care |
2400
|
08 May 2022 |
Konstantin Olchanski | Info | RO_STOPPED with triggered events | > some old front-end are not running any more since they do use RO_ALWAYS together with
triggered events.
I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
some of these frontends will fail to start.
https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
tmfe c++ frontends do not have this restriction but by default only read data when run
is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
K.O. |
2401
|
13 May 2022 |
Konstantin Olchanski | Info | analysis of corner cases in event buffer write cache | introduction:
to remember, bm_send_event() writes an event to the write cache, bm_flush_cache()
writes the contents of the write cache into the shared memory event buffer, buffer
free space is consumed. in the usual case, mlogger is reading events from the shared
memory event buffer, buffer free space is released. there is also a read cache, not
part of this discussion.
the purpose of the write cache is to reduce contention for the shared memory
semaphore. in the case of large number of small events, semaphore is locked per
cache-flush, instead of per-event. correct tuning of write cache and event size can
reduce lock rate from >100 kHz to around 100 Hz or lower.
analysis:
for correct operation of bm_send_event() under all conditions we need to consider
all corner cases:
1) no write cache: (cache size set to 0)
- event_size > buffer_size -> reject the event (obviously)
- event_size > 0.5 * buffer_size -> only 1 event fits into the buffer, next write
will stall until mlogger reads the previous event (sequential operation, bad)
- event_size < 0.3 * buffer_size -> at least 2 events fit into the buffer (good)
decision: limit event size to 0.5 to 0.3 * buffer_size (current limit is 0.5 *
buffer_size, I think).
consequence: buffer size limit is 2 Gbytes (32-bit byte offsets, code is only 31-
bit-clean), max event size is between 1 Gbytes and 0.6 Gbytes.
2) writing to write cache:
- event_size > cache_size -> flush cache, write event to directly to buffer
- event_size > 0.5 * cache_size -> inefficient use of cache: write to cache, next
event does not fit, flush to buffer, repeat. no gain in semaphore locking (bad), one
additional memcpy() (event to cache and cache to buffer) (bad)
- event_size < 0.3 * cache_size -> multiple events fit into cache, but probably no
gain in semaphore locking
decision: events that are bigger than 0.3 to 0.1 * cache_size should not go through
the cache. (flush cache, write directly to buffer).
3) flush write cache to buffer:
- cache_size > buffer_size -> cannot flush in 1 operation, must have a loop and
flush the cache in pieces
- cache_size between 0.5 and 1.0 * buffer_size -> can flush in 1 operation, but must
wait for mlogger to fully empty the buffer (sequential operation, bad)
- cache size < 0.3 * buffer_size -> can flush in 1 operation, at least 2 "flushes"
fit inside the buffer (good)
decision: limit write cache size to 0.3 * buffer_size. (current limit is
0.25*buffer_size).
consequences:
- write cache size limit is 0.3..0.25 * 2GB = 0.6..0.5 Gbytes
- cached event size limit is 0.3..0.1 * 0.5 GBytes = 150..50 Mbytes
- minimum number of cached events: 3 to 10
- semaphore locks reduced: 3 to 10 locks become 1 lock (all events cached),
4 to 11 locks become 2 locks (big event causes cache flush).
4) complications:
- there is a periodic 1/second bm_flush_cache() that flushes the cache early and
reduces it's efficiency (but needed to avoid having data stuck in cache for long
time)
- if multiple frontends use large write cache (~ 0.3..0.5 * buffer_size), again,
sequential operation can happen (bad)
- write cache is per-frontend, not per-equipment. if different equipments request
different cache sizes, mfe.c and tmfe c++ frontends complain about this, but the
user has to sort it out.
K.O. |
2402
|
16 May 2022 |
Konstantin Olchanski | Info | RO_STOPPED with triggered events | > > some old front-end are not running any more since they do use RO_ALWAYS together with
> triggered events.
>
> I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
> some of these frontends will fail to start.
> https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
>
> tmfe c++ frontends do not have this restriction but by default only read data when run
> is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
As of commit
https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c
frontends will no longer fail to start. an error will still be issued "Equipment \"%s\"
contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be
removed."
BTW 1:
Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments.
They do not generate any events when there is no run (some of them do not generate any
events at all). Now they will start printing this error message, for no reason. (no we will
not be rewriting them justy to get rid of this message. life is too short).
BTW 2:
the c++ tmfe frontend does not have any protections against these "undersired side-effects".
What are these undesired side effects and should we add protection against them?
K.O. |
2403
|
16 May 2022 |
Konstantin Olchanski | Info | analysis of corner cases in event buffer write cache | > for correct operation of bm_send_event() under all conditions we need to ...
to continue computation from last message:
default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes
hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)
max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes
number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)
default write cache size: 100 kbytes
max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes
max size of cached events:
current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.
max data rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec
max event rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)
K.O. |
2404
|
16 May 2022 |
Konstantin Olchanski | Info | analysis of corner cases in event buffer write cache | > > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:
if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.
this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.
with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does
not need to change).
this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.
multiple writers can run into contention for event buffer space.
consider 10 writers want to flush their 10 Mbyte write cache all at the same time:
if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).
but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).
so there should be no problem in practice.
K.O. |
2406
|
17 May 2022 |
Stefan Ritt | Info | RO_STOPPED with triggered events | > > > some old front-end are not running any more since they do use RO_ALWAYS together with
> > triggered events.
> >
> > I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
> > some of these frontends will fail to start.
> > https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
> >
> > tmfe c++ frontends do not have this restriction but by default only read data when run
> > is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
>
> As of commit
> https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c
> frontends will no longer fail to start. an error will still be issued "Equipment \"%s\"
> contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be
> removed."
>
> BTW 1:
>
> Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments.
> They do not generate any events when there is no run (some of them do not generate any
> events at all). Now they will start printing this error message, for no reason. (no we will
> not be rewriting them justy to get rid of this message. life is too short).
>
> BTW 2:
>
> the c++ tmfe frontend does not have any protections against these "undersired side-effects".
>
> What are these undesired side effects and should we add protection against them?
>
> K.O.
The undesired side-effects are the following: The logger tries to collect all events at the end of
the run by emptying the SYSTEM buffer. If events keep coming after the run is stopped, this loop in
the logger might be an endless loop, crashing the whole experiment in the end.
Another issue (and actually the reason for this change) is the funciton receive_trigger_event() in
mfe.cxx which will get confused if events are still coming in after a run has been stopped and
actually enters an infinite loop.
Combining EQ_MULTITHREAD with EQ_PERIODIC or EQ_SLOW is a wrong parameter combination as written in
the documentation. If one wants to have multi-threaded slow control events, one has to use the
DF_MULTITHREAD flag in the DEVICE_DRIVER structure.
Having triggered events being sent to the system after a run has been stopped I would consider
simply wrong. Why should we ever use a run start/stop if events are always flowing? Adding
protections in all places for this case is certainly much more work than just changing one flag for
frontends which produce this error message now for a wrong parameter combination. |
2407
|
17 May 2022 |
Razvan Stefan Gornea | Info | MIDAS switched to C++ | Hi, I have three naive questions about this:
- have you posted somewhere this guide about converting C frontends to C++?
- it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
- it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
Something like
g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
is expected to fail or is just me glitching? Is it because of thread library differences?
Thanks!
> The last bits of code to switch MIDAS to C++ have been committed, see tag midas-2019-05-cxx.
>
> Since the cmake conversion is still in progress, for now, I recommend using the old "make" build for trying this update.
>
> From the switch to C++, the biggest change is the requirement that frontend programs be build and linked
> using the C++ compiler. Since mfe.o and the rest of MIDAS are built with C++, building frontends
> with C is no longer possible.
>
> To help with this, I will post a short guide for converting C frontends to C++.
>
> K.O. |
Draft
|
17 May 2022 |
Ben Smith | Info | MIDAS switched to C++ | > - have you posted somewhere this guide about converting C frontends to C++?
See the instructions at:
https://daq00.triumf.ca/MidasWiki/index.php/Changelog#2019-06
> - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
> - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
Correct. Midas is built with C++, so names get mangled |
2409
|
17 May 2022 |
Konstantin Olchanski | Info | MIDAS switched to C++ | > Hi, I have three naive questions about this:
all good questions, ask more of them.
> - have you posted somewhere this guide about converting C frontends to C++?
yes, in this elog here I posted a guide for converting C mfe.c frontends to C++ and
a guide for converting mfe.c frontend to C++ TMFE frontend. please use the "find" function,
if you cannot find them, let me know, I will look for it for you.
> - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
correct. please run "git tag", tags before "midas-2019-05-cxx"is "C", after is "C++".
> - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
> is expected to fail or is just me glitching? Is it because of thread library differences?
yes, it is expected to fail, you have spaces after "-I", "-L" and "-l", incorrect g++ command syntax. after
correcting this, it may or may not work depending on what you have inside odb_test.c. I would be happy
to help you debug this, but please start a separate thread instead of necroposting into the C++ announcements.
K.O. |
2410
|
17 May 2022 |
Ben Smith | Info | MIDAS switched to C++ | > - have you posted somewhere this guide about converting C frontends to C++?
There's documentation in the wiki at:
https://daq00.triumf.ca/MidasWiki/index.php/Changelog#2019-06
It includes a step-by-step guide of how to upgrade, what changes need to be made to frontends, and common issues that people had. |
2417
|
05 Aug 2022 |
Stefan Ritt | Info | Information for midas updates though git | Several submodules of midas have been re-organized, so if you want to pull the
newest version, you need a
git pull --recurse-submodules
git submodule update --init --recursive
before you can build again. To do this automatically the next time, you can do
git config submodule.recurse true
which needs git 2.14 or later. I hope this works for everybody. If there is a
better way to do that (I'm not a big expert on git) please reply here.
Stefan |
2418
|
06 Aug 2022 |
Stefan Ritt | Info | Improvement of odbxx API | While the odbxx API has been successfully used since the last months, a potential
problem with large ODBs surfaced. If you have lots of data in the ODB and load it
into an object like
midas::odb o("/Equipment");
this might take quite long, since each ODB value is fetched separately, which is
very quick on a local machine but can take long over a client-server connection.
For large experiments this can take up to minutes (!).
To get rid of this problem, the underlying object model has been modified. When an
object is instantiated like above, then the whole ODB tree is fetched in an XML
buffer in a single transfer, which even for large ODBs usually takes much less
than a second. Then the XML buffer is decomposed on the client side and converted
into the proper midas::odb objects. In one case this gave an improvement from 35
seconds to 0.5 seconds which is significant. To enable the new method, the object
can be created with a flag like
midas::odb o("/Equipment", true);
which then switches to the new method. One has to take care not to fool oneself
(like I did) by printing the object like
midas::odb o("/Equipment", true);
std::cout << o << std::endl;
because each read access to any sub-object of o causes a separate read request to
the server which again can take long. Therefore, one has to switch off the auto
refresh via
midas::odb o("/Equipment", true);
o.set_auto_refresh_read(false);
std::cout << o << std::endl;
Accessing any sub-object of o then does not cause a client-server request, which
is not necessary if all objects just have been pulled from the server before. If
one keeps the object however for a long time in memory, one has to be aware that
it only contains "old" values from the time if instantiation. If one needs more
current ODB values, the auto read refresh has to be turned on again.
Stefan |
2419
|
08 Aug 2022 |
Stefan Ritt | Info | Improvement of odbxx API | After some thought, I changed the API again and removed the flag in the constructor,
so the system now automatically choses the best algorithm depending if the client
is connected to a local or a remote API. So in all cases you use again the old syntax:
midas::odb o("/Equipment");
Stefan |
|