19 Aug 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files
|
serious (but rare) bug was fixed in the history reader. unlucky experiment would see
errors about "Detected duplicate or non-monotonous data" in some history file, fixed by
removing/renaming the offending file. (reported by MEG experiment)
it turns out there was nothing wrong with the data files (good), but there
was a nasty bug in the history reader. it did not ensure that we read history
files in chronological order. under some conditions order of files could be
reversed, older files would be read after newer files and trip the built-in
protection against returning non-monotonically increasing history data to the user.
fixed commit
https://bitbucket.org/tmidas/midas/commits/9893f85ebe33e96cc63f501a0f89e1f8932c894d
for more details, see https://bitbucket.org/tmidas/midas/issues/350/file-history-non-
monotonic-time
K.O. |
23 Aug 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files
|
> serious (but rare) bug was fixed in the history reader.
previous fix was incomplete. please update to git commit
https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63
K.O. |
17 Nov 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files
|
> > serious (but rare) bug was fixed in the history reader.
> previous fix was incomplete. please update to git commit
> https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63
a race condition between reading history file in mhttpd and writing history file in
mlogger was accidentally introduced. mhttpd would file spurious errors about "timestamp
is after last timestamp".
fixed, please update to git commit
https://bitbucket.org/tmidas/midas/commits/7a9f6e0c58ffddcacb9ee19934ce3e2033a805ef
fix race condition in history file reader - a race condition was added accidentally -
first the reader remembers the history file size and the time of the last entry, then it
goes to read the file and bombs if at the same time mlogger added more entries - their
time is after the remembered time of last entry and error "timestamp is after last
timestamp" is triggered.
K.O. |
11 Nov 2022, Frederik Wauters, Bug Fix, O_CREAT in open in split.cxx
|
midas currently does not compile on linux
/usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
50 | __open_missing_mode ();
giving the mode is mandatory: https://man7.org/linux/man-pages/man2/open.2.html
fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600 |
12 Nov 2022, Stefan Ritt, Bug Fix, O_CREAT in open in split.cxx
|
> midas currently does not compile on linux
>
> /usr/include/x86_64-linux-gnu/bits/fcntl2.h:50:24: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
> 50 | __open_missing_mode ();
>
> giving the mode is mandatory: https://man7.org/linux/man-pages/man2/open.2.html
>
> fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600
Thanks. Fixed.
Stefan |
17 Nov 2022, Konstantin Olchanski, Bug Fix, O_CREAT in open in split.cxx
|
> > midas currently does not compile on linux
> > fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600
I got more warnings from split.cxx, looked at the code and see so many problems that it is easier
to delete it than it is to fix it.
Check for end of file is done incorrectly (check for read() return 0, -1 or short read),
memory overrun if given file name is longer than 80 bytes, no check for valid event length
read from the file, and so on and so on.
A better example for reading and writing midas files is in midasio/test_midasio.cxx. Proper c++ coding, and can read compressed files.
K.O. |
22 Oct 2022, Lars Martin, Suggestion, read_only odbxx?
|
I really like the concept of the odbxx interface.
I think it would be a nice feature if one could have a read_only connection, e.g. by declaring a "const midas::odb".
Just for fun I tried if this already works, but the compiler doesn't allow const midas::odb for e.g. the [] operator. I'm guessing this would be non-trivial to implement, but I like the idea of certain Midas clients being able to read the odb without risking corruption. |
24 Oct 2022, Stefan Ritt, Suggestion, read_only odbxx?
|
> I really like the concept of the odbxx interface.
> I think it would be a nice feature if one could have a read_only connection, e.g. by declaring a "const midas::odb".
> Just for fun I tried if this already works, but the compiler doesn't allow const midas::odb for e.g. the [] operator. I'm guessing this would be non-trivial to implement, but I like the idea of certain Midas clients being able to read the odb without risking corruption.
Having a "const midas::odb" probably does not work (at least I would not know how to implement that).
But I could make an internal flag analog to the auto refresh flags. So you would have
o.set_write_protect(true);
to turn on write protection. Would that work for you?
Best,
Stefan |
26 Oct 2022, Lars Martin, Suggestion, read_only odbxx?
|
> Having a "const midas::odb" probably does not work (at least I would not know how to implement that).
>
> But I could make an internal flag analog to the auto refresh flags. So you would have
>
> o.set_write_protect(true);
>
> to turn on write protection. Would that work for you?
Absolutely. Looking at the underlying code I was also at a loss how const would work.
I'm mostly just interested in having small clients that only read from the odb (for whatever reason) without risking corrupting it by messing something
up in the code, especially since such small clients are almost by definition hacked together quickly on the fly. |
29 Oct 2022, Stefan Ritt, Suggestion, read_only odbxx?
|
Ok, I implemented and committed that. Just call
o.set_write_protect(true)
on a key you don't want to modify. If you do so, an exception gets thrown.
Best,
Stefan |
14 Oct 2022, Lars Martin, Suggestion, Allow onchange to refer to arbitrary js function
|
Maybe this is already possible, I have a hard time understanding the mhttpd source code.
I would like to use a function defined in the <script> block of my custom page as an onchange callback.
Specific example:
I have an modbthermo that I would like to change to three different colours for "too cold", "just right", and "too hot" (measuring porridge, presumably). The examples only show the explicit (condition)?(val1):(val2) syntax, which doesn't allow more than two values, so I had hoped to replace
onchange="this.dataset.color=this.value > 40?'red':'blue';"
with something like
onchange="this.dataset.color=check_Temp(this.value);"
or
onchange="check_Temp(this.value, this.dataset.color);"
if that's easier somehow. The function itself would then return the colour string, or set the color argument to that string (I'm not sure if JS passes references or just values.)
Is this a possibility? |
14 Oct 2022, Ben Smith, Info, Allow onchange to refer to arbitrary js function
|
> I would like to use a function defined in the <script> block of my custom page as an onchange callback.
>
> Is this a possibility?
Yes, this is already possible. An example was shown in the "modb" section of the custom page documentation, but not in the "Changing properties of controls dynamically" section. I've updated the wiki with an example.
https://daq00.triumf.ca/MidasWiki/index.php/Custom_Page#Changing_properties_of_controls_dynamically |
22 Oct 2022, Lars Martin, Info, Allow onchange to refer to arbitrary js function
|
I figured I wasn't the first to have this idea.
Works great, thanks! |
22 Oct 2022, Lars Martin, Info, Allow onchange to refer to arbitrary js function
|
Actually, now that I look again, there is a mistake in the instructions:
you say
onchange="this.dataset.color=check_therm(this)"
but check_therm doesn't return anything and instead changes the color itself. So you either want the function to return the string and use the above assignment, or use the function you provide and use
onchange="check_therm(this)" |
10 Oct 2022, Zaher Salman, Suggestion, JSON-RPC function to read files
|
Hello ,
The midas sequencer uses the function js_seq_list_files to get a list of files in the /Sequencer/State/Path with extension *.msl. It would be nice to generalize this function to be able to read files with other (or any) extension.
Based on the js_seq_list_files I added a function in js_any_list_files mjsonrpc_user.cxx (attached) which does the job. Maybe a better/safer implementation can be made in midas. Are there any plans to do this?
thanks. |
21 Aug 2022, Joseph McKenna, Suggestion, mvodb functionality to get the 'LastWritten' property of a key
|
I want to read data from the ODB with the mvodb interface in one of my frontends, it's useful to know how old that data is, so I prototyped functionality in a pull request to mvodb:
https://bitbucket.org/tmidas/mvodb/pull-requests/2/add-readkeylastwritten-function-to-extract |
22 Aug 2022, Stefan Ritt, Suggestion, mvodb functionality to get the 'LastWritten' property of a key
|
> I want to read data from the ODB with the mvodb interface in one of my frontends, it's useful to know how old that data is, so I prototyped functionality in a pull request to mvodb:
>
> https://bitbucket.org/tmidas/mvodb/pull-requests/2/add-readkeylastwritten-function-to-extract
Thanks for raising that point. I realized that the odbxx API was also missing that functionality, so I added it:
https://bitbucket.org/tmidas/midas/commits/6991a92c19292eaf67721cb80f182c61db077f45
Best,
Stefan |
15 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
17:21:28.821 Script terminated by timeout at:
MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
mhistory.js:2828:7
Any ideas how to resolve this?? |
15 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
> Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
>
> 17:21:28.821 Script terminated by timeout at:
> MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> mhistory.js:2828:7
>
> Any ideas how to resolve this??
I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that
problem anywhere else.
Stefan |
16 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
> > Firefox is hanging/becoming unresponsive due to javascript code. After stopping the script manually to get firefox back in control I have the following message in the console
> >
> > 17:21:28.821 Script terminated by timeout at:
> > MhistoryGraph.prototype.drawTAxis@http://lem03.psi.ch:8081/mhistory.js:2828:7
> > MhistoryGraph.prototype.draw@http://lem03.psi.ch:8081/mhistory.js:1792:9
> > mhistory.js:2828:7
> >
> > Any ideas how to resolve this??
>
> I have to reproduce the problem. Can you send me the full URL from your browser when you see that problem? Probably you have some "special" axis limits, so we don't see that
> problem anywhere else.
>
> Stefan
Hi Stefan and Konstantin,
The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas
Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.
The hangs seem to happen randomly so I have not been able to reproduce it yet.
I have histories here http://lem03.psi.ch:8081/?cmd=custom&page=Mudas&tab=3 (30 minutes each), but I have also histories popping up in modals though they do not cause any issues.
I'll try to reproduce it in the coming few days and report again.
thanks,
Zaher |
16 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
I found the bug. The problem is triggered by changing the firefox window. This calls a function that is supposed to change the size of the history plot and it works well when the history plots are visible but not if the history plots are hidden in a javascript tab (not another firefox tab).
Is there a clean way to resize the history plot if the parent div changes size?? The offending code is
mhist[i].mhg = new MhistoryGraph(mhist[i]);
mhist[i].mhg.initializePanel(i);
mhist[i].mhg.resize();
mhist[i].resize = function () {
mhis.mhg.resize();
}; |
17 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
The problem lies in your function mhistory_init_one() in Mudas.js:1965. You can only call "new MhistoryGraph(e)" with an element "e" which is something like
<div class="mjshistory" data-group="..." data-panel="..." data-base-u-r-l="https://host.psi.ch/?cmd=history" title="">
Please note the "data-base-u-r-l". This gets automatically added by the function mhistory_init() in mhistory.js:48. The URL is necessary sot that the upper right button in a history graph works which goes to a history page only showing the current graph.
In you function mhistory_init_one() you forgot the call
mhist.dataset.baseURL = baseURL;
where baseURL has to come from the current address bar like
let baseURL = window.location.href;
if (baseURL.indexOf("?cmd") > 0)
baseURL = baseURL.substr(0, baseURL.indexOf("?cmd"));
baseURL += "?cmd=history";
If you duplicate some functionality from mhistory.js, please make sure to duplicate it completely.
Best,
Stefan |
17 Aug 2022, Zaher Salman, Bug Report, firefox hangs due to mhistory
|
> The problem lies in your function mhistory_init_one() in Mudas.js:1965. You can only call "new MhistoryGraph(e)" with an element "e" which is something like
>
> <div class="mjshistory" data-group="..." data-panel="..." data-base-u-r-l="https://host.psi.ch/?cmd=history" title="">
>
> Please note the "data-base-u-r-l". This gets automatically added by the function mhistory_init() in mhistory.js:48. The URL is necessary sot that the upper right button in a history graph works which goes to a history page only showing the current graph.
>
> In you function mhistory_init_one() you forgot the call
>
> mhist.dataset.baseURL = baseURL;
>
> where baseURL has to come from the current address bar like
>
> let baseURL = window.location.href;
> if (baseURL.indexOf("?cmd") > 0)
> baseURL = baseURL.substr(0, baseURL.indexOf("?cmd"));
> baseURL += "?cmd=history";
>
> If you duplicate some functionality from mhistory.js, please make sure to duplicate it completely.
>
Thanks Stefan, but this was not the problem since I am setting the baseURL. You may have looked at the code during my debugging.
Some of my histories are placed in an IFrame object. I eventually realized that my code fails when it tries to resize a history which is placed in an invisible IFrame. I resolved the issue by making sure that I am resizing plots only if they are in a visible IFrame.
|
17 Aug 2022, Stefan Ritt, Bug Report, firefox hangs due to mhistory
|
> Some of my histories are placed in an IFrame object. I eventually realized that my code fails
> when it tries to resize a history which is placed in an invisible IFrame. I resolved the issue
> by making sure that I am resizing plots only if they are in a visible IFrame.
Just to be clear: You could resolve everything on your side, or do you need to change anything in mhistory.js?
Just a tip: IFrames are not good to put anything in. I recommend just to dynamically crate a <div> element,
append it to the document body, make it floating and initially invisible. Then put all inside that div. Have
a look how control.js do it. This takes less resources than a complete IFrame and is much easier to handle.
Stefan |
16 Aug 2022, Konstantin Olchanski, Bug Report, firefox hangs due to mhistory
|
> > > Firefox is hanging/becoming unresponsive due to javascript code.
>
> The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas
so malfunction is not in the midas history page, but in a custom page. I could help you debug it,
but you would have to provide the complete source code (javascript and html).
> Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.
my firefox is 103.something. when you say google-chrome has "similar issues",
I read it as "google-chrome does not show this same bug, but shows some other
bug somewhere else". (if I misread you, you have to write better).
but this gives you a front to attack your bugs. basically all browsers should render your
custom page exactly the same (unless you use some obscure or experimental feature, which I
recommend against).
so you tweak your page to identify the source of different rendering results, and try to eliminate it,
hopefully by the time you get your page render exactly the same everywhere, all the real bugs
have gotten shaken out, too. (this is similar to debugging a c++ program by compiling
it on linux, mac, windows, vax, raspbery pi, etc and checking that you get the same result everywhere).
> The hangs seem to happen randomly so I have not been able to reproduce it yet.
I find that javascript debuggers are not setup to debug hangs. I think debugger runs partially
inside the same javascript engine you are debugging, so both hang and debugging is impossible.
(latest google-chrome has another improvement, all pages from the same computer run in the same
javascript engine, so if one midas page stops (on exception or because I debug it), all midas pages
stop and I have to run two different browsers if I want to debug (i.e.) a history page crash
and look at odb at the same time. fun).
K.O. |
05 Aug 2022, Stefan Ritt, Info, Information for midas updates though git
|
Several submodules of midas have been re-organized, so if you want to pull the
newest version, you need a
git pull --recurse-submodules
git submodule update --init --recursive
before you can build again. To do this automatically the next time, you can do
git config submodule.recurse true
which needs git 2.14 or later. I hope this works for everybody. If there is a
better way to do that (I'm not a big expert on git) please reply here.
Stefan |
08 Aug 2022, Konstantin Olchanski, Info, Information for midas updates though git
|
> git pull --recurse-submodules
> git submodule update --init --recursive
> git config submodule.recurse true
does not work for me, macos 12.4 git 2.32.1.
after I set "submodule.recurse true", I still have to type "git submodule update --
init --recursive", without --recursive, mscb/mxml is empty and the build bombs.
P.S. the underlying issue is that the mxml submodule is now included twice
(midas/mxml and midas/mscb/mxml) and there is nothing to enforce that both copies are
the same. (No idea what happens if the two mxml's are different). |
08 Aug 2022, Stefan Ritt, Info, Information for midas updates though git
|
> after I set "submodule.recurse true", I still have to type "git submodule update --
> init --recursive", without --recursive, mscb/mxml is empty and the build bombs.
Indeed, doesn't work for me either. If some git guru has some more insight, please post
here!
> P.S. the underlying issue is that the mxml submodule is now included twice
> (midas/mxml and midas/mscb/mxml) and there is nothing to enforce that both copies are
> the same. (No idea what happens if the two mxml's are different).
The version of each mxml is defined by last commit of the parent repository, which contains
the hash of the submodule version. If we have to update mxml for some reason, we have to
commit also mscb with the new version, and then midas with the same version of mxml. If one
checks out midas then with
git clone https://bitbucket.org/tmidas/midas --recursive
one gets the same versions for mxml. |
08 Aug 2022, Konstantin Olchanski, Info, odb disallow key names that start or end with spaces
|
while testing the new odb editor, we ran into a number of problems with key names
that start or end with spaces. we cannot think of any valid use case for such key
names (subdirectories and variables) and we think they could only have been
created by mistake. ODB now disallows such names. K.O. |
08 Aug 2022, Konstantin Olchanski, Info, midas on ubuntu LTS 22.04
|
reporting that as of commit 78f707c0686d22f8329c7a1f1c46d7dccf35ceff, midas builds
without errors or warnings on Ubuntu LTS 22.04, 20.04, CentOS-7 and MacOS 12.4.
(except for some warnings from mscb and msc). K.O. |
06 Aug 2022, Stefan Ritt, Info, Improvement of odbxx API
|
While the odbxx API has been successfully used since the last months, a potential
problem with large ODBs surfaced. If you have lots of data in the ODB and load it
into an object like
midas::odb o("/Equipment");
this might take quite long, since each ODB value is fetched separately, which is
very quick on a local machine but can take long over a client-server connection.
For large experiments this can take up to minutes (!).
To get rid of this problem, the underlying object model has been modified. When an
object is instantiated like above, then the whole ODB tree is fetched in an XML
buffer in a single transfer, which even for large ODBs usually takes much less
than a second. Then the XML buffer is decomposed on the client side and converted
into the proper midas::odb objects. In one case this gave an improvement from 35
seconds to 0.5 seconds which is significant. To enable the new method, the object
can be created with a flag like
midas::odb o("/Equipment", true);
which then switches to the new method. One has to take care not to fool oneself
(like I did) by printing the object like
midas::odb o("/Equipment", true);
std::cout << o << std::endl;
because each read access to any sub-object of o causes a separate read request to
the server which again can take long. Therefore, one has to switch off the auto
refresh via
midas::odb o("/Equipment", true);
o.set_auto_refresh_read(false);
std::cout << o << std::endl;
Accessing any sub-object of o then does not cause a client-server request, which
is not necessary if all objects just have been pulled from the server before. If
one keeps the object however for a long time in memory, one has to be aware that
it only contains "old" values from the time if instantiation. If one needs more
current ODB values, the auto read refresh has to be turned on again.
Stefan |
08 Aug 2022, Stefan Ritt, Info, Improvement of odbxx API
|
After some thought, I changed the API again and removed the flag in the constructor,
so the system now automatically choses the best algorithm depending if the client
is connected to a local or a remote API. So in all cases you use again the old syntax:
midas::odb o("/Equipment");
Stefan |
18 Jul 2022, Konstantin Olchanski, Release, midas-2022-05
|
There is a release branch for midas-2022-05 and corresponding git tag midas-2022-
05-b. This branch is known to be stable and is working well for the ALPHA
experiment at CERN. Latest update to this branch fixes two problems in the
mserver (rpc timeout and a use-after-free internal error).
https://bitbucket.org/tmidas/midas/branch/release/midas-2022-05
K.O. |
25 Jun 2022, Joseph McKenna, Bug Report, RPC timeout for manalyzer over network
|
In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...
gdb trace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff5d35859 in __GI_abort () at abort.c:79
#2 0x00005555555a2a22 in rpc_call (routine_id=11111) at /home/alpha/packages/midas/src/midas.cxx:13866
#3 0x000055555562699d in bm_receive_event_rpc (buffer_handle=buffer_handle@entry=2, buf=buf@entry=0x0, buf_size=buf_size@entry=0x0, ppevent=ppevent@entry=0x0, pvec=pvec@entry=0x7fffffffd700,
timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10510
#4 0x0000555555631082 in bm_receive_event_vec (buffer_handle=2, pvec=pvec@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/midas.cxx:10794
#5 0x0000555555673dbb in TMEventBuffer::ReceiveEvent (this=this@entry=0x555557388b30, e=e@entry=0x7fffffffd700, timeout_msec=timeout_msec@entry=100) at /home/alpha/packages/midas/src/tmfe.cxx:312
#6 0x0000555555607b56 in ReceiveEvent (b=0x555557388b30, e=0x7fffffffd6c0, timeout_msec=100) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1411
#7 0x000055555560d8dc in ProcessMidasOnlineTmfe (args=..., progname=<optimized out>, hostname=<optimized out>, exptname=<optimized out>, bufname=<optimized out>, event_id=<optimized out>,
trigger_mask=<optimized out>, sampling_type_string=<optimized out>, num_analyze=0, writer=<optimized out>, multithread=<optimized out>, profiler=<optimized out>,
queue_interval_check=<optimized out>) at /home/alpha/packages/midas/manalyzer/manalyzer.cxx:1534
#8 0x000055555560f93b in manalyzer_main (argc=<optimized out>, argv=<optimized out>) at /usr/include/c++/9/bits/basic_string.h:2304
#9 0x00007ffff5d37083 in __libc_start_main (main=0x5555555b1130 <main(int, char**)>, argc=8, argv=0x7fffffffdda8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffdd98) at ../csu/libc-start.c:308
#10 0x00005555555b184e in _start () at /usr/include/c++/9/bits/stl_vector.h:94
Any suggestions? Many thanks |
18 Jul 2022, Konstantin Olchanski, Bug Report, RPC timeout for manalyzer over network
|
> In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...
there is a subtle bug in the mserver. under rare conditions, ss_suspend() will recurse in an unexpected way
and mserver will go to sleep waiting for data from a udp socket (that will never arrive, so sleep forever).
remote client will see it as an rpc timeout. in my tests (and in ALPHA-g at CERN, as reported by Joseph),
I see this rare condition to happen about every 5 minutes. in normal use, this is the first time we become
aware of this problem, the best I can tell this bug was in the mserver since day one.
commit https://bitbucket.org/tmidas/midas/commits/fbd06ad9d665b1341bd58b0e28d6625877f3cbd0
to develop and
to release/midas-2022-05
The stack trace that shows the mserver hang/crash (sleep() is the stand-in for the sleep-forever socket read).
(gdb) bt
#0 0x00007f922c53f9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00007f922c53f894 in sleep () from /lib64/libc.so.6
#2 0x0000000000451922 in ss_suspend (millisec=millisec@entry=100, msg=msg@entry=1) at /home/agmini/packages/midas/src/system.cxx:4433
#3 0x0000000000411d53 in bm_wait_for_more_events_locked (pbuf_guard=..., pc=pc@entry=0x7f920639b93c, timeout_msec=timeout_msec@entry=100,
unlock_read_cache=unlock_read_cache@entry=1) at /home/agmini/packages/midas/src/midas.cxx:9429
#4 0x00000000004238c3 in bm_fill_read_cache_locked (timeout_msec=100, pbuf_guard=...) at /home/agmini/packages/midas/src/midas.cxx:9003
#5 bm_read_buffer (pbuf=pbuf@entry=0xdf8b50, buffer_handle=buffer_handle@entry=2, bufptr=bufptr@entry=0x0, buf=buf@entry=0x7f9203d75020,
buf_size=buf_size@entry=0x7f920639aa20, vecptr=vecptr@entry=0x0, timeout_msec=timeout_msec@entry=100, convert_flags=0,
dispatch=dispatch@entry=0) at /home/agmini/packages/midas/src/midas.cxx:10279
#6 0x0000000000424161 in bm_receive_event (buffer_handle=2, destination=0x7f9203d75020, buf_size=0x7f920639aa20, timeout_msec=100)
at /home/agmini/packages/midas/src/midas.cxx:10649
#7 0x0000000000406ae4 in rpc_server_dispatch (index=11111, prpc_param=0x7ffcad70b7a0) at /home/agmini/packages/midas/progs/mserver.cxx:575
#8 0x000000000041ce9c in rpc_execute (sock=10, buffer=buffer@entry=0xe11570 "g+", convert_flags=0)
at /home/agmini/packages/midas/src/midas.cxx:15003
#9 0x000000000041d7a5 in rpc_server_receive_rpc (idx=idx@entry=0, sa=0xde6ba0) at /home/agmini/packages/midas/src/midas.cxx:15958
#10 0x0000000000451455 in ss_suspend (millisec=millisec@entry=1000, msg=msg@entry=0) at /home/agmini/packages/midas/src/system.cxx:4575
#11 0x000000000041deb2 in rpc_server_loop () at /home/agmini/packages/midas/src/midas.cxx:15907
#12 0x0000000000405266 in main (argc=9, argv=<optimized out>) at /home/agmini/packages/midas/progs/mserver.cxx:390
(gdb)
K.O. |
19 Jun 2022, Francesco Renga, Forum, Alarm on variable not updating
|
Dear all,
I've an ODB equipment that sometimes loses the connection with the hardware, so that the variables are not updated anymore. The connection can be restored by restarting the frontend. It would be useful to have an alarm based on the time from the last update of some variable (i.e. the alarm is triggered if the variable is not updated for more than X seconds). Is there a method to implement such an alarm in MIDAS?
Thank you very much,
Francesco |
20 Jun 2022, Stefan Ritt, Forum, Alarm on variable not updating
|
There are two functions to do that, one check the last write access, the other the last write access if the run is running. The alarm condition looks like:
access(/Equipment/.../Variables/Input[10]) > 60
which will cause an alarm if the Input[10] is not written for more than 60 seconds. The other function which checks the run status as well is like:
access_running(...odb key...) > 60
You can actually see an example on the MEG alarm page.
Rather than having an alarm for that I would however recommend that you program you frontend such that it realizes if it looses connections, then tries automatically to reconnect or trigger an alarm itself (so-called "internal" alarm). This is also how the MSCB system is working and is much more robust.
Stefan |
20 Jun 2022, jianrun, Bug Report, Error in "midas/src/mana.cxx"
|
Dear Midas developers,
When we are running the examples in $MIDASSYS/examples/experiment/, we meet some
problems when analyzing the results:
1. When we analyze the data using the analyzer: ./analyzer -i run00001.mid -o
run00001.rz , we find some bugs:
"
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
[Analyzer,ERROR] [mana.cxx:1832:bor,ERROR] HBOOK support is not compiled in
[Analyzer,INFO] Set run number 6 in ODB
Load ODB from run 6...OK
run00006.mid:2680 events, 0.00s
"
We think this occurs in the "midas/src/mana.cxx ". How can we solve this?
2. When we analyze the above data, an error also occurs:
[Analyzer,ERROR] [odb.cxx:847:db_validate_name,ERROR] Invalid name
"/Analyzer/Tests/Always true/Rate [Hz]" passed to db_create_key_wlocked: should
not contain "["
We simply fixed that just by replacing the "Rate [Hz]" with "Rate" in the
test_write in midas/src/mana.cxx
We are curious whether you can fix the problem permanently in the next version,
or we are not running the code properly. Thanks! |
15 Apr 2019, Konstantin Olchanski, Info, switch of MIDAS to C++
|
For a long time now we have keep the core of midas (odb.c, midas.c, etc) compatible with plain C and by default
we have built the MIDAS library using the plain C compiler. Over time, we have switched most MIDAS programs
(mhttpd, mlogger, mdump, odbedit, etc) to C++ (with happy results). (and for a long time now, all of MIDAS
could be build as C++, even if the default build remained plain C).
The main reason for keeping the core of MIDAS as C has been to allow writing MIDAS frontends in C - for
example, in environments with no C++ compilers or no C++ runtime (VxWorks) or where C++ had too much
overhead (small memory machines, etc).
Today, all concerns against using C++ seem to have receded into the past. C++ compilers are now always
available, even for small embedded systems. C++ overheads are now well understood and one can easily write
C++ code that is as efficient as C for using limited CPU and memory resources. (While at the same time, today's
embedded systems tend to have more CPU and RAM than "big" MIDAS DAQ machines had in the past - 1GHz
CPU, 1GB RAM is pretty typical for embedded ARM).
As examples of small hardware where MIDAS frontends written in C++ worked just fine, consider the T2K ND280
FGD data collector running on XILINX FPGA with a 300MHz PowerPC and 128 Mbytes of RAM (standard Linux
kernel) and the GRIFFIN Clock distribution module control running on a Microsemi FPGA with a 300MHz ARM
CPU (ucLinux without an MMU). More typical Cyclone-5 ARM SoCs with 1GB RAM and 1GHz CPU run standard
Linux (CentOS7) and can build MIDAS natively (no need for cross-compiling).
With the removal of the requirement to make it possible to write MIDAS frontends in C, we can switch the MIDAS
default build to C++ and start using C++ features in the MIDAS API (std::string, std::vector, etc).
Next to consider is "which C++ should we use?".
K.O. |
15 Apr 2019, Konstantin Olchanski, Info, switch of MIDAS to C++, which C++?
|
>
> With the removal of the requirement to make it possible to write MIDAS frontends in C, we can switch the MIDAS
> default build to C++ and start using C++ features in the MIDAS API (std::string, std::vector, etc).
>
Consider the most basic C++ construct, std::string, and observe how many member functions are annotated "c++11", "c++17", etc:
https://en.cppreference.com/w/cpp/string/basic_string
For MIDAS this means that we cannot target "a" C++ or "the" C++, we have to chose between C++ "before C++11", C++11, C++17
(plus the incoming c++20).
For example, the ROOT 6 package requires C++11 *and* g++ >= 4.8.
Now consider the platforms we use at TRIUMF:
- Linux RHEL/SL/CentOS6 - gcc 4.4.7, no C++11.
- Linux RHEL/SL/CentOS7 - gcc 4.8.5, full C++11, no C++14, no C++17
- Ubuntu 18.04.2 LTS - gcc 7.3.0, full C++11, full C++14, "experimental" C++17.
- MacOS 10.13 - llvm 10.0.0 (clang-1000.11.45.5), full C++11, full C++14, full C++17.
(see here for GCC C++ support: https://gcc.gnu.org/projects/cxx-status.html)
(see here for LLVM clang c++ support: https://clang.llvm.org/cxx_status.html)
As is easy to see from the std::string reference how C++17 has a large number of very useful new features.
Alas, at TRIUMF we still run MIDAS on many SL6 machines where C++11 and C++17 is not normally available. I estimate another 1-2
years before all our SL6 machines are upgraded to RHEL/SL/CentOS7 (or Ubuntu LTS).
This means we cannot use C++11 and C++17 in MIDAS yet. We are stuck with pre-C++11 for now.
Remarks:
- there will be trouble right away as both Stefan and myself do MIDAS development on MacOS where full C++17 is available and is
tempting to use. (as they say, watch this space)
- it is possible to install a newer C++ compiler into RHEL/SL/CentOS 6 and 7 systems, but we are loath to require this (same as we
are loath to require cmake for building MIDAS) - the "I" in MIDAS means integrated, meaning "does not require installing 100
additional packages before one can use it".
- the MS Windows situation is unclear, but since one has to install the C++ compiler as an additional package anyway, I do not see
any problem with requiring C++17 support, with a choice of MS compilers, GCC and LLVM. I doubt we will support anything older
than Windows 10.
K.O. |
15 Apr 2019, Konstantin Olchanski, Info, switch of MIDAS to C++, how much C++?
|
> >
> > With the removal of the requirement to make it possible to write MIDAS frontends in C, we can switch the MIDAS
> > default build to C++ and start using C++ features in the MIDAS API (std::string, std::vector, etc).
> >
C++ is a big animal. Obviously we want to use std::string, std::vector and similar improvements over plain C (we already use "//" for comments).
But in keeping with the Camel's nose fable (https://en.wikipedia.org/wiki/Camel%27s_nose), there are some parts of C++ we definitely do not want to use in MIDAS. Even the C++ FAQ talks
about "evil features", see https://isocpp.org/wiki/faq/big-picture#use-evil-things-sometimes
Here is my list of things to use and to avoid. Comments on this are very welcome - as everybody's experience with C++ is different (and everybody's experience is very valuable and very
welcome).
- std::string, std:vector, etc are in. I am already using them in the MIDAS API (midas.h)
- extern "C" is out, everything has to be C++, will remove "extern "C"" from all midas header files.
- exceptions are out, see https://stackoverflow.com/questions/1736146/why-is-exception-handling-bad
- std::thread and std::mutex are in, at least for writing new frontends, but see discussion of "cannot use c++11". (maybe replace ss_mutex_xxx() with out own std::mutex look-alike).
- heavy use of templates and heavy use of argument overloading is out - just by looking at the code, impossible to tell what function will be called
- "auto" is on probation. I need to know if "auto v=f()" is an integer or a double when I write "auto w=v/2" or "auto w=v/2.0". see
https://softwareengineering.stackexchange.com/questions/180216/does-auto-make-c-code-harder-to-understand
- unreadable gibberish is out (lambdas, etc)
- C-style malloc()/free() is in. C++ new and delete are okey, but "delete[]" confuses me.
- C-style printf() is in. C++ cout and "<<" gunk provide no way to easily format the output for easy reading.
K.O. |
16 Apr 2019, Pintaudi Giorgio, Info, switch of MIDAS to C++, how much C++?
|
Dear Konstantin,
even if I am still quite young and have only limited experience (but not null), I would like to give my two cents. I have reflected a bit about the C++ issue, also because I am developing a
brand new MIDAS interface for the WAGASCI-T2K experiment, and I feel that the future of MIDAS could influence the future of our DAQ system, too. I'll start from the conclusions: I completely
agree with you on a practical level, even if I kind of disagree on an "ethical" level.
What you propose in essence is to migrate the MIDAS core from pure C to a version of C with some fancy C++ features. Let's say a kind of C+ with only one plus. Theoretically speaking, even if
on the surface C and C++ are very similar, they are completely different languages and require different mindsets (and I am sure that everyone is aware of it). This is the reason why even if I
would have preferred to develop the MIDAS frontend for our experiment in C++, I have chosen to stick to pure C because I feel that MIDAS is still very C-like in its architecture (or from what
I can see from the documentation). So I wanted to "keep on track" for better internal coherence. What I mean is that, if someone told me to port a C project of mine to C++, I would end up
rewriting it almost completely, instead of just modifying it (I really don't know how much of the MIDAS core has been written with C++ in mind, so if a large part of it is already C++-like,
please ignore my comment above).
Anyway, on a practical level, I completely agree with your approach, because I imagine that a complete rewrite of MIDAS is off the table but, at the same time, some new C++ features like
better string and vector handling are very tempting to use. Moreover, in general, physicists are more familiar with the C syntax than with the C++ one (but thanks to ROOT that is changing). As
for the use of MIDAS in embedded devices, I have no experience so I refrain from judging. So, in the particular case of MIDAS, what you propose is probably the best and only option.
As far as the C++ standard to adopt, I would say that the C++11 standard is the best fit for the T2K experiment since the official OS for T2K is CentOS7 and, out of the box, it supports C++11
only. Anyway, I acknowledge that there are many other experiments and requirements. For the records, I do development on Ubuntu 18.04.
Best regards
Giorgio |
17 Apr 2019, John M O'Donnell, Info, switch of MIDAS to C++, how much C++?
|
some semi-random thoughts:
no templates strictly means you can't use std::string, std::vector etc.
printf is in any case part of C++ (#include <cstdio>), but std::ostreams can be faster (for std::cout, endl line causes buffer flushing, whereas "\n" does not flush the buffer but printf
always flushes the buffer), and formatting is possible (though very long winded). printf does not allow to print things other than simple data, e.g. BANK_HEADER* bh; printf( "%?", *bh);
I've been writing all our DAQ code in C++ for a while now.
> > >
> > > With the removal of the requirement to make it possible to write MIDAS frontends in C, we can switch the MIDAS
> > > default build to C++ and start using C++ features in the MIDAS API (std::string, std::vector, etc).
> > >
>
> C++ is a big animal. Obviously we want to use std::string, std::vector and similar improvements over plain C (we already use "//" for comments).
>
> But in keeping with the Camel's nose fable (https://en.wikipedia.org/wiki/Camel%27s_nose), there are some parts of C++ we definitely do not want to use in MIDAS. Even the C++ FAQ talks
> about "evil features", see https://isocpp.org/wiki/faq/big-picture#use-evil-things-sometimes
>
> Here is my list of things to use and to avoid. Comments on this are very welcome - as everybody's experience with C++ is different (and everybody's experience is very valuable and very
> welcome).
>
> - std::string, std:vector, etc are in. I am already using them in the MIDAS API (midas.h)
> - extern "C" is out, everything has to be C++, will remove "extern "C"" from all midas header files.
> - exceptions are out, see https://stackoverflow.com/questions/1736146/why-is-exception-handling-bad
> - std::thread and std::mutex are in, at least for writing new frontends, but see discussion of "cannot use c++11". (maybe replace ss_mutex_xxx() with out own std::mutex look-alike).
> - heavy use of templates and heavy use of argument overloading is out - just by looking at the code, impossible to tell what function will be called
> - "auto" is on probation. I need to know if "auto v=f()" is an integer or a double when I write "auto w=v/2" or "auto w=v/2.0". see
> https://softwareengineering.stackexchange.com/questions/180216/does-auto-make-c-code-harder-to-understand
> - unreadable gibberish is out (lambdas, etc)
> - C-style malloc()/free() is in. C++ new and delete are okey, but "delete[]" confuses me.
> - C-style printf() is in. C++ cout and "<<" gunk provide no way to easily format the output for easy reading.
>
> K.O. |
22 Apr 2019, Pintaudi Giorgio, Info, switch of MIDAS to C++, how much C++?
|
Dear Konstantin and others,
our recent discussion stimulated my curiosity and I wrote a small frontend for the trigger board of our experiment in C++.
The underlying hardware details are not relevant here. I would just like to briefly report and discuss what I found out.
I have written all the frontend files (but the bus driver) in C++11:
- my_frontend.cpp
- driver/class/my_class_driver.cpp
- driver/device/my_device_driver.cpp
All went quite smoothly, but I feel that the overall structure is still very C-like (that may be a good thing or a bad thing depending on the point of view).
As far as I know, the MIDAS frontend mfe.c has still only the C version (I couldn't find any mfe.cxx). This means that all the points of contact between the MIDAS frontend code and the user
frontend code must be C compatible (no C++ features or name mangling). To accomplish this I needed to slightly modify the midas.h header file like this:
@@ -1141,7 +1141,13 @@ typedef struct eqpmnt {
+#ifdef __cplusplus
+extern "C" {
+#endif
INT device_driver(DEVICE_DRIVER *device_driver, INT cmd, ...);
+#ifdef __cplusplus
+}
+#endif
I also tested the new strcomb1 function and it seems to work OK.
I have attached a source file to show how I implemented the device driver in C++. The code is not meant to be compilable: it is just to show how I implemented it. This is the most C++-like syntax that I could come out with. Feel free to comment it and if you think that it could be improved let me know.
Best Regards
Giorgio
|
23 Apr 2019, Konstantin Olchanski, Info, switch of MIDAS to C++, how much C++?
|
> Dear Konstantin and others, our recent discussion stimulated my curiosity and I wrote a small frontend for the trigger board of our
experiment in C++.
Yay!
> my_frontend.cpp
In MIDAS we are using .cxx, not .cpp, per ROOT coding convention https://root.cern.ch/coding-conventions
> the overall structure is still very C-like
this is object-oriented programming done in C. (actually C++ looks exactly the same if you look behind the curtain)
right now we do not hope to rewrite the slow control class driver framework in C++, but if somebody does it,
we should be happy to add it to midas.
for the mfe.c framework, I have a new C++ class based frontend framework in development (and already in use
in the ALPHA-g experiment at CERN). There is a number of lose ends to polish befire I can add it to midas.
And as usual the last 10% of the work consume 90% of the time.
> the MIDAS frontend mfe.c has still only the C version (I couldn't find any mfe.cxx).
> This means that all the points of contact between the MIDAS frontend code and the user frontend code must be C compatible
> (no C++ features or name mangling).
this will change with the switch to C++, mfe.c will become mfe.cxx and I shall add the required definitions to mfe.h (or midas.h, TBD)
> To accomplish this I needed to slightly modify the midas.h header file like this:
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> INT device_driver(DEVICE_DRIVER *device_driver, INT cmd, ...);
I intend for all "extern "C"" to go away, everything will use the C++ linkage (and name mangling). This will break existing frontends
and I will need to write clear instructions on converting them to the new scheme.
> I also tested the new strcomb1 function and it seems to work OK.
good.
> I have attached a source file to show how I implemented the device driver in C++
Yup, looks familiar, I have a couple of C++ frontends written like this, too.
K.O. |
11 May 2019, Konstantin Olchanski, Info, switch of MIDAS to C++, which C++?
|
> [which c++]
>
> - Linux RHEL/SL/CentOS6 - gcc 4.4.7, no C++11.
> - Linux RHEL/SL/CentOS7 - gcc 4.8.5, full C++11, no C++14, no C++17
>
The construct I now always use:
class X {
int a = 0; // do not leave data members uninitialized, see "Non-static data member initializers", N2756 and N2628
}
is only available starting from gcc 4.7, see https://gcc.gnu.org/projects/cxx-status.html
Another nail into the coffin of "pre c++11" c++ and el < el7.
Hmm...
K.O. |
22 May 2019, Konstantin Olchanski, Info, switch of MIDAS to C++
|
> switch MIDAS to C++
switch to C++ will proceed as follows:
- create a new branch off develop (feature/switch_to_cxx)
- remove all extern "C", ifdef c++, etc
- switch Makefile from gcc to g++
- test
- merge into develop
- before merge, tag the last "C" midas
- cut a new release branch (tentatively feature/midas-2019-06)
the last recommended "pre-C++" midas will remain the midas-2019-03 release (where we can retroactively apply bug fixes, as I just did a few minutes ago).
K.O. |
05 Jun 2019, Konstantin Olchanski, Info, MIDAS switched to C++
|
The last bits of code to switch MIDAS to C++ have been committed, see tag midas-2019-05-cxx.
Since the cmake conversion is still in progress, for now, I recommend using the old "make" build for trying this update.
From the switch to C++, the biggest change is the requirement that frontend programs be build and linked
using the C++ compiler. Since mfe.o and the rest of MIDAS are built with C++, building frontends
with C is no longer possible.
To help with this, I will post a short guide for converting C frontends to C++.
K.O. |
17 May 2022, Razvan Stefan Gornea, Info, MIDAS switched to C++
|
Hi, I have three naive questions about this:
- have you posted somewhere this guide about converting C frontends to C++?
- it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
- it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
Something like
g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
is expected to fail or is just me glitching? Is it because of thread library differences?
Thanks!
> The last bits of code to switch MIDAS to C++ have been committed, see tag midas-2019-05-cxx.
>
> Since the cmake conversion is still in progress, for now, I recommend using the old "make" build for trying this update.
>
> From the switch to C++, the biggest change is the requirement that frontend programs be build and linked
> using the C++ compiler. Since mfe.o and the rest of MIDAS are built with C++, building frontends
> with C is no longer possible.
>
> To help with this, I will post a short guide for converting C frontends to C++.
>
> K.O. |
17 May 2022, Konstantin Olchanski, Info, MIDAS switched to C++
|
> Hi, I have three naive questions about this:
all good questions, ask more of them.
> - have you posted somewhere this guide about converting C frontends to C++?
yes, in this elog here I posted a guide for converting C mfe.c frontends to C++ and
a guide for converting mfe.c frontend to C++ TMFE frontend. please use the "find" function,
if you cannot find them, let me know, I will look for it for you.
> - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?
correct. please run "git tag", tags before "midas-2019-05-cxx"is "C", after is "C++".
> - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
> is expected to fail or is just me glitching? Is it because of thread library differences?
yes, it is expected to fail, you have spaces after "-I", "-L" and "-l", incorrect g++ command syntax. after
correcting this, it may or may not work depending on what you have inside odb_test.c. I would be happy
to help you debug this, but please start a separate thread instead of necroposting into the C++ announcements.
K.O. |
17 May 2022, Ben Smith, Info, MIDAS switched to C++
|
> - have you posted somewhere this guide about converting C frontends to C++?
There's documentation in the wiki at:
https://daq00.triumf.ca/MidasWiki/index.php/Changelog#2019-06
It includes a step-by-step guide of how to upgrade, what changes need to be made to frontends, and common issues that people had. |
08 May 2022, Stefan Ritt, Info, RO_STOPPED with triggered events
|
We had issues in one of our experiment that people used RO_STOPPED in the
equipment list together with triggered events (EQ_USER). If events are sent when
a run is stopped, this leads to many unexpected results, so I added a check in
the mfe.cxx code which prevents RO_STOPPED (or RO_ALWAYS which includes
RO_STOPPED) together with EQ_TRIGGERED, EQ_INTERRUPT, EQ_MULTITHREAD and EQ_USER
type of events.
I got now complaints that some old front-end are not running any more since they
do use RO_ALWAYS together with triggered events. Can the author of these frontend
please tell me the rationale why this is needed, then I can maybe add a better
fix for that.
Stefan |
08 May 2022, Konstantin Olchanski, Info, RO_STOPPED with triggered events
|
> some old front-end are not running any more since they do use RO_ALWAYS together with
triggered events.
I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
some of these frontends will fail to start.
https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
tmfe c++ frontends do not have this restriction but by default only read data when run
is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
K.O. |
16 May 2022, Konstantin Olchanski, Info, RO_STOPPED with triggered events
|
> > some old front-end are not running any more since they do use RO_ALWAYS together with
> triggered events.
>
> I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
> some of these frontends will fail to start.
> https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
>
> tmfe c++ frontends do not have this restriction but by default only read data when run
> is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
As of commit
https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c
frontends will no longer fail to start. an error will still be issued "Equipment \"%s\"
contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be
removed."
BTW 1:
Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments.
They do not generate any events when there is no run (some of them do not generate any
events at all). Now they will start printing this error message, for no reason. (no we will
not be rewriting them justy to get rid of this message. life is too short).
BTW 2:
the c++ tmfe frontend does not have any protections against these "undersired side-effects".
What are these undesired side effects and should we add protection against them?
K.O. |
17 May 2022, Stefan Ritt, Info, RO_STOPPED with triggered events
|
> > > some old front-end are not running any more since they do use RO_ALWAYS together with
> > triggered events.
> >
> > I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS,
> > some of these frontends will fail to start.
> > https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
> >
> > tmfe c++ frontends do not have this restriction but by default only read data when run
> > is active (per-equipment fEqConfReadOnlyWhenRunning default is true).
>
> As of commit
> https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c
> frontends will no longer fail to start. an error will still be issued "Equipment \"%s\"
> contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be
> removed."
>
> BTW 1:
>
> Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments.
> They do not generate any events when there is no run (some of them do not generate any
> events at all). Now they will start printing this error message, for no reason. (no we will
> not be rewriting them justy to get rid of this message. life is too short).
>
> BTW 2:
>
> the c++ tmfe frontend does not have any protections against these "undersired side-effects".
>
> What are these undesired side effects and should we add protection against them?
>
> K.O.
The undesired side-effects are the following: The logger tries to collect all events at the end of
the run by emptying the SYSTEM buffer. If events keep coming after the run is stopped, this loop in
the logger might be an endless loop, crashing the whole experiment in the end.
Another issue (and actually the reason for this change) is the funciton receive_trigger_event() in
mfe.cxx which will get confused if events are still coming in after a run has been stopped and
actually enters an infinite loop.
Combining EQ_MULTITHREAD with EQ_PERIODIC or EQ_SLOW is a wrong parameter combination as written in
the documentation. If one wants to have multi-threaded slow control events, one has to use the
DF_MULTITHREAD flag in the DEVICE_DRIVER structure.
Having triggered events being sent to the system after a run has been stopped I would consider
simply wrong. Why should we ever use a run start/stop if events are always flowing? Adding
protections in all places for this case is certainly much more work than just changing one flag for
frontends which produce this error message now for a wrong parameter combination. |
24 Apr 2022, Konstantin Olchanski, Bug Fix, mserver buffer overrun and crash
|
There is a memory allocation bug in the mserver.
ALIGN8() was missing when receiving events from the event socket and data buffer
was allocated 4 bytes too short. but only for some received events and only in
very unlucky sequence of received events. result was a rare but obnoxious crash
of fevme frontend in alpha-2 at CERN. (we do not see any crash from this in
alpha-g or anywhere else, the best I can tell).
fixed in commit 4dc06ba47ff7caa5251fd8c48d8533f35799f3a6.
If you use the mserver, please update to this commit or apply following patch in
midas.cxx:
- int bufsize = sizeof(INT) + event_size;
+ int bufsize = sizeof(INT) + total_size;
K.O. |
16 May 2022, Konstantin Olchanski, Bug Fix, mserver buffer overrun and crash
|
> There is a memory allocation bug in the mserver.
Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache,
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer
This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401
Event buffer write cache (as set via ODB Equipment/Common and via
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.
mfe.c and tmfe c++ frontends updated to request the new write cache size by default.
if events are getting stuck in the write cache for too long, instead of reducing the
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by
default).
commit 373bcc3ab7f83c3c7bf6c051c237de043a982502
K.O. |
13 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
introduction:
to remember, bm_send_event() writes an event to the write cache, bm_flush_cache()
writes the contents of the write cache into the shared memory event buffer, buffer
free space is consumed. in the usual case, mlogger is reading events from the shared
memory event buffer, buffer free space is released. there is also a read cache, not
part of this discussion.
the purpose of the write cache is to reduce contention for the shared memory
semaphore. in the case of large number of small events, semaphore is locked per
cache-flush, instead of per-event. correct tuning of write cache and event size can
reduce lock rate from >100 kHz to around 100 Hz or lower.
analysis:
for correct operation of bm_send_event() under all conditions we need to consider
all corner cases:
1) no write cache: (cache size set to 0)
- event_size > buffer_size -> reject the event (obviously)
- event_size > 0.5 * buffer_size -> only 1 event fits into the buffer, next write
will stall until mlogger reads the previous event (sequential operation, bad)
- event_size < 0.3 * buffer_size -> at least 2 events fit into the buffer (good)
decision: limit event size to 0.5 to 0.3 * buffer_size (current limit is 0.5 *
buffer_size, I think).
consequence: buffer size limit is 2 Gbytes (32-bit byte offsets, code is only 31-
bit-clean), max event size is between 1 Gbytes and 0.6 Gbytes.
2) writing to write cache:
- event_size > cache_size -> flush cache, write event to directly to buffer
- event_size > 0.5 * cache_size -> inefficient use of cache: write to cache, next
event does not fit, flush to buffer, repeat. no gain in semaphore locking (bad), one
additional memcpy() (event to cache and cache to buffer) (bad)
- event_size < 0.3 * cache_size -> multiple events fit into cache, but probably no
gain in semaphore locking
decision: events that are bigger than 0.3 to 0.1 * cache_size should not go through
the cache. (flush cache, write directly to buffer).
3) flush write cache to buffer:
- cache_size > buffer_size -> cannot flush in 1 operation, must have a loop and
flush the cache in pieces
- cache_size between 0.5 and 1.0 * buffer_size -> can flush in 1 operation, but must
wait for mlogger to fully empty the buffer (sequential operation, bad)
- cache size < 0.3 * buffer_size -> can flush in 1 operation, at least 2 "flushes"
fit inside the buffer (good)
decision: limit write cache size to 0.3 * buffer_size. (current limit is
0.25*buffer_size).
consequences:
- write cache size limit is 0.3..0.25 * 2GB = 0.6..0.5 Gbytes
- cached event size limit is 0.3..0.1 * 0.5 GBytes = 150..50 Mbytes
- minimum number of cached events: 3 to 10
- semaphore locks reduced: 3 to 10 locks become 1 lock (all events cached),
4 to 11 locks become 2 locks (big event causes cache flush).
4) complications:
- there is a periodic 1/second bm_flush_cache() that flushes the cache early and
reduces it's efficiency (but needed to avoid having data stuck in cache for long
time)
- if multiple frontends use large write cache (~ 0.3..0.5 * buffer_size), again,
sequential operation can happen (bad)
- write cache is per-frontend, not per-equipment. if different equipments request
different cache sizes, mfe.c and tmfe c++ frontends complain about this, but the
user has to sort it out.
K.O. |
16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
> for correct operation of bm_send_event() under all conditions we need to ...
to continue computation from last message:
default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes
hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)
max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes
number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)
default write cache size: 100 kbytes
max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes
max size of cached events:
current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.
max data rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec
max event rate before event buffer semaphore locking rate exceeds 100 Hz:
1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)
K.O. |
16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache
|
> > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:
if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.
this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.
with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does
not need to change).
this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.
multiple writers can run into contention for event buffer space.
consider 10 writers want to flush their 10 Mbyte write cache all at the same time:
if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).
but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).
so there should be no problem in practice.
K.O. |
06 May 2022, Stefan Ritt, Info, Increased timeout for program shut down
|
We had the problem in our lab that a frontend took about 6 seconds to gracefully
shut down, mainly it needed to park some motors. I found that the shutdown command
had a hard-coded timeout of 5 seconds, after which the frontend gets killed, and
cannot finish the park operation. I change the code so that the client timeout
stored in the ODB is taken instead of the hard-coded 5 seconds. This allows each
client to fine-tune its timeout, to allow graceful shutdown, but also not let the
user wait too long if the client gets stuck and needs a hard kill.
The default timeout for mfe.cxx based frontends has been changed to 10 seconds
now, but in the frontend_init function this can be changed by the user code
easily.
I hope this char does not trigger any bad side effects, but if it does, please
report here.
Stefan |
|