ID |
Date |
Author |
Topic |
Subject |
2283
|
11 Oct 2021 |
Stefan Ritt | Info | Modification in the history logging system | A requested change in the history logging system has been made today. Previously, history values were
logged with a maximum frequency (usually once per second) but also with a minimum frequency, meaning
that values were logged for example every 60 seconds, even if they did not change. This causes a problem.
If a frontend is inactive or crashed which produces variables to be logged, one cannot distinguish between
a crashed or inactive frontend program or a history value which simply did not change much over time.
The history system was designed from the beginning in a way that values are only logged when they actually
change. This design pattern was broken since about spring 2021, see for example this issue:
https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for
Today I modified the history code to fix this issue. History logging is now controlled by the value of
common/Log history in the following way:
* Common/Log history = 0 means no history logging
* Common/Log history = 1 means log whenever the value changes in the ODB
* Common/Log history = N means log whenever the value changes in the ODB and
the previous write was more than N seconds ago
So most experiments should be happy with 0 or 1. Only experiments which have fluctuating values due to noisy
sensors might benefit from a value larger than 1 to limit the history logging. Anyhow this is not the preferred
way to limit history logging. This should be done by the front-end limiting the updates to the ODB. Most of the
midas slow control drivers have a “threshold” value. Only if the input changes by more then the threshold are
written to the ODB. This allows a per-channel “dead band” and not a per-event limit on history logging
as ‘log history’ would do. In addition, the threshold reduces the write accesses to the ODB, although that is
only important for very large experiments.
Stefan |
2282
|
30 Sep 2021 |
Francesco Renga | Forum | OPC client within MIDAS | Dear all,
I need to integrate in my MIDAS project the communication with an OPC UA
server. My plan is to develop an OPC UA client as a "device" in
midas/drivers/device.
Two questions:
1) Is anybody aware of some similar effort for some other project, so that I can
get some example?
2) What could be the more appropriate driver's class to be used? generic.cxx?
multi.cxx?
Thank you for your help,
Francesco |
2281
|
29 Sep 2021 |
Stefan Ritt | Bug Report | nstall clash between MIDAS 2020-08 and mscb | > Thank you, Stefan.
>
> I found these instructions under
> 1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
> 2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
>
> I do see reference to updating the submodules under the TRIUMF install
> instructions
> (https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
> all_MIDAS) although perhaps it can be clarified.
>
> Cheers,
> Richard
Hi Richard,
I updated the documentation at
https://midas.triumf.ca/MidasWiki/index.php/Changelog#Updating_midas
by putting the submodule update command everywhere.
Best,
Stefan |
2280
|
29 Sep 2021 |
Richard Longland | Bug Report | nstall clash between MIDAS 2020-08 and mscb | Thank you, Stefan.
I found these instructions under
1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
I do see reference to updating the submodules under the TRIUMF install
instructions
(https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
all_MIDAS) although perhaps it can be clarified.
Cheers,
Richard |
2279
|
28 Sep 2021 |
Stefan Ritt | Bug Report | Install clash between MIDAS 2020-08 and mscb | > 1) git clone https://bitbucket.org/tmidas/midas --recursive
> 2) cd midas
> 3) git checkout release/midas-2020-08
> 4) mkdir build
> 5) cd build
> 6) cmake ..
> 7) make
When you do step 3), you get
~/tmp/midas$ git checkout release/midas-2020-08
warning: unable to rmdir 'manalyzer': Directory not empty
warning: unable to rmdir 'midasio': Directory not empty
M mjson
M mscb
M mvodb
M mxml
The 'M' in front of the submodules like mscb tell you that you
have an older version of midas (namely midas-2020-08), but the
*current* submodules, which won't match. So you have to roll back
also the submodules with:
3.5) git submodule update --recursive
This fetched those versions of the submodules which match the
midas version 2020-08. See here for details:
https://git-scm.com/book/en/v2/Git-Tools-Submodules
From where did you get the command
git checkout release/xxxx ???
If you tell me the location of that documentation, I will take
care that it will be amended with the command
git submodule update --recursive
Best,
Stefan |
2278
|
28 Sep 2021 |
Richard Longland | Bug Report | Install clash between MIDAS 2020-08 and mscb | All,
I am performing a fresh install of MIDAS on an Ubuntu linux box. I follow the
usual installation procedure:
1) git clone https://bitbucket.org/tmidas/midas --recursive
2) cd midas
3) git checkout release/midas-2020-08
4) mkdir build
5) cd build
6) cmake ..
7) make
Step 3 warns me that
"warning: unable to rmdir 'manalyzer': Directory not empty" and
"warning: unable to rmdir 'midasio': Directory not empty"
Step 7 fails.
Compilation fails with an mhttp error related to mscb:
mhttpd.cxx:8224:59: error: too few arguments to function 'int mscb_ping(int,
short unsigned int, int, int)'
8224 | status = mscb_ping(fd, (unsigned short) ind, 1);
I was able to get around this by rolling mscb back to some old version (commit
74468dd), but am extremely nervous about mix-and-matching the code this way.
Any advice would be greatly appreciated.
Cheers,
Richard |
2277
|
19 Sep 2021 |
Stefan Ritt | Bug Fix | Chat working again | Not sure how many people are using it, but the Chat facility in midas was broken
for some time now and got fixed today again.
Just for your information: Chat can be used like WhatsApp & Co, and connects all
people who access a midas experiment through their browser. It's good to
communicate between shift crew members located at different places. One advantage
is that the chat messages can get 'spoken' by the text-to-speech engine of your
browser, so it can be used to "wake up" shifters. Can be configured through the
"Config" page.
Stefan |
Attachment 1: Screenshot_2021-09-19_at_21.27.19_.png
|
|
2276
|
17 Sep 2021 |
Stefan Ritt | Forum | mhttpd crash | To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive.
To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into
/etc/monit.d/mhttpd
You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put
set daemon 10
into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.
Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is
multi-threaded, but I haven't tested this in detail.
Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).
I hope this helps some of you.
Stefan |
Attachment 1: mhttpd
|
check process mhttpd matching "mhttpd"
start program = "/bin/su -l meg -c '/<path-to-midas>/bin/mhttpd -D'"
stop program = "/usr/bin/killall mhttpd"
if failed
host 127.0.0.1
port 8082
protocol http
method GET
request "/midas.css"
with timeout 30 seconds
then restart
|
Attachment 2: Screenshot_2021-09-17_at_21.11.15_.png
|
|
2275
|
07 Sep 2021 |
Andreas Suter | Forum | mhttpd crash | Dear Konstantin,
thanks for the prompt response, this helps a lot!
> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.
I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.
> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
>
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
>
I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?
Andreas |
2274
|
06 Sep 2021 |
Konstantin Olchanski | Forum | mhttpd crash | > [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.
This is my code. I am the culprit. I had a bit of discussion about this with Stefan.
Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.
I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.
This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.
Back to your troubles.
1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.
2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.
3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).
4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)
In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).
K.O. |
2273
|
06 Sep 2021 |
Andreas Suter | Forum | mhttpd crash | midas version used: midas-2019-05-cxx-1461-g906be8b
I find in the systemd log every couple of days/weeks the following error message related to the mhttpd:
[mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
with various socket numbers of course.
Can anybody hint me what is going wrong here?
The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.
Help would be very much appreciated!
Andreas |
2272
|
24 Aug 2021 |
Stefan Ritt | Bug Fix | changes in history plots | One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to
determine an order.
See for example: https://jsbin.com/hijetos/edit?js,output
But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)
Stefan |
2271
|
20 Aug 2021 |
Stefan Ritt | Bug Report | select() FD_SETSIZE overrun | > I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
> mysteriously misbehaving during run start and stop.
>
> The problem turns out to be with the select() system call.
>
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
> the FD_SET() array. Ouch.
>
> I see that all uses of select() in midas have no protection against this.
>
> (we should probably move away from select() to newer poll() or whatever it is)
>
> Why does mlogger open so many file descriptors? The usual, scaling problems in the
> history. The old midas history does not reuse file descriptors, so opens the same
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history
> opens just one file per history event. But if the number of events is bigger than
> 1024, we run into same trouble.
>
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
> on some other machines, see "limit" or "ulimit -a").
>
> K.O.
I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't
fit on your status page.
I have some other suspicion: The logger opens a history file on access, then closes it
again after writing to it. In the old days we had a case where we had a return from the
write function BEFORE the file has been closed. This is kind of a memory leak, but with
file descriptors. After some time of course you run out of file descriptors and crash.
Now that bug has been fixed many years ago, but it sounds to me like there is another
"fd leak" somewhere. You should add some debugging in the history code to print the
file descriptors when you open a file and when you leave that routine. The leak could
however also be somewhere else, like writing to the message file, ODB dump, ...
The right thing of course would be to rewrite everything with std::ofstream which
closes automatically the file when the object gets out of scope.
Stefan |
2270
|
19 Aug 2021 |
Konstantin Olchanski | Bug Report | select() FD_SETSIZE overrun | I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
mysteriously misbehaving during run start and stop.
The problem turns out to be with the select() system call.
The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
the FD_SET() array. Ouch.
I see that all uses of select() in midas have no protection against this.
(we should probably move away from select() to newer poll() or whatever it is)
Why does mlogger open so many file descriptors? The usual, scaling problems in the
history. The old midas history does not reuse file descriptors, so opens the same
3 history files (.hst, .idx, etc) for each history event. The new FILE history
opens just one file per history event. But if the number of events is bigger than
1024, we run into same trouble.
(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
on some other machines, see "limit" or "ulimit -a").
K.O. |
2269
|
05 Aug 2021 |
Stefan Ritt | Bug Report | mhttpd WebServer ODBTree initialization | Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall
I do it? |
2268
|
02 Aug 2021 |
Andreas Suter | Bug Report | cmake with CMAKE_INSTALL_PREFIX fails | Dear Konstantin,
I have tried your adopted version. You did already quite a job which is more consistent than what I was suggesting.
Yet, I still have a problem (git sha2 2d3872dfd31) when starting on a clean system (i.e. no midas present yet):
Without CMAKE_INSTALL_PREFIX set, everything is fine.
However, when setting CMAKE_INSTALL_PREFIX, I get the following error message on the build level (cmake --build ./ -- VERBOSE=1) from the manalyzer:
[ 32%] Building CXX object manalyzer/CMakeFiles/manalyzer.dir/manalyzer.cxx.o
cd /home/l_musr_tst/Tmp/midas/build/manalyzer && /usr/bin/c++ -DHAVE_FTPLIB -DHAVE_MIDAS -DHAVE_ROOT_HTTP -DHAVE_THTTP_SERVER -DHAVE_TMFE -DHAVE_ZLIB -D_LARGEFILE64_SOURCE -I/home/l_musr_tst/Tmp/midas/manalyzer -I/usr/local/root/include -O2 -g -Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function -std=c++11 -pipe -fsigned-char -pthread -DHAVE_ROOT -std=gnu++11 -o CMakeFiles/manalyzer.dir/manalyzer.cxx.o -c /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx
In file included from /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx:14:0:
/home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.h:13:21: fatal error: midasio.h: No such file or directory
#include "midasio.h"
^
compilation terminated.
Obviously, still some include paths are missing. I tried quickly to see if an easy fix is possible, but I failed.
Question: is it possible to use manalyzer without midas? I am asking since the MIDAS_FOUND flag is confusing me.
> big thanks to Andreas S. for getting most of this figured out. I now understand
> much better how cmake installs things and how it generates config files, both
> find_package(midas) style and install(export) style.
>
> with the latest updates, CMAKE_INSTALL_PREFIX should work correctly. I now understand how it works,
> how to use it and how to test it, it should not break again.
>
> for posterity, my commends to Andreas's pull request:
>
> thank you for providing this code, it was very helpful. at the end I implemented things slightly differently. It took me a while to understand that I have to provide 2 “install” modes, for your case, I need to
> “install” the header files and everything works “the cmake way”, for our normal case, we use include files in-place and have to include all the git submodules to the include path. I am quite happy with the
> result. K.O.
>
> K.O. |
2267
|
31 Jul 2021 |
Peter Kunz | Bug Report | ss_shm_name: unsupported shared memory type, bye! | I ran into a problem trying to compile the latest MIDAS version on a Fedora
system.
mhttpd and odbedit return:
ss_shm_name: unsupported shared memory type, bye!
check_shm_type: preferred POSIXv4_SHM got SYSV_SHM
The check returns SYSV_SHM which doesn't seem to be supported in ss_shm_name.
Is there an easy solution for this?
Thanks. |
2266
|
14 Jul 2021 |
Konstantin Olchanski | Bug Fix | changes in history plots | > Moving in the direction of this proposal. Remaining missing piece is the "show
> raw value" buttons and code behind them.
added "show raw value" button, updated on-page instructions.
I think this is the final layout of the history panel editor, conversion
to html+javascript will be done "as is". If you have suggestions to improve
the layout (add/remove/move things around, etc), please shoult out (on the elog
here or by direct email to me).
I am thinking in the direction of changing the control flow of the history editor:
- midas "history" manu button click redirects to
- current history panel selection (with checkbox to open old history plots), click on "new plot" button redirects to
- new page for creating new plots. this will present a list of all history variables, click on variable name creates a new history
panel containing just this one variable and redirects to it.
In other words, to see the history for any history variable:
- click on "history" menu button
- click on "new"
- click on desired history variable
- see this history plot
From here, click on the "wheel" button to open the existing history panel editor and add any additional variables, change settings,
etc.
In the history panel editor, I am thinking in the direction of replacing the existing drop-down selection of history variables (now
very workable for large experiments) with an overlay dialog to show all history variables, with checkboxes to select them, basically
the same history variable select page as described above. Not sure yet how this will work visually.
K.O. |
2265
|
14 Jul 2021 |
Konstantin Olchanski | Bug Fix | changes in history plots | Moving in the direction of this proposal. History plot editor is updated according to it. Remaining missing piece is the "show
raw value" buttons and code behind them.
Changes:
- "show factor and offset" moved to the top of the page, "off" by default
- factor and offset (if not zero) are automatically migrated to the formula field (if it is empty), one needs to save the panel
for this to take effect.
K.O.
> > I am updating the history plots.
> > So the idea is to use this computation:
> > y_position_on_plot = offset + factor*(formula(history_value) - voffset)
>
> Stefan and myself did some brain storming on zoom. Writing it down the way I remember it.
>
> - we distilled the gist of the problem - the numerical values we show in the plot labels and in hover-over-the-graph
> are before formula is applied or after the formula is applied?
>
> - I suggested a universal solution using a double formula: use formula1 for one case;
> use formula2 for the other case;
> use formula1 for "physics calibration", use formula2 for factor and offset for composite plots:
> numeric_value = formula1(history_value)
> plotted_value = formula2(numeric_value)
>
> - we agree that this is way too complicated, difficult to explain and difficult to coherently present in the history editor
>
> - Stefan suggested a simple solution, a checkbox labeled "show raw value" next to each history variable. by default, the
> value after the formula is plotted and displayed. if checked, the raw value (before the formula) is displayed, and the
> value after the formula is plotted. (so this works the same as the factor and offset on the old history plots).
>
> - if "show raw value" is enabled, the numerical values shown will be inconsistent against the labels on the vertical axis.
> Our solution it to turn the axis labels off. (for composite plots, like oscillator frequency in Hz vs oscillator
> temperature in degC, both scaled to see their correlation, the vertical axis is unit-less "arbitrary units", of course)
>
> - to simplify migration of old history plots that use custom factor and offset settings, we think in the direction of
> automatically moving them to the "formula". (factor=2, offset=10 automatically populates formula with "2*x+10", "show raw
> value" checked/enabled). Thus we can avoid implementing factor and offset in the new history code (an unwelcome
> complication).
>
> - I think this covers all the use cases I have seen in the past, so we will move in this direction.
>
> K.O. |
2264
|
14 Jul 2021 |
Konstantin Olchanski | Bug Report | cmake question | > > cmake check and mate in 1 move. please help.
> > -std=c++11 and -std=c++14 collision...
>
> I have a solution implemented for this, I am not happy with it, Stefan is not happy with it. See
> discussion: https://bitbucket.org/tmidas/midas/commits/50a15aa70a4fe3927764605e8964b55a3bb1732b
>
I figured it out, solution is to use:
target_compile_features(midas PUBLIC cxx_std_11)
this is how it works:
- centos-7 (g++ has c++11 off by default): -std=gnu++11 is added automatically (not -std=c++11, but
probably correct, as some c++11 functions were available as gnu extensions)
- ubuntu-20.04 LTS without ROOT: nothing added (I guess correct, g++ has c++11 is enabled by default)
- ubuntu-20.04 LTS with -std=c++14 from ROOT: nothing added, c++14 as requested by ROOT is in affect.
- macos without ROOT: -std=gnu++11 is added automatically
- macos with -std=c++11 from ROOT: ditto, so both -std=c++11 and -std=gnu++11 are present in this order,
wrong-ish, but works.
and good luck figuring this out just from cmake documentation:
https://cmake.org/cmake/help/latest/command/target_compile_features.html
K.O. |
|