Back Midas Rome Roody Rootana
  Midas DAQ System, Page 2 of 115  Not logged in ELOG logo
ID Date Author Topic Subject
  2289   14 Oct 2021 Amy RobertsSuggestionAdding (or improving discoverability) of TID for odbset
Creating an ODB key requires users to know the Type ID that are defined in 
https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.

I can't find any information on the Midas Wiki about these values or how to find 
them.

Am I missing something obvious?  Is there a way to improve how to find these values?  
Or is this not the best way to interact with the ODB?
  2288   11 Oct 2021 Konstantin OlchanskiForummidas forum updated, moved
The midas forum software (elogd) was updated to latest version and moved from our old server 
(ladd00.triumf.ca) to our new server (daq00.triumf.ca).

The following URLs should work:

https://daq00.triumf.ca/elog-midas/Midas/ (new URL)
https://midas.triumf.ca/elog/Midas/ (old URL, redirects to daq00)
https://midas.triumf.ca/forum (link from midas wiki)

The configuration on the old server ladd00.triumf.ca is quite tangled between
several virtual hosts and several DNS CNAMEs. I think I got all the redirects
correct and all old URLs and links in old emails & etc still work.

If you see something wrong, please reply to this message here or email me directly.

K.O.
  2287   11 Oct 2021 Konstantin OlchanskiForumtest
> > > test, no email. K.O.
> > 
> > test reply, no email. K.O.
> 
> test attachment, no email. K.O.

test email. K.O.
  2286   11 Oct 2021 Konstantin OlchanskiForumtest
> > test, no email. K.O.
> 
> test reply, no email. K.O.

test attachment, no email. K.O.
Attachment 1: image.png
image.png
  2285   11 Oct 2021 Konstantin OlchanskiForumtest
> test, no email. K.O.

test reply, no email. K.O.
  2284   11 Oct 2021 Konstantin OlchanskiForumtest
test, no email. K.O.
  2283   11 Oct 2021 Stefan RittInfoModification in the history logging system
A requested change in the history logging system has been made today. Previously, history values were
logged with a maximum frequency (usually once per second) but also with a minimum frequency, meaning
that values were logged for example every 60 seconds, even if they did not change. This causes a problem.
If a frontend is inactive or crashed which produces variables to be logged, one cannot distinguish between
a crashed or inactive frontend program or a history value which simply did not change much over time.
The history system was designed from the beginning in a way that values are only logged when they actually
change. This design pattern was broken since about spring 2021, see for example this issue:

https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for

Today I modified the history code to fix this issue. History logging is now controlled by the value of 
common/Log history in the following way:

* Common/Log history = 0 means no history logging
* Common/Log history = 1 means log whenever the value changes in the ODB
* Common/Log history = N means log whenever the value changes in the ODB and 
  the previous write was more than N seconds ago

So most experiments should be happy with 0 or 1. Only experiments which have fluctuating values due to noisy 
sensors might benefit from a value larger than 1 to limit the history logging. Anyhow this is not the preferred 
way to limit history logging. This should be done by the front-end limiting the updates to the ODB. Most of the 
midas slow control drivers have a “threshold” value. Only if the input changes by more then the threshold are 
written to the ODB. This allows a per-channel “dead band” and not a per-event limit on history logging 
as ‘log history’ would do. In addition, the threshold reduces the write accesses to the ODB, although that is
only important for very large experiments.

Stefan
  2282   30 Sep 2021 Francesco RengaForumOPC client within MIDAS
Dear all,
     I need to integrate in my MIDAS project the communication with an OPC UA 
server. My plan is to develop an OPC UA client as a "device" in 
midas/drivers/device.

Two questions:

1) Is anybody aware of some similar effort for some other project, so that I can 
get some example?

2) What could be the more appropriate driver's class to be used? generic.cxx? 
multi.cxx?

Thank you for your help,
           Francesco
  2281   29 Sep 2021 Stefan RittBug Reportnstall clash between MIDAS 2020-08 and mscb
> Thank you, Stefan.
> 
> I found these instructions under
> 1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
> 2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
> 
> I do see reference to updating the submodules under the TRIUMF install 
> instructions 
> (https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
> all_MIDAS) although perhaps it can be clarified.
> 
> Cheers,
> Richard

Hi Richard,

I updated the documentation at

https://midas.triumf.ca/MidasWiki/index.php/Changelog#Updating_midas

by putting the submodule update command everywhere.

Best,
Stefan
  2280   29 Sep 2021 Richard LonglandBug Reportnstall clash between MIDAS 2020-08 and mscb
Thank you, Stefan.

I found these instructions under
1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)

I do see reference to updating the submodules under the TRIUMF install 
instructions 
(https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
all_MIDAS) although perhaps it can be clarified.

Cheers,
Richard
  2279   28 Sep 2021 Stefan RittBug ReportInstall clash between MIDAS 2020-08 and mscb
> 1) git clone https://bitbucket.org/tmidas/midas --recursive
> 2) cd midas
> 3) git checkout release/midas-2020-08
> 4) mkdir build
> 5) cd build
> 6) cmake ..
> 7) make

When you do step 3), you get

~/tmp/midas$ git checkout release/midas-2020-08
warning: unable to rmdir 'manalyzer': Directory not empty
warning: unable to rmdir 'midasio': Directory not empty
M	mjson
M	mscb
M	mvodb
M	mxml

The 'M' in front of the submodules like mscb tell you that you
have an older version of midas (namely midas-2020-08), but the 
*current* submodules, which won't match. So you have to roll back
also the submodules with:

3.5) git submodule update --recursive

This fetched those versions of the submodules which match the
midas version 2020-08. See here for details: 

https://git-scm.com/book/en/v2/Git-Tools-Submodules

From where did you get the command

git checkout release/xxxx ???

If you tell me the location of that documentation, I will take
care that it will be amended with the command

git submodule update --recursive

Best,
Stefan
  2278   28 Sep 2021 Richard LonglandBug ReportInstall clash between MIDAS 2020-08 and mscb
All,

I am performing a fresh install of MIDAS on an Ubuntu linux box. I follow the 
usual installation procedure:

1) git clone https://bitbucket.org/tmidas/midas --recursive
2) cd midas
3) git checkout release/midas-2020-08
4) mkdir build
5) cd build
6) cmake ..
7) make

Step 3 warns me that 
"warning: unable to rmdir 'manalyzer': Directory not empty" and 
"warning: unable to rmdir 'midasio': Directory not empty"

Step 7 fails.
Compilation fails with an mhttp error related to mscb:
mhttpd.cxx:8224:59: error: too few arguments to function 'int mscb_ping(int, 
short unsigned int, int, int)'
 8224 |             status = mscb_ping(fd, (unsigned short) ind, 1);


I was able to get around this by rolling mscb back to some old version (commit 
74468dd), but am extremely nervous about mix-and-matching the code this way.

Any advice would be greatly appreciated.

Cheers,
Richard 
  2277   19 Sep 2021 Stefan RittBug FixChat working again
Not sure how many people are using it, but the Chat facility in midas was broken 
for some time now and got fixed today again.

Just for your information: Chat can be used like WhatsApp & Co, and connects all 
people who access a midas experiment through their browser. It's good to 
communicate between shift crew members located at different places. One advantage 
is that the chat messages can get 'spoken' by the text-to-speech engine of your 
browser, so it can be used to "wake up" shifters. Can be configured through the 
"Config" page.

Stefan
Attachment 1: Screenshot_2021-09-19_at_21.27.19_.png
Screenshot_2021-09-19_at_21.27.19_.png
  2276   17 Sep 2021 Stefan RittForummhttpd crash
To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI 
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive. 

To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into

/etc/monit.d/mhttpd

You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put 

set daemon 10

into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.

Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is 
multi-threaded, but I haven't tested this in detail.

Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).

I hope this helps some of you.

Stefan
Attachment 1: mhttpd
check process mhttpd matching "mhttpd"
  start program = "/bin/su -l meg -c '/<path-to-midas>/bin/mhttpd -D'"
  stop program = "/usr/bin/killall mhttpd"
  if failed
    host 127.0.0.1
    port 8082 
    protocol http
    method GET
    request "/midas.css"
    with timeout 30 seconds 
  then restart
Attachment 2: Screenshot_2021-09-17_at_21.11.15_.png
Screenshot_2021-09-17_at_21.11.15_.png
  2275   07 Sep 2021 Andreas SuterForummhttpd crash
Dear Konstantin,

thanks for the prompt response, this helps a lot!

> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.

I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.

> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
> 
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
> 

I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?

Andreas
  2274   06 Sep 2021 Konstantin OlchanskiForummhttpd crash
> [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

This is my code. I am the culprit. I had a bit of discussion about this with Stefan.

Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.

I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.

This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.

Back to your troubles.

1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.

2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.

3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).

4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)

In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).

K.O.
  2273   06 Sep 2021 Andreas SuterForummhttpd crash
midas version used: midas-2019-05-cxx-1461-g906be8b

I find in the systemd log every couple of days/weeks the following error message related to the mhttpd:

[mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!

with various socket numbers of course.

Can anybody hint me what is going wrong here?

The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

Help would be very much appreciated!

Andreas 
  2272   24 Aug 2021 Stefan RittBug Fixchanges in history plots
One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more 
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to 
determine an order. 

See for example: https://jsbin.com/hijetos/edit?js,output

But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)

Stefan
  2271   20 Aug 2021 Stefan RittBug Reportselect() FD_SETSIZE overrun
> I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
> mysteriously misbehaving during run start and stop.
> 
> The problem turns out to be with the select() system call.
> 
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
> the FD_SET() array. Ouch.
> 
> I see that all uses of select() in midas have no protection against this.
> 
> (we should probably move away from select() to newer poll() or whatever it is)
> 
> Why does mlogger open so many file descriptors? The usual, scaling problems in the 
> history. The old midas history does not reuse file descriptors, so opens the same 
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history 
> opens just one file per history event. But if the number of events is bigger than 
> 1024, we run into same trouble.
> 
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
> on some other machines, see "limit" or "ulimit -a").
> 
> K.O.

I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't 
fit on your status page. 

I have some other suspicion: The logger opens a history file on access, then closes it 
again after writing to it. In the old days we had a case where we had a return from the 
write function BEFORE the file has been closed. This is kind of a memory leak, but with 
file descriptors. After some time of course you run out of file descriptors and crash. 
Now that bug has been fixed many years ago, but it sounds to me like there is another 
"fd leak" somewhere. You should add some debugging in the history code to print the 
file descriptors when you open a file and when you leave that routine. The leak could 
however also be somewhere else, like writing to the message file, ODB dump, ...

The right thing of course would be to rewrite everything with std::ofstream which 
closes automatically the file when the object gets out of scope.

Stefan
  2270   19 Aug 2021 Konstantin OlchanskiBug Reportselect() FD_SETSIZE overrun
I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
mysteriously misbehaving during run start and stop.

The problem turns out to be with the select() system call.

The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
the FD_SET() array. Ouch.

I see that all uses of select() in midas have no protection against this.

(we should probably move away from select() to newer poll() or whatever it is)

Why does mlogger open so many file descriptors? The usual, scaling problems in the 
history. The old midas history does not reuse file descriptors, so opens the same 
3 history files (.hst, .idx, etc) for each history event. The new FILE history 
opens just one file per history event. But if the number of events is bigger than 
1024, we run into same trouble.

(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
on some other machines, see "limit" or "ulimit -a").

K.O.
ELOG V3.1.4-2e1708b5