Back Midas Rome Roody Rootana
  Midas DAQ System, Page 30 of 142  Not logged in ELOG logo
ID Date Author Topic Subject
  2275   07 Sep 2021 Andreas SuterForummhttpd crash
Dear Konstantin,

thanks for the prompt response, this helps a lot!

> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.

I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.

> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
> 
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
> 

I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?

Andreas
  2274   06 Sep 2021 Konstantin OlchanskiForummhttpd crash
> [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

This is my code. I am the culprit. I had a bit of discussion about this with Stefan.

Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.

I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.

This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.

Back to your troubles.

1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.

2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.

3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).

4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)

In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).

K.O.
  2273   06 Sep 2021 Andreas SuterForummhttpd crash
midas version used: midas-2019-05-cxx-1461-g906be8b

I find in the systemd log every couple of days/weeks the following error message related to the mhttpd:

[mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!

with various socket numbers of course.

Can anybody hint me what is going wrong here?

The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

Help would be very much appreciated!

Andreas 
  2272   24 Aug 2021 Stefan RittBug Fixchanges in history plots
One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more 
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to 
determine an order. 

See for example: https://jsbin.com/hijetos/edit?js,output

But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)

Stefan
  2271   20 Aug 2021 Stefan RittBug Reportselect() FD_SETSIZE overrun
> I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
> mysteriously misbehaving during run start and stop.
> 
> The problem turns out to be with the select() system call.
> 
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
> the FD_SET() array. Ouch.
> 
> I see that all uses of select() in midas have no protection against this.
> 
> (we should probably move away from select() to newer poll() or whatever it is)
> 
> Why does mlogger open so many file descriptors? The usual, scaling problems in the 
> history. The old midas history does not reuse file descriptors, so opens the same 
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history 
> opens just one file per history event. But if the number of events is bigger than 
> 1024, we run into same trouble.
> 
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
> on some other machines, see "limit" or "ulimit -a").
> 
> K.O.

I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't 
fit on your status page. 

I have some other suspicion: The logger opens a history file on access, then closes it 
again after writing to it. In the old days we had a case where we had a return from the 
write function BEFORE the file has been closed. This is kind of a memory leak, but with 
file descriptors. After some time of course you run out of file descriptors and crash. 
Now that bug has been fixed many years ago, but it sounds to me like there is another 
"fd leak" somewhere. You should add some debugging in the history code to print the 
file descriptors when you open a file and when you leave that routine. The leak could 
however also be somewhere else, like writing to the message file, ODB dump, ...

The right thing of course would be to rewrite everything with std::ofstream which 
closes automatically the file when the object gets out of scope.

Stefan
  2270   19 Aug 2021 Konstantin OlchanskiBug Reportselect() FD_SETSIZE overrun
I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
mysteriously misbehaving during run start and stop.

The problem turns out to be with the select() system call.

The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
the FD_SET() array. Ouch.

I see that all uses of select() in midas have no protection against this.

(we should probably move away from select() to newer poll() or whatever it is)

Why does mlogger open so many file descriptors? The usual, scaling problems in the 
history. The old midas history does not reuse file descriptors, so opens the same 
3 history files (.hst, .idx, etc) for each history event. The new FILE history 
opens just one file per history event. But if the number of events is bigger than 
1024, we run into same trouble.

(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
on some other machines, see "limit" or "ulimit -a").

K.O.
  2269   05 Aug 2021 Stefan RittBug Reportmhttpd WebServer ODBTree initialization
Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall 
I do it?
  2268   02 Aug 2021 Andreas SuterBug Reportcmake with CMAKE_INSTALL_PREFIX fails
Dear Konstantin,

I have tried your adopted version. You did already quite a job which is more consistent than what I was suggesting.
Yet, I still have a problem (git sha2 2d3872dfd31) when starting on a clean system (i.e. no midas present yet): 
Without CMAKE_INSTALL_PREFIX set, everything is fine. 
However, when setting CMAKE_INSTALL_PREFIX, I get the following error message on the build level (cmake --build ./ -- VERBOSE=1) from the manalyzer:

[ 32%] Building CXX object manalyzer/CMakeFiles/manalyzer.dir/manalyzer.cxx.o
cd /home/l_musr_tst/Tmp/midas/build/manalyzer && /usr/bin/c++  -DHAVE_FTPLIB -DHAVE_MIDAS -DHAVE_ROOT_HTTP -DHAVE_THTTP_SERVER -DHAVE_TMFE -DHAVE_ZLIB -D_LARGEFILE64_SOURCE -I/home/l_musr_tst/Tmp/midas/manalyzer -I/usr/local/root/include  -O2 -g -Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function -std=c++11 -pipe -fsigned-char -pthread -DHAVE_ROOT -std=gnu++11 -o CMakeFiles/manalyzer.dir/manalyzer.cxx.o -c /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx
In file included from /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx:14:0:
/home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.h:13:21: fatal error: midasio.h: No such file or directory
 #include "midasio.h"
                     ^
compilation terminated.

Obviously, still some include paths are missing. I tried quickly to see if an easy fix is possible, but I failed.

Question: is it possible to use manalyzer without midas? I am asking since the MIDAS_FOUND flag is confusing me.

> big thanks to Andreas S. for getting most of this figured out. I now understand
> much better how cmake installs things and how it generates config files, both
> find_package(midas) style and install(export) style.
> 
> with the latest updates, CMAKE_INSTALL_PREFIX should work correctly. I now understand how it works,
> how to use it and how to test it, it should not break again.
> 
> for posterity, my commends to Andreas's pull request:
> 
> thank you for providing this code, it was very helpful. at the end I implemented things slightly differently. It took me a while to understand that I have to provide 2 “install” modes, for your case, I need to 
> “install” the header files and everything works “the cmake way”, for our normal case, we use include files in-place and have to include all the git submodules to the include path. I am quite happy with the 
> result. K.O.
> 
> K.O.
  2267   31 Jul 2021 Peter KunzBug Reportss_shm_name: unsupported shared memory type, bye!
I ran into a problem trying to compile the latest MIDAS version on a Fedora 
system.

mhttpd and odbedit return:
ss_shm_name: unsupported shared memory type, bye!

check_shm_type: preferred POSIXv4_SHM got SYSV_SHM

The check returns SYSV_SHM which doesn't seem to be supported in ss_shm_name.

Is there an easy solution for this?

Thanks.
  2266   14 Jul 2021 Konstantin OlchanskiBug Fixchanges in history plots
> Moving in the direction of this proposal. Remaining missing piece is the "show 
> raw value" buttons and code behind them.

added "show raw value" button, updated on-page instructions.

I think this is the final layout of the history panel editor, conversion
to html+javascript will be done "as is". If you have suggestions to improve
the layout (add/remove/move things around, etc), please shoult out (on the elog
here or by direct email to me).

I am thinking in the direction of changing the control flow of the history editor:

- midas "history" manu button click redirects to
- current history panel selection (with checkbox to open old history plots), click on "new plot" button redirects to
- new page for creating new plots. this will present a list of all history variables, click on variable name creates a new history 
panel containing just this one variable and redirects to it.

In other words, to see the history for any history variable:
- click on "history" menu button
- click on "new"
- click on desired history variable
- see this history plot

From here, click on the "wheel" button to open the existing history panel editor and add any additional variables, change settings, 
etc.

In the history panel editor, I am thinking in the direction of replacing the existing drop-down selection of history variables (now 
very workable for large experiments) with an overlay dialog to show all history variables, with checkboxes to select them, basically 
the same history variable select page as described above. Not sure yet how this will work visually.

K.O.
  2265   14 Jul 2021 Konstantin OlchanskiBug Fixchanges in history plots
Moving in the direction of this proposal. History plot editor is updated according to it. Remaining missing piece is the "show 
raw value" buttons and code behind them.

Changes:

- "show factor and offset" moved to the top of the page, "off" by default
- factor and offset (if not zero) are automatically migrated to the formula field (if it is empty), one needs to save the panel 
for this to take effect.

K.O.


> > I am updating the history plots.
> > So the idea is to use this computation:
> > y_position_on_plot = offset + factor*(formula(history_value) - voffset)
> 
> Stefan and myself did some brain storming on zoom. Writing it down the way I remember it.
> 
> - we distilled the gist of the problem - the numerical values we show in the plot labels and in hover-over-the-graph
> are before formula is applied or after the formula is applied?
> 
> - I suggested a universal solution using a double formula: use formula1 for one case;
>   use formula2 for the other case;
>   use formula1 for "physics calibration", use formula2 for factor and offset for composite plots:
>      numeric_value = formula1(history_value)
>      plotted_value = formula2(numeric_value)
> 
> - we agree that this is way too complicated, difficult to explain and difficult to coherently present in the history editor
> 
> - Stefan suggested a simple solution, a checkbox labeled "show raw value" next to each history variable. by default, the 
> value after the formula is plotted and displayed. if checked, the raw value (before the formula) is displayed, and the 
> value after the formula is plotted. (so this works the same as the factor and offset on the old history plots).
> 
> - if "show raw value" is enabled, the numerical values shown will be inconsistent against the labels on the vertical axis. 
> Our solution it to turn the axis labels off. (for composite plots, like oscillator frequency in Hz vs oscillator 
> temperature in degC, both scaled to see their correlation, the vertical axis is unit-less "arbitrary units", of course)
> 
> - to simplify migration of old history plots that use custom factor and offset settings, we think in the direction of 
> automatically moving them to the "formula". (factor=2, offset=10 automatically populates formula with "2*x+10", "show raw 
> value" checked/enabled). Thus we can avoid implementing factor and offset in the new history code (an unwelcome 
> complication).
> 
> - I think this covers all the use cases I have seen in the past, so we will move in this direction.
> 
> K.O.
  2264   14 Jul 2021 Konstantin OlchanskiBug Reportcmake question
> > cmake check and mate in 1 move. please help.
> > -std=c++11 and -std=c++14 collision...
> 
> I have a solution implemented for this, I am not happy with it, Stefan is not happy with it. See 
> discussion: https://bitbucket.org/tmidas/midas/commits/50a15aa70a4fe3927764605e8964b55a3bb1732b
>

I figured it out, solution is to use:

target_compile_features(midas PUBLIC cxx_std_11)

this is how it works:

- centos-7 (g++ has c++11 off by default): -std=gnu++11 is added automatically (not -std=c++11, but 
probably correct, as some c++11 functions were available as gnu extensions)
- ubuntu-20.04 LTS without ROOT: nothing added (I guess correct, g++ has c++11 is enabled by default)
- ubuntu-20.04 LTS with -std=c++14 from ROOT: nothing added, c++14 as requested by ROOT is in affect.
- macos without ROOT: -std=gnu++11 is added automatically
- macos with -std=c++11 from ROOT: ditto, so both -std=c++11 and -std=gnu++11 are present in this order, 
wrong-ish, but works.

and good luck figuring this out just from cmake documentation:
https://cmake.org/cmake/help/latest/command/target_compile_features.html

K.O.
  2263   13 Jul 2021 Konstantin OlchanskiBug Reportcmake question
> cmake check and mate in 1 move. please help.
> -std=c++11 and -std=c++14 collision...

I have a solution implemented for this, I am not happy with it, Stefan is not happy with it. See 
discussion: https://bitbucket.org/tmidas/midas/commits/50a15aa70a4fe3927764605e8964b55a3bb1732b

K.O.
  2262   13 Jul 2021 Konstantin OlchanskiInfoMidasConfig.cmake usage
> $MIDASSYS/drivers/class/
> $MIDASSYS/drivers/device
> $MIDASSYS/mscb/src/
> $MIDASSYS/src/mfe.cxx
> 
> I guess this can be easily added by defining a MIDAS_SOURCES in MidasConfig.cmake, so 
> that I can do things like:
> 
> add_executable(my_fe
>   myfe.cxx
>   $(MIDAS_SOURCES}/src/mfe.cxx
>   ${MIDAS_SOURCES}/drivers/class/hv.cxx
>   ...)

1) remove $(MIDAS_SOURCES}/src/mfe.cxx from "add_executable", add "mfe" to 
target_link_libraries() as in examples/experiment/frontend:

add_executable(frontend frontend.cxx)
target_link_libraries(frontend mfe midas)

2) ${MIDAS_SOURCES}/drivers/class/hv.cxx surely is ${MIDASSYS}/drivers/...

If MIDAS is built with non-default CMAKE_INSTALL_PREFIX, "drivers" and co are not 
available, as we do not "install" them. Where MIDASSYS should point in this case is
anybody's guess. To run MIDAS, $MIDASSYS/resources is needed, but we do not install
them either, so they are not available under CMAKE_INSTALL_PREFIX and setting
MIDASSYS to same place as CMAKE_INSTALL_PREFIX would not work.

I still think this whole business of installing into non-default CMAKE_INSTALL_PREFIX
location has not been thought through well enough. Too much thinking about how cmake works
and not enough thinking about how MIDAS works and how MIDAS is used. Good example
of "my tool is a hammer, everything else must have the shape of a nail".

K.O.
  2261   13 Jul 2021 Stefan RittInfoMidasConfig.cmake usage
Thanks for the contribution of MidasConfig.cmake. May I kindly ask for one extension:

Many of our frontends require inclusion of some midas-supplied drivers and libraries 
residing under

$MIDASSYS/drivers/class/
$MIDASSYS/drivers/device
$MIDASSYS/mscb/src/
$MIDASSYS/src/mfe.cxx

I guess this can be easily added by defining a MIDAS_SOURCES in MidasConfig.cmake, so 
that I can do things like:

add_executable(my_fe
  myfe.cxx
  $(MIDAS_SOURCES}/src/mfe.cxx
  ${MIDAS_SOURCES}/drivers/class/hv.cxx
  ...)

Does this make sense or is there a more elegant way for that?

Stefan
  2260   11 Jul 2021 Konstantin OlchanskiBug Reportcmake with CMAKE_INSTALL_PREFIX fails
big thanks to Andreas S. for getting most of this figured out. I now understand
much better how cmake installs things and how it generates config files, both
find_package(midas) style and install(export) style.

with the latest updates, CMAKE_INSTALL_PREFIX should work correctly. I now understand how it works,
how to use it and how to test it, it should not break again.

for posterity, my commends to Andreas's pull request:

thank you for providing this code, it was very helpful. at the end I implemented things slightly differently. It took me a while to understand that I have to provide 2 “install” modes, for your case, I need to 
“install” the header files and everything works “the cmake way”, for our normal case, we use include files in-place and have to include all the git submodules to the include path. I am quite happy with the 
result. K.O.

K.O.
  2259   11 Jul 2021 Konstantin OlchanskiSuggestionMidasConfig.cmake usage
> > > So you say "nuke ${MIDAS_LIBRARIES}" and "fix ${MIDAS_INCLUDE}". Ok.
> > A more moderate option ...
> 
> For the record, I did not disappear. I have a very short time window
> to complete commissioning the alpha-g daq (now that the network
> and the event builder are cooperating). To add to the fun, our high voltage
> power supply turned into a pumpkin, so plotting voltages and currents 
> on the same history plot at the same time (like we used to be able to do)
> went up in priority. 
> 

in the latest update, find_package(midas) should work correctly, the include path is right, 
the library list is right.

please test.

I find that the cmake install(export) method is simpler on the user side (just one line of 
code) and is easier to support on the midas side (config file is auto-generated).

I request that proponents of the find_package(midas) method contribute the documentation and 
example on how to use it. (see my other message).

K.O.
  2258   11 Jul 2021 Konstantin OlchanskiInfomidas cmake update
I reworked the midas cmake files:
- install via CMAKE_INSTALL_PREFIX should work correctly now:
- installed are bin, lib and include - everything needed to build against the midas library
- if built without CMAKE_INSTALL_PREFIX, a special mode "MIDAS_NO_INSTALL_INCLUDE_FILES" is activated, and the include path 
contains all the subdirectories need for compilation
- -I$MIDASSYS/include and -L$MIDASSYS/lib -lmidas work in both cases
- to "use" midas, I recommend: include($ENV{MIDASSYS}/lib/midas-targets.cmake)
- config files generated for find_package(midas) now have correct information (a manually constructed subset of information 
automatically exported by cmake's install(export))
- people who want to use "find_package(midas)" will have to contribute documentation on how to use it (explain the magic used to 
find the "right midas" in /usr/local/midas or in /midas or in ~/packages/midas or in ~/pacjages/new-midas) and contribute an 
example superproject that shows how to use it and that can be run from the bitpucket automatic build. (features that are not part 
of the automatic build we cannot insure against breakage).

On my side, here is an example of using include($ENV{MIDASSYS}/lib/midas-targets.cmake). I posted this before, it is used in 
midas/examples/experiment and I will ask ben to include it into the midas wiki documentation.

Below is the complete cmake file for building the alpha-g event bnuilder and main control frontend. When presented like this, I 
have to agree that cmake does provide positive value to the user. (the jury is still out whether it balances out against the 
negative value in the extra work to "just support find_package(midas) already!").

#
# CMakeLists.txt for alpha-g frontends
#

cmake_minimum_required(VERSION 3.12)
project(agdaq_frontends)

include($ENV{MIDASSYS}/lib/midas-targets.cmake)

add_compile_options("-O2")
add_compile_options("-g")
#add_compile_options("-std=c++11")
add_compile_options(-Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function)
add_compile_options("-DTMFE_REV0")
add_compile_options("-DOS_LINUX")

add_executable(feevb feevb.cxx TsSync.cxx)
target_link_libraries(feevb midas)

add_executable(fectrl fectrl.cxx GrifComm.cxx EsperComm.cxx JsonTo.cxx KOtcp.cxx $ENV{MIDASSYS}/src/tmfe_rev0.cxx)
target_link_libraries(fectrl midas)

#end
  2257   09 Jul 2021 Konstantin OlchanskiInfocannot push to bitbucket
the day has arrived when I cannot git push to bitbucket. cloud computing rules!

I have never seen this error before and I do not think we have any hooks installed,
so it must be some bitbucket stuff. their status page says some kind of maintenance
is happening, but the promised error message is "repository is read only" or something 
similar.

I hope this clears out automatically. I am updating all the cmake crud and I have no idea 
which changes I already pushed and which I did not, so no idea if anything will work for 
people who pull from midas until this problem is cleared out.

daq00:mvodb$ git push
X11 forwarding request failed on channel 0
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 12 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 247 bytes | 247.00 KiB/s, done.
Total 2 (delta 1), reused 0 (delta 0)
remote: null value in column "attempts" violates not-null constraint
remote: DETAIL:  Failing row contains (13586899, 2021-07-10 01:13:28.812076+00, 1970-01-01 
00:00:00+00, 1970-01-01 00:00:00+00, 65975727, null).
To bitbucket.org:tmidas/mvodb.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'git@bitbucket.org:tmidas/mvodb.git'
daq00:mvodb$

K.O.
  2256   09 Jul 2021 Konstantin OlchanskiBug Reportcmake question
cmake check and mate in 1 move. please help.

the midas cmake file has a typo in the ROOT_CXX_FLAGS, I fixed it and now I am dead in the 
water, need help from cmake experts and pushers.

On Ubuntu:
ROOT_CXX_FLAGS has -std=c++14
midas cmake defines -std=gnu++11 (never mind that I asked for c++11, not "c++11 with GNU 
extensions")

the two compiler flags collide and the build explodes, the best I can tell c++11 prevails 
and ROOT header files blow up because they expect c++14.

if I remove the midas cmake request for c++11, -std=gnu++11 is gone, there is no conflict 
with ROOT C++14 request and the build works just fine.

but now it explodes on CentOS-7 because by default, c++11 is not enabled. (include <mutex> 
blows up).

what a mess.

K.O.
ELOG V3.1.4-2e1708b5