Back Midas Rome Roody Rootana
  Midas DAQ System, Page 102 of 150  Not logged in ELOG logo
Entry  08 Aug 2019, Konstantin Olchanski, Info, MIDAS will use C++11 
After much discussion, and following the MIDAS workshop at TRIUMF, we made the decision to use C++11 in MIDAS.

There are many benefits, and only one drawback - no c++11 compilers in the default OS install on older computers (i.e. 
RHEL/SL/CentOS before el7). (the same applies to our use of cmake).

Specifically for el6, the solution is to use c++11 compatible gcc-8 from devtoolset-8, see 
https://midas.triumf.ca/elog/Midas/1649

The c++11 features we most welcome - initialization of class members at declaration time (no more forgetting to add initialization to 
each and every constructor), c++ threads and mutexes, lambdas and "auto".

K.O.
    Reply  08 Aug 2019, Konstantin Olchanski, Suggestion, midas cmake migration 
> Each CMakeLists.txt should specify which version of CMake it requires. The MIDAS CMakeLists.txt requires CMake 3.1 or later. 
> We deliberately stayed away from fancy cutting edge CMake features in order to make midas easier to compile. On top of that,
> midas is a much simpler package compared to root, so things are not so complicated.

The oldest cmake I actually used is 3.6.1 (on SL6), so I do not know if cmake versions between 3.1 and 3.6 actually work for us. Perhaps we should set 
the CMakefile requirement to 3.6.1 to match the oldest version we know works. If somebody has an older cmake, they have a choice of updating it or 
trying it as as and reporting success/failure to the midas forum here.

K.O.
    Reply  08 Aug 2019, Konstantin Olchanski, Bug Report, ROOTANA bug? 
> indnerlt:rootana lindner$ git diff libMidasInterface/TMidasOnline.cxx
> diff --git a/libMidasInterface/TMidasOnline.cxx b/libMidasInterface/TMidasOnline.cxx
> index 92eb3e9..67da613 100644
> --- a/libMidasInterface/TMidasOnline.cxx
> +++ b/libMidasInterface/TMidasOnline.cxx
> @@ -191,7 +191,7 @@ bool TMidasOnline::sleep(int mdelay)
>    #ifdef CH_IPC
>    ss_suspend_set_dispatch(CH_IPC, 0, NULL);
>    #else
> -  ss_suspend_set_dispatch_ipc(NULL);
> +  //  ss_suspend_set_dispatch_ipc(NULL);
>    #endif
>   int status = ss_suspend(mdelay, 0);
>    if (status == SS_SUCCESS)
> 
> This compiles and at least runs for me; so maybe that is helpful for you.  But Konstantin will provide a longer term solution.


This is a problem with the latest development version of MIDAS. ss_suspend overrides have been removed from system.cxx
and there is no way for rootana to avoid the problem of recursive call to the event handler.

I recommend that instead of using the latest development version of MIDAS you use one of the recent released versions (use "git tag -l", midas-2019-06-b is the latest release).

All the released versions of midas have the ss_suspend overrides implemented and rootana will work correctly. For the next release of midas
I will restore the ss_suspend override and update the rootana code.


K.O.
    Reply  09 Aug 2019, Konstantin Olchanski, Info, Precedence of equipment/common structure 
> Today I fixed a long-annoying problem. ...
> /Equipment/<name>/Common
> In the past, the ODB setting took precedence over the frontend structure...
> We defined this like 25 years ago and I forgot what the exact reason was.
> It causes however many people (including myself) to fall into this trap: ...

There is good number of confusions regarding entries in /eq/xxx/common:

- for some of them, the frontend code settings take precedence and overwrite settings in odb ("frontend file name")
- for some of them, ODB takes precedence and frontend code values are ignored ("read on" and "period")
- for some of them, changes in ODB take effect immediately (via db_watch) ("period")
- for some of them, frontend restart is required for changes to take effect (output event buffer name "buffer")
- some of them continuously update the odb values ("status", "status color")

I do not think there is a simple way to improve on this.

(One solution would replace the single "common" with several subdirectories, "per function",
one would have items where the code takes precedence, one would have items where odb takes
precedence (in effect, "standard settings"), one will have items that the frontend always updates
and that should not be changes via odb ("frontend name", etc). I am not sure this one solution
is necessarily an "improvement").

Lacking any ideas for improvements, I vote for the status quo. (plus a review of the documentation to ensure we have clearly
written up what each entry in "common" does and whether the user is permitted to edit it in odb).

K.O.
    Reply  09 Aug 2019, Konstantin Olchanski, Bug Report, Fetest History Plot 
> Hi, our logger was running.


One more thing is to check that the history files are actually being written to. Be default
the history files are written into the same directory where you have ODB, etc and
the file names are "*.hst".

Second thing, you can run "mlogger -v" and it will report that it is writing history events
into the history.

If all these things are happening,

Third, you can run "mhist" to see the data directly from the data files.

If all of these work, but you still get nothing in mhttpd, that would be very strange. If I remember
right, to see the sine wave, the history variable to plot is equipment "slow", variable "slow".


K.O.



> I have tried restarting mlogger (even though we haven't
> changed variable names). We ran the following commands one after another and still no
> luck with history plot. Is there anything else that could be causing these problems?
> 
> Kind regards,
> Hassan 
> 
> ==================================================================================
> 
> [lm17773@it038146 ~]$ cd /opt/midas_software/midas/bin/
> [lm17773@it038146 bin]$ mhttpd
> [mhttpd,ERROR] [odb.cxx:1646:db_open_database,ERROR] Removed ODB client 'mhttpd',
> index 0 because process pid 20094 does not exists
> [mhttpd,ERROR] [odb.cxx:1646:db_open_database,ERROR] Removed ODB client 'Logger',
> index 1 because process pid 20214 does not exists
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC
> hosts/Allowed hosts"
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/mhttpd hosts/Allowed
> hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/mhttpd
> hosts/Allowed hosts"
> [mhttpd,INFO] Removed open record flag from "/Logger/History"
> [mhttpd,INFO] Removed exclusive access mode from "/Logger/History"
> [mhttpd,INFO] Removed open record flag from "/Sequencer/State"
> [mhttpd,INFO] Removed exclusive access mode from "/Sequencer/State"
> [mhttpd,INFO] Removed open record flag from "/History/LoggerHistoryChannel"
> [mhttpd,INFO] Removed exclusive access mode from "/History/LoggerHistoryChannel"
> [mhttpd,INFO] Removed open record flag from "/Equipment/slow/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/slow/Variables"
> [mhttpd,INFO] Removed open record flag from "/Equipment/Trigger/Statistics/Events per
> sec."
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Trigger/Statistics/Events
> per sec."
> [mhttpd,INFO] Removed open record flag from "/Equipment/Trigger/Statistics/kBytes per
> sec."
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Trigger/Statistics/kBytes
> per sec."
> [mhttpd,INFO] Removed open record flag from "/Equipment/Periodic/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Periodic/Variables"
> [mhttpd,INFO] Removed open record flag from "/Equipment/Scaler/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Scaler/Variables"
> [mhttpd,INFO] Corrected 10 ODB entries
> [mhttpd,INFO] Deleted entry '/System/Clients/20094' for client 'mhttpd' because it is
> not connected to ODB
> Mongoose web server will use SSL certificate file "/home/lm17773/online/ssl_cert.pem"
> Mongoose web server will use authentication realm "sampleexpt", password file
> "/home/lm17773/online/htpasswd.txt"
> mongoose web server is redirecting HTTP port 8080 to
> https://it038146.users.bris.ac.uk:8443
> mongoose web server is listening on the HTTP port 8080
> mongoose web server is listening on the HTTPS port 8443
> 
====================================================================================
> 
> [lm17773@it038146 bin]$ mlogger
> [Logger,INFO] Deleted entry '/System/Clients/20214' for client 'Logger' because it is
> not connected to ODB
> Log     directory is /home/lm17773/online/
> Data    directory is same as Log unless specified in /Logger/channels/
> History directory is same as Log unless specified in /Logger/history/
> ELog    directory is same as Log
> SQL     database is localhost/sampleexpt/Runlog
> MIDAS logger started. Stop with "!"
> 
====================================================================================
> [lm17773@it038146 bin]$ fetest
> Frontend name          :     fetest
> Event buffer size      :     10485760
> User max event size    :     4194304
> User max frag. size    :     4194304
> # of events per buffer :     2
> 
> Connect to experiment sampleexpt...
> OK
> Init hardware...frontend_init!
> Event size set to 10240 bytes
> Ring buffer wait sleep 1 ms
> OK
> time 1564131394, data 97.814758
> time 1564131395, data 96.592583
> time 1564131396, data 95.105652
> time 1564131397, data 93.358040
> time 1564131398, data 91.354546
> time 1564131399, data 89.100655
> time 1564131400, data 86.602539
> time 1564131401, data 83.867058
> time 1564131402, data 80.901703
> time 1564131403, data 77.714592
> Warning: bank RND4 has zero size
> time 1564131404, data 74.314484
> time 1564131405, data 70.710678
> time 1564131406, data 66.913063
> time 1564131407, data 62.932041
> 
====================================================================================
> 
> 
> 
> 
> 
> 
> > > Hi,
> > > 
> > > We've been trying to run Fetest in the attempt of plotting the sine wave data on
> > > the history page on the web server. However each time we've tried running a new
> > > plot we have come across the error of 'no data' from the variables. In the
> > > status page we are clearly obtaining data from the frontend and it is updating
> > > the variable as expected in SLOW.
> > > 
> > > When setting up MIDAS we managed to produce a history plot from Fetest but are
> > > unable to do so any longer. We did have a go at modifying the Fetest code but
> > > created a backup before doing so and are now running the original backup.
> > > 
> > > What could be causing the Fetest data not to be showing in the history plot?
> > 
> > Is the logger running? (this application is handling the history data).
> > If yes: Did you change the variable names? If yes: restart the logger.
    Reply  09 Aug 2019, Konstantin Olchanski, Bug Report, Fetest History Plot 
> Hi, our logger was running.

Please do these simple tests:

- run "mlogger -v", it should report that it is writing slow/slow data into the history with rate 1 Hz (fetest 
should be running at this point, yes?)
- normally the history files are written into the experiment directory (where ODB is, etc) and have file names 
"*.hst". Observe that the files are growing. Use "ls -ltr". (mlogger and fetest should be running at this point, 
yes?)
- if all if this is happening, you can try to run "mhist" to see the history data

If all of the above works, but you still get nothing from the history plots in mhttpd, then we probably have a 
bug in midas and we would like very much to fix it. For this we will need some more information from you. I 
hope you have some time available to help us with this.

Hmm... the fetest history plots are not defined automatically, you have to create the history plot manually,
maybe this is where the problem happens. One thing to check here, the correct variable to plot
is "slow/slow", if I remember right.


K.O.


> I have tried restarting mlogger (even though we haven't
> changed variable names). We ran the following commands one after another and still no
> luck with history plot. Is there anything else that could be causing these problems?
> 
> Kind regards,
> Hassan 
> 
> 
================================================================================
==
> 
> [lm17773@it038146 ~]$ cd /opt/midas_software/midas/bin/
> [lm17773@it038146 bin]$ mhttpd
> [mhttpd,ERROR] [odb.cxx:1646:db_open_database,ERROR] Removed ODB client 'mhttpd',
> index 0 because process pid 20094 does not exists
> [mhttpd,ERROR] [odb.cxx:1646:db_open_database,ERROR] Removed ODB client 'Logger',
> index 1 because process pid 20214 does not exists
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC
> hosts/Allowed hosts"
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/mhttpd hosts/Allowed
> hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/mhttpd
> hosts/Allowed hosts"
> [mhttpd,INFO] Removed open record flag from "/Logger/History"
> [mhttpd,INFO] Removed exclusive access mode from "/Logger/History"
> [mhttpd,INFO] Removed open record flag from "/Sequencer/State"
> [mhttpd,INFO] Removed exclusive access mode from "/Sequencer/State"
> [mhttpd,INFO] Removed open record flag from "/History/LoggerHistoryChannel"
> [mhttpd,INFO] Removed exclusive access mode from "/History/LoggerHistoryChannel"
> [mhttpd,INFO] Removed open record flag from "/Equipment/slow/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/slow/Variables"
> [mhttpd,INFO] Removed open record flag from "/Equipment/Trigger/Statistics/Events per
> sec."
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Trigger/Statistics/Events
> per sec."
> [mhttpd,INFO] Removed open record flag from "/Equipment/Trigger/Statistics/kBytes per
> sec."
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Trigger/Statistics/kBytes
> per sec."
> [mhttpd,INFO] Removed open record flag from "/Equipment/Periodic/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Periodic/Variables"
> [mhttpd,INFO] Removed open record flag from "/Equipment/Scaler/Variables"
> [mhttpd,INFO] Removed exclusive access mode from "/Equipment/Scaler/Variables"
> [mhttpd,INFO] Corrected 10 ODB entries
> [mhttpd,INFO] Deleted entry '/System/Clients/20094' for client 'mhttpd' because it is
> not connected to ODB
> Mongoose web server will use SSL certificate file "/home/lm17773/online/ssl_cert.pem"
> Mongoose web server will use authentication realm "sampleexpt", password file
> "/home/lm17773/online/htpasswd.txt"
> mongoose web server is redirecting HTTP port 8080 to
> https://it038146.users.bris.ac.uk:8443
> mongoose web server is listening on the HTTP port 8080
> mongoose web server is listening on the HTTPS port 8443
> 
================================================================================
====
> 
> [lm17773@it038146 bin]$ mlogger
> [Logger,INFO] Deleted entry '/System/Clients/20214' for client 'Logger' because it is
> not connected to ODB
> Log     directory is /home/lm17773/online/
> Data    directory is same as Log unless specified in /Logger/channels/
> History directory is same as Log unless specified in /Logger/history/
> ELog    directory is same as Log
> SQL     database is localhost/sampleexpt/Runlog
> MIDAS logger started. Stop with "!"
> 
================================================================================
====
> [lm17773@it038146 bin]$ fetest
> Frontend name          :     fetest
> Event buffer size      :     10485760
> User max event size    :     4194304
> User max frag. size    :     4194304
> # of events per buffer :     2
> 
> Connect to experiment sampleexpt...
> OK
> Init hardware...frontend_init!
> Event size set to 10240 bytes
> Ring buffer wait sleep 1 ms
> OK
> time 1564131394, data 97.814758
> time 1564131395, data 96.592583
> time 1564131396, data 95.105652
> time 1564131397, data 93.358040
> time 1564131398, data 91.354546
> time 1564131399, data 89.100655
> time 1564131400, data 86.602539
> time 1564131401, data 83.867058
> time 1564131402, data 80.901703
> time 1564131403, data 77.714592
> Warning: bank RND4 has zero size
> time 1564131404, data 74.314484
> time 1564131405, data 70.710678
> time 1564131406, data 66.913063
> time 1564131407, data 62.932041
> 
================================================================================
====
> 
> 
> 
> 
> 
> 
> > > Hi,
> > > 
> > > We've been trying to run Fetest in the attempt of plotting the sine wave data on
> > > the history page on the web server. However each time we've tried running a new
> > > plot we have come across the error of 'no data' from the variables. In the
> > > status page we are clearly obtaining data from the frontend and it is updating
> > > the variable as expected in SLOW.
> > > 
> > > When setting up MIDAS we managed to produce a history plot from Fetest but are
> > > unable to do so any longer. We did have a go at modifying the Fetest code but
> > > created a backup before doing so and are now running the original backup.
> > > 
> > > What could be causing the Fetest data not to be showing in the history plot?
> > 
> > Is the logger running? (this application is handling the history data).
> > If yes: Did you change the variable names? If yes: restart the logger.
    Reply  09 Aug 2019, Konstantin Olchanski, Forum, How to convert C midas frontends to C++ 
> How do I solve mismatched declarations in the mfe (or other places in the midas code)?

I run into such problems all the time. My solution? I grep for the function name in my code and in the header file,
then look very carefully at the definition to confirm that all the argument declarations are the same in both
places. Sometimes my eyes do not see the difference and I ask for a "second pair of eyes".

In your case, you have a mismatch between functions in mfe.h and in your frontend. The difference
is "int source" in mfe.h and "int source[]" in your code.

Because C++ permits functions with identical namesm but different arguments, the compiler thinks
you did this on purpose and does not complain. Later, of course, the linker bombs,
but all it can report at this stage, is what you see "function not found"... Then you grep your code
for the missing function, check arguments, rinse, repeat.

Before C++, the C compiler would probably had complained about the mismatch, except that MIDAS
did not have an mfe.h header file definitions for all this stuff until just now, so again, the mismatch would
have gone unnoticed, unfixed.

K.O.



> It is having issues with the midas defined BOOL/... types. This 
> is what I get for a minimal scfe:
> 
> [ 12%] Building CXX object CMakeFiles/sc_fe_mini.dir/sc_fe_mini.cpp.o
> [ 25%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/drivers/class/hv.cxx.o
> [ 37%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/drivers/class/multi.cxx.o
> [ 50%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/drivers/device/nulldev.cxx.o
> [ 62%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/drivers/bus/null.cxx.o
> [ 75%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/drivers/device/mscbdev.cxx.o
> [ 87%] Building CXX object CMakeFiles/sc_fe_mini.dir/home/frederik/packages/midas/mscb/src/mscb.cxx.o
> [100%] Linking CXX executable sc_fe_mini
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `_readout_thread':
> /home/frederik/packages/midas/src/mfe.cxx:1271: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `check_polled_events':
> /home/frederik/packages/midas/src/mfe.cxx:1601: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/src/mfe.cxx:1643: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `readout_enable(unsigned int)':
> /home/frederik/packages/midas/src/mfe.cxx:1158: undefined reference to `interrupt_configure(int, int, long)'
> /home/frederik/packages/midas/src/mfe.cxx:1156: undefined reference to `interrupt_configure(int, int, long)'
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `initialize_equipment':
> /home/frederik/packages/midas/src/mfe.cxx:614: undefined reference to `interrupt_configure(int, int, long)'
> /home/frederik/packages/midas/src/mfe.cxx:649: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `scheduler':
> /home/frederik/packages/midas/src/mfe.cxx:1890: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/src/mfe.cxx:1932: undefined reference to `poll_event(int, int, unsigned int)'
> /home/frederik/packages/midas/build/libmfe.a(mfe.cxx.o): In function `main':
> /home/frederik/packages/midas/src/mfe.cxx:2701: undefined reference to `interrupt_configure(int, int, long)'
> /home/frederik/packages/midas/src/mfe.cxx:2702: undefined reference to `interrupt_configure(int, int, long)'
> collect2: error: ld returned 1 exit status
> make[2]: *** [sc_fe_mini] Error 1
> make[1]: *** [CMakeFiles/sc_fe_mini.dir/all] Error 2
> make: *** [all] Error 2
> 
> 
> This is my cmakelists for my user code:
> 
> #
> # cmake for the muX software
> #
> cmake_minimum_required(VERSION 3.3)
> 
> project(muX)
> 
> #
> # find installations
> #
> set(MIDAS_DIR $ENV{MIDASSYS})
> message("MIDAS dir: " ${MIDAS_DIR})
> 
> #
> # set directories
> #
> set(MIDASBUILD_DIR ${MIDAS_DIR}/build)
> set(MIDASINCLUDE_DIR ${MIDAS_DIR}/include)
> set(MXML_DIR ${MIDAS_DIR}/mxml)
> set(MSCB_DIR ${MIDAS_DIR}/mscb)
> set(DRV_DIR ${MIDAS_DIR}/drivers)
> 
> 
> #
> # drivers, libs
> #
> set(DRIVERS
>     ${MIDAS_DIR}/drivers/class/hv
>     ${MIDAS_DIR}/drivers/class/multi
>     ${MIDAS_DIR}/drivers/device/nulldev
>     ${MIDAS_DIR}/drivers/bus/null
> )
> set(MIDASLIB ${MIDASBUILD_DIR}/libmidas.a)
> set(FELIB ${MIDASBUILD_DIR}/libmfe.a)
> 
> #
> # sc_fe
> #
> add_executable(sc_fe_mini
>         sc_fe_mini.cpp
>         ${DRIVERS}
>         ${MIDAS_DIR}/drivers/device/mscbdev
>         ${MIDAS_DIR}/mscb/src/mscb)
> 
> target_include_directories(sc_fe_mini PRIVATE ${DRV_DIR} ${MIDAS_DIR}/mscb/include ${MIDAS_DIR}/include)
> target_link_libraries(sc_fe_mini ${LIBS} ${MIDASLIB} ${FELIB} rt pthread util)
> 
> 
> 
> I seem to be able to compile the current midas distributions, including the scfe frontend
> 
> 
> 
> > To convert a MIDAS frontend to C++ follow this checklist:
> > 
> > a) add #include "mfe.h" after include of midas.h and fix all compilation errors.
> > 
> > NOTE: there should be no "extern C"  brackets around MIDAS include files.
> > 
> > NOTE: Expect to see following problems:
> > 
> > a1) duplicate or mismatched declarations of functions defined in mfe.h
> > a2) frontend_name and frontend_file_name should be "const char*" instead of "char*"
> > a3) duplicate "HNDLE hDB" collision with hDB from mfe.c - not sure why it worked before, either use HNDLE hDB from mfe.h or use "extern HNDLE hDB".
> > a4) poll_event() and interrupt_configure() have "source" as "int[]" instead of "int" (why did this work before?)
> > a5) use of "extern int frontend_index" instead of get_frontend_index() from mfe.h
> > a6) bk_create() last argument needs to be cast to (void**)
> > a7) "bool debug" collides with "debug" from mfe.h (why did this work before?)
> > 
> > b) remove no longer needed "extern C" brackets around mfe related code. Ideally there should be no "extern C" brackets anywhere.
> > 
> > c) in the Makefile, change CC=gcc to CC=g++ for compiling and linking everything as C++
> > 
> > c1) fix all compilation problems. most valid C code will compile as valid C++, but there is some known trouble:
> > - return value of malloc() & co needs to be cast to the correct data type: "char* s = (char*)malloc(...)"
> > - some C++ compilers complain about mismatch between signed and unsigned values
> > 
> > If you need help with converting your frontend from C to C++, I will be most happy
> > to assist you - post your compiler error messages to this forum or email them to me privately.
> > 
> > Good luck,
> > K.O.
Entry  14 Aug 2019, Konstantin Olchanski, Bug Fix, incorrect recursion in ss_suspend() via the user event handler 
The ROOTANA midas analyzer uncovered a problem with recursive use of ss_suspend().

When running in graphical mode, the ROOT graphics main event loop was calling 
ss_suspend(0) (no MSG_BM) which would sometimes call the user event handler (if a new 
event arrives into the midas event buffer). Because this loop was already running in the 
user event handler, there was a crash.

This is the calling sequence leading to the incorrect recursion: (from system.cxx comments 
to ss_suspend())

analyzer ->
     -> cm_yield() in the main loop
     -> ss_suspend(0)
     -> MSG_BM message arrives in the UDP socket
     -> ss_suspend_process_ipc(0)
     -> cm_dispatch_ipc()
     -> bm_push_event()
     -> bm_push_buffer()
     -> bm_read_buffer()
     -> bm_dispatch_event()
     -> user event handler
     -> user event handler ROOT graphics main loop needs to sleep
     -> ss_suspend(0) <--- should be ss_suspend(MSG_BM)!!!     
     -> MSG_BM message arrives in the UDP socket
     -> ss_suspend_process_ipc(0) <- should be ss_suspend_process_ipc(MSG_BM)!!!
     -> cm_dispatch_ipc() <- without MSG_BM, calling cm_dispatch_ipc() again
     -> bm_push_event()
     -> bm_push_buffer()
     -> bm_read_buffer()
     -> bm_dispatch_event()
     -> user event handler <---- called recursively, very bad!

The proper fix for this is to always call ss_suspend(MSG_BM) from the user event handler 
and ss_suspend(MSG_ODB) from the user db_watch handler.

In this second case, ss_suspend(MSG_OBM) will lose/ignore/discard db_watch notifications, 
if you do not want that, call ss_suspend(0) and be ready for recursive calls to your 
db_watch handler. (this can happen in a frontend program that acts on changes in ODB and 
where some of these actions may require sleeping via ss_suspend()).

ss_suspend(MSG_BM) will discard MSG_BM messages, which is not a problem as 
bm_wait_for_events() and cm_yield() will immediately poll the event buffer and there will be 
no delay in receiving new events.

Until commit 99d6e90 there were problems in ss_suspend(MSG_BM) and recursive calls to 
the user event handler were still possible. It is now fixed in this and the previous commits.

K.O.
    Reply  14 Aug 2019, Konstantin Olchanski, Bug Report, ROOTANA bug? 
> -  ss_suspend_set_dispatch_ipc(NULL);
> +  //  ss_suspend_set_dispatch_ipc(NULL);
> 
> This compiles and at least runs for me; so maybe that is helpful for you.  But Konstantin will provide a longer term solution.

I now understand why this fix worked. Around December 2018 timeframe, I reworked the MIDAS event buffer code
and one improvement was to only send UDP buffer notifications if somebody is waiting for them. This probably
reduced to zero the probability of recursive calls to the user event handler - the problem originally fixed by the monkey
work against the midas ipc handler.

After looking at it, I now understand that the correct solution is to call ss_suspend(MSG_BM), but it turns out
inside MIDAS, handling of MSG_BM was incomplete and recursive calls to the user event handler were still
possible. (but most likely not actually happening anymore because of those changes to the event buffer code).

So.

a) ss_suspend(MSG_BM) inside midas now works correctly, recursive call to the user event handler will not happen.
b) TMidasOnline::sleep() now calls ss_suspend(MSG_BM), monkey business with ss_suspend_set_dispatch_ipc() is removed.

The problem of recursive call to the analyzer event handler is now fixed, both rootana and manalyzer (both use the same TMidasOnline code).

Read more about this here:
https://midas.triumf.ca/elog/Midas/1663

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Forum, Open a hotlink to a single element in an ODB array 
> Is it possible to open a hotlink to a single element in an ODB array?

Not possible.

> sprintf(element, "%s[%d]", path, i);
> db_find_key(hDB, hv_info->hKeyRoot, element, &hKey);

There is some confusion and inconsistency between db_xxx() functions,
some of them accept the array index "a[10]" syntax, some do not.

db_find_key() and db_watch()/db_open_record() do not operate on array elements
and do not accept the "a[10]" array index syntax.

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Bug Report, https redirect and ODB access 
> I'm not sure if these issues are related or not, but I'm getting an error
> message when I want to access the root of the ODB via the webserver:
> [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of
> 4096 returned -1, errno 21 (Is a directory)

This is an old bug. It was part of the "custom path" confusion. Fixed (I think) in all midas-2019 
releases.

To confirm, which version are you using (run "odbedit ver" or look on the mhttpd "help" page)?

If you have an older version, I recommend that you update to midas-2019-03 (cd midas; git pull; 
git checkout midas-2019-03; make clean; make).

If you feel adventurous, you can also update to the head of the development version
and see all the new features (cmake, c++11, new history pages).

If you do not feel adventurous, wait until we have midas-2019-09 ready, use midas-2019-03 
until then.

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Info, New history plot facility 
> I see currently quite often is the error hs_read_arraybuffer (see the 
attachement).
> Are there ways to get a log which would document where the problems 
start?
> [also crash of mhttpd]

We can debug it from both ends, javascript and mhttpd:

On the web page, the error message says "see javascript console", do you see 
anything there?

Or the tab is so hung-up that you cannot even access the console? In this 
case, can you open the console before running your test?

In some browsers (firefox, google-chrome) this will also activate the javascript 
debugger and as likely as not will make the bug go away (ouch!)

On the mhttpd side, please capture the stack trace from the crash: enable 
core dumps (ODB "/experiment/enable core dumps" set to "y", after the crash, 
run "ls -l core.*; gdb mhttpd core.9999") or run mhttpd inside gdb or attach 
gdb to a running mhttpd (gdb -p 9999). Once in gdb, run "info thr" to list all 
threads, "thr 0; bt", "thr 1; bt", etc to get stack traces from all threads, only 
one of them contains the crash (tedious!).

Email me the stack trace (or post here), in case we want to look at values
of any variables from the crash, keep the core dump and do not rebuild 
mhttpd.

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Info, New history plot facility 
> During my visit at TRIUMF we rewrote the history plotting functionality of midas.

This is a most amazing achievement. We wanted to do this "for years" and I think we have
benefitted greatly from the delay - tools available for building interactive web graphics
have improved so much so recently.

For example, delivering binary data from mhttpd to javascript (avoiding json encoding and decoding
saves tons of CPU cycles) went from "how do I do this?!?" to "I did it in only 3 hours!".

> We are now in a state where this is still work in progress, but already at this stage it might
> be useful for others to report any feedback.

The old gif-based history plots took a lot of effort and a long time to get where they work well
for most experiments and where we are happy with them.

From the TRIUMF side of things, lots of polishing of the graphics and of the user interface came
through use at our bigger experiments - TWIST (TRIUMF), ALPHA (CERN), T2K/ND280 (Japan).

So, much improvement and polishing of the new graphics is still ahead for us.

> Simply upgrade the the newest develop branch of midas, and you will see two menu items 
> "OldHistory" which is the old system and "History" which is the new system.

I hope to start the new release branch for midas-2019-09 soon. For the release, we will try
to have both the old and the new history graphics to integrate smoothly. The old graphics
still has to work well, as some users may prefer the old graphics and the old user interface.

Also the new system is still incomplete, i.e. there is no trivial way to save a history plot into a file:

> Following items are planned, but not yet implemented:
> - Printing of run markers as in the old history
> - Export / Printing / Sending to ELOG any history plot

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Forum, History plot problems for frontend with multiple indicies 
> My first question would be why are you using several font-ends at all? That makes things more 
> complicated than needed. In the normal FE framework, you can define either several equipment 
> served by one frontend, or even one equipment linked to several devices.

I am the culprit here, as I wrote the original code for T2K/ND280 that Nick is looking at.

At the time, we needed to control multiple units of identical equipment. Most of these equipments
needed to be controlled independently from each other, so we could not/did not want to use
one single frontend executable to control all of them at the same time. For example, for equipment
not in use, we can stop the corresponding frontend. In case of trouble, we can restart
the corresponding frontend without disrupting the frontends for the other equipments.

The successful operation of the T2K/ND280 experiment is sufficient defence for the validity of this approach.

One lesson learned was that the MIDAS frontend framework did not make it easy to have multiple identical frontends 
for controlling multiple identical equipments. (typical use is control of 2-3 Wiener power supplies, 1-2-3 UPS 
devices, etc). At the time (and today), only the "i NNN" flag is available to tell the frontend "who am I?". To make it 
work, one has to use the hard to "%02d" stuff in the equipment name, and there are other complications. For my 
"next generation" of frontends, I tried to specialize the frontend executables at compile time using C/C++ 
preprocessor defines (-Dwiener01, -Dwiener02, etc), this worked better, but still not super happy.

My current solution as implemented by the tmfe frontend framework is to give the user full control
over the command line arguments (mfe.c did not permit any "user arguments" and did not allow
access to argc/argv) and full control over the equipment names (mfe.c equipment names are fixed at compile time).

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Forum, History plot problems for frontend with multiple indicies 
> it's probably better to run a multi-threaded setup, than individual frontends.

I recommend against using multiple threads if at all possible and unless absolutely required.

Only for one reason: multithreaded c++ programs are notoriously hard to debug.

In addition, one has to face several classes of bugs absent in single-threaded applications:

a) which thread "owns" which object
b) locking of all shared data
c) huge overheads from locking at high data rates (a performance bug)
d) correct locking order, dead locks, live locks
e) incomprehensible core dumps and stack traces
f) race conditions

To control 2 power supplies, run 2 frontend programs, 1 per power supply.

To control 64 frontend cards, run 1 frontend with many threads: 64 (per device) + 1 (main thread) + 1 (RPC handler) + 1 
(watchdog) + 1 (common event generator/data transmitter) + 1 (odb/web page status update). You *will* bump into each 
and every one of the problems (a) to (f) above.

K.O.
    Reply  16 Sep 2019, Konstantin Olchanski, Forum, History plot problems for frontend with multiple indicies 
> thanks for your reply. I can confirm that your suggested workaround does indeed
> make the problem dissapear.
> I guess this issue hasn't been seen at T2K since we use MYSQL for the history.

I think you found the source of the problem, confused event id assignments. To confirm,
can you email me (or post here) the output of odbedit "ls -l /History/Events".

If that's the problem, you can avoid it completely by switching to a history storage method
that does not rely on magic mapping between equipment names and numeric event id's:
try the "FILE" method (set odb /Logger/History/FILE/Active to "y", restart the logger) or
the "MYSQL" method (you will need to setup a mysql database). You tell mhttpd and mhist which 
history data to read by setting ODB /History/LoggerHistoryChannel to one of the channel names 
from /logger/history/, restart mhttpd. (mhttpd and mhist used to print a message "reading history 
data from channel XXX", but somebody removed this message).

K.O.
    Reply  17 Sep 2019, Konstantin Olchanski, Info, New history plot facility 
> > On the mhttpd side, please capture the stack trace from the crash
> 
> here comes the stack trace (only happens when using safari 12.1.2 macOS 10.14.6):
> 
> #10 0x000000000041ce0f in check_digest_auth ...
>

The crash is in check_digest_auth() which checks the mongoose web server password (if not using 
password protection from the https proxy i.e. apache httpd).

If so you should see this crash on all pages, not just when you access history pages, yes?

Ok, I just checked, my safari is "Version 12.1.2 (13607.3.10)" and I see no immediate crash, even on 
history pages.

But I am macos 10.13.6, maybe that makes a difference.

If you see the safari crash on all pages, then it is not history-specific.

In this case, I would like you to file a bug report on bitbucket "mhttpd crash with safari" and we follow up 
on it there.

K.O.
    Reply  17 Sep 2019, Konstantin Olchanski, Forum, History plot problems for frontend with multiple indicies 
> [local:e666:S]History>ls -l /History/Events
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> 1                               STRING  1     10    2m   0   RWD  FeDummy02
> 0                               STRING  1     16    2m   0   RWD  Run transitions

Something is very broken. There should be more entries here, at least
there should be entries for "FeDummy01" and usually there is also an entry
for "FeDummy" because one invariably runs fedummy without "-i" at least once.

The fact that changing from "midas" storage to "file" storage makes no difference
also indicates that something is very broken.

I want to debug this.

Since you tried the "file" storage, can you send me the output of "ls -l mhf*.dat" in the directory
with the history files? (it should have the "*.hst" files from the "midas" storage and "mhf*.dat" files
from the "file" storage.

K.O.
Entry  27 Sep 2019, Konstantin Olchanski, Bug Fix, improvement for midas web page resource use 
I noticed that midas web pages consume unexpectedly large amount of resources, as observed by the chrome browser 
"task manager" and by other tools.

For example, size of the "status" page was observe to reach 200, 600 and even 900 Mbytes. The "programs" page (which 
does not have nearly as much stuff as the status page), was observed to reach 200-600 Mbytes. This is comparable to the 
New York Times front page, which has much more stuff, but usually runs at about 200 Mbytes. (they do force a periodic full 
page reload, to deal with exactly this same type of trouble, I suspect).

Also I observed the midas web pages consume an unusual amount of CPU - 5-10-15% - all in inactive tabs in minimized 
windows.

All this was quite noticeable in my oldish mac laptop with only 8 GBytes of RAM.

Using the google-chrome performance analyzer I was able to identify the reason of high memory use - our 1/sec periodic 
updates leak "too many" DOM "nodes" and I suspect that due to throttling of inactive tabs, the garbage collector simply 
does not keep up with us.

(Note that javascript features automatic memory management with garbage collection. In practice in means that where in 
C/C++ we have malloc() and free(), in javascript we only have malloc() and no free(), and cannot explicitly release memory 
we know we no longer need. In the C/C++ sense, all memory allocations are leaked, and one relies on a janitor to "clean it all 
up" eventually, later).

The source of node leakage was unexpected (unexpected to me). It turns out that each assignment to e.innerHTML creates 
a new node, even if the new contents is the same as the old contents. (also the html parser has to run, consuming extra cpu 
cycles).

Obvious solution is to write code like this:
if (v !== e.innerHTML) { e.innerHTML = v };

This helped quite a bit on the "programs" page, but not as much as expected, and hardly at all on the "status" page.

It turns out, that read of innerHTML does not necessarily return the same string as it was written into it.
For example, if "v" is "a&b", e.innerHTML will return "a&amp;b" and the comparison will misfire.
There is more cases like this, see the section "Test set and get e.innerHTML" on the "example" midas page.

To help dealing with this, I suggest that instead of "inline" comparison (as above), one writes this:
mhttpd_set_if_changed(e, v);

Then to check that the comparison is effective, go to mhttpd.js and uncomment the console.log() call in 
mhttpd_set_if_changed(), reload the page and look at the javascript console to see all calls that result
in assignment of innerHTML (and leakage of DOM nodes).

This done, after replacing many "&" with "&amp;" and many "\'" with "\"", node leakage on the "programs" page was reduced 
to 1 node per 1/sec update: the unavoidable change to the timestamp on the top-right of the page.

Luckily, Stefan pointed me to the solution for this: use of e.firstChild.data instead of e.innerHTML. The only quirk is that the 
node should not be empty, which was easy to arrange by setting the initial value of the timestamp to a dummy value.

With these changes, the "programs" page (and most other pages) now leak 0 nodes (from the 1/sec periodic updates). 

There is still some small memory leakage from making the RPC requests and from receiving the RPC replies, but the 
garbage collector seems to have no trouble with them.

Typical memory use for all midas pages is now 50-60 Mbytes (down from 100-200 Mbytes).

The "status" page took a bit more work to fix due to it's curious coding, but it, too now uses 50-60 Mbytes as well. It still 
leaks quite a few nodes (to be fixed!), but the garbage collector seems to keep up with the allocations.

K.O.
    Reply  27 Sep 2019, Konstantin Olchanski, Bug Fix, improvement for midas web page resource use (alarm sound) 
> I noticed that midas web pages consume unexpectedly large amount of resources, as observed by the chrome browser 
> "task manager" and by other tools.
> 
> For example, size of the "status" page was observe to reach 200, 600 and even 900 Mbytes.
> [this was fixed by using set_if_changed(e, v);
> 
> Also I observed the midas web pages consume an unusual amount of CPU - 5-10-15% - all in inactive tabs in minimized 
> windows.
> 

The case of high CPU use turned out to be quite nasty.

The symptoms:
- the "programs" page in an inactive tab in a minimized window sits "doing nothing" for a day or two.
- uses about 0 to 0.1 to 1% CPU and 40-50-60 Mbytes of RAM (after the previous improvements)
- suddenly I see it use 10-15-20% CPU, continuously, non stop
- I open this tab
- suddenly, CPU use goes to 100%, memory use quickly grows from 40-50-60 Mbytes to 100-200 Mbytes.
- after a few seconds everything settles down, CPU use is back to 0-0.1-1%, but memory use does not go down.
- WTH?!?

The culprit turned out to be the playing of the alarm sound. (I have all tabs "muted" by default, also speakers usually powered down).

If I comment-out the playing of the alarm sound, this problem goes away completely. Pretty conclusive, I think.

After adding lots of debug console.log() calls, I think I identified the problem: audio objects were being created,
but they were not starting to play their sound files. When I opened the tab, all of them (about 400) at the same time
loaded the mp3 file (resulting in memory use going from 50 Mbytes to 190 Mbytes, typical) and started playing
(as seen on the audio event activity in the cpu profile traces from the google-chrome "performance" tool).

I think I am looking at an unexpected interaction between audio objects and google-chrome throttling of inactive tabs.

To muddy the waters some more, google-chrome periodically fails audio.play() with an exception to the effect of
"we will not play audio because user is not interacting with this page enough". See
https://bitbucket.org/tmidas/midas/issues/191/exception-on-audioplay

Now I think I have this sort of fixed. I have to handle the audio.play() failure (which is not a normal exception,
but a rejected promise, the handler is quite different), and I do not allow creating new audio objects if previous
audio object did not finish playing.

(note the "normal" timing: periodic update every 1 sec, playing of alarm sound event 60 seconds, length of alarm sound file is 3 sec,
two sound files should never overlap. now a console.log message is printed if overlap is detected)

This leaves us with the problem of alarm sound not playing "because the user didn't interact with the document first",
and I think there is nothing I can do about that.

K.O.

P.S. Another quirk is I discovered: go to the "config" page and press the new buttons "play test sound" and "speak test message". In muted
tabs, the test sound will not sound, but the test message will be shouted out loudly. This seems inconsistent to me. Unwanted audio/video ads
are blocked but loud shouting of "shave with burma-shave" is permitted. I also wonder if speaking is subject to this
"user did not interact" business. If not, we could replace the playing of our relaxing alarm beep with the yelling of "alarm! alarm! alarm!".

K.O.
ELOG V3.1.4-2e1708b5