Back Midas Rome Roody Rootana
  Midas DAQ System, Page 104 of 155  Not logged in ELOG logo
    Reply  08 Jul 2019, Konstantin Olchanski, Info, Limitations of MSL 
> Konstantin, would you mind resurrecting and sharing the python code?

Not until September or later.

I encourage you to start an "issue" in the bitbucket repository (a "request for improvement") and contact
Thomas L. to see if we can/should discuss it at the incoming midas workshop at TRIUMF.

> So far, my workaround is to call odbedit, which is not ideal for many reasons.

Yes, this has been the way to do it for years...

Also I was thinking of writing an command line tool to invoke midas json-rpc methods (via mhttpd),
then you can do from command line or from a script everything that a web page can do. (of course
you can do it today, by using "curl" to send http requests to mhttpd directly).

K.O.
    Reply  08 Jul 2019, Konstantin Olchanski, Info, Limitations of MSL 
Hi, Stefan, on second thought, I agree, I do not know of any scripting language implementation (packaged as a library or not) that
can store it's state in a file ("checkpoint the execution") and that can execute it's program "one line at a time", like the midas
sequencer does now. In the most severe case, one invocation of msequencer executes one line of the msl script.

K.O.

> Sure some existing scripting languages can be used, but they fall short of a few important items in larger experiments:
> 
> - they are typically run from a local terminal in the counting house. A remote observer of the experiment has no idea which script is running and at which state it is.
> 
> - if DAQ crashes during a script or is aborted, it has to be restarted from the beginning. If you run a sequence of let's say 100 runs taking 8 hours, and on run #98 something goes wrong, you are screwed if you have to start at run #1 again.
> 
> This are the main reasons why I developed the midas sequencer. Having everything web-based, everybody can watch remotely how far the sequence progressed. If the whole DAQ crashes, the sequence resumes from the crash point, not from the beginning. This is by saving the current state into the ODB. So even if the sequencer itself is stopped and restarted, that still works.
> 
> I agree that the MSL is missing a few calculations, and I was just waiting to get a few specific requests. I will either add new functions such as basic calculations like adding and subtracting variables, or I will create a way to call an external shell like bash to do calculations. I put this high on my todo list.
> 
> Stefan
    Reply  08 Jul 2019, Konstantin Olchanski, Bug Report, Frontend killed at stop of run 
> > run the frontend inside gdb and post the stack trace after the crash?
> > 
> > if there is no crash (the program is stopped by exit()), you may need
> > to set a breakpoint in exit() or _exit() (not sure what it's latest name is)
> > then with luck your stack trace will show who/what called it from where.
> > 
> 
> If I remember correctly from the last time I tried that, it doesn't use the exit
> function but gdb just reports that the program was terminated and no longer exists. I
> can't set a breakpoint on SIGKILL as the point of SIGKILL is to kill the program and
> gdb can't set a break at that point afaik.

For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
trace. Is this what you see?

If your program stops "normally", not from receiving some signal, set breakpoints on "exit" and 
"_exit".

The normal stop sequence is to call exit(), which runs all the atexit() functions (the midas atexit() 
function prints the message about "cm_disconnect_experiment not called at end of program") and 
calls _exit() to stop the program.

So if you see the midas message "cm_disconnect_experiment not called at end of program", it is a 
good indication that somebody (not mfe.c) called exit() on you. A breakpoint on "exit" should catch 
who does it.

Good luck,
K.O.
    Reply  08 Jul 2019, Konstantin Olchanski, Bug Report, Frontend killed at stop of run 
> > 
> > For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack 
> > trace. Is this what you see?
> 
> Yes, that is exactly what I remember seeing.
> 

Where would a SIGKILL come from?!?

Look in the syslog (/var/log/messages). If the program was killed by the linux kernel, it would be logged there,
the usual cause is the machine runs out of memory and programs are killed by the OOM killer, this is logged
into the syslog, always.

MIDAS also can issue a SIGKILL sometimes, again this is always logged in midas.log. see src/midas.c, search for SIGKILL to see 
the exact messages printed before it is sent out.

K.O.
    Reply  10 Jul 2019, Konstantin Olchanski, Bug Report, Frontend killed at stop of run 
> ... finding a current midas.log file

On the "help" page, see "midas.log".

Same information is in ODB, the midas log file name is concatenation of "/Logger/Data dir" and "message file".

K.O.
    Reply  10 Jul 2019, Konstantin Olchanski, Bug Report, Header files missing when trying to compile rootana, roody and analyzer 
>> [hh19285@it038146 ~]$ which root-config
> /software/root/v6.06.08/bin/root-config
> [hh19285@it038146 ~]$ root-config --cflags
> -pthread -std=c++11 -Wno-deprecated-declarations -m64 -I/software/root/v6.06.08/include
> 
> [hh19285@it038146 build]$ ./analyzer
> Warning in <TClassTable::Add>: class TApplication already in TClassTable
> ...
> ...
> #2  0x00007f7e911b21a4 in TUnixSystem::StackTrace() () from /usr/lib64/root/libCore.so.6.16

You have a mismatch. Your root-config thinks ROOT is installed in /software/..., but the crash
dump says your ROOT libraries are in /usr/lib64/root (not in /software/...).

You can confirm that you are linking against the correct ROOT by running cmake with VERBOSE=1
and examine the linker command line to see what library link path is specified for ROOT.

You can confirm which ROOT library is actually used when you run the analyzer
by running "ldd ./analyzer". You should see the same library paths as specified
to the linker (/software/.../lib*.so). A mismatch can be caused by the setting of LD_LIBRARY_PATH
and by 100 other reasons.

I suggest that you remove the "wrong" ROOT before you continue debugging this.

K.O.
    Reply  11 Jul 2019, Konstantin Olchanski, Bug Report, Frontend killed at stop of run 
> Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [system.c:4580:ss_recv_net_command,ERROR] timeout receiving network  command header
> Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [midas.c:10322:rpc_client_call,ERROR] call to "fedescant" on  "grsmid00.triumf.ca" RPC "rc_transition": timeout waiting for reply

We should have started debugging from here. The error messages mean: your frontend is not responding to run transition (RPC timeout).

> problem in the communication via the A3818 card from CAEN.

Yes, this has been problematic before.

K.O.
    Reply  11 Jul 2019, Konstantin Olchanski, Bug Report, Header files missing when trying to compile rootana, roody and analyzer 
> > You can confirm that you are linking against the correct ROOT by running cmake with VERBOSE=1
> > and examine the linker command line to see what library link path is specified for ROOT.
> 
> $ make VERBOSE=1
> to see the command lines.
>

Most likely, they forgot to rerun "cmake" after installing a new ROOT. The joys of a two-step build (cmake; make).

K.O.
Entry  11 Jul 2019, Konstantin Olchanski, Bug Report, problems with the default mhttpd configuration 
We installed recent mhttpd on a ubuntu machine and discovered a number of problems
with the default mhttpd settings.

We did follow the normal instructions to install and configure an apache https proxy
with a certbot certificate and password protection, this part worked ok. Big thanks
to Lars M. for providing the Ubuntu instructions for apache.

Then we started seeing errors from mhttpd about access to URLs like "manager/html" 
(google "manager/html exploit") that did not go through the proxy.

It turns out that unlike CentOS-7, Ubuntu LTS 18.04 does not run a restrictive firewall
and access to mhttpd ports 8080 and 8443 is not blocked. Then, it turns out that by 
default, the mhttpd access controls are also disabled, and it accepts http requests from 
anywhere/everywhere. Also by default, the mhttpd password is also disabled.

As result, anybody from anywhere can access mhttpd without a password.

One fix for this is to activate the mhttpd access control list by setting ODB 
/Experiment/Security/allowed hosts[0] to "localhost".

K.O.
    Reply  11 Jul 2019, Konstantin Olchanski, Bug Report, problems with the default mhttpd configuration, also elogd 
> It turns out that unlike CentOS-7, Ubuntu LTS 18.04 does not run a restrictive firewall
> and access to mhttpd ports 8080 and 8443 is not blocked
>
> As result, anybody from anywhere can access mhttpd without a password.
> 

elogd can suffer from the same problem, but not as badly, one can connect to elogd and attempt to run 
exploits, but one cannot access elog entries without a password:

a) default configuration is to ask for a password
b) elogd almost immediately redirects to the https URL specified in the URL entry of the config file, which 
normally points to the https proxy, which also immediately asks for a password.

In the absence of firewall protection (as on Ubuntu), 
add "Interface = 127.0.0.1" to the elog config file or run elogd with "-n localhost",
per instructions at https://elog.psi.ch/elog/config.html 

K.O.
    Reply  11 Jul 2019, Konstantin Olchanski, Bug Report, rework of mhttpd configuration 
> Ubuntu LTS 18.04 does not run a restrictive firewall and access to mhttpd ports 8080 and 8443 is not 
blocked.

Clearly, the present defaults settings of mhttpd are out of date.

The best I remember our internal discussions, we have converged on the following new default settings:

- mhttpd only listens on the localhost interface
- only accepts http (not https)
- password protection is off

These settings allow one to easily test midas on a laptop or on a single-user computer.

They also happen to be the correct settings when using an https proxy (i.e. apache httpd).

If the https proxy cannot be on the same computer, (i.e. ALPHA at CERN):

- one would enable mhttpd to listen on the external network interface
- this will enable the mhttpd access controls (ODB /expt/security/mhttpd hosts/allowed hosts)
- one would allow the https proxy machine access to mhttpd by adding it's hostname to "allowed hosts".

In the case where a separate https proxy cannot be used:

- one would enable https on the external network interface
- one would have to obtain an https certificate (there is possibility of adding certbot integration to mhttpd, 
if there is demand for this)
- this will activate the mhttpd password protection, so one would have to define a username and password 
in the .htdigest file (this is done by the mongoose web server library).

I was planning to implement these changes when I update the mongoose web server library to the latest 
version (fixes a memory leak and improves/simplifies multithreading).

But maybe I should implement them sooner.

I am also thinking of adding a proxy function to mhttps (same as "ProxyPass" in apache httpd), set ODB 
/Proxy/webcam to "http://webcam_on_private_network/magic_webcam_url", and access to 
https://midas/webcam will return the data from the webcam without having to set this up in apache httpd 
(requires root access, etc).

K.O.
    Reply  12 Jul 2019, Konstantin Olchanski, Bug Report, rework of mhttpd configuration 
> > - this will activate the mhttpd password protection, so one would have to define a username and password 
> > in the .htdigest file (this is done by the mongoose web server library).
> 
> Actually I'm thinking since a while to have user-level access to mhttpd, similarly to elog.
>

With per-user login, we have the possibility to add better permissions/access controls. In past
discussions we talked about 3 levels of user access:

- read-only user: can look, but cannot affect anything
- operator: same as read-only user, but can start/stop runs, can clear alarms, can push buttons on custom pages, can cause predefined scripts to run, etc.
- root user: can do everything

Technically, this is easy to implement in the mjsonrpc library: each username will be mapped to a privilege level,
and each rpc request handler will specify minimum required privilege: odb write rpc would require root level,
run start would require operator level, odb read permitted for everybody. This will be enforced inside mhttpd.

>
> Each user has to log in with a unique username/password. After some time of inactivity, you're logged out.
>

For now, we use the password protection built into the apache httpd web server.

It is known to be secure, but it does not have the "advanced" user management functions
that we take for granted with the elog, with wiki pages, with github, etc. Missing are self-registration
with approval, password reset and recovery and so forth.

On the other hand, apache httpd is supposed to be easy to integrate with "enterprise" user management
systems, like the CERN single-sign-on system. (We did not look yet at the integration with the TRIUMF
single-sign-on system, based on Microsoft AD).

(I see the nginx web server is gaining in popularity, but I do not know what features it has
for user and password management).

The elog software does have very good user and password management, and we could bring it into midas,
if we figure out how to ensure that it is actually secure. I know a professional security audit was done
for the elog software and I know that mhttpd will not pass such an audit.

But with some extra work it is possible.

>
> This would have the advantage that one knows who is active where, like when using the chat functionality in mhttpd. Or who started/stopped a run etc. This might not be necessary for simple local installations, but if you have 20 
> people controlling an experiment from three different continents simultaneously, this could be beneficial. Using the elog authentication libraries, one could even forward the login process to LDAP or KERBEROS, 
> so you could log in with out institutional account, and don't have to remember an additional password.
> 
> Just some food for thought.
> 

Some of this food looks very good, indeed.

K.O.
    Reply  16 Jul 2019, Konstantin Olchanski, Bug Report, a3818 and signals, Frontend killed at stop of run 
Message from John M O'Donnell <odonnell@lanl.gov>

Folks,

The following might be related, if so great, if not sorry for the spam.

We had problems with MIDAS/CAEN_A3818 until two things happened:

1) CAEN found the root cause of a problem, as the A3818 and MIDAS both
used unix alarm signals, resulting in clashes.  CAEN modified the
A3818 driver to have a "no alarm" option.

2) after downloading the modified driver, edit src/a3818.c to #define
USE_MIDAS 1 somewhere near the top.

Hope this helps,

John.
    Reply  16 Jul 2019, Konstantin Olchanski, Bug Report, a3818 and signals, Frontend killed at stop of run 
> Message from John M O'Donnell <odonnell@lanl.gov>
>
> the A3818 and MIDAS both used unix alarm signals, resulting in clashes.
>

FWIW, current midas no longer uses alarm signals. See forum messages and git commits about 
removal of cm_watchdog().

K.O.
Entry  21 Jul 2019, Konstantin Olchanski, Info, error handling is hard 
Happy summer to everybody.

When programming in general, and when programming MIDAS, there is always a struggle
with error handling. Too much? Too little? Visually, some MIDAS functions are 90% error handling, 10% useful work, if that.

What is the right way to do this?
Where is the balance?
Would c++ exceptions help or hinder?
How to make it better?

It turns out that the Go community have been struggling with exactly this for the last year or so.

Read the (ultimately rejected) proposal for improved error handling in Go. All the problems and difficulties
and struggles facing the programmer and the language designer spread right in front of us.

https://go.googlesource.com/proposal/+/master/design/go2draft-error-handling-overview.md

(To remember, Go is this: https://www.oreilly.com/library/view/the-go-programming/9780134190570/
The language designers are Brian W. Kernighan, Rob Pike, Ken Thompson and "some other people"
(Dennis Ritchie is no longer with us). These people gave us UNIX and C (and C++, the C++ guy was
their next-door-office mate), when that crowd speaks, I listen)

That write-up refers to some blogger's vivid illustration how correct error handling is hard,
with special focus on error handling in the presence of exceptions:

https://devblogs.microsoft.com/oldnewthing/?p=39683
https://devblogs.microsoft.com/oldnewthing/?p=36693

I read all this stuff and said, "wow, this is the reader's digest version of my life in computer programming".

The clincher from this last guy? "My point isn’t that exceptions are bad.
My point is that exceptions are too hard and I’m not smart
enough to handle them."

"back to writing some error handling code",
K.O.
Entry  08 Aug 2019, Konstantin Olchanski, Info, c++11 for RHEL/SL/CentOS-6 
The default el6 (RHEL/SL/CentOS-6) compiler is gcc-4.4.7, it does not support c++11, not even a little bit.

Do this to install newer c++ compilers and build MIDAS with c++11:

ssh root@sl6machine
# yum install centos-release-scl-rh
# yum install devtoolset-8
# yum install cmake3
# scl -l
devtoolset-8
...

$ ssh user@sl6machine
$ scl enable devtoolset-8 bash
$ gcc -v
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-8/root/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
gcc version 8.3.1 20190311 (Red Hat 8.3.1-3) (GCC) 
$ cd git/midas
$ make cclean
$ make cmake3
$ ls -l bin/odbedit

K.O.
Entry  08 Aug 2019, Konstantin Olchanski, Info, MIDAS will use C++11 
After much discussion, and following the MIDAS workshop at TRIUMF, we made the decision to use C++11 in MIDAS.

There are many benefits, and only one drawback - no c++11 compilers in the default OS install on older computers (i.e. 
RHEL/SL/CentOS before el7). (the same applies to our use of cmake).

Specifically for el6, the solution is to use c++11 compatible gcc-8 from devtoolset-8, see 
https://midas.triumf.ca/elog/Midas/1649

The c++11 features we most welcome - initialization of class members at declaration time (no more forgetting to add initialization to 
each and every constructor), c++ threads and mutexes, lambdas and "auto".

K.O.
    Reply  08 Aug 2019, Konstantin Olchanski, Suggestion, midas cmake migration 
> Each CMakeLists.txt should specify which version of CMake it requires. The MIDAS CMakeLists.txt requires CMake 3.1 or later. 
> We deliberately stayed away from fancy cutting edge CMake features in order to make midas easier to compile. On top of that,
> midas is a much simpler package compared to root, so things are not so complicated.

The oldest cmake I actually used is 3.6.1 (on SL6), so I do not know if cmake versions between 3.1 and 3.6 actually work for us. Perhaps we should set 
the CMakefile requirement to 3.6.1 to match the oldest version we know works. If somebody has an older cmake, they have a choice of updating it or 
trying it as as and reporting success/failure to the midas forum here.

K.O.
    Reply  08 Aug 2019, Konstantin Olchanski, Bug Report, ROOTANA bug? 
> indnerlt:rootana lindner$ git diff libMidasInterface/TMidasOnline.cxx
> diff --git a/libMidasInterface/TMidasOnline.cxx b/libMidasInterface/TMidasOnline.cxx
> index 92eb3e9..67da613 100644
> --- a/libMidasInterface/TMidasOnline.cxx
> +++ b/libMidasInterface/TMidasOnline.cxx
> @@ -191,7 +191,7 @@ bool TMidasOnline::sleep(int mdelay)
>    #ifdef CH_IPC
>    ss_suspend_set_dispatch(CH_IPC, 0, NULL);
>    #else
> -  ss_suspend_set_dispatch_ipc(NULL);
> +  //  ss_suspend_set_dispatch_ipc(NULL);
>    #endif
>   int status = ss_suspend(mdelay, 0);
>    if (status == SS_SUCCESS)
> 
> This compiles and at least runs for me; so maybe that is helpful for you.  But Konstantin will provide a longer term solution.


This is a problem with the latest development version of MIDAS. ss_suspend overrides have been removed from system.cxx
and there is no way for rootana to avoid the problem of recursive call to the event handler.

I recommend that instead of using the latest development version of MIDAS you use one of the recent released versions (use "git tag -l", midas-2019-06-b is the latest release).

All the released versions of midas have the ss_suspend overrides implemented and rootana will work correctly. For the next release of midas
I will restore the ss_suspend override and update the rootana code.


K.O.
    Reply  09 Aug 2019, Konstantin Olchanski, Info, Precedence of equipment/common structure 
> Today I fixed a long-annoying problem. ...
> /Equipment/<name>/Common
> In the past, the ODB setting took precedence over the frontend structure...
> We defined this like 25 years ago and I forgot what the exact reason was.
> It causes however many people (including myself) to fall into this trap: ...

There is good number of confusions regarding entries in /eq/xxx/common:

- for some of them, the frontend code settings take precedence and overwrite settings in odb ("frontend file name")
- for some of them, ODB takes precedence and frontend code values are ignored ("read on" and "period")
- for some of them, changes in ODB take effect immediately (via db_watch) ("period")
- for some of them, frontend restart is required for changes to take effect (output event buffer name "buffer")
- some of them continuously update the odb values ("status", "status color")

I do not think there is a simple way to improve on this.

(One solution would replace the single "common" with several subdirectories, "per function",
one would have items where the code takes precedence, one would have items where odb takes
precedence (in effect, "standard settings"), one will have items that the frontend always updates
and that should not be changes via odb ("frontend name", etc). I am not sure this one solution
is necessarily an "improvement").

Lacking any ideas for improvements, I vote for the status quo. (plus a review of the documentation to ensure we have clearly
written up what each entry in "common" does and whether the user is permitted to edit it in odb).

K.O.
ELOG V3.1.4-2e1708b5