Back Midas Rome Roody Rootana
  Midas DAQ System, Page 99 of 146  Not logged in ELOG logo
    Reply  13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size 
> > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> > odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
begin and end of every data file, you can recover some of the lost data.

for better effect, ODB should be dumped to disk at periodic intervals. but. current implementation writes odb to disk while holding the ODB 
semaphore, which means all ODB access stops for the duration, specifically, there will be gaps in the history because mlogger cannot read history 
data from ODB.

a better implementation could take the ODB lock, make a copy of ODB shared memory, release the ODB lock, complete writing to disk without holding the 
lock. protection is needed against 100 midas programs trying to do this all at the same time. computers with 0.5 GB RAM (many ARM FPGA SoCs) will be 
limited to ~100 Mbyte ODB). plus deal with memory allocation failures when taking a copy of a 2GB ODB.

in theory, the mmap() shared memory (already implemented in midas) does this automatically, but we lose control
over disk writes, we see some OSes write odb to disk "too often" and at wrong times, i.e. while we are in the middle
of creating or deleting something. current sequence of open(), atomic write() and close() ensures ODB.SHM always
contains a valid odb. (minus loss of OS and disk caches to crash or power loss).

K.O.
    Reply  13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size 
> > BTW, how do I resize the ODB.

ODB cannot be resized "online". Everything has to stop, save content to odb.json, get rid of old ODB.SHM, ensure ODB shared memory is destroyed (SysV or POSIX shared memory), 
create new ODB with new size, load odb.json. Feel free to punch this into chatgpt > odbresize.cxx, commit, test, push.

> I remember we discussed this some time ago, and concluded that odbedit needs a resize flag.

ODB cannot be resized online. ODB API has ODB clients holding ODB handles which are pointers (offsets) into ODB shared memory.

> Has this even been done?
> I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
> I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong

please contribute 14 distraction-free days to my patreon. thanks in advance!

K.O.
    Reply  13 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size 
> small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> begin and end of every data file, you can recover some of the lost 

The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
the flush only in the logger after an end of run.

Stefan
    Reply  13 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size 
> 
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> > begin and end of every data file, you can recover some of the lost 
> 
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.

are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June 
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.

so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.

my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

K.O.
    Reply  13 Jun 2023, Stefan Ritt, Suggestion, Maximum ODB size 
> are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
> (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs 
run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.

> my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

My vote is to flush the odb either periodically or after each run.

Stefan 
    Reply  15 Jun 2023, Konstantin Olchanski, Suggestion, Maximum ODB size 
> 
> > are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
> > (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
> 
> Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs 
> run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
> 
> > my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
> 
> My vote is to flush the odb either periodically or after each run.
> 

So we are in agreement.

RFE filed:
https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Dangerous change reverted:
60e4c44ad66346b89ba057391acf7a02890049be

K.O.

bash-3.2$ git diff
diff --git a/src/odb.cxx b/src/odb.cxx
index 0d3b88c2..d104ff28 100644
--- a/src/odb.cxx
+++ b/src/odb.cxx
@@ -2199,7 +2199,14 @@ INT db_close_database(HNDLE hDB)
       destroy_flag = (pheader->num_clients == 0);
 
       /* flush shared memory to disk */
-      if (destroy_flag)
+
+      /* if we save ODB to disk only after last client finishes, we will never save ODB to disk
+         in most experiments - none of them ever completely stop MIDAS in normal operation.
+         as result, all changes to ODB contents will be lost on system crash, power loss
+         or normal reboot. see https://daq00.triumf.ca/elog-midas/Midas/2539
+         K.O. June 2023. */
+
+      if (1 || destroy_flag)
          ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
 
       strlcpy(xname, pheader->name, sizeof(xname));

K.O.
    Reply  28 Jul 2023, Stefan Ritt, Suggestion, Maximum ODB size 
> RFE filed:
> https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically

Stefan
    Reply  09 Aug 2023, Konstantin Olchanski, Suggestion, Maximum ODB size 
> > RFE filed:
> > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
> 
> Stefan

Stefan's comments from the closed bug report:

Ok I implemented some periodic flushing. Here is what I did:

Created

/System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32

which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute.

All clients call db_flush_database() through their cm_yield() function
db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is 
done inside the ODB semaphore so that we don’t get a race condigiton
If the period has expired, db_flush_database() calls ss_shm_flush()
ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of 
memory), ss_shm_flush() writes directly to the binary file as before.
If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a 
dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and 
not block the calling program during the disk write operation.
Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory 
gets deleted.
This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to 
call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other 
clients latest 60 seconds (or whatever the flush period is) after odbedit exits.

Please note that ODB flushing has two purposes:

When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very 
seldom. Maybe at the end of a beam time period.
If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash.
Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the 
flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from 
corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease 
recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease 
it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are 
slow.

I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no 
complaints within the next few days, I will merge that into develop.

Stefan
Entry  06 Jun 2004, Konstantin Olchanski, , Makefile: set -rpath 
I commited Makefile bits to set the RPATH on dynamically linked executables
to find libmidas.so and ROOT shared libraries without setting
LD_LIBRARY_PATH , etc. K.O.
Entry  28 Jun 2020, Konstantin Olchanski, Info, Makefile update 
I reworked the MIDAS Makefile to simplify things and to remove redundancy with functions 
provided by cmake.

When you say "make", the list of options is printed.

The first and main options are "make cmake" and "make cclean" to run the cmake build.

This is my recommended way to build midas - the output of "make cmake" was tuned to provide 
the information need to debug build problems (all compiler commands, command line switches 
and file paths are reported). (normal "cmake VERBOSE=1" is tuned for debugging of cmake and 
for maximum obfuscation of problems building the actual project).

Build options are implemented through cmake variables:

options that can be added to "make cmake":
      NO_LOCAL_ROUTINES=1 NO_CURL=1
      NO_ROOT=1 NO_ODBC=1 NO_SQLITE=1 NO_MYSQL=1 NO_SSL=1 NO_MBEDTLS=1
      NO_EXPORT_COMPILE_COMMANDS=1

for example "make cmake NO_ROOT=1" to disable auto-detection of ROOT.

Two more make targets create reduced builds of midas:

"make mini" builds a subset of midas suitable for building frontend programs. Big programs 
like mlogger and mhttpd are excluded, optional components like CURL or SQLITE are not needed.

"make remoteonly" builds a subset of midas suitable for building remotely connected 
frontends. Big parts of midas are excluded, many system-dependent functions are excluded, 
etc. This is intended for embedded applications, such as fpga, uclinux, etc.

But wait, there is more. Here is the full list:

daqubuntu:midas$ make
Usage:

   make cmake     --- full build of midas
   make cclean    --- remove everything build by make cmake

   options that can be added to "make cmake":
      NO_LOCAL_ROUTINES=1 NO_CURL=1
      NO_ROOT=1 NO_ODBC=1 NO_SQLITE=1 NO_MYSQL=1 NO_SSL=1 NO_MBEDTLS=1
      NO_EXPORT_COMPILE_COMMANDS=1

   make dox       --- run doxygen, results are in ./html/index.html
   make cleandox  --- remove doxygen output

   make htmllint  --- run html check on resources/*.html

   make test      --- run midas self test

   make mbedtls   --- enable mhttpd support for https via the mbedtls https library
   make update_mbedtls --- update mbedtls to latest version
   make clean_mbedtls  --- remove mbedtls from this midas build

   make mtcpproxy --- build the https proxy to forward root-only port 443 to mhttpd https 
port 8443

   make mini      --- minimal build, results are in linux/{bin,lib}
   make cleanmini --- remove everything build by make mini

   make remoteonly      --- minimal build, remote connetion only, results are in linux-
remoteonly/{bin,lib}
   make cleanremoteonly --- remove everything build by make remoteonly

   make linux32   --- minimal x86 -m32 build, results are in linux-m32/{bin,lib}
   make clean32   --- remove everything built by make linux32

   make linux64   --- minimal x86 -m64 build, results are in linux-m64/{bin,lib}
   make clean64   --- remove everything built by make linux64

   make linuxarm  --- minimal ARM cross-build, results are in linux-arm/{bin,lib}
   make cleanarm  --- remove everything built by make linuxarm

   make clean     --- run all 'clean' commands

daqubuntu:midas$ 

K.O.
    Reply  15 Jul 2020, Stefan Ritt, Info, Makefile update 
Please note that you can also compile midas in the standard cmake way with

$ mkdir build
$ cd build
$ cmake ..
$ make install

in the root midas directory. You might have to use "cmake3" on some systems.

Stefan
Entry  10 May 2023, Lukas Gerritzen, Suggestion, Make sequencer more compatible with mobile devices 
When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:
if (script.substring(0, 1) === "[") {
   // refuse to load script if the selected a subdirectory
   return;
}

and the fact that the <option> elements are listening for double click events, which seem to be impossible on a mobile device.

The following modification allows browsing the directories without changing the double click behaviour on a desktop:
diff --git a/resources/load_script.html b/resources/load_script.html
index 41bfdccd..36caa57f 100644
--- a/resources/load_script.html
+++ b/resources/load_script.html
@@ -59,6 +59,28 @@
 </div>

 <script>
+   document.getElementById("msg_sel").onchange = function() {
+      script = this.value;
+      button = document.getElementById("load_button");
+      if (script.substring(0, 4) === "[..]") {
+         // Change button to go back
+         enable_button_by_id("load_button");
+         button.innerHTML = "Back";
+         button.onclick = up_subdir;
+      } else if (script.substring(0, 1) === "[") {
+         // Change button to load subdirectory
+         enable_button_by_id("load_button");
+         button.innerHTML = "Enter subdirectory";
+         button.onclick = load_subdir;
+      } else {
+         // Change button to load script
+         enable_button_by_id("load_button");
+         button = document.getElementById("load_button");
+         button.innerHTML = "Load script";
+         button.onclick = load_script;
+      }
+   }
+
 function set_if_changed(id, value)
 {
    var e = document.getElementById(id);

This makes the code quoted above redundant, so the check can actually be omitted.
    Reply  10 May 2023, Stefan Ritt, Suggestion, Make sequencer more compatible with mobile devices 

Lukas Gerritzen wrote:
When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:


We are working right now on a general file picker, which will replace also the file picker for the sequencer. So please wait until the new thing is out and then test it there.

Stefan
Entry  11 Jul 2011, Konstantin Olchanski, Info, Make "STOP" run transition always succeed 
Over the years, there was some back-and-forth changes in what happens to run transitions when some 
of the participants misbehave (do not respond to RPC calls, timeout, crash, etc).

The very original behaviour was to ignore all errors. This resulted in user confusion when some clients 
would start, some would not, data from frontends that missed the transition did not arrive, etc.

So it was changed to fail the transition if any client misbehaves.

This left mlogger (who is usually the first one to see the TR_START transition) in a funny state - output 
file is open, etc, but there is no run active. This was fixed by adding a TR_STARTABORT transition to tell 
mlogger, event builder & co that the just started run did not start after all.

Also at some point code was added to forcefully kill clients that do not respond to run transitions (do 
not respond to RPC, timeout, etc).

Recently, it was observed how during unattended overnight operation of a MIDAS DAQ system, with the 
logger set to "auto restart", some unnecessary clients misbehave during the run stop transition, and 
prevent the run from stopping and restarting. The user comes in the morning and is unhappy that data 
taking stopped some time during the night.

midas.c svn rev 5136 changes the TR_STOP transition to always succeed, even if some clients had 
transition errors. If these clients are unnecessary for normal operation of the DAQ, the following run 
"auto restart" will continue taking data. If those were important clients, data taking will continue the 
best it can - it *is* unattended operation - nobody is looking - but users can always setup alarms for 
checking that important clients are always running during data taking. (For very important clients, one 
can setup alarms to send email, send SMS messages, etc).

K.O.
Entry  14 Nov 2013, Konstantin Olchanski, Bug Report, MacOS10.9 strlcpy() problem 
On MacOS 10.9 MIDAS will crashes in strlcpy() somewhere inside odb.c. We think this is because strlcpy() 
in MacOS 10.9 was changed to abort() if input and output strings overlap. For overlapping memory one is 
supposed to use memmove(). This is fixed in current midas, for older versions, you can try this patch:

konstantin-olchanskis-macbook:midas olchansk$ git diff
diff --git a/src/odb.c b/src/odb.c
index 1589dfa..762e2ed 100755
--- a/src/odb.c
+++ b/src/odb.c
@@ -6122,7 +6122,10 @@ INT db_paste(HNDLE hDB, HNDLE hKeyRoot, const char *buffer)
                   pc++;
                while ((*pc == ' ' || *pc == ':') && *pc)
                   pc++;
-               strlcpy(data_str, pc, sizeof(data_str));
+
+               //strlcpy(data_str, pc, sizeof(data_str)); // MacOS 10.9 does not permit strlcpy() of overlapping 
strings
+               assert(strlen(pc) < sizeof(data_str)); // "pc" points at a substring inside "data_str"
+               memmove(data_str, pc, strlen(pc)+1);
 
                if (n_data > 1) {
                   data_str[0] = 0;
konstantin-olchanskis-macbook:midas olchansk$ 


As historical reference:

a) MacOS documentation says "behavior is undefined", which is no longer true, the behaviour is KABOOM!
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/strlcpy.3.h
tml

b) the original strlcpy paper from OpenBSD does not contain the word "overlap" 
http://www.courtesan.com/todd/papers/strlcpy.html

c) the OpenBSD man page says the same as Apple man page (behaviour undefined)
http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy

d) the linux kernel strlcpy() uses memcpy() and is probably unsafe for overlapping strings
http://lxr.free-electrons.com/source/lib/string.c#L149

e) midas strlcpy() looks to be safe for overlapping strings.

K.O.
Entry  01 Oct 2013, Konstantin Olchanski, Info, MacOS select() problem 
The following code found in mhttpd does not work on MacOS (BSD UNIX).

On Linux, the do-loop will finish after 2 seconds as expected. On MacOS (and other BSD systems), it will 
loop forever.

The cause is the MIDAS watchdog alarm() signal that fires every 1 second and always interrupts the 2 
second sleep of select(). The Linux select() updates it's timeout argument to reflect time already slept, so 
eventually we finish. The MacOS (BSD) select() does not update the timeout argument and select goes back 
to sleep for another 2 seconds (to be again interrupted half-way through).

The POSIX standard (specification for select() & co) permits either behaviour. Compare "man select" on 
MacOS and on Linux.

If the select() timeout were not 2 seconds, but 0.9 seconds; or if the MIDAS watchdog alarm fired every 
2.1 seconds, this problem would also not exist.

I think there are several places in MIDAS with code like this. An audit is required.

{
                  FD_ZERO(&readfds);
                  FD_SET(_sock, &readfds);

                  timeout.tv_sec = 2;
                  timeout.tv_usec = 0;

                  do {
                     status = select(FD_SETSIZE, &readfds, NULL, NULL, &timeout);
                     /* if an alarm signal was cought, restart with reduced timeout */
                  } while (status == -1 && errno == EINTR);
}

K.O.
    Reply  25 Oct 2013, Konstantin Olchanski, Info, MacOS select() problem 
> The following code found in mhttpd does not work on MacOS (BSD UNIX). ...

Because of this problem, on MacOS, run transitions can get stuck forever - most timeouts do not work. (Specifically, recv_string() never times out)

K.O.
Entry  05 Jul 2011, Konstantin Olchanski, Bug Report, MacOS network socket timeouts non-functional 
It turns out that because of differences between select() syscall implementation between UNIX (MacOS, 
maybe BSD) and Linux,  network socket timeouts do not work.

This affects timeouts during run transitions (transition calls to dead clients do not timeout), maybe other 
places.

I am looking into fixing this. The main difficulty is with UNIX select() not updating the timeout parameter 
when it is interrupted by the MIDAS watchdog alarm signal. Linux select() subtracts the elapsed time from 
the timeout value and this code from system.c works correctly: while (1) { status = select(..., &timeout); if 
(status==0) break; } (value of timeout becomes smaller each time), while on MacOS it loops forever (value 
of timeout does not change).
K.O.
Entry  11 May 2016, Thomas Lindner, Info, MacOS 10.11 (El Capitan) openssl compilation errors 
I recently upgraded my macbook to MacOS 10.11.  The compilation of MIDAS failed after the upgrade, 
complaining about  

gcc  -c -g -O2 -Wall <snip> src/mongoose.c
src/mongoose.c:322:10: fatal error: 'openssl/ssl.h' file not found

It seems that MacOS has now fully removed openssl header files (they were deprecated for a while).  There 
seems to be some notes on that here

http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html

Konstantin suggested installing open-source builds of openssl using MacPorts.  I did that and MIDAS 
compiled fine.  I documented the procedure here:

https://midas.triumf.ca/MidasWiki/index.php/Installation/Compilation_problems#MacOS_10.11_.28El_Capitan.2
9_openssl_errors
    Reply  12 May 2016, Stefan Ritt, Info, MacOS 10.11 (El Capitan) openssl compilation errors 
> I recently upgraded my macbook to MacOS 10.11.  The compilation of MIDAS failed after the upgrade, 
> complaining about  
> 
> gcc  -c -g -O2 -Wall <snip> src/mongoose.c
> src/mongoose.c:322:10: fatal error: 'openssl/ssl.h' file not found
> 
> It seems that MacOS has now fully removed openssl header files (they were deprecated for a while).  There 
> seems to be some notes on that here
> 
> http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html
> 
> Konstantin suggested installing open-source builds of openssl using MacPorts.  I did that and MIDAS 
> compiled fine.  I documented the procedure here:
> 
> https://midas.triumf.ca/MidasWiki/index.php/Installation/Compilation_problems#MacOS_10.11_.28El_Capitan.2
> 9_openssl_errors

The MIDAS Wiki page points to https://guide.macports.org/  which covers OSX up to 10.9. Installers for 10.10 and the current 10.11 
(El Captain) can be found here: https://www.macports.org/install.php

Stefan
ELOG V3.1.4-2e1708b5