ID |
Date |
Author |
Topic |
Subject |
2537
|
13 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size |
> small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash
> the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at
> begin and end of every data file, you can recover some of the lost
The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost.
So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
the flush only in the logger after an end of run.
Stefan |
2538
|
13 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | >
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at
> > begin and end of every data file, you can recover some of the lost
>
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost.
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.
are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.
so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.
my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
K.O. |
2539
|
13 Jun 2023 |
Stefan Ritt | Suggestion | Maximum ODB size |
> are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
> (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs
run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
> my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
My vote is to flush the odb either periodically or after each run.
Stefan |
2541
|
15 Jun 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | >
> > are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now"
> > (I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?
>
> Indeed this is almost never the case, maybe once per months. On the other hand, we have a complete crash of the os maybe once a year. Most of the time the programs
> run continuously (we do not need odbedit), so our timestamp is typically one or two days old, so not good either.
>
> > my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.
>
> My vote is to flush the odb either periodically or after each run.
>
So we are in agreement.
RFE filed:
https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
Dangerous change reverted:
60e4c44ad66346b89ba057391acf7a02890049be
K.O.
bash-3.2$ git diff
diff --git a/src/odb.cxx b/src/odb.cxx
index 0d3b88c2..d104ff28 100644
--- a/src/odb.cxx
+++ b/src/odb.cxx
@@ -2199,7 +2199,14 @@ INT db_close_database(HNDLE hDB)
destroy_flag = (pheader->num_clients == 0);
/* flush shared memory to disk */
- if (destroy_flag)
+
+ /* if we save ODB to disk only after last client finishes, we will never save ODB to disk
+ in most experiments - none of them ever completely stop MIDAS in normal operation.
+ as result, all changes to ODB contents will be lost on system crash, power loss
+ or normal reboot. see https://daq00.triumf.ca/elog-midas/Midas/2539
+ K.O. June 2023. */
+
+ if (1 || destroy_flag)
ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);
strlcpy(xname, pheader->name, sizeof(xname));
K.O. |
2565
|
28 Jul 2023 |
Stefan Ritt | Suggestion | Maximum ODB size | > RFE filed:
> https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
Stefan |
2578
|
09 Aug 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > RFE filed:
> > https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
>
> Implemented and closed: https://bitbucket.org/tmidas/midas/issues/367/odb-should-be-saved-to-disk-periodically
>
> Stefan
Stefan's comments from the closed bug report:
Ok I implemented some periodic flushing. Here is what I did:
Created
/System/Flush/Flush period : TID_UINT32 /System/Flush/Last flush : TID_UINT32
which control the flushing to disk. The default value for “Flush period” is 60 seconds or one minute.
All clients call db_flush_database() through their cm_yield() function
db_flush_database() checks the “Last flush” and only flushes the ODB when the period has expired. This test is
done inside the ODB semaphore so that we don’t get a race condigiton
If the period has expired, db_flush_database() calls ss_shm_flush()
ss_shm_flush() tries to allocate a buffer of the shared memory. If the allocation is not successful (out of
memory), ss_shm_flush() writes directly to the binary file as before.
If the allocation is successful, ss_shm_flush() copies the share memory to a buffer and passes this buffer to a
dedicated thread which writes the buffer to the binary file. This causes ss_shm_flush() to return immediately and
not block the calling program during the disk write operation.
Added back the “if (destroy_flag) ss_shm_flush()” so that the ODB is flushed for sure before the shared memory
gets deleted.
This means now that under normal circumstances, exiting programs like odbedit do NOT flush the ODB. This allows to
call many “odbedit -c” in a row without the flush penalty. Nevertheless, the ODB then gets flushed by other
clients latest 60 seconds (or whatever the flush period is) after odbedit exits.
Please note that ODB flushing has two purposes:
When all programs exit, we need a persistent storage for the ODB. In most experiments this only happens very
seldom. Maybe at the end of a beam time period.
If the computer crashes, a recent version of the ODB is kept on disk to simplify recovery after the crash.
Since crashes are not so often (during production periods we have maybe one hardware failure every few years) the
flushing of the ODB too often does not make sense and just consumes resources. Flushing does also not help from
corrupted ODBs, since the binary image will also get corrupted. So the only reason for periodic flushes is to ease
recovery after a total crash. I put the default to 60 seconds, but if people are really paranoid they can decrease
it to 10 seconds or so. Or increase it to 600 seconds if their system does not crash every week and disks are
slow.
I made a dedicated branch feature/periodic_odb_flush so people can test the new functionality. If there are no
complaints within the next few days, I will merge that into develop.
Stefan |
61
|
06 Jun 2004 |
Konstantin Olchanski | | Makefile: set -rpath | I commited Makefile bits to set the RPATH on dynamically linked executables
to find libmidas.so and ROOT shared libraries without setting
LD_LIBRARY_PATH , etc. K.O. |
1961
|
28 Jun 2020 |
Konstantin Olchanski | Info | Makefile update | I reworked the MIDAS Makefile to simplify things and to remove redundancy with functions
provided by cmake.
When you say "make", the list of options is printed.
The first and main options are "make cmake" and "make cclean" to run the cmake build.
This is my recommended way to build midas - the output of "make cmake" was tuned to provide
the information need to debug build problems (all compiler commands, command line switches
and file paths are reported). (normal "cmake VERBOSE=1" is tuned for debugging of cmake and
for maximum obfuscation of problems building the actual project).
Build options are implemented through cmake variables:
options that can be added to "make cmake":
NO_LOCAL_ROUTINES=1 NO_CURL=1
NO_ROOT=1 NO_ODBC=1 NO_SQLITE=1 NO_MYSQL=1 NO_SSL=1 NO_MBEDTLS=1
NO_EXPORT_COMPILE_COMMANDS=1
for example "make cmake NO_ROOT=1" to disable auto-detection of ROOT.
Two more make targets create reduced builds of midas:
"make mini" builds a subset of midas suitable for building frontend programs. Big programs
like mlogger and mhttpd are excluded, optional components like CURL or SQLITE are not needed.
"make remoteonly" builds a subset of midas suitable for building remotely connected
frontends. Big parts of midas are excluded, many system-dependent functions are excluded,
etc. This is intended for embedded applications, such as fpga, uclinux, etc.
But wait, there is more. Here is the full list:
daqubuntu:midas$ make
Usage:
make cmake --- full build of midas
make cclean --- remove everything build by make cmake
options that can be added to "make cmake":
NO_LOCAL_ROUTINES=1 NO_CURL=1
NO_ROOT=1 NO_ODBC=1 NO_SQLITE=1 NO_MYSQL=1 NO_SSL=1 NO_MBEDTLS=1
NO_EXPORT_COMPILE_COMMANDS=1
make dox --- run doxygen, results are in ./html/index.html
make cleandox --- remove doxygen output
make htmllint --- run html check on resources/*.html
make test --- run midas self test
make mbedtls --- enable mhttpd support for https via the mbedtls https library
make update_mbedtls --- update mbedtls to latest version
make clean_mbedtls --- remove mbedtls from this midas build
make mtcpproxy --- build the https proxy to forward root-only port 443 to mhttpd https
port 8443
make mini --- minimal build, results are in linux/{bin,lib}
make cleanmini --- remove everything build by make mini
make remoteonly --- minimal build, remote connetion only, results are in linux-
remoteonly/{bin,lib}
make cleanremoteonly --- remove everything build by make remoteonly
make linux32 --- minimal x86 -m32 build, results are in linux-m32/{bin,lib}
make clean32 --- remove everything built by make linux32
make linux64 --- minimal x86 -m64 build, results are in linux-m64/{bin,lib}
make clean64 --- remove everything built by make linux64
make linuxarm --- minimal ARM cross-build, results are in linux-arm/{bin,lib}
make cleanarm --- remove everything built by make linuxarm
make clean --- run all 'clean' commands
daqubuntu:midas$
K.O. |
1963
|
15 Jul 2020 |
Stefan Ritt | Info | Makefile update | Please note that you can also compile midas in the standard cmake way with
$ mkdir build
$ cd build
$ cmake ..
$ make install
in the root midas directory. You might have to use "cmake3" on some systems.
Stefan |
2508
|
10 May 2023 |
Lukas Gerritzen | Suggestion | Make sequencer more compatible with mobile devices | When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:
if (script.substring(0, 1) === "[") {
// refuse to load script if the selected a subdirectory
return;
}
and the fact that the <option> elements are listening for double click events, which seem to be impossible on a mobile device.
The following modification allows browsing the directories without changing the double click behaviour on a desktop:
diff --git a/resources/load_script.html b/resources/load_script.html
index 41bfdccd..36caa57f 100644
--- a/resources/load_script.html
+++ b/resources/load_script.html
@@ -59,6 +59,28 @@
</div>
<script>
+ document.getElementById("msg_sel").onchange = function() {
+ script = this.value;
+ button = document.getElementById("load_button");
+ if (script.substring(0, 4) === "[..]") {
+ // Change button to go back
+ enable_button_by_id("load_button");
+ button.innerHTML = "Back";
+ button.onclick = up_subdir;
+ } else if (script.substring(0, 1) === "[") {
+ // Change button to load subdirectory
+ enable_button_by_id("load_button");
+ button.innerHTML = "Enter subdirectory";
+ button.onclick = load_subdir;
+ } else {
+ // Change button to load script
+ enable_button_by_id("load_button");
+ button = document.getElementById("load_button");
+ button.innerHTML = "Load script";
+ button.onclick = load_script;
+ }
+ }
+
function set_if_changed(id, value)
{
var e = document.getElementById(id);
This makes the code quoted above redundant, so the check can actually be omitted. |
2509
|
10 May 2023 |
Stefan Ritt | Suggestion | Make sequencer more compatible with mobile devices |
Lukas Gerritzen wrote: | When trying to select a run script on an iPad or other mobile device, you cannot enter subdirectories. This is caused by the following part:
|
We are working right now on a general file picker, which will replace also the file picker for the sequencer. So please wait until the new thing is out and then test it there.
Stefan |
777
|
11 Jul 2011 |
Konstantin Olchanski | Info | Make "STOP" run transition always succeed | Over the years, there was some back-and-forth changes in what happens to run transitions when some
of the participants misbehave (do not respond to RPC calls, timeout, crash, etc).
The very original behaviour was to ignore all errors. This resulted in user confusion when some clients
would start, some would not, data from frontends that missed the transition did not arrive, etc.
So it was changed to fail the transition if any client misbehaves.
This left mlogger (who is usually the first one to see the TR_START transition) in a funny state - output
file is open, etc, but there is no run active. This was fixed by adding a TR_STARTABORT transition to tell
mlogger, event builder & co that the just started run did not start after all.
Also at some point code was added to forcefully kill clients that do not respond to run transitions (do
not respond to RPC, timeout, etc).
Recently, it was observed how during unattended overnight operation of a MIDAS DAQ system, with the
logger set to "auto restart", some unnecessary clients misbehave during the run stop transition, and
prevent the run from stopping and restarting. The user comes in the morning and is unhappy that data
taking stopped some time during the night.
midas.c svn rev 5136 changes the TR_STOP transition to always succeed, even if some clients had
transition errors. If these clients are unnecessary for normal operation of the DAQ, the following run
"auto restart" will continue taking data. If those were important clients, data taking will continue the
best it can - it *is* unattended operation - nobody is looking - but users can always setup alarms for
checking that important clients are always running during data taking. (For very important clients, one
can setup alarms to send email, send SMS messages, etc).
K.O. |
937
|
14 Nov 2013 |
Konstantin Olchanski | Bug Report | MacOS10.9 strlcpy() problem | On MacOS 10.9 MIDAS will crashes in strlcpy() somewhere inside odb.c. We think this is because strlcpy()
in MacOS 10.9 was changed to abort() if input and output strings overlap. For overlapping memory one is
supposed to use memmove(). This is fixed in current midas, for older versions, you can try this patch:
konstantin-olchanskis-macbook:midas olchansk$ git diff
diff --git a/src/odb.c b/src/odb.c
index 1589dfa..762e2ed 100755
--- a/src/odb.c
+++ b/src/odb.c
@@ -6122,7 +6122,10 @@ INT db_paste(HNDLE hDB, HNDLE hKeyRoot, const char *buffer)
pc++;
while ((*pc == ' ' || *pc == ':') && *pc)
pc++;
- strlcpy(data_str, pc, sizeof(data_str));
+
+ //strlcpy(data_str, pc, sizeof(data_str)); // MacOS 10.9 does not permit strlcpy() of overlapping
strings
+ assert(strlen(pc) < sizeof(data_str)); // "pc" points at a substring inside "data_str"
+ memmove(data_str, pc, strlen(pc)+1);
if (n_data > 1) {
data_str[0] = 0;
konstantin-olchanskis-macbook:midas olchansk$
As historical reference:
a) MacOS documentation says "behavior is undefined", which is no longer true, the behaviour is KABOOM!
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/strlcpy.3.h
tml
b) the original strlcpy paper from OpenBSD does not contain the word "overlap"
http://www.courtesan.com/todd/papers/strlcpy.html
c) the OpenBSD man page says the same as Apple man page (behaviour undefined)
http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy
d) the linux kernel strlcpy() uses memcpy() and is probably unsafe for overlapping strings
http://lxr.free-electrons.com/source/lib/string.c#L149
e) midas strlcpy() looks to be safe for overlapping strings.
K.O. |
917
|
01 Oct 2013 |
Konstantin Olchanski | Info | MacOS select() problem | The following code found in mhttpd does not work on MacOS (BSD UNIX).
On Linux, the do-loop will finish after 2 seconds as expected. On MacOS (and other BSD systems), it will
loop forever.
The cause is the MIDAS watchdog alarm() signal that fires every 1 second and always interrupts the 2
second sleep of select(). The Linux select() updates it's timeout argument to reflect time already slept, so
eventually we finish. The MacOS (BSD) select() does not update the timeout argument and select goes back
to sleep for another 2 seconds (to be again interrupted half-way through).
The POSIX standard (specification for select() & co) permits either behaviour. Compare "man select" on
MacOS and on Linux.
If the select() timeout were not 2 seconds, but 0.9 seconds; or if the MIDAS watchdog alarm fired every
2.1 seconds, this problem would also not exist.
I think there are several places in MIDAS with code like this. An audit is required.
{
FD_ZERO(&readfds);
FD_SET(_sock, &readfds);
timeout.tv_sec = 2;
timeout.tv_usec = 0;
do {
status = select(FD_SETSIZE, &readfds, NULL, NULL, &timeout);
/* if an alarm signal was cought, restart with reduced timeout */
} while (status == -1 && errno == EINTR);
}
K.O. |
922
|
25 Oct 2013 |
Konstantin Olchanski | Info | MacOS select() problem | > The following code found in mhttpd does not work on MacOS (BSD UNIX). ...
Because of this problem, on MacOS, run transitions can get stuck forever - most timeouts do not work. (Specifically, recv_string() never times out)
K.O. |
774
|
05 Jul 2011 |
Konstantin Olchanski | Bug Report | MacOS network socket timeouts non-functional | It turns out that because of differences between select() syscall implementation between UNIX (MacOS,
maybe BSD) and Linux, network socket timeouts do not work.
This affects timeouts during run transitions (transition calls to dead clients do not timeout), maybe other
places.
I am looking into fixing this. The main difficulty is with UNIX select() not updating the timeout parameter
when it is interrupted by the MIDAS watchdog alarm signal. Linux select() subtracts the elapsed time from
the timeout value and this code from system.c works correctly: while (1) { status = select(..., &timeout); if
(status==0) break; } (value of timeout becomes smaller each time), while on MacOS it loops forever (value
of timeout does not change).
K.O. |
1177
|
11 May 2016 |
Thomas Lindner | Info | MacOS 10.11 (El Capitan) openssl compilation errors | I recently upgraded my macbook to MacOS 10.11. The compilation of MIDAS failed after the upgrade,
complaining about
gcc -c -g -O2 -Wall <snip> src/mongoose.c
src/mongoose.c:322:10: fatal error: 'openssl/ssl.h' file not found
It seems that MacOS has now fully removed openssl header files (they were deprecated for a while). There
seems to be some notes on that here
http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html
Konstantin suggested installing open-source builds of openssl using MacPorts. I did that and MIDAS
compiled fine. I documented the procedure here:
https://midas.triumf.ca/MidasWiki/index.php/Installation/Compilation_problems#MacOS_10.11_.28El_Capitan.2
9_openssl_errors |
1178
|
12 May 2016 |
Stefan Ritt | Info | MacOS 10.11 (El Capitan) openssl compilation errors | > I recently upgraded my macbook to MacOS 10.11. The compilation of MIDAS failed after the upgrade,
> complaining about
>
> gcc -c -g -O2 -Wall <snip> src/mongoose.c
> src/mongoose.c:322:10: fatal error: 'openssl/ssl.h' file not found
>
> It seems that MacOS has now fully removed openssl header files (they were deprecated for a while). There
> seems to be some notes on that here
>
> http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html
>
> Konstantin suggested installing open-source builds of openssl using MacPorts. I did that and MIDAS
> compiled fine. I documented the procedure here:
>
> https://midas.triumf.ca/MidasWiki/index.php/Installation/Compilation_problems#MacOS_10.11_.28El_Capitan.2
> 9_openssl_errors
The MIDAS Wiki page points to https://guide.macports.org/ which covers OSX up to 10.9. Installers for 10.10 and the current 10.11
(El Captain) can be found here: https://www.macports.org/install.php
Stefan |
2169
|
19 May 2021 |
Francesco Renga | Suggestion | MYSQL logger | Dear all,
I'm trying to use the logging on a mysql DB. Following the instructions on
the Wiki, I recompiled MIDAS after installing mysql, and cmake with NEED_MYSQL=1
can find it:
-- MIDAS: Found MySQL version 8.0.23
Then, I compiled my frontend (cmake with no options + make) and run it, but in the
ODB I cannot find the tree for mySQL. I have only:
Logger/Runlog/ASCII
while I would expect also:
Logger/Runlog/SQL
What could be missing? Maybe should I add something in the CMakeList file or run
cmake with some option?
Thank you,
Francesco |
2171
|
21 May 2021 |
Francesco Renga | Suggestion | MYSQL logger | I solved this, it was a failed "make clean" before recompiling. Now it works.
Sorry for the noise.
Francesco
> Dear all,
> I'm trying to use the logging on a mysql DB. Following the instructions on
> the Wiki, I recompiled MIDAS after installing mysql, and cmake with NEED_MYSQL=1
> can find it:
>
> -- MIDAS: Found MySQL version 8.0.23
>
> Then, I compiled my frontend (cmake with no options + make) and run it, but in the
> ODB I cannot find the tree for mySQL. I have only:
>
> Logger/Runlog/ASCII
>
> while I would expect also:
>
> Logger/Runlog/SQL
>
> What could be missing? Maybe should I add something in the CMakeList file or run
> cmake with some option?
>
> Thank you,
> Francesco |
|