Back Midas Rome Roody Rootana
  Midas DAQ System, Page 111 of 139  Not logged in ELOG logo
ID Date Authordown Topic Subject
  2470   17 Mar 2023 Konstantin OlchanskiInfoT2K/ND280 - Many warning from ten year old variables in ODB
Forwarded from the T2K/ND280 elog:

Author              : Nick Hastings
Subject             : Many warning from ten year old variables in ODB
Logbook URL         : http://elog.nd280.org/elog/FGD/2553

Midas does period checks that the variables in the ODB are ok.
One of these is a check to see if each variable was set with +/- 10
years. Since this experiment has been running for longer than 10
years there are *many* variables that fail this check.

As a result the midas.log and messages in mhttpd are spammed
with many warnings. Eg

Mon Feb 13 14:49:18 2023 [ODBEdit,ERROR] [odb.c:548:db_validate_key,ERROR] Warning: invalid access time, key 
"/System/Prompt", time 1288763123

These can be removed by simply setting the variable again with its current value.

So I wonder if it would be best to just do a full odbdump and then load all the values
back in. Comments from MIDAS experts would be appreciated. Eg:

odbedit -c 'save fgddaq.odb'
odbedit -c 'load fgddaq.odb'

Note this problem is currently seen on both the FGD DAQ and the global slow control MIDAS instances.
It may also be a problem on the INGRID GSC and the DAQs of other ND280 systems but I did not check.
  2474   21 Apr 2023 Konstantin OlchanskiForumSetup Midas with Caen vx2740 - ask for help
> I'm trying to setup Midas with the Caen vx2740 VME digitizer board.

welcome to the world of daq and midas! Ben already answered and he will help you with this specific hardware. (we work together)

> #0  0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
>     at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947

I look at this line in midas and I do not see any problems other than all functions that touch rpc_list are not thread safe,
and calling them at the same time as rpc calls are active will cause memory corruption and crash. This is not a problem
in most programs because rpc_register_function() is usually called once at the beginning of everything, before any RPCs
are received, sent or processed. I filed a bug against this problem. https://bitbucket.org/tmidas/midas/issues/362/rpc_list-is-not-thread-safe

> with is "jrpc"?

I implemented it years ago to allow web pages to call mhttpd (XHR/HTTP) to call user frontends (MIDAS RPC) to perform real-time actions,
i.e. to turn power supplied on or off. "j" stands for "json", but most experiments send very simple commands
and do not use json encoding.

K.O.
  2477   27 Apr 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > I agree, I think we can safely bump the limit from 100 Mbytes to 1 Gbyte, maybe 1.5 or 
> > 1.99 Gbytes. Above that we run into 32-bit/31-bit cleanliness problems.
>
> We just went in and changed: int odb_size_limit = INT_MAX;//100*1000*1000; in odb.cxx.
>

This is change is wrong. As I wrote, ODB is not 64-bit clean and it is not 32-bit clean. We think is is 31-bit clean, so maximum size would be slightly less than 2 Gbytes.

> And we could create ODBs with 1GB and 1.5 GB.

Congratulations. created != "it works". for proper test, you should fill it with 1.5 GB of stuff, save to json file, reload from json file, save to a different json file and compare that they have same contents (minus timestamps).

We could spend a lot of time making odb 32-bit clean and give you 4GB-max ODB, but would it be useful? For large ODB, "save to .json" already takes a long time ("save to .xml" is slower, "save to .odb" ditto, also buggy). We already have complaints that runs take forever to start because mlogger 
takes a long time to write the ODB save file.

P.S. 64-bit clean ODB will be binary incompatible, all internal pointers are 32-bit right now.

K.O.
  Draft   27 Apr 2023 Konstantin OlchanskiForumProblem with running midas odbxx frontends on a remote machine using the -h option
> H
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1   0x00000000000042D828 (null) + 4380712
> 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8   0x0000000000004385EF setup_odb() + 8351
> 9   0x00000000000043B2E6 frontend_init() + 22
> 10  0x000000000000433304 main + 1540
> 11  0x0000007F8C6FE3724D __libc_start_main + 239
> 12  0x000000000000433F7A _start + 42
> 
> Aborted (core dumped)
> 
> 
> We have the same problem for all our frontends. When we want to start them locally they work. Starting them locally with ./frontend -h localhost also reproduces the error above.
> 
> The error can also be reproduced with the odbxx_test.cxx example in the midas repo by replacing line 22 in midas/examples/odbxx/odbxx_test.cxx (cm_connect_experiment(NULL, NULL, "test", NULL);) with cm_connect_experiment("localhost", "Mu3e", "test", NULL); (Put the name of the experiment instead of "Mu3e")
> 
> running odbxx_test locally gives us then the same error as our other frontend.
> 
> Thanks in advance,
> Martin
  2479   27 Apr 2023 Konstantin OlchanskiForumProblem with running midas odbxx frontends on a remote machine using the -h option
Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.

> Connect to experiment Mu3e on host 10.32.113.210...
> OK
> Init hardware...
> terminate called after throwing an instance of 'mexception'
>   what():  
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1   0x00000000000042D828 (null) + 4380712
> 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8   0x0000000000004385EF setup_odb() + 8351
> 9   0x00000000000043B2E6 frontend_init() + 22
> 10  0x000000000000433304 main + 1540
> 11  0x0000007F8C6FE3724D __libc_start_main + 239
> 12  0x000000000000433F7A _start + 42
> 
> Aborted (core dumped)

K.O.
  2481   27 Apr 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> You said the writing into .odb is buggy. Do you mean it’s buggy in general or only in this specific case?
> We save the ODB most of the time in the .odb format. 

I recommend JSON. Main advantage is you can read it using JSON decoder available for any language, no need to write custom code.

Other than that, the main issue is encoding of strings. For ODB this is key names and string values.

JSON was the first to standardize escape characters what can encode all valid UTF-8 UNICODE strings,
the system of escape characters is clean, easy to understand and easy to implement. https://www.json.org/json-en.html

XML is not as well defined as JSON, i.e. go and try to find the XML BNF grammar. I am not sure if the MIDAS XML encoder
and decoder is fully UTF-8 clean, and if some unlucky combinations of characters break string encoding or decoding. This
is usually tested using a fuzzer (generates all possible, unlucky and unlikely string values). Most suspicious
would be quotes, and square and angle brackets. If some character combinations break encoding or decoding, likely
this cannot be fixed in MIDAS without breaking backwards self-compatibility (will not read old ODB files correctly).

Same applies for the ODB format, except that it is even more ad-hoc. Again, any problems are hard to fix without
breaking backward self-compatibility.

In addition, in the past, the ODB and XML decoders had trouble with very long strings, this has been
fixed some time ago.

K.O.
  2482   27 Apr 2023 Konstantin OlchanskiSuggestionMaximum ODB size
my vote is to bump the ODB size limit to 1999*1000*1000 (not quite 2GB). but this needs to be tested. especially save and restore from ODB, XML and JSON files, including how long it takes to save and load a 1.9GB ODB. K.O.
  2488   28 Apr 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > Congratulations. created != "it works".
> 
> Two other tings to consider:
> 
> 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished and read if the first client starts, to get it persistent. 
> So this could slow down starting and stopping, but only the first client, so I guess it's not an issue.
>

typical disk writing speed is 100-1000 Mbytes/sec, so writing 1 GB .ODB.SHM will take 1-10 seconds. NFS over 1gige network is 100 Mbytes/sec, so 10 seconds to 
write .ODB.SHM. embedded ARM write speed to SD flash can be as low as 10 Mbytes/sec, so up to 100 seconds.

> 
> 2) Traditionally, the ODB gets dumped to the .mid file at the beginning and end of every run, so that one know the exact configuration of the experiment
> for offline analysis. This can be turned off of course, but most experiments use it. If the ODB is dumped in any ASCII format, this can take quite long.
> Assume it takes 10 seconds at the beginning of each run, and we take a run every five minutes. Then we loose 48 mins of precious beam time every day.
> 

new default is to save as JSON, (as of my last measurement) JSON encoder is faster than the XML (and ODB?) encoder, by default result is compressed by GZIP-1 (66 
Mbytes/sec is my old benchmark, should remeasure on new DDR5 machines), compressed JSON is written .mid.gz file at disk speed (as above). Alternatively, use LZ4 
compression, runs roughly at memcpy() speed, less compression, written to .mid.lz4 at disk speed.

if data storage is ZFS, ZFS built-in LZ4 compression is now enabled by default, so result writing uncompressed .mid file (no compression of ODB dump), should be 
roughly same as when using MIDAS LZ4 compression and writing .mid.lz4.

bottom line, I need to remeasure gzip and lz4 compression speeds on new computers (DDR4 AMD 5000 series and DDR5 AMD 7000 series).

K.O.
  2489   28 Apr 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
> 
> At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*.
>

Yes and no. They must be storing more than 100 Mbytes of stuff in ODB, if they are asking to bump ODB size from 100 Mbyte to 2 GByte ODB.

On the MIDAS, side, though, we have to plan for the worst case, if max ODB size 1.9 GB and it is full of data,
and mlogger (and odbedit save and load) take 10-30 seconds, then at least all timeouts (watchdog timeout, RPC timeout, etc)
must be increased accordingly.

K.O.
  2490   28 Apr 2023 Konstantin OlchanskiForumProblem with running midas odbxx frontends on a remote machine using the -h option
> As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort

ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.

but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.

what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).

K.O.
  2492   28 Apr 2023 Konstantin OlchanskiForumProblem with running midas odbxx frontends on a remote machine using the -h option
> > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines. 
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine

so, a regression. ouch.

if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to 
serve your RPC connection that crashes.

> int main() {
>    cm_connect_experiment("localhost", "Mu3e", "test", NULL);
>    midas::odb o = {
>       {"Int32 Key", 42}
>    };
>    o.connect("/Test/Settings");
>    cm_disconnect_experiment();
>    return 1;
> }

to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.

so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).

rant-off.

to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu

last known-working point is:

commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date:   Wed Jul 13 14:45:28 2022 -0700

    Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.

K.O.
  2515   12 May 2023 Konstantin OlchanskiInfoNew environment variable MIDAS_EXPNAME
> A new environment variable MIDAS_EXPNAME has been introduced [to be used together with 
MIDAS_DIR]

This is fixes an important buglet. If experiment uses MIDAS_DIR instead of exptab, at the time 
of connecting to ODB, we do not know the experiment name and use name "Default" to create ODB 
shared memory, instead of actual experiment name.

This creates an inconsistency, if some MIDAS programs in the same experiment use MIDAS_DIR while 
others use exptab (this would be unusual, but not impossible) they would connect to two 
different ODB shared memories, former using name "Default", latter using actual experiment name.

As an indication that something is not right, when stopping MIDAS programs, there is an error 
message about failure to delete shared ODB shared memory (because it was created using name 
"Default" and delete using the correct experiment name fails).

Also it can cause co-mingling between two different experiments, depending on the type of shared 
memory used by MIDAS (see $MIDAS_DIR/.SHM_TYPE.TXT):
POSIX - (usually not used) not affected (experiment name is not used)
POSIXv2 - (usually not used) affected (shm name is "$EXP_NAME_ODB")
POSIXv3 - used on MacOS - affected (shm name is "$UID_$EXP_NAME_ODB" so "$UID_Default_ODB" will 
collide)
POSIXv4 - used on Linux - not affected (shm name includes $MIDAS_DIR which is different for 
different experiments)

K.O.
  2516   16 May 2023 Konstantin OlchanskiBug Reportexcessive logging of http requests
Our default configuration of apache httpd logs every request. MIDAS custom web pages can easily make a huge number of RPC calls creating a 
huge log file and filling system disk to 100% capacity

this example has around 100 RPC requests per second. reasonable/unreasonable? available hardware can handle it (web browser, network 
httpd, mhttpd, etc), so we should try to get this to work. perhaps filter the apache httpd logs to exclude mjsonrpc requests? of course we 
can ask the affected experiment why they make so many RPC calls, is there a bug?

[14/May/2023:03:49:01 -0700] 142.90.111.176 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /?mjsonrpc HTTP/1.1" 299
[14/May/2023:03:49:01 -0700] 142.90.111.176 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /?mjsonrpc HTTP/1.1" 299
[14/May/2023:03:49:01 -0700] 142.90.111.176 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /?mjsonrpc HTTP/1.1" 299
[14/May/2023:03:49:01 -0700] 142.90.111.176 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /?mjsonrpc HTTP/1.1" 299
[14/May/2023:03:49:01 -0700] 142.90.111.176 TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384 "POST /?mjsonrpc HTTP/1.1" 299

K.O.
  2517   16 May 2023 Konstantin OlchanskiBug Reportexcessive logging of http requests
> Our default configuration of apache httpd logs every request. MIDAS custom web pages can easily make a huge number of RPC calls creating a 
> huge log file and filling system disk to 100% capacity

perhaps use existing logrotate, add limit on file size (size) and limit of 2 old log files (rotate).

/etc/logrotate.d/httpd

/var/log/httpd/*log { 
    size 100M 
    rotate 2 
    missingok 
    notifempty 
    sharedscripts 
    delaycompress 
    postrotate 
        /bin/systemctl reload httpd.service > /dev/null 2>/dev/null || true 
    endscript 
} 

K.O.
  2525   09 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished ...

correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().

I have just run into a problem with this in the DRAGON experiment. At begin and end of run they run
a script that does a large number of "odbedit" calls to read stuff from ODB and it was taking a very long time.
Each odbedit invocation was taking about 1 second, starting odbedit is quick, stopping odbedit takes about 1 second.

It turns out each invocation of odbedit saves .ODB.SHM, ODB was 100 Mbytes size, home disk is an HDD (~100-200 Mbytes/sec writing speed), so yes, about 1 second to 
stop odbedit.

Solution was to reduce ODB size from 100 Mbytes to 10 Mbytes, odbedit now run quickly, begin and end of run scripts run quickly. problem solved.

K.O.

P.S. no, I am not the dragon experiment, no, I did not write those scripts, no, I will not rewrite them, persons who wrote them are long gone, no, the persons running 
dragon today will not be rewriting them.
  2526   09 Jun 2023 Konstantin OlchanskiInfoadded IPv6 support for mserver and MIDAS RPC
as of commit 71fb5e82f3e9a1501b4dc2430f9193ee5d176c82, MIDAS RPC and the mserver 
listen for connections both on IPv4 and IPv6. mserver clients and MIDAS RPC 
clients can connect to MIDAS using both IPv4 and IPv6. In the default 
configuration ("/Expt/Security/Enable non-localhost RPC" set to "n"), IPv4 
localhost is used, as before. Support for IPv6 is a by product from switching 
from obsolete non-thread-safe gethostbyname() and getaddrbyname() to modern 
getaddrinfo() and getnameinfo(). This fixes bug 357, observed crash of mhttpd 
inside gethostbyname(). K.O.
  2528   12 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > correction: ODB shared memory is saved to .ODB.SHM each time a client stops, this is db_close_database().
> 
> The original design of the midas shared memory (back in the 1990's) was that the ODB shared memory file gets
> only saved into the .ODB.SHM when the *last* client exits. This ensures to keep the ODB persistent when the
> shared memory gets deleted. I vaguely remember I put something in like:
> 
> db_close_database()
> ...
>   destroy_flag = (pheader->num_clients == 0);
> 
>   if (destroy_flag)
>      ss_shm_flush(pheader->name, pdb->shm_adr, pdb->shm_size, pdb->shm_handle);

I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

What's more, ss_shm_flush() is done while holding the ODB semaphore, so all other midas programs that try to access
odb at the same time (including the mserver) will stall until write() and close() return. at least we do not fsync(),
and there is no waiting until data is committed to physical media.

$ git annotate 3bb04af4d^ src/odb.c
...
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	875)  destroy_flag = (pheader->num_clients == 0);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	876)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	877)  /* flush shared memory to disk */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	878)  ss_flush_shm(pheader->name, pheader, sizeof(DATABASE_HEADER)+2*pheader->data_size);
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	879)
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	880)  /* unmap shared memory, delete it if we are the last */
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	881)  ss_close_shm(pheader->name, pheader,
ef8320177	(Stefan Ritt	1998-10-08 13:46:02 +0000	882)               _database[hDB-1].shm_handle, destroy_flag);
...

K.O.
  2535   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > I remember the same, but I tracked it down in git to the very first commit, and there is no if() there,
> > odb is saved to .ODB.SHM on every client shutdown, not just the last client. I guess we both misremebered.

small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
begin and end of every data file, you can recover some of the lost data.

for better effect, ODB should be dumped to disk at periodic intervals. but. current implementation writes odb to disk while holding the ODB 
semaphore, which means all ODB access stops for the duration, specifically, there will be gaps in the history because mlogger cannot read history 
data from ODB.

a better implementation could take the ODB lock, make a copy of ODB shared memory, release the ODB lock, complete writing to disk without holding the 
lock. protection is needed against 100 midas programs trying to do this all at the same time. computers with 0.5 GB RAM (many ARM FPGA SoCs) will be 
limited to ~100 Mbyte ODB). plus deal with memory allocation failures when taking a copy of a 2GB ODB.

in theory, the mmap() shared memory (already implemented in midas) does this automatically, but we lose control
over disk writes, we see some OSes write odb to disk "too often" and at wrong times, i.e. while we are in the middle
of creating or deleting something. current sequence of open(), atomic write() and close() ensures ODB.SHM always
contains a valid odb. (minus loss of OS and disk caches to crash or power loss).

K.O.
  2536   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> > BTW, how do I resize the ODB.

ODB cannot be resized "online". Everything has to stop, save content to odb.json, get rid of old ODB.SHM, ensure ODB shared memory is destroyed (SysV or POSIX shared memory), 
create new ODB with new size, load odb.json. Feel free to punch this into chatgpt > odbresize.cxx, commit, test, push.

> I remember we discussed this some time ago, and concluded that odbedit needs a resize flag.

ODB cannot be resized online. ODB API has ODB clients holding ODB handles which are pointers (offsets) into ODB shared memory.

> Has this even been done?
> I guess this is still not done and the issue is still open: https://bitbucket.org/tmidas/midas/issues/329/need-odbresize
> I guess if we touch this maybe the problem with the wrong size should be also fixed: https://bitbucket.org/tmidas/midas/issues/328/odbinit-s-1024mb-creates-odb-with-wrong

please contribute 14 distraction-free days to my patreon. thanks in advance!

K.O.
  2538   13 Jun 2023 Konstantin OlchanskiSuggestionMaximum ODB size
> 
> > small problem. build an experiment, start taking data, observe how ODB is never saved to disk because the "last client" never stops. as bonus, crash 
> > the computer, observe how all changes to ODB are now lost. if mlogger is configured to save odb.json at the end of run, and to write ODB dumps at 
> > begin and end of every data file, you can recover some of the lost 
> 
> The new behavior is not much worse than before. Assume 10 programs running happily for days, computer crashes, all ODB changes lost. 
> So indeed a periodic flush without holding the lock might be best. Use a semaphore to prevent all programs flushing at the same time, or put
> the flush only in the logger after an end of run.

are you sure? when/how often does "last midas program finishes" happen? it does not happen on a system crash, not on power loss, not on "shutdown -r now" 
(I am pretty sure). In the experiments you run, how often do you shut down all programs (and check that you did not forget one somehow)?

sanity check. dragon experiment, very active, .ODB.SHM timestamp is 1 second old. not-very-active agmini, today is June 13th, timestamp of .ODB.SHM is June 
2nd. inactive TACTIC, timestamp of .ODB.SHM is May 16th.

so yes, not great, but in the new scheme, ODB.SHM timestamps would probably be from 2021 or 2020.

my vote is to undo this change, it is dangerous because it causes odb to be saved to ODB.SHM never.

K.O.
ELOG V3.1.4-2e1708b5