ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 4 of 52

Not logged in

Find | Login | Help

Full | Summary | Threaded | Collapse | Expand

1028 Entries

Goto page Previous 1, 2, 3, 4, 5 ... 50, 51, 52 Next

04 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code

Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.

The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?

Thanks,

Mark.

04 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code

Noted. I will look at this asap. K.O.

[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to 
close to 100Gb.  We think we've traced this to the following issue when making 
RPC calls.

[LIST]
[*] The brpc method allocates memory for the response at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may 
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to 
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at 
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the 
buffer if the size is greater than zero.
[/LIST]

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that 
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push 
to https://bitbucket.org/tmidas/mjson.git.  Could somebody take a look?

Thanks,

Mark.[/quote]

07 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code

Hi,

We applied an intermediate fix for this locally and it seems to have fixed our issue. The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use. After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.

Our fix to mjson was:

diff --git a/mjson.cxx b/mjson.cxx

index 17ee268..2443510 100644

--- a/mjson.cxx

+++ b/mjson.cxx

@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor

delete subnodes[i];

subnodes.clear();

- if (arraybuffer_size > 0) {

- assert(arraybuffer_ptr != NULL);

+ if (arraybuffer_ptr != NULL) {

free(arraybuffer_ptr);

arraybuffer_size = 0;

arraybuffer_ptr = NULL;

We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx

index 2201d228..38f0b99b 100644

--- a/src/mjsonrpc.cxx

+++ b/src/mjsonrpc.cxx

@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)

status = cm_connect_client(name.c_str(), &hconn);

if (status != RPC_SUCCESS) {

+ free(buf);

return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));

}

I hope this is useful to someone. As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.

Thanks,

Mark.

10 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code

I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().

I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).

Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
file, return of "new char[0]" is never freed).

"receive_event" and "read_history" RPCs never use zero-size buffers and are not 
affected by this bug.

mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295

Please post bug reports a plain-text so I can quote from them.

K.O.

15 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code

Many thanks for the fix.  We've applied and see better memory performance.  We still have to kill and restart 
mhttpd after a few days however.  I think the official fix is missing this part:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
    status = cm_connect_client(name.c_str(), &hconn);
 
    if (status != RPC_SUCCESS) {
+      free(buf);
       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
    }

When the other process returns a failure the memory block is also currently leaked.  I originally stated "...although I 
don't think it contributed to the leak we were seeing" but it seems this was false.

Thanks,

Mark.


> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
> 
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
> 
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
> file, return of "new char[0]" is never freed).
> 
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not 
> affected by this bug.
> 
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
> 
> Please post bug reports a plain-text so I can quote from them.
> 
> K.O.

23 Jun 2025, Stefan Ritt, Bug Report, Memory leak in mhttpd binary RPC code

Since this memory leak is quite obvious, I pushed the fix to develop.

Stefan

10 Jun 2025, Nik Berger, Bug Report, History variables with leading spaces

By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.

How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space.

19 Jun 2025, Stefan Ritt, Bug Report, History variables with leading spaces

I added now code to the logger so it properly complains if there would be a leading space in a variable name.

Stefan

> By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
> 
> How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
> Sugested fix: Produce an error if a history variable has a leading space.

19 Jun 2025, Frederik Wauters, Bug Report, add history variables

I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So 
depending on the size and zoom of my screen, some variables further down the list 
can not be selected

tried Chrome and Firefox

11 Jun 2025, Stefan Ritt, Info, Frontend write cache size

We had issues with the frontend write cache size and the way it was implemented in the frontend 
framework mfe. Specifically, we had two equipments like in the experiment/examples/frontend.cxx, one 
trigger event and one periodic event. While the trigger event sets the cache size correctly in the 
equipment table, the periodic event (defined afterwards) overwrites the cache size to zero, causing slow 
data taking. 

The underlying problem is that one event buffer (usually "SYSTEM") can only have one cache size for all 
events in one frontend. To avoid mishaps like that, I remove the write cache size from the equipment table 
and instead defined a function which explicitly sets the cache size for a buffer. In your frontend_init() you 
can now call

  set_cache_size("SYSTEM", 10000000);

to set the cache size of the SYSTEM buffer. Note that 10 MiB is the minimum cache size. Especially for 
smaller events, this dramatically can speed up your maximal DAQ rate.

Full commit is here:

  https://bitbucket.org/tmidas/midas/commits/24fbf2c02037ae5f7db74d0cadab23cd4bfe3b13

Note that this only affects frontends using the mfe framework, NOT for those using the tmfe framework.

Stefan

11 Jun 2025, Stefan Ritt, Info, History configuration changed

Today the way the history system is configured has changed. Whenever one adds new equipment 
variables, the logger has to add that variable to the history system. Previously, this happened during 
startup of the logger and at run start. We have now a case in the Mu3e experiment where we have many 
variables and the history configuration takes about 15 seconds, which delays data taking considerably.

After discussion with KO we decided to remove the history re-configuration at run start. This speeds up 
the run start considerably, also for other experiments with many history variables. It means however that 
once you add/remove/rename any equipment variable going into the history system, you have to restart the 
logger for this to become active.

https://bitbucket.org/tmidas/midas/commits/c0a14c2d0166feb6b38c645947f2c5e0bef013d5

Stefan

10 Jun 2025, Amy Roberts, Info, use of modified image showing MIDAS data format?

Hello, I'm currently writing a paper about making a dataset publicly available 
from the University of Minnesota and it's a MIDAS dataset.

I'd like to use an image that shows the MIDAS data format (that's been slightly 
modified to fit in the paper) and am wondering (1) if I could get permission to do 
so and (2) what the preferred attribution would be.

10 Jun 2025, Stefan Ritt, Info, use of modified image showing MIDAS data format?

> Hello, I'm currently writing a paper about making a dataset publicly available 
> from the University of Minnesota and it's a MIDAS dataset.
> 
> I'd like to use an image that shows the MIDAS data format (that's been slightly 
> modified to fit in the paper) and am wondering (1) if I could get permission to do 
> so and (2) what the preferred attribution would be. 

Feel free to use whatever you like, the documentation is on the same open source license as midas itself.

Stefan

27 May 2025, Pavel Murat, Suggestion, handling of 2+ like-long messages

Dear MIDAS experts, 

currently, the MIDAS messaging system is optimized for one-line long messages, 
so the content of 2+liners shows up in the message log in the reverse order, 
with the first line on the bottom of the message. 

I wonder if printing the message content in the reverse order, starting from the last line, 
would make sense ? - that wouldn't affect one-line long messages, but could make longer 
messages more useful. 

--thanks, regards, Pasha

26 May 2025, Francesco Renga, Forum, Reading two devices in parallel

Dear experts,
      in the CYGNO experiment, we readout CMOS cameras for optical readout of GEM-TPCs. So far, we only developed the readout for a single camera. In the future, we will have multiple cameras to read out (up to 18 in the next phase of the experiment), and we are investigating how to optimize the readout by exploiting parallelization.

One idea is to start parallel threads within a single equipment. Alternatively, one could associate different equipment with each camera and run an Event Builder. Perhaps other solutions did not come to mind. Which one would you regard as the most effective and elegant?

Thank you very much,
           Francesco

26 May 2025, Stefan Ritt, Forum, Reading two devices in parallel

> Dear experts,
>       in the CYGNO experiment, we readout CMOS cameras for optical readout of GEM-TPCs. So far, we only developed the readout for a single camera. In the future, we will have multiple cameras to read out (up to 18 in the next phase of the experiment), and we are investigating how to optimize the readout by exploiting parallelization.
> 
> One idea is to start parallel threads within a single equipment. Alternatively, one could associate different equipment with each camera and run an Event Builder. Perhaps other solutions did not come to mind. Which one would you regard as the most effective and elegant?
> 
> Thank you very much,
>            Francesco

In principle both will work. It's kind of matter of taste. In the multi-threaded approach one has a single frontend to start and stop, and in the second case you have to start 18 individual frontends and make sure that they are running. 

For the multi-threaded frontend you have to ensure proper synchronization between the threads (like common run start/stop), and in the end you also have to do some event building, sending all 18 streams into a single buffer. As you know, multi-thread programming can be a bit of an art, using mutexes or semaphores, but it can be more flexible as the event builder which is a given piece of software.

Best,
Stefan

25 May 2025, Pavel Murat, Bug Report, subdirectory ordering in ODB browser ?

Dear MIDAS experts, 

I'm running into a minor but annoying issue with the subdirectory name ordering by the ODB browser. 
I have a straw-man hash map which includes ODB subdirectories named "000", "010", ... "300", 
and I'm yet to succeed to have them displayed in a "natural" order: the subdirectories with names 
starting from "0" always show up on the bottom of the list - see attached .png file. 

Neither interactive re-ordering nor manual ordering of the items in the input .json file helps. 

I have also attached a .json file which can be loaded with odbedit to reproduce the issue. 

Although I'm using a relatively recent - ~ 20 days old - commit, 'db1819ac', is it possible 
that this issue has already been sorted out ?

-- many thanks, regards, Pasha

24 May 2025, Pavel Murat, Info, ROOT scripting for MIDAS seems to work pretty much out of the box

Dear All, 

I'm pretty sure many know this already, however I found this feature by a an accident 
and want to share with those who don't know about it yet - seems very useful. 

- it looks that one can use ROOT scripting with rootcling and call from the 
  interactive ROOT prompt any function defined in midas.h and access ODB seemingly 
  WITHOUT DOING anything special 

- more surprisingly, that also works for odbxx, with one minor exception in handling 
  the 64-bit types - the proof is in attachment. The script test_odbxx.C loaded 
  interactively is Stefan's

 https://bitbucket.org/tmidas/midas/src/develop/examples/odbxx/odbxx_test.cxx 

with one minor change - the line 
 
   o[Int64 Key] = -1LL;

is replaced with

   int64_t x = -1LL;
   o["Int64 Key"] = x;

- apparently the interpeter has its limitations. 

My rootlogon.C file doesn't load any libraries, it only defines the appropriate 
include paths. So it seems that everything works pretty much out of the box. 

One issue has surfaced however. All that worked despite my experiment 
had its name="test_025", while the example specifies experiment="test". 
Is it possible that that only first 4 characters are being tested ? 

-- regards, Pasha

16 May 2025, Marius Koeppel, Bug Report, history_schema.cxx fails to build

Hi all,

we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx. There was a major change in this code in the commits fe7f6a6 and 159d8d3.

image: rootproject/root:latest

pipelines:
  default:
    - step:
        name: 'Build and test'
        runs-on:
          - self.hosted
          - linux
        script:
          - apt-get update
          - DEBIAN_FRONTEND=noninteractive apt-get -y install python3-all python3-pip python3-pytest-dependency python3-pytest
          - DEBIAN_FRONTEND=noninteractive apt-get -y install gcc g++ cmake git python3-all libssl-dev libz-dev libcurl4-gnutls-dev sqlite3 libsqlite3-dev libboost-all-dev linux-headers-generic
          - gcc -v
          - cmake --version
          - git clone https://marius_koeppel@bitbucket.org/tmidas/midas.git
          - cd midas
          - git submodule update --init --recursive
          - mkdir build
          - cd build
          - cmake ..
          - make -j4 install


Error is:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:5991:10: error: �class HsSqlSchema� has no member named �table_name�; did you mean �fTableName�?

 5991 |       s->table_name = xtable_name;

      |          ^~~~~~~~~~

      |          fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx: In member function �virtual int PgsqlHistory::read_column_names(HsSchemaVector*, const char*, const char*)�:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6034:14: error: �class HsSqlSchema� has no member named �table_name�; did you mean �fTableName�?

 6034 |       if (s->table_name != table_name)

      |              ^~~~~~~~~~

      |              fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6065:16: error: �struct HsSchemaEntry� has no member named �fNumBytes�

 6065 |             se.fNumBytes = 0;

      |                ^~~~~~~~~

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6140:30: error: �__gnu_cxx::__alloc_traits<std::allocator<HsSchemaEntry>, HsSchemaEntry>::value_type� {aka �struct HsSchemaEntry�} has no member named �fNumBytes�

 6140 |             s->fVariables[j].fNumBytes = tid_size;

      |                              ^~~~~~~~~

At global scope:

cc1plus: note: unrecognized command-line option �-Wno-vla-cxx-extension� may have been intended to silence earlier diagnostics

make[2]: *** [CMakeFiles/objlib.dir/build.make:384: CMakeFiles/objlib.dir/src/history_schema.cxx.o] Error 1

make[2]: *** Waiting for unfinished jobs....

make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/objlib.dir/all] Error 2

make: *** [Makefile:136: all] Error 2

16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build

> we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.

Missing from this report is critical information: HAVE_PGSQL is set.

I will have to check why it is not set in my development account.

I will have to check why it is not set in our bitbucket build.

Thank you for reporting this problem.

K.O.

16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build

> > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
> 
> Missing from this report is critical information: HAVE_PGSQL is set.
> 
> I will have to check why it is not set in my development account.
> 

The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages

apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS

>
> I will have to check why it is not set in our bitbucket build.
> 

Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).

>
> Thank you for reporting this problem.
> 

Fix committed. Sorry about this problem.

K.O.

05 May 2025, Konstantin Olchanski, Bug Report, abort and core dump in cm_disconnect_experiment()

I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to 
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.

mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires 
the watchdog.

In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the 
core dump.

I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".

Stack trace from mhist:

[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my 
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue, 
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...

Program received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6  0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8  0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9  0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb) 

Stack trace from mserver:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6  0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8  0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9  bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387

K.O.

06 May 2025, Stefan Ritt, Bug Report, abort and core dump in cm_disconnect_experiment()

I would be in favor of not curing the symptoms, but fixing the cause of the problem. I guess you put the watchdog disable into mhist, right? Usually mhist is called locally, so no mserver should be 
involved. If not, I would prefer to propagate the watchdog disable to the mserver side as well, if that's not been done already. Actually I never would disable the watchdog, but set it to a reasonable 
maximal value, like a few minutes or so. In that case, the client gets still removed if it crashes for some reason.

My five cents,
Stefan

05 May 2025, Konstantin Olchanski, Bug Fix, Bug fix in SQL history

A bug was introduced to the SQL history in 2022 that made renaming of variable names not work. This is now fixed.

break commit:
54bbc9ed5d65d8409e8c9fe60b024e99c9f34a85
fix commit:
159d8d3912c8c92da7d6d674321c8a26b7ba68d4

P.S.

This problem was caused by an unfortunate design of the c++ class system. If I want to add more data to an existing 
class, I write this:

class old_class {
int i,j,k;
}

class bigger_class: public old_class {
int additional_variable;
}

But if I have this:

struct x { int i,j; }

class y {
std::vector<x> array_of_x;
}

and I want to add "k" to "x", c++ has not way to do this. history code has this workaround:

class bigger_y: public y
{
std::vector<int> array_of_k;
}

int bigger_y:foo(int n) {
printf("%d %d %d\", array_of_x[n].i, array_of_x[n].j, array_of_k[n]);
}

problem is that it is not obvious that "array_of_x" and "array_of_k" are connected
and they can easily get out of sync (if elements are added or removed). this is the
bug that happened in the history code. I now added assert(array_of_x.size()==array_of_k.size())
to offer at least some protection going forward.

P.S. As final solution I think I want to completely separate file history and sql history code,
they have more things different than common.

K.O.

29 Apr 2025, Pavel Murat, Bug Report, ODBXX : ODB links in the path names ?

Dear MIDAS experts,

does the ODBXX interface to ODB currently ODB links in the path names? - From what I see so far, it currently fails to do so,
but I could be doing something else wrong... 

-- thanks, regards, Pasha

30 Apr 2025, Stefan Ritt, Bug Report, ODBXX : ODB links in the path names ?

Indeed this was missing from the very beginning. I added it, please report back if it's not working.

Stefan

12 May 2020, Stefan Ritt, Info, New ODB++ API

Since the beginning of the lockdown I have been working hard on a new object-oriented interface to the online database ODB. I have the code now in an initial state where it is ready for 
testing and commenting. The basic idea is that there is an object midas::odb, which represents a value or a sub-tree in the ODB. Reading, writing and watching is done through this 
object. To get started, the new API has to be included with

   #include <odbxx.hxx>

To create ODB values under a certain sub-directory, you can either create one key at a time like:

   midas::odb o;
   o.connect("/Test/Settings", true);   // this creates /Test/Settings
   o.set_auto_create(true);            // this turns on auto-creation
   o["Int32 Key"] = 1;                 // create all these keys with different types
   o["Double Key"] = 1.23;
   o["String Key"] = "Hello";

or you can create a whole sub-tree at once like:

  midas::odb o = {
    {"Int32 Key", 1},
    {"Double Key", 1.23},
    {"String Key", "Hello"},
    {"Subdir", {
      {"Another value", 1.2f}
    }
  };
  o.connect("/Test/Settings");

To read and write to the ODB, just read and write to the odb object

   int i = o["Int32 Key];
   o["Int32 Key"] = 42;
   std::cout << o << std::endl;

This works with basic types, strings, std::array and std::vector. Each read access to this object triggers an underlying read from the ODB, and each write access triggers a write to the 
ODB. To watch a value for change in the odb (the old db_watch() function), you can use now c++ lambdas like:

   o.watch([](midas::odb &o) {
      std::cout << "Value of key \"" + o.get_full_path() + "\" changed to " << o << std::endl;
   });

Attached is a full running example, which is now also part of the midas repository. I have tested most things, but would not yet use it in a production environment. Not 100% sure if there 
are any memory leaks. If someone could valgrind the test program, I would appreciate (currently does not work on my Mac).

Have fun!

Stefan

20 May 2020, Konstantin Olchanski, Info, New ODB++ API

>    midas::odb o;
>    o["foo"] = 1;

This is an excellent development.

ODB is a tree-structured database, JSON is a tree-structured data format,
and they seem to fit together like hand and glove. For programming
web pages, Javascript and JSON-style access to ODB seems to work really well.

And now with modern C++ we can have a similar API for working with ODB tree data,
as if it were Javascript JSON tree data.

Let's see how well it works in practice!

K.O.

20 May 2020, Stefan Ritt, Info, New ODB++ API

In meanwhile, there have been minor changes and improvements to the API:

Previously, we had:

>    midas::odb o;
>    o.connect("/Test/Settings", true);   // this creates /Test/Settings
>    o.set_auto_create(true);            // this turns on auto-creation
>    o["Int32 Key"] = 1;                 // create all these keys with different types
>    o["Double Key"] = 1.23;
>    o["String Key"] = "Hello";

Now, we only need:

      o.connect("/Test/Settings");
      o["Int32 Key"] = 1;                 // create all these keys with different types
      ...

no "true" needed any more. If the ODB tree does not exist, it gets created. Similarly, set_auto_create() can be dropped, it's on by default (thought this makes more sense). Also the iteration over subkeys has 
been changed slightly.

The full example attached has been updated accordingly. 

Best,
Stefan

20 May 2020, Pintaudi Giorgio, Info, New ODB++ API

All this is very good news. I really wish this were available some months ago: it would have helped me immensely. The old C API was clunky at best.
I really like the idea and looking forward to using it (even if at the moment I do not have the need to) ...

20 May 2020, Konstantin Olchanski, Info, New ODB++ API

> All this is very good news. I really wish this were available some months ago: it would have helped me immensely. The old C API was clunky at best.
> I really like the idea and looking forward to using it (even if at the moment I do not have the need to) ...

Yes, I have designed new C-style MIDAS ODB APIs twice now (VirtualOdb in ROOTANA and MVOdb in ROOTANA and MIDAS),
and I was never happy with the results. There is too many corner cases and odd behaviour. Let's see how
this C++ interface shakes out.

For use in analyzers, Stefan's C++ interface still need to be virtualized - right now it has only one implementation
with the MIDAS ODB backend. In analyzers, we need XML, JSON (and a NULL ODB) backends. The API looks
to be clean enough to add this, but I have not looked at the implementation yet. So "watch this space" as they say.

K.O.

30 Apr 2025, Stefan Ritt, Info, New ODB++ API

I had to change the ODBXX API: https://bitbucket.org/tmidas/midas/commits/273c4e434795453c0c6bceb46bac9a0d2d27db18

The old C API is case-insensitive, meaning db_find_key("name") returns a key "name" or "Name" or "NAME". We can discuss if this is good or bad, but that's how it is since 30 years.

I now realized the the ODBXX API keys is case sensitive, so a o["NAME"] does not return any key "name". Rather, it tries to create a new key which of course fails. I changed therefore
the ODBXX to become case-insensitive like the old C API.

Stefan

30 Apr 2025, Pavel Murat, Info, New ODB++ API

it is a very convenient interface! Does it support the ODB links in the path names ? -- thanks, regards, Pasha

08 Apr 2025, Lukas Mandokk, Info, MSL Syntax Highlighting Extension for VSCode (Release)

Hello everyone,

I just wanted to let you know, that I published a MSL Syntax Highlighting Extension for VSCode.
It is still in a quite early stage, so there might be some missing keywords and edge cases which are not fully handled. So in case you find any issues or have suggestions for improvements, I am happy to implement them. Also I only tested it with a custom theme (One Monokai), so it might look very different with the default theme and other ones.

The extension is called "MSL Syntax Highlighter" and can be found in the extension marketplace in VSCode. (vscode marketplace: https://marketplace.visualstudio.com/items?itemName=LukasMandok.msl-syntax-highlighter, github repo: https://github.com/LukasMandok/msl-syntax-highlighter)

One additional remark:
- To keep a consitent style with existing themes, one is a bit limited in regard to colors. For this reason a distinction betrween LOOP and IF Blocks is not really possible without writing a custom theme. A workaround would be to add the theming in the custom user settings (explained in the readme).

01 Apr 2025, Lukas Gerritzen, Suggestion, Sequencer ODBSET feature requests

I would like to request the following sequencer features if you find the ideas as sensible as I do:

A "Reload File" button
Support for patterns in ODBSET, e.g.:
- ```
ODBSET "/Path/value[1,3,5]", 1
```
- ```
ODBSET "/Path/value[1-5,7-9]", 1
```
- Arbitrary combinations of the above

Support for variable substitution:

SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]", 1

SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]", 1

ODBSET "/Path/value[0-100, except $BADCHANNELS]", 1

To add some context: I am using the sequencer for a voltage scan of several thousand channels. However, a few dozen of them have shorts, so I cannot simply set all demands to the voltage step. Currently, this is solved with a manually-created ODB file for each individual voltage step, but as you can imagine, this is quite difficult to maintain.

I also encountered a small annoyance in the current workflow of editing sequencer files in the browser:

Load a file
Double-click it to edit it, acknowledge the "To edit the sequence it must be opened in an editor tab" dialog
A new tab opens
Edit something, click "Start", acknowledge the "Save and start?" dialog (which pops up even if no changes are made)
Run the script
Double-click to make more changes -> another tab opens

After a while, many tabs with the same file are open. I understand this may be considered "user error", but perhaps the sequencer could avoid opening redundant tabs for the same file, or prompt before doing so?

Thanks for considering these suggestions!

01 Apr 2025, Lukas Gerritzen, Suggestion, Sequencer ODBSET feature requests

While trying to simplify the existing spaghetti code, I encountered problems with type safety. Compare the following:

SET v, "54"
SET file, "MPPCHV_$v.odb"
ODBLOAD $file

-> successfully loads MPPCHV_54.odb

SET v, "54.2"
SET file, "MPPCHV_$v.odb"
ODBLOAD $file

-> Error reading file "[...]/MPPCHV_54.200000.odb"

The "54.2" appears to be stored as a float rather than a string. Maybe "54" was stored as an integer? I don't know how to verify this in odbedit.

Actually, I would be fine with setting the value as a float, as it allows arithmetic. In that case, I would appreciate something like a SPRINTF function in MSL:

SET v, 54.2
SPRINTF file, "MPPCHV_%f.odb", $v
ODBLOAD $file

Or, maybe a bit more modern, something akin to Python's f-strings

ODBLOAD f"MPPCHV_{v:.1f}.odb"

01 Apr 2025, Stefan Ritt, Suggestion, Sequencer ODBSET feature requests

A new sequencer which understands Python is in the works. There you can use all features from that language.

Stefan

01 Apr 2025, Stefan Ritt, Suggestion, Sequencer ODBSET feature requests

The extended ODBSET[x,y1-y2,z] could make sense to be implemented, since it then will match the alarm system which uses the same syntax.

The $GOODCHANNELS/$BADCHANNELS is however a very strange syntax which I haven't seen in any other computer language. It would take me probably several days to properly implement this, while it would take you much less time to explicitly use a few ODBSET statements to set the bad channels to zero.

For the file edit workflow, the author of the editor will have a look.

Stefan

Lukas Gerritzen wrote:

I would like to request the following sequencer features if you find the ideas as sensible as I do:

A "Reload File" button
Support for patterns in ODBSET, e.g.:
- ```
ODBSET "/Path/value[1,3,5]", 1
```
- ```
ODBSET "/Path/value[1-5,7-9]", 1
```
- Arbitrary combinations of the above

Support for variable substitution:

SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]", 1

SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]", 1

ODBSET "/Path/value[0-100, except $BADCHANNELS]", 1

01 Apr 2025, Konstantin Olchanski, Suggestion, Sequencer ODBSET feature requests

> ODBSET "/Path/value[1,3,5]"
> ODBSET "/Path/value[1-5,7-9]"

we support this array index syntax in several places,
specifically, in javascript odb get and set mjsonrpc RPCs.

> SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]"
> SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]"
> ODBSET "/Path/value[0-100, except $BADCHANNELS]"

this is very clever syntax, but I have not seen any programming
language actually implement it (not even perl).

there must be a good reason why nobody does this. probably we should not do it either.

but as Stefan said (and my opinion), the route of extending MIDAS sequencer
language until it becomes a superset of python, perl, tcl, bash, javascript
and algol is not a sustainable approach. I once looked at using LUA for this,
but I think basing off an full featured programming language like python
is better.

K.O.

01 Apr 2025, Pavel Murat, Suggestion, Sequencer ODBSET feature requests

 I once looked at using LUA for this,
> but I think basing off an full featured programming language like python
> is better.

if it came to a vote, my vote would go to Lua: it would allow to do everything needed, 
with much less external dependencies and with much less motivation to over-use the interpreter. 
The CMS experience was very teaching in this respect... 

-- my 2c, regards, Pasha

02 Apr 2025, Konstantin Olchanski, Suggestion, Sequencer ODBSET feature requests

> I once looked at using LUA for this
>
> > but I think basing off an full featured programming language like python
> > is better.
> 
> if it came to a vote, my vote would go to Lua: it would allow to do everything needed, 
> with much less external dependencies and with much less motivation to over-use the interpreter. 
> The CMS experience was very teaching in this respect... 

Unfortunately I am only slightly aware of Lua to say how nicve or how bad it is. And we are
not sure how well it supports the single-line-stepping that permits the nice graphical
visualization of Stefan's sequencer.

It looks like python has the single-line-stepping built-in as a standard feature
and python is a more popular and more versatile machine, so to me python looks
like a better choice compared to lua (obscure), perl ("nobody uses it anymore")
or bash (ugly syntax).

K.O.

03 Apr 2025, Stefan Ritt, Suggestion, Sequencer ODBSET feature requests

And there is one more argument: 

We have a Python expert in our development team who wrote already the Python-to-C bindings. That means when running a Python 
script, we can already start/stop runs, write/read to the ODB etc. We only have to get the single stepping going which seems feasible to 
me, since there are some libraries like inspect.currentframe() and traceback.extract_stack(). For single-stepping there are debug APIs 
like debugpy. With Lua we really would have to start from scratch.

Stefan

07 Apr 2025, Zaher Salman, Suggestion, Sequencer ODBSET feature requests

Lukas Gerritzen wrote:

I also encountered a small annoyance in the current workflow of editing sequencer files in the browser:

Load a file
Double-click it to edit it, acknowledge the "To edit the sequence it must be opened in an editor tab" dialog
A new tab opens
Edit something, click "Start", acknowledge the "Save and start?" dialog (which pops up even if no changes are made)
Run the script
Double-click to make more changes -> another tab opens

The original reason the restricting edits in the first tab is that it is used to reflect the state of the sequencer, i.e. the file that is currently loaded in the ODB.

Imagine two users are working in parallel on the same file, each preparing their own sequence. One finishes editing and starts the sequencer. How would the second person know that by now the file was changed and is running?

I am open to suggestions to minimize the number of clicks and/or other options to make the first tab editable while making it safe and visible to all other users. Maybe a lock mechanism in the ODB can help here.

Zaher

07 Apr 2025, Stefan Ritt, Suggestion, Sequencer ODBSET feature requests

If people are simultaneously editing scripts this is indeed an issue, which probably never can be resolved by technical means. It need communication between the users.

For the main script some ODB locking might look like:

- First person clicks on "Edit", system checks that file is not locked and sequencer is not running, then goes into edit mode
- When entering edit mode, the editor puts a lock in to the ODB, like "Scrip lock = pc1234".
- When another person clicks on "Edit", the system replies "File current being edited on pc1234"
- When the first person saves the file or closes the web browser, the lock gets removed.
- Since a browser can crash without removing a lock, we need some automatic lock recovery, like if the lock is there, the next users gets a message "file currently locked. Click "override" to "steal" the lock and edit the file".

All that is not 100% perfect, but will probably cover 99% of the cases.

There is still the problem on all other scripts. In principle we would need a lock for each file which is not so simple to implement (would need arrays of files and host names).

Another issue will arise if a user opens a file twice for editing. The second attempt will fail, but I believe this is what we want.

A hostname for the lock is the easiest we can get. Would be better to also have a user name, but since the midas API does not require a log in, we won't have a user name handy. At it would be too tedious to ask "if you want to edit this file, enter your username".

Just some thoughts.

Stefan

30 Jan 2025, Pavel Murat, Forum, converting non-MIDAS slow control data into MIDAS history format ?

Dear MIDAS experts,

I have a time series of slow control measurements in an ASCII format - 
data records in a format (run_number, time, temperature, voltage1, ..., voltageN), 
and, if possible, would like to convert them into a MIDAS history format. 

Making MIDAS events out of that data is easy, but is it possible to preserve 
the time stamps?  - Logically, this boils down to whether it is possible to have  
the event time set by a user frontend

-- as always - many thanks, regards, Pasha

31 Jan 2025, Pavel Murat, Forum, converting non-MIDAS slow control data into MIDAS history format ?

I think I found an answer to my question: a user-controlled event header does have a time stamp: 

https://daq00.triumf.ca/MidasWiki/index.php/Event_Structure#Event_Header

-- apologies for the spam, regards, Pasha

01 Feb 2025, Pavel Murat, Bug Report, MIDAS history system not using the event timestamps ?

> I have a time series of slow control measurements in an ASCII format - 
> data records in a format (run_number, time, temperature, voltage1, ..., voltageN), 
> and, if possible, would like to convert them into a MIDAS history format. 
> 
> Making MIDAS events out of that data is easy, but is it possible to preserve 
> the time stamps?  - Logically, this boils down to whether it is possible to have  
> the event time set by a user frontend

It looks that the original question was not as naive as I expected and may be pointing to a subtle bug. 
I have implemented a python frontend - essentially a clone of 

https://bitbucket.org/tmidas/midas/src/develop/python/midas/frontend.py

reading the old slow control data and setting the event.header.timestamp's to some dates from the year of 2022. 

When I run MIDAS and read the "old slow control events", one event in 10 seconds, 
the MIDAS Event Dump utility shows the data with the correct event timestamps, from the year of 2022. 

However the history plots show the event parameters with the timestamps from Feb 01 2025 and the adjacent 
data points separated by 10 sec.

Is it possible that the history system uses its own timestamp setting instead of using timestamps from the event headers? 
- Under normal circumstances, the two should be very close, and that could've kept the issue hidden... 

-- thanks, regards, Pasha

UPDATE:  I attached the frontend code and the input data file it is reading. The data file should reside in the local directory
- the frontend code doesn't have everything fully automated for the test, 
  -- an integer field "/Mu2e/Offline/Ops/LastTime" would need to be created manually
  -- the history plots would need to be declared manually

01 Apr 2025, Pavel Murat, Bug Report, MIDAS history system not using the event timestamps ?

Dear MIDAS experts, 

I confirm that when writing out history files corresponding to the slow control event data, 
MIDAS history system timestamps the data not with the event time coming from the event data, 
but with the current time determined by the program - 

https://bitbucket.org/tmidas/midas/src/293d27fad0c87c80c4ed7b94b5c40ba1e150bea4/progs/mlogger.cxx#lines-5321

where 'now' is defined as  

time_t now = time(NULL);

I'm looking for a way to timestamp the history data with the event time - that is important 
for HEP applications outside the DAQ domain. Yes, MIDAS infrastructure is very well suited for that, 
there could have a number of such applications, and experiments could significantly benefit from that.

So I'm wondering whether the implementation is a design choice made or it could be changed. 

The change itself and especially its validation may require a non-negligible amount of work - I'd be happy to contribute.

Any insight much appreciated. 

-- thanks, regards, Pasha

01 Apr 2025, Konstantin Olchanski, Bug Report, MIDAS history system not using the event timestamps ?

> I confirm that when writing out history files corresponding to the slow control event data, 
> MIDAS history system timestamps the data not with the event time coming from the event data, 
> but with the current time determined by [mlogger].

This is correct. The timestamp in the history file is the mlogger timestamp.

In theory we could use the ODB "last_written" timestamp, but in practice,
timestamps are 1 second granularity and the difference between the two
timestamps normally would be less than 1 second. (time to react to db_watch()).

But ODB last_written also is not the data timestamp. For remote connected clients
it includes the mserver communication delay.

What is the data timestamp, only the user knows - for some FPGA based equipments,
I can see the data timestamp being read from an FPGA register together with the data.

But back to earth.

For making history plots, 1 second granularity with a small (a few seconds) delay should be okey,
and I think the mserver timestamp is good enough.

For data analysis, you are reading history data from a history data file and you are
not constrained to using the MIDAS timestamp.

You can always include your "true" data timestamp as the first value in your data.

We do this in felaview for writing labview data to midas history in the ALPHA antihydrogen experiment at CERN.

This also anticipates your next request, can we have millisecond, microsecond, nanosecond history timestamps:
since you define your "true" data timestamp, you an make it anything you want. (I use "double" time in seconds,
64-bit IEEE-754 "double" has enough precision for microsecond granularity. FPGA based devices can have timestamps
with 10 ns or 8 ns granularity, in this case a uint64_t clock counter could be more appropriate).

K.O.

02 Apr 2025, Pavel Murat, Bug Report, MIDAS history system not using the event timestamps ?

> You can always include your "true" data timestamp as the first value in your data.

Are you saying that if the first data word of a history event were a timestamp, 
the MIDAS history system, when plotting the time dependencies, would use that timestamp 
instead of the mlogger timestamp?  

if that is true, what tells MIDAS that the first data word is the timestamp? 

I couldn't find a discussion of that on the page describing the history system - 

 https://daq00.triumf.ca/MidasWiki/index.php/History_System#Frontend_history_event

- perhaps I should be looking at a different page?

-- thanks again, regards, Pasha

02 Apr 2025, Konstantin Olchanski, Bug Report, MIDAS history system not using the event timestamps ?

> > You can always include your "true" data timestamp as the first value in your data.
> 
> Are you saying that if the first data word of a history event were a timestamp, 
> the MIDAS history system, when plotting the time dependencies, would use that timestamp 
> instead of the mlogger timestamp?
>

you are correct, midas knows nothing about what you put in the history data.

what I suggested is: if you want your true data timestamp recorded in the history,
you can put it into the history data yourself, and I suggested using the 1st value,
but you can also make it the last value or the 10th value, it is up to you.

for making history plots, the history timestamp is used, as you wrote and I confirmed,
this timestamp is generated by mlogger.

what is not clear to me is why this is a problem? do you see a big difference between the 
true data timestamp and the mlogger data timestamp? bigger than 1 second? (this would change 
the shape of "last 10 minutes" plots (600 seconds). bigger than 1 minute? (this would change 
the shape of "last 1 hour plots" (60 minutes, 3600 seconds).

that said, note that we currently store the timestamp as a DWORD 32-bit UNIX time value 
which will overflow in 2038 and which is quickly becoming incompatible with the ongoing 
switch to 64-bit time_t. Ubuntu-24 already build a large number of system libraries with 64-
bit time_t and building MIDAS with 32-bit time_t may soon become as difficult as building 
32-bit MIDAS for 32-bit i686 VME processors. we have to move with the times.

what it means is that the history system data format will have to be updated to 64-bit 
time_t and at the same time, we may try to change the timestamp from mlogger-generated to 
frontend-generated.

but it is still not clear to me how that helps you, because the frontend-generated timestamp 
is not the true data timestamp that you wanted. (and only you know what the true data 
timestamp is and where it comes from and how to tell it to MIDAS).

K.O.

01 Apr 2025, Konstantin Olchanski, Bug Report, ODB corruption

We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.

Crash is memset() called by db_delete_key1() called by cm_connect_experiment().

I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.

Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().

I will not do this immediately today, but hopefully next week or so.

Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.

#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1  0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2  free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c 
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3  0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>, 
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5  0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6  0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7  0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8  0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca", 
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:194
#9  0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "", 
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit", 
func=func@entry=0x0, 
    odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at 
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789	            free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0, 
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1, 
  last_written = 1953785965}
(gdb) 

K.O.

01 Apr 2025, Konstantin Olchanski, Bug Fix, ODB and event buffer - release semaphore before abort() and core dump

There is a long standing problem with ODB and event buffers. If they detect an 
internal data inconsistency and cannot continue running, they call abort() to 
dump core and stop.

Problem is in some code paths, they do this while holding the ODB or event 
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after 
core dump is finished and program holding them is stopped).

If core dump takes longer than 10 seconds (for whatever reason, but we see this 
often enough), all other programs that wait for ODB or event buffer access, will 
also timeout and also crash (with core dump). Result is a core dump storm, at 
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply 
restart everything).

Now I realize that in many situation, we do not need to hold the semaphore while 
dumping core - the content of ODB and event buffer shared memories is not 
important for debugging the crash - and it is safe to release the semaphore 
before calling abort().

This is now implemented for ODB and event buffers. Hopefully core dump storms 
will not happen again.

commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes 
longer than 10 sec all other midas programs will timeout and crash.

commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(), 
refactor bm_get_my_client_locked()


K.O.

Goto page Previous 1, 2, 3, 4, 5 ... 50, 51, 52 Next

ELOG V3.1.6-083448f7