Back Midas Rome Roody Rootana
  Midas DAQ System, Page 139 of 157  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  05 May 2025, Stefan Ritt, Bug Report, abort and core dump in cm_disconnect_experiment() 
I would be in favor of not curing the symptoms, but fixing the cause of the problem. I guess you put the watchdog disable into mhist, right? Usually mhist is called locally, so no mserver should be 
involved. If not, I would prefer to propagate the watchdog disable to the mserver side as well, if that's not been done already. Actually I never would disable the watchdog, but set it to a reasonable 
maximal value, like a few minutes or so. In that case, the client gets still removed if it crashes for some reason.

My five cents,
Stefan 
Entry  16 May 2025, Marius Koeppel, Bug Report, history_schema.cxx fails to build 
Hi all,

we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx. There was a major change in this code in the commits fe7f6a6 and 159d8d3.

image: rootproject/root:latest

pipelines:
  default:
    - step:
        name: 'Build and test'
        runs-on:
          - self.hosted
          - linux
        script:
          - apt-get update
          - DEBIAN_FRONTEND=noninteractive apt-get -y install python3-all python3-pip python3-pytest-dependency python3-pytest
          - DEBIAN_FRONTEND=noninteractive apt-get -y install gcc g++ cmake git python3-all libssl-dev libz-dev libcurl4-gnutls-dev sqlite3 libsqlite3-dev libboost-all-dev linux-headers-generic
          - gcc -v
          - cmake --version
          - git clone https://marius_koeppel@bitbucket.org/tmidas/midas.git
          - cd midas
          - git submodule update --init --recursive
          - mkdir build
          - cd build
          - cmake ..
          - make -j4 install


Error is:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:5991:10: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 5991 |       s->table_name = xtable_name;

      |          ^~~~~~~~~~

      |          fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx: In member function ‘virtual int PgsqlHistory::read_column_names(HsSchemaVector*, const char*, const char*)’:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6034:14: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 6034 |       if (s->table_name != table_name)

      |              ^~~~~~~~~~

      |              fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6065:16: error: ‘struct HsSchemaEntry’ has no member named ‘fNumBytes’

 6065 |             se.fNumBytes = 0;

      |                ^~~~~~~~~

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6140:30: error: ‘__gnu_cxx::__alloc_traits<std::allocator<HsSchemaEntry>, HsSchemaEntry>::value_type’ {aka ‘struct HsSchemaEntry’} has no member named ‘fNumBytes’

 6140 |             s->fVariables[j].fNumBytes = tid_size;

      |                              ^~~~~~~~~

At global scope:

cc1plus: note: unrecognized command-line option ‘-Wno-vla-cxx-extension’ may have been intended to silence earlier diagnostics

make[2]: *** [CMakeFiles/objlib.dir/build.make:384: CMakeFiles/objlib.dir/src/history_schema.cxx.o] Error 1

make[2]: *** Waiting for unfinished jobs....

make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/objlib.dir/all] Error 2

make: *** [Makefile:136: all] Error 2
    Reply  16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build 
> we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.

Missing from this report is critical information: HAVE_PGSQL is set.

I will have to check why it is not set in my development account.

I will have to check why it is not set in our bitbucket build.

Thank you for reporting this problem.

K.O.
    Reply  16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build 
> > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
> 
> Missing from this report is critical information: HAVE_PGSQL is set.
> 
> I will have to check why it is not set in my development account.
> 

The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages

apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS

>
> I will have to check why it is not set in our bitbucket build.
> 

Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).

>
> Thank you for reporting this problem.
> 

Fix committed. Sorry about this problem.

K.O.
Entry  25 May 2025, Pavel Murat, Bug Report, subdirectory ordering in ODB browser ? panel_map.jsonpanel_map.png
Dear MIDAS experts, 

I'm running into a minor but annoying issue with the subdirectory name ordering by the ODB browser. 
I have a straw-man hash map which includes ODB subdirectories named "000", "010", ... "300", 
and I'm yet to succeed to have them displayed in a "natural" order: the subdirectories with names 
starting from "0" always show up on the bottom of the list - see attached .png file. 

Neither interactive re-ordering nor manual ordering of the items in the input .json file helps. 

I have also attached a .json file which can be loaded with odbedit to reproduce the issue. 

Although I'm using a relatively recent - ~ 20 days old - commit, 'db1819ac', is it possible 
that this issue has already been sorted out ?

-- many thanks, regards, Pasha  
Entry  04 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code 
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.

  • The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
  • It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
  • It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
  • When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?

Thanks,

Mark.
    Reply  04 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code 
Noted. I will look at this asap. K.O.

[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to 
close to 100Gb.  We think we've traced this to the following issue when making 
RPC calls.

[LIST]
[*] The brpc method allocates memory for the response at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may 
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to 
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at 
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the 
buffer if the size is greater than zero.
[/LIST]

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that 
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push 
to https://bitbucket.org/tmidas/mjson.git.  Could somebody take a look?

Thanks,

Mark.[/quote]
    Reply  07 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code msysmon-mu3ebe-20250601-042124-20250606-122124.png

Hi,

We applied an intermediate fix for this locally and it seems to have fixed our issue.  The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use.  After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.

Our fix to mjson was:

diff --git a/mjson.cxx b/mjson.cxx

index 17ee268..2443510 100644

--- a/mjson.cxx

+++ b/mjson.cxx

@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor

       delete subnodes[i];

    subnodes.clear();

 

-   if (arraybuffer_size > 0) {

-      assert(arraybuffer_ptr != NULL);

+   if (arraybuffer_ptr != NULL) {

       free(arraybuffer_ptr);

       arraybuffer_size = 0;

       arraybuffer_ptr = NULL;

We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx

index 2201d228..38f0b99b 100644

--- a/src/mjsonrpc.cxx

+++ b/src/mjsonrpc.cxx

@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)

    status = cm_connect_client(name.c_str(), &hconn);

 

    if (status != RPC_SUCCESS) {

+      free(buf);

       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));

    }

I hope this is useful to someone.  As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.

Thanks,

Mark.

Entry  10 Jun 2025, Nik Berger, Bug Report, History variables with leading spaces 
By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.

How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space.
    Reply  10 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code 
I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().

I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).

Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
file, return of "new char[0]" is never freed).

"receive_event" and "read_history" RPCs never use zero-size buffers and are not 
affected by this bug.

mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295

Please post bug reports a plain-text so I can quote from them.

K.O.
    Reply  15 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code 
Many thanks for the fix.  We've applied and see better memory performance.  We still have to kill and restart 
mhttpd after a few days however.  I think the official fix is missing this part:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
    status = cm_connect_client(name.c_str(), &hconn);
 
    if (status != RPC_SUCCESS) {
+      free(buf);
       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
    }

When the other process returns a failure the memory block is also currently leaked.  I originally stated "...although I 
don't think it contributed to the leak we were seeing" but it seems this was false.

Thanks,

Mark.


> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
> 
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
> 
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
> file, return of "new char[0]" is never freed).
> 
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not 
> affected by this bug.
> 
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
> 
> Please post bug reports a plain-text so I can quote from them.
> 
> K.O.
Entry  19 Jun 2025, Frederik Wauters, Bug Report, add history variables 
I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So 
depending on the size and zoom of my screen, some variables further down the list 
can not be selected

tried Chrome and Firefox
    Reply  19 Jun 2025, Stefan Ritt, Bug Report, History variables with leading spaces 
I added now code to the logger so it properly complains if there would be a leading space in a variable name.

Stefan

> By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
> 
> How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
> Sugested fix: Produce an error if a history variable has a leading space.
    Reply  23 Jun 2025, Stefan Ritt, Bug Report, Memory leak in mhttpd binary RPC code 
Since this memory leak is quite obvious, I pushed the fix to develop.

Stefan
Entry  04 Jul 2025, Mark Grimes, Bug Report, Memory leaks in mhttpd 
Something changed in our system and we started seeing memory leaks in mhttpd again.  I guess someone 
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady 
memory usage.  The branch is fix/memory_leaks 
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55 
(https://bitbucket.org/tmidas/midas/pull-requests/55).  I couldn't find a BitBucket account for you 
Konstantin to add as a reviewer, so it currently has none.

Thanks,

Mark.
    Reply  21 Jul 2025, Stefan Ritt, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
> Perhaps have:
> 
> set_write_cache_size("SYSTEM", 0);
> set_write_cache_size("BUF1", bigsize);
> 
> with an internal std::map<std::string,size_t>; for write cache size for each named buffer

Ok, this is implemented now in mfed.cxx and called from examples/experiment/frontend.cxx

Stefan
Entry  17 Sep 2025, Mark Grimes, Bug Report, Midas no longer compiles on macOS 
Hi,
The current develop branch no longer compiles on macOS. I get lots of errors of the form

/Users/me/midas/src/history_schema.cxx:740:4: error: unknown type name 'off64_t'; did you mean 'off_t'?
  740 |    off64_t fDataOffset = 0;
      |    ^~~~~~~
      |    off_t
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sd
k/usr/include/sys/_types/_off_t.h:31:33: note: 'off_t' declared here
   31 | typedef __darwin_off_t          off_t;
      |                                 ^

There are also similar errors about lseek64. This appears to have come in with commit 9a6ad2e dated 
23rd July, but I think it was merged into develop with commit 2beeca0 on 3rd of September.

Googling around it seems that off64_t is a GNU extension. I don't know of a cross platform solution but I'm 
happy to test if someone has a suggestion.

Thanks,

Mark.
    Reply  17 Sep 2025, Konstantin Olchanski, Bug Report, Midas no longer compiles on macOS 
> The current develop branch no longer compiles on macOS. I get lots of errors of the form
> /Users/me/midas/src/history_schema.cxx:740:4: error: unknown type name 'off64_t' ...

Confirmed. No idea why off64_t is missing on MacOS. I will try to fix it next week.

K.O.
Entry  06 Nov 2025, Konstantin Olchanski, Bug Report, broken scroll on midas web pages 
midas web pages that use overlays (dlgPanel, etc) are currently broken - if 
overlay does not fit in the visible window, it's bottom is truncated and control 
buttons like "create" and "cancel" are not visible, not clickable, page does not 
work.

when these pages were originally written, I am pretty sure these overlays were 
scrollable and this problem did not exist. I think that was broken recently, 
maybe withint the last year or so.

specific examples:

a) odb editor:

- open odb editor,
- click on "create odb link"
- click on "link target ...", a dialog overlay opens with a list of odb keys in 
the current directory
- select a directory with a large number of entries (i.e. "/Programs")
- alternatively, make browser window smaller
- observe the "ok" and "cancel" buttons are not visible, cannot be clicked
- definitely, there used be a scroll bar and one could scroll down to see these 
buttons.

b) history planel editor:

- open history plot,
- click on "configure this plot" icon,
- history editor opens,
- click "add active variables"
- select active event that has many variables
- observe that the list is cut off at the bottom, the very last variables are 
not visible
- alternatively, make the browser window smaller

I wrote this page and at the time this problem did not exist, there was a scroll 
bar and one could scroll up and down the list even if there were really many 
variables there.

Maybe this breakage is not from us, I see similar problems on other sites, so 
maybe browser behaviour changed recentlyshly.

I think Stefan write the dlgPanel code originally? I am not very familiar with 
it and I do not know if anybody changed it recently?

K.O.
    Reply  13 Nov 2025, Stefan Ritt, Bug Report, broken scroll on midas web pages 
I confirm the problem is there (at least under MacOSX Safari) and I will take care of it.

Stefan
ELOG V3.1.4-2e1708b5