Back Midas Rome Roody Rootana
  Midas DAQ System, Page 135 of 152  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topicdown Subject
  3012   01 Apr 2025 Pavel MuratBug ReportMIDAS history system not using the event timestamps ?
Dear MIDAS experts, 

I confirm that when writing out history files corresponding to the slow control event data, 
MIDAS history system timestamps the data not with the event time coming from the event data, 
but with the current time determined by the program - 

https://bitbucket.org/tmidas/midas/src/293d27fad0c87c80c4ed7b94b5c40ba1e150bea4/progs/mlogger.cxx#lines-5321

where 'now' is defined as  

time_t now = time(NULL);

I'm looking for a way to timestamp the history data with the event time - that is important 
for HEP applications outside the DAQ domain. Yes, MIDAS infrastructure is very well suited for that, 
there could have a number of such applications, and experiments could significantly benefit from that.

So I'm wondering whether the implementation is a design choice made or it could be changed. 

The change itself and especially its validation may require a non-negligible amount of work - I'd be happy to contribute.

Any insight much appreciated. 

-- thanks, regards, Pasha
  3018   01 Apr 2025 Konstantin OlchanskiBug ReportODB corruption
We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.

Crash is memset() called by db_delete_key1() called by cm_connect_experiment().

I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.

Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().

I will not do this immediately today, but hopefully next week or so.

Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.

#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1  0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2  free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c 
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3  0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>, 
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5  0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6  0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7  0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8  0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca", 
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:194
#9  0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "", 
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit", 
func=func@entry=0x0, 
    odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at 
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789	            free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0, 
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1, 
  last_written = 1953785965}
(gdb) 

K.O.
  3020   01 Apr 2025 Konstantin OlchanskiBug ReportMIDAS history system not using the event timestamps ?
> I confirm that when writing out history files corresponding to the slow control event data, 
> MIDAS history system timestamps the data not with the event time coming from the event data, 
> but with the current time determined by [mlogger].

This is correct. The timestamp in the history file is the mlogger timestamp.

In theory we could use the ODB "last_written" timestamp, but in practice,
timestamps are 1 second granularity and the difference between the two
timestamps normally would be less than 1 second. (time to react to db_watch()).

But ODB last_written also is not the data timestamp. For remote connected clients
it includes the mserver communication delay.

What is the data timestamp, only the user knows - for some FPGA based equipments,
I can see the data timestamp being read from an FPGA register together with the data.

But back to earth.

For making history plots, 1 second granularity with a small (a few seconds) delay should be okey,
and I think the mserver timestamp is good enough.

For data analysis, you are reading history data from a history data file and you are
not constrained to using the MIDAS timestamp.

You can always include your "true" data timestamp as the first value in your data.

We do this in felaview for writing labview data to midas history in the ALPHA antihydrogen experiment at CERN.

This also anticipates your next request, can we have millisecond, microsecond, nanosecond history timestamps:
since you define your "true" data timestamp, you an make it anything you want. (I use "double" time in seconds,
64-bit IEEE-754 "double" has enough precision for microsecond granularity. FPGA based devices can have timestamps
with 10 ns or 8 ns granularity, in this case a uint64_t clock counter could be more appropriate).

K.O.
  3022   02 Apr 2025 Pavel MuratBug ReportMIDAS history system not using the event timestamps ?
> You can always include your "true" data timestamp as the first value in your data.

Are you saying that if the first data word of a history event were a timestamp, 
the MIDAS history system, when plotting the time dependencies, would use that timestamp 
instead of the mlogger timestamp?  

if that is true, what tells MIDAS that the first data word is the timestamp? 

I couldn't find a discussion of that on the page describing the history system - 

 https://daq00.triumf.ca/MidasWiki/index.php/History_System#Frontend_history_event

- perhaps I should be looking at a different page?

-- thanks again, regards, Pasha 
  3023   02 Apr 2025 Konstantin OlchanskiBug ReportMIDAS history system not using the event timestamps ?
> > You can always include your "true" data timestamp as the first value in your data.
> 
> Are you saying that if the first data word of a history event were a timestamp, 
> the MIDAS history system, when plotting the time dependencies, would use that timestamp 
> instead of the mlogger timestamp?
>

you are correct, midas knows nothing about what you put in the history data.

what I suggested is: if you want your true data timestamp recorded in the history,
you can put it into the history data yourself, and I suggested using the 1st value,
but you can also make it the last value or the 10th value, it is up to you.

for making history plots, the history timestamp is used, as you wrote and I confirmed,
this timestamp is generated by mlogger.

what is not clear to me is why this is a problem? do you see a big difference between the 
true data timestamp and the mlogger data timestamp? bigger than 1 second? (this would change 
the shape of "last 10 minutes" plots (600 seconds). bigger than 1 minute? (this would change 
the shape of "last 1 hour plots" (60 minutes, 3600 seconds).

that said, note that we currently store the timestamp as a DWORD 32-bit UNIX time value 
which will overflow in 2038 and which is quickly becoming incompatible with the ongoing 
switch to 64-bit time_t. Ubuntu-24 already build a large number of system libraries with 64-
bit time_t and building MIDAS with 32-bit time_t may soon become as difficult as building 
32-bit MIDAS for 32-bit i686 VME processors. we have to move with the times.

what it means is that the history system data format will have to be updated to 64-bit 
time_t and at the same time, we may try to change the timestamp from mlogger-generated to 
frontend-generated.

but it is still not clear to me how that helps you, because the frontend-generated timestamp 
is not the true data timestamp that you wanted. (and only you know what the true data 
timestamp is and where it comes from and how to tell it to MIDAS).

K.O.
  3030   29 Apr 2025 Pavel MuratBug ReportODBXX : ODB links in the path names ?
Dear MIDAS experts,

does the ODBXX interface to ODB currently ODB links in the path names? - From what I see so far, it currently fails to do so,
but I could be doing something else wrong... 

-- thanks, regards, Pasha
  3033   30 Apr 2025 Stefan RittBug ReportODBXX : ODB links in the path names ?
Indeed this was missing from the very beginning. I added it, please report back if it's not working.

Stefan
  3036   05 May 2025 Konstantin OlchanskiBug Reportabort and core dump in cm_disconnect_experiment()
I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to 
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.

mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires 
the watchdog.

In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the 
core dump.

I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".

Stack trace from mhist:

[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my 
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue, 
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...

Program received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6  0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8  0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9  0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb) 

Stack trace from mserver:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6  0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8  0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9  bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387

K.O.
  3037   05 May 2025 Stefan RittBug Reportabort and core dump in cm_disconnect_experiment()
I would be in favor of not curing the symptoms, but fixing the cause of the problem. I guess you put the watchdog disable into mhist, right? Usually mhist is called locally, so no mserver should be 
involved. If not, I would prefer to propagate the watchdog disable to the mserver side as well, if that's not been done already. Actually I never would disable the watchdog, but set it to a reasonable 
maximal value, like a few minutes or so. In that case, the client gets still removed if it crashes for some reason.

My five cents,
Stefan 
  3039   16 May 2025 Marius KoeppelBug Reporthistory_schema.cxx fails to build
Hi all,

we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx. There was a major change in this code in the commits fe7f6a6 and 159d8d3.

image: rootproject/root:latest

pipelines:
  default:
    - step:
        name: 'Build and test'
        runs-on:
          - self.hosted
          - linux
        script:
          - apt-get update
          - DEBIAN_FRONTEND=noninteractive apt-get -y install python3-all python3-pip python3-pytest-dependency python3-pytest
          - DEBIAN_FRONTEND=noninteractive apt-get -y install gcc g++ cmake git python3-all libssl-dev libz-dev libcurl4-gnutls-dev sqlite3 libsqlite3-dev libboost-all-dev linux-headers-generic
          - gcc -v
          - cmake --version
          - git clone https://marius_koeppel@bitbucket.org/tmidas/midas.git
          - cd midas
          - git submodule update --init --recursive
          - mkdir build
          - cd build
          - cmake ..
          - make -j4 install


Error is:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:5991:10: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 5991 |       s->table_name = xtable_name;

      |          ^~~~~~~~~~

      |          fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx: In member function ‘virtual int PgsqlHistory::read_column_names(HsSchemaVector*, const char*, const char*)’:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6034:14: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 6034 |       if (s->table_name != table_name)

      |              ^~~~~~~~~~

      |              fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6065:16: error: ‘struct HsSchemaEntry’ has no member named ‘fNumBytes’

 6065 |             se.fNumBytes = 0;

      |                ^~~~~~~~~

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6140:30: error: ‘__gnu_cxx::__alloc_traits<std::allocator<HsSchemaEntry>, HsSchemaEntry>::value_type’ {aka ‘struct HsSchemaEntry’} has no member named ‘fNumBytes’

 6140 |             s->fVariables[j].fNumBytes = tid_size;

      |                              ^~~~~~~~~

At global scope:

cc1plus: note: unrecognized command-line option ‘-Wno-vla-cxx-extension’ may have been intended to silence earlier diagnostics

make[2]: *** [CMakeFiles/objlib.dir/build.make:384: CMakeFiles/objlib.dir/src/history_schema.cxx.o] Error 1

make[2]: *** Waiting for unfinished jobs....

make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/objlib.dir/all] Error 2

make: *** [Makefile:136: all] Error 2
  3040   16 May 2025 Konstantin OlchanskiBug Reporthistory_schema.cxx fails to build
> we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.

Missing from this report is critical information: HAVE_PGSQL is set.

I will have to check why it is not set in my development account.

I will have to check why it is not set in our bitbucket build.

Thank you for reporting this problem.

K.O.
  3041   16 May 2025 Konstantin OlchanskiBug Reporthistory_schema.cxx fails to build
> > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
> 
> Missing from this report is critical information: HAVE_PGSQL is set.
> 
> I will have to check why it is not set in my development account.
> 

The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages

apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS

>
> I will have to check why it is not set in our bitbucket build.
> 

Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).

>
> Thank you for reporting this problem.
> 

Fix committed. Sorry about this problem.

K.O.
  3044   25 May 2025 Pavel MuratBug Reportsubdirectory ordering in ODB browser ?
Dear MIDAS experts, 

I'm running into a minor but annoying issue with the subdirectory name ordering by the ODB browser. 
I have a straw-man hash map which includes ODB subdirectories named "000", "010", ... "300", 
and I'm yet to succeed to have them displayed in a "natural" order: the subdirectories with names 
starting from "0" always show up on the bottom of the list - see attached .png file. 

Neither interactive re-ordering nor manual ordering of the items in the input .json file helps. 

I have also attached a .json file which can be loaded with odbedit to reproduce the issue. 

Although I'm using a relatively recent - ~ 20 days old - commit, 'db1819ac', is it possible 
that this issue has already been sorted out ?

-- many thanks, regards, Pasha  
Attachment 1: panel_map.json
{
  "/MIDAS version" : "2.1",
  "/MIDAS git revision" : "Sat May 3 17:22:36 2025 +0200 - midas-2025-01-a-192-gdb1819ac-dirty on branch HEAD",
  "/filename" : "PanelMap.json",
  "/ODB path" : "/Test/PanelMap",
  "000" : {
  },
  "020" : {
  },
  "030" : {
  },
  "040" : {
  },
  "090" : {
  },
  "100" : {
  },
  "200" : {
  }
}
Attachment 2: panel_map.png
panel_map.png
  3049   04 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.

  • The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
  • It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
  • It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
  • When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?

Thanks,

Mark.
  3050   04 Jun 2025 Konstantin OlchanskiBug ReportMemory leak in mhttpd binary RPC code
Noted. I will look at this asap. K.O.

[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to 
close to 100Gb.  We think we've traced this to the following issue when making 
RPC calls.

[LIST]
[*] The brpc method allocates memory for the response at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may 
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to 
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at 
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the 
buffer if the size is greater than zero.
[/LIST]

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that 
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push 
to https://bitbucket.org/tmidas/mjson.git.  Could somebody take a look?

Thanks,

Mark.[/quote]
  3051   07 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code

Hi,

We applied an intermediate fix for this locally and it seems to have fixed our issue.  The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use.  After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.

Our fix to mjson was:

diff --git a/mjson.cxx b/mjson.cxx

index 17ee268..2443510 100644

--- a/mjson.cxx

+++ b/mjson.cxx

@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor

       delete subnodes[i];

    subnodes.clear();

 

-   if (arraybuffer_size > 0) {

-      assert(arraybuffer_ptr != NULL);

+   if (arraybuffer_ptr != NULL) {

       free(arraybuffer_ptr);

       arraybuffer_size = 0;

       arraybuffer_ptr = NULL;

We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx

index 2201d228..38f0b99b 100644

--- a/src/mjsonrpc.cxx

+++ b/src/mjsonrpc.cxx

@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)

    status = cm_connect_client(name.c_str(), &hconn);

 

    if (status != RPC_SUCCESS) {

+      free(buf);

       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));

    }

I hope this is useful to someone.  As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.

Thanks,

Mark.

Attachment 1: msysmon-mu3ebe-20250601-042124-20250606-122124.png
msysmon-mu3ebe-20250601-042124-20250606-122124.png
  3054   10 Jun 2025 Nik BergerBug ReportHistory variables with leading spaces
By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.

How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space.
  3055   10 Jun 2025 Konstantin OlchanskiBug ReportMemory leak in mhttpd binary RPC code
I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().

I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).

Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
file, return of "new char[0]" is never freed).

"receive_event" and "read_history" RPCs never use zero-size buffers and are not 
affected by this bug.

mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295

Please post bug reports a plain-text so I can quote from them.

K.O.
  3058   15 Jun 2025 Mark GrimesBug ReportMemory leak in mhttpd binary RPC code
Many thanks for the fix.  We've applied and see better memory performance.  We still have to kill and restart 
mhttpd after a few days however.  I think the official fix is missing this part:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
    status = cm_connect_client(name.c_str(), &hconn);
 
    if (status != RPC_SUCCESS) {
+      free(buf);
       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
    }

When the other process returns a failure the memory block is also currently leaked.  I originally stated "...although I 
don't think it contributed to the leak we were seeing" but it seems this was false.

Thanks,

Mark.


> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
> 
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
> 
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
> file, return of "new char[0]" is never freed).
> 
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not 
> affected by this bug.
> 
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
> 
> Please post bug reports a plain-text so I can quote from them.
> 
> K.O.
  3059   19 Jun 2025 Frederik WautersBug Reportadd history variables
I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So 
depending on the size and zoom of my screen, some variables further down the list 
can not be selected

tried Chrome and Firefox
ELOG V3.1.4-2e1708b5