| ID |
Date |
Author |
Topic |
Subject |
|
3033
|
30 Apr 2025 |
Stefan Ritt | Bug Report | ODBXX : ODB links in the path names ? |
Indeed this was missing from the very beginning. I added it, please report back if it's not working.
Stefan |
|
3036
|
05 May 2025 |
Konstantin Olchanski | Bug Report | abort and core dump in cm_disconnect_experiment() |
I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.
mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires
the watchdog.
In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the
core dump.
I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".
Stack trace from mhist:
[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue,
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...
Program received signal SIGABRT, Aborted.
Download failed: Invalid argument. Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6 0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7 bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8 0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9 0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb)
Stack trace from mserver:
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6 0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7 bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8 0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9 bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387
K.O. |
|
3037
|
05 May 2025 |
Stefan Ritt | Bug Report | abort and core dump in cm_disconnect_experiment() |
I would be in favor of not curing the symptoms, but fixing the cause of the problem. I guess you put the watchdog disable into mhist, right? Usually mhist is called locally, so no mserver should be
involved. If not, I would prefer to propagate the watchdog disable to the mserver side as well, if that's not been done already. Actually I never would disable the watchdog, but set it to a reasonable
maximal value, like a few minutes or so. In that case, the client gets still removed if it crashes for some reason.
My five cents,
Stefan |
|
3039
|
16 May 2025 |
Marius Koeppel | Bug Report | history_schema.cxx fails to build |
Hi all,
we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx. There was a major change in this code in the commits fe7f6a6 and 159d8d3.
image: rootproject/root:latest
pipelines:
default:
- step:
name: 'Build and test'
runs-on:
- self.hosted
- linux
script:
- apt-get update
- DEBIAN_FRONTEND=noninteractive apt-get -y install python3-all python3-pip python3-pytest-dependency python3-pytest
- DEBIAN_FRONTEND=noninteractive apt-get -y install gcc g++ cmake git python3-all libssl-dev libz-dev libcurl4-gnutls-dev sqlite3 libsqlite3-dev libboost-all-dev linux-headers-generic
- gcc -v
- cmake --version
- git clone https://marius_koeppel@bitbucket.org/tmidas/midas.git
- cd midas
- git submodule update --init --recursive
- mkdir build
- cd build
- cmake ..
- make -j4 install
Error is:
/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:5991:10: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?
5991 | s->table_name = xtable_name;
| ^~~~~~~~~~
| fTableName
/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx: In member function ‘virtual int PgsqlHistory::read_column_names(HsSchemaVector*, const char*, const char*)’:
/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6034:14: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?
6034 | if (s->table_name != table_name)
| ^~~~~~~~~~
| fTableName
/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6065:16: error: ‘struct HsSchemaEntry’ has no member named ‘fNumBytes’
6065 | se.fNumBytes = 0;
| ^~~~~~~~~
/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6140:30: error: ‘__gnu_cxx::__alloc_traits<std::allocator<HsSchemaEntry>, HsSchemaEntry>::value_type’ {aka ‘struct HsSchemaEntry’} has no member named ‘fNumBytes’
6140 | s->fVariables[j].fNumBytes = tid_size;
| ^~~~~~~~~
At global scope:
cc1plus: note: unrecognized command-line option ‘-Wno-vla-cxx-extension’ may have been intended to silence earlier diagnostics
make[2]: *** [CMakeFiles/objlib.dir/build.make:384: CMakeFiles/objlib.dir/src/history_schema.cxx.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/objlib.dir/all] Error 2
make: *** [Makefile:136: all] Error 2 |
|
3040
|
16 May 2025 |
Konstantin Olchanski | Bug Report | history_schema.cxx fails to build |
> we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.
Missing from this report is critical information: HAVE_PGSQL is set.
I will have to check why it is not set in my development account.
I will have to check why it is not set in our bitbucket build.
Thank you for reporting this problem.
K.O. |
|
3041
|
16 May 2025 |
Konstantin Olchanski | Bug Report | history_schema.cxx fails to build |
> > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
>
> Missing from this report is critical information: HAVE_PGSQL is set.
>
> I will have to check why it is not set in my development account.
>
The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages
apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS
>
> I will have to check why it is not set in our bitbucket build.
>
Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).
>
> Thank you for reporting this problem.
>
Fix committed. Sorry about this problem.
K.O. |
|
3044
|
25 May 2025 |
Pavel Murat | Bug Report | subdirectory ordering in ODB browser ? |
Dear MIDAS experts,
I'm running into a minor but annoying issue with the subdirectory name ordering by the ODB browser.
I have a straw-man hash map which includes ODB subdirectories named "000", "010", ... "300",
and I'm yet to succeed to have them displayed in a "natural" order: the subdirectories with names
starting from "0" always show up on the bottom of the list - see attached .png file.
Neither interactive re-ordering nor manual ordering of the items in the input .json file helps.
I have also attached a .json file which can be loaded with odbedit to reproduce the issue.
Although I'm using a relatively recent - ~ 20 days old - commit, 'db1819ac', is it possible
that this issue has already been sorted out ?
-- many thanks, regards, Pasha |
| Attachment 1: panel_map.json
|
{
"/MIDAS version" : "2.1",
"/MIDAS git revision" : "Sat May 3 17:22:36 2025 +0200 - midas-2025-01-a-192-gdb1819ac-dirty on branch HEAD",
"/filename" : "PanelMap.json",
"/ODB path" : "/Test/PanelMap",
"000" : {
},
"020" : {
},
"030" : {
},
"040" : {
},
"090" : {
},
"100" : {
},
"200" : {
}
}
|
| Attachment 2: panel_map.png
|
|
|
3049
|
04 Jun 2025 |
Mark Grimes | Bug Report | Memory leak in mhttpd binary RPC code |
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.
- The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
- It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
- It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
- When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.
Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?
Thanks,
Mark. |
|
3050
|
04 Jun 2025 |
Konstantin Olchanski | Bug Report | Memory leak in mhttpd binary RPC code |
Noted. I will look at this asap. K.O.
[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to
close to 100Gb. We think we've traced this to the following issue when making
RPC calls.
[LIST]
[*] The brpc method allocates memory for the response at
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the
buffer if the size is greater than zero.
[/LIST]
Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push
to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?
Thanks,
Mark.[/quote] |
|
3051
|
07 Jun 2025 |
Mark Grimes | Bug Report | Memory leak in mhttpd binary RPC code |
Hi,
We applied an intermediate fix for this locally and it seems to have fixed our issue. The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use. After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.
Our fix to mjson was:
diff --git a/mjson.cxx b/mjson.cxx
index 17ee268..2443510 100644
--- a/mjson.cxx
+++ b/mjson.cxx
@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor
delete subnodes[i];
subnodes.clear();
- if (arraybuffer_size > 0) {
- assert(arraybuffer_ptr != NULL);
+ if (arraybuffer_ptr != NULL) {
free(arraybuffer_ptr);
arraybuffer_size = 0;
arraybuffer_ptr = NULL;
We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
I hope this is useful to someone. As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.
Thanks,
Mark. |
| Attachment 1: msysmon-mu3ebe-20250601-042124-20250606-122124.png
|
|
|
3054
|
10 Jun 2025 |
Nik Berger | Bug Report | History variables with leading spaces |
By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space. |
|
3055
|
10 Jun 2025 |
Konstantin Olchanski | Bug Report | Memory leak in mhttpd binary RPC code |
I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
file, return of "new char[0]" is never freed).
"receive_event" and "read_history" RPCs never use zero-size buffers and are not
affected by this bug.
mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
Please post bug reports a plain-text so I can quote from them.
K.O. |
|
3058
|
15 Jun 2025 |
Mark Grimes | Bug Report | Memory leak in mhttpd binary RPC code |
Many thanks for the fix. We've applied and see better memory performance. We still have to kill and restart
mhttpd after a few days however. I think the official fix is missing this part:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
When the other process returns a failure the memory block is also currently leaked. I originally stated "...although I
don't think it contributed to the leak we were seeing" but it seems this was false.
Thanks,
Mark.
> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
>
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
>
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
> file, return of "new char[0]" is never freed).
>
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not
> affected by this bug.
>
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
>
> Please post bug reports a plain-text so I can quote from them.
>
> K.O. |
|
3059
|
19 Jun 2025 |
Frederik Wauters | Bug Report | add history variables |
I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So
depending on the size and zoom of my screen, some variables further down the list
can not be selected
tried Chrome and Firefox |
|
3060
|
19 Jun 2025 |
Stefan Ritt | Bug Report | History variables with leading spaces |
I added now code to the logger so it properly complains if there would be a leading space in a variable name.
Stefan
> By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
>
> How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
> Sugested fix: Produce an error if a history variable has a leading space. |
|
3061
|
23 Jun 2025 |
Stefan Ritt | Bug Report | Memory leak in mhttpd binary RPC code |
Since this memory leak is quite obvious, I pushed the fix to develop.
Stefan |
|
3062
|
04 Jul 2025 |
Mark Grimes | Bug Report | Memory leaks in mhttpd |
Something changed in our system and we started seeing memory leaks in mhttpd again. I guess someone
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady
memory usage. The branch is fix/memory_leaks
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55
(https://bitbucket.org/tmidas/midas/pull-requests/55). I couldn't find a BitBucket account for you
Konstantin to add as a reviewer, so it currently has none.
Thanks,
Mark. |
|
3064
|
21 Jul 2025 |
Stefan Ritt | Bug Report | Default write cache size for new equipments breaks compatibility with older equipments |
> Perhaps have:
>
> set_write_cache_size("SYSTEM", 0);
> set_write_cache_size("BUF1", bigsize);
>
> with an internal std::map<std::string,size_t>; for write cache size for each named buffer
Ok, this is implemented now in mfed.cxx and called from examples/experiment/frontend.cxx
Stefan |
|
3073
|
17 Sep 2025 |
Mark Grimes | Bug Report | Midas no longer compiles on macOS |
Hi,
The current develop branch no longer compiles on macOS. I get lots of errors of the form
/Users/me/midas/src/history_schema.cxx:740:4: error: unknown type name 'off64_t'; did you mean 'off_t'?
740 | off64_t fDataOffset = 0;
| ^~~~~~~
| off_t
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.5.sd
k/usr/include/sys/_types/_off_t.h:31:33: note: 'off_t' declared here
31 | typedef __darwin_off_t off_t;
| ^
There are also similar errors about lseek64. This appears to have come in with commit 9a6ad2e dated
23rd July, but I think it was merged into develop with commit 2beeca0 on 3rd of September.
Googling around it seems that off64_t is a GNU extension. I don't know of a cross platform solution but I'm
happy to test if someone has a suggestion.
Thanks,
Mark. |
|
3076
|
17 Sep 2025 |
Konstantin Olchanski | Bug Report | Midas no longer compiles on macOS |
> The current develop branch no longer compiles on macOS. I get lots of errors of the form
> /Users/me/midas/src/history_schema.cxx:740:4: error: unknown type name 'off64_t' ...
Confirmed. No idea why off64_t is missing on MacOS. I will try to fix it next week.
K.O. |