ID |
Date |
Author |
Topic |
Subject |
3051
|
07 Jun 2025 |
Mark Grimes | Bug Report | Memory leak in mhttpd binary RPC code |
Hi,
We applied an intermediate fix for this locally and it seems to have fixed our issue. The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use. After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.
Our fix to mjson was:
diff --git a/mjson.cxx b/mjson.cxx
index 17ee268..2443510 100644
--- a/mjson.cxx
+++ b/mjson.cxx
@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor
delete subnodes[i];
subnodes.clear();
- if (arraybuffer_size > 0) {
- assert(arraybuffer_ptr != NULL);
+ if (arraybuffer_ptr != NULL) {
free(arraybuffer_ptr);
arraybuffer_size = 0;
arraybuffer_ptr = NULL;
We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
I hope this is useful to someone. As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.
Thanks,
Mark. |
Attachment 1: msysmon-mu3ebe-20250601-042124-20250606-122124.png
|
|
3054
|
10 Jun 2025 |
Nik Berger | Bug Report | History variables with leading spaces |
By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space. |
3055
|
10 Jun 2025 |
Konstantin Olchanski | Bug Report | Memory leak in mhttpd binary RPC code |
I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
file, return of "new char[0]" is never freed).
"receive_event" and "read_history" RPCs never use zero-size buffers and are not
affected by this bug.
mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
Please post bug reports a plain-text so I can quote from them.
K.O. |
3058
|
15 Jun 2025 |
Mark Grimes | Bug Report | Memory leak in mhttpd binary RPC code |
Many thanks for the fix. We've applied and see better memory performance. We still have to kill and restart
mhttpd after a few days however. I think the official fix is missing this part:
diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
status = cm_connect_client(name.c_str(), &hconn);
if (status != RPC_SUCCESS) {
+ free(buf);
return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
}
When the other process returns a failure the memory block is also currently leaked. I originally stated "...although I
don't think it contributed to the leak we were seeing" but it seems this was false.
Thanks,
Mark.
> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
>
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
>
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
> file, return of "new char[0]" is never freed).
>
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not
> affected by this bug.
>
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
>
> Please post bug reports a plain-text so I can quote from them.
>
> K.O. |
3059
|
19 Jun 2025 |
Frederik Wauters | Bug Report | add history variables |
I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So
depending on the size and zoom of my screen, some variables further down the list
can not be selected
tried Chrome and Firefox |
3060
|
19 Jun 2025 |
Stefan Ritt | Bug Report | History variables with leading spaces |
I added now code to the logger so it properly complains if there would be a leading space in a variable name.
Stefan
> By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
>
> How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
> Sugested fix: Produce an error if a history variable has a leading space. |
3061
|
23 Jun 2025 |
Stefan Ritt | Bug Report | Memory leak in mhttpd binary RPC code |
Since this memory leak is quite obvious, I pushed the fix to develop.
Stefan |
3062
|
04 Jul 2025 |
Mark Grimes | Bug Report | Memory leaks in mhttpd |
Something changed in our system and we started seeing memory leaks in mhttpd again. I guess someone
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady
memory usage. The branch is fix/memory_leaks
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55
(https://bitbucket.org/tmidas/midas/pull-requests/55). I couldn't find a BitBucket account for you
Konstantin to add as a reviewer, so it currently has none.
Thanks,
Mark. |
3064
|
21 Jul 2025 |
Stefan Ritt | Bug Report | Default write cache size for new equipments breaks compatibility with older equipments |
> Perhaps have:
>
> set_write_cache_size("SYSTEM", 0);
> set_write_cache_size("BUF1", bigsize);
>
> with an internal std::map<std::string,size_t>; for write cache size for each named buffer
Ok, this is implemented now in mfed.cxx and called from examples/experiment/frontend.cxx
Stefan |
170
|
22 Oct 2004 |
Konstantin Olchanski | Bug Fix | mhttpd message colouring |
I commited a fix to mhttpd logic that decides which messages should be shown in
"red" colour- before, any message with square brackets and colons would be
highlighted in red. Now only messages matching the pattern [...:...] are
highlighted. The decision logic was moved into a function message_red(). K.O. |
174
|
09 Nov 2004 |
Pierre-Andre Amaudruz | Bug Fix | New transition scheme |
Problem:
If cm_set_transition_sequence() is used for changing the sequence number, the
command odbedit> start/stop/resume/pause -v report the propre sequence but the
action on the client side is actually not performed!
Fix:
Local transition table updated in midas.c (1.226)
Note:
The transition number under /system/clients/<pid>/transition...
is used internally. Changing it won't have any effect on the client action
if sequence number is not registered. |
200
|
25 Feb 2005 |
Konstantin Olchanski | Bug Fix | fixed: double free in FORMAT_MIDAS ybos.c causing lazylogger crashes |
We stumbled upon and fixed a "double free" bug in src/ybos.c causing crashes in
lazylogger writing .mid files in the FORMAT_MIDAS format (why does it use
ybos.c? Pierre says- for generic file i/o). Why this code had ever worked before
remains a mystery. K.O. |
211
|
05 May 2005 |
Konstantin Olchanski | Bug Fix | fix: minor bit rot in the example experiment |
I fixed some minor bit rot in the example experiment: a few minor Makefile
problems, make the analyzer use the current histogram creation macros, etc. I
also added startup and shutdown scripts. These will be documented as we work
through them with our Summer student. K.O. |
212
|
02 Aug 2005 |
Konstantin Olchanski | Bug Fix | fix odb corruption when running analzer for the first time |
I have been plagued by ODB corruption when I run the analyzer for the first time
after setting up the new experiment. Some time ago, I traced this to
mana.c::book_ttree() and now I found and fixed the bug, fix now commited to
midas cvs. In book_ttree(), db_find("/Analyzer/Bank switches") was returning an
error and setting hkey to zero. Then we called db_open_record() with hkey==0,
which cased ODB corruption later on. The normal db_validate_hkey() did not catch
this because it considers hkey==0 to be valid (when most likely it is not). K.O. |
216
|
18 Aug 2005 |
Konstantin Olchanski | Bug Fix | fix race condition between clients on run start/stop, pause/resume |
It turns out that the new priority sequencing of run state transitions had a
flaw: the frontends, the analyzer and the logger all registered at priority 500
and were invoked in essentially a random order. For example the frontend could
get a begin-run transition before the logger and so start sending data before
the logger opened the output file. Same for the analyzer and same for the end of
run. Also the sequencing for pause/resume run and begin/end run was different
when the two pairs ought to have identical sequencing.
I now commited changes to mana.c and mlogger.c changing their transition sequencing:
start and resume:
200 - logger (mlogger.c, no change)
300 - analyzer (mana.c, was 500)
500 - frontends (mfe.c, no change)
stop and pause:
500 - frontends (mfe.c, no change)
700 - analyzer (mana.c, was 500)
800 - mlogger (mlogger.c, was 500)
P.S. However, even after this change, the TRIUMF ISAC/Dragon experiment still
see an anomaly in the analyzer, where it receives data events after the
end-of-run transition.
K.O. |
219
|
01 Sep 2005 |
Stefan Ritt | Bug Fix | fix race condition between clients on run start/stop, pause/resume |
> It turns out that the new priority sequencing of run state transitions had a
> flaw: the frontends, the analyzer and the logger all registered at priority 500
> and were invoked in essentially a random order. For example the frontend could
> get a begin-run transition before the logger and so start sending data before
> the logger opened the output file. Same for the analyzer and same for the end of
> run. Also the sequencing for pause/resume run and begin/end run was different
> when the two pairs ought to have identical sequencing.
>
> I now commited changes to mana.c and mlogger.c changing their transition sequencing:
>
> start and resume:
> 200 - logger (mlogger.c, no change)
> 300 - analyzer (mana.c, was 500)
> 500 - frontends (mfe.c, no change)
>
> stop and pause:
> 500 - frontends (mfe.c, no change)
> 700 - analyzer (mana.c, was 500)
> 800 - mlogger (mlogger.c, was 500)
>
> P.S. However, even after this change, the TRIUMF ISAC/Dragon experiment still
> see an anomaly in the analyzer, where it receives data events after the
> end-of-run transition.
>
> K.O.
Thanks for fixing that bug. It happend because during the implementatoin of the priority
sequencing we have up the pre/post tansition, which took care of the proper sequence
between the logger, frontend and analyzer. The way you modified the sequence is
absolutely correct. It is important to have >10 numbers "around" the frontends (like
450...550) in case one has an experiment with >10 frontends which need to make a
transition in a certain sequence (like the DANCE experiment in Los Alamos). |
229
|
17 Oct 2005 |
Exaos Lee | Bug Fix | "make install" error under MacOS X |
Under MacOS X, "make install" will cours an error like this:
...
install: darwin/bin/dio: No such file or directory
make: *** [install] Error 71
This can be fixed as the following diff:
404,405c404,405
< $(BIN_DIR)/mcnaf: $(UTL_DIR)/mcnaf.c $(DRV_DIR)/camac/camacrpc.c
< $(CC) $(CFLAGS) $(OSFLAGS) -o $@ $(UTL_DIR)/mcnaf.c $(DRV_DIR)/camac/camacrpc.c $(LIB) $(LIBS)
---
> $(BIN_DIR)/mcnaf: $(UTL_DIR)/mcnaf.c $(DRV_DIR)/bus/camacrpc.c
> $(CC) $(CFLAGS) $(OSFLAGS) -o $@ $(UTL_DIR)/mcnaf.c $(DRV_DIR)/bus/camacrpc.c $(LIB) $(LIBS)
438c438,439
< @for i in mserver mhttpd odbedit mlogger ; \
---
>
> @for i in mserver mhttpd odbedit mlogger dio ; \
444,447d444
< chmod +s $(SYSBIN_DIR)/mhttpd
<
< ifeq ($(OSTYPE),linux)
< install -v -m 755 $(BIN_DIR)/dio $(SYSBIN_DIR)
449c446
< endif
---
> chmod +s $(SYSBIN_DIR)/mhttpd
|
235
|
23 Nov 2005 |
Stefan Ritt | Bug Fix | Endian swapping in mana.c |
It was reported that following code in mana.c :
/* swap event header if in wrong format */
if (pevent->serial_number > 0x1000000) {
WORD_SWAP(&pevent->event_id);
WORD_SWAP(&pevent->trigger_mask);
DWORD_SWAP(&pevent->serial_number);
DWORD_SWAP(&pevent->time_stamp);
DWORD_SWAP(&pevent->data_size);
}
does not work correctly for events having a true serial number above 16777216 (=0x10000000). After some considerations, I concluded that there is no good way to determine automatically the endian format of midas events, without adding another field in the header, which would break the compatibility with all recorded data up to date. I therefore changed the above code to
/* swap event header if in wrong format */
#ifdef SWAP_EVENTS
WORD_SWAP(&pevent->event_id);
WORD_SWAP(&pevent->trigger_mask);
DWORD_SWAP(&pevent->serial_number);
DWORD_SWAP(&pevent->time_stamp);
DWORD_SWAP(&pevent->data_size);
#endif
So if one wants to analyze events with the midas analyzer on a PC system for example where the events come from a VxWorks system with the opposite endian encoding, one has to set the flag -DSWAP_EVENTS when compiling the analyzer for that type of analysis. |
252
|
07 May 2006 |
Konstantin Olchanski | Bug Fix | Update & add VME drivers |
I commited fixes for a few minor compilation errors in the VME drivers
(vmicvme.c, etc)
I also added new drivers for the v513 latch and v560 scaler that I wrote for
CERN-ALPHA.
(Maybe I should mention that we also have drivers for the SIS 3820 multiscaler,
the v895 VME discriminator and a few more modules. Will commit them as they mature).
K.O. |
256
|
18 May 2006 |
Konstantin Olchanski | Bug Fix | removed a few "//" comments to fix compilation on VxWorks |
Our VxWorks C compiler (gcc-2.8-something) does not like the "//" comments. Luckily, on VxWorks, we
only compile a small subset of midas, so there is no point in banning all "//" comments. But I did have to
convert a couple of them to /* commens */ in odb.c to make it compile. Changes to odb.c commited. K.O. |