ID |
Date |
Author |
Topic |
Subject |
3007
|
28 Mar 2025 |
Konstantin Olchanski | Info | mjsroot added | I need to look at histograms inside a ROOT file, but all the old ways for doing this no longer work. (in theory I can scp the ROOT file to
the computer I am sitting in front of, but this assumes I have a working ROOT there. anyhow it is pointless to fight this, all modern
packages are written to only work on the developer's laptop).
- root new TBrowser starts a web server, tries to open firefox (and fails)
- root --web=off new TBrowser using ssh X11 tunnel no longer works, ROOT X11 graphics refresh is broken
- macos root binary kit is built without X11 support, root --web=off does not work at all
- root7 recommended "rootssh" prints an error message (and fails)
What does work well is JSROOT which we use to look at manalyzer live histograms (through apache and mhttpd web proxies).
So I wrote mjsroot.exe. It opens a ROOT file and starts JSROOT to look at it (plus a bit of dancing around to make it actually work):
mjsroot.exe -R8082 root_output_files/output00371.root
To actually see the histograms:
a) if you sitting in front of the same computer, open http://localhost:8082
b) if you are somewhere else, start an ssh tunnel: ssh daq13 -L8082:localhost:8082, open http://localhost:8082
c) if daq13 is running mhttpd, setup http proxy:
set ODB /webserver/proxy/mjsroot to http://localhost:8082
open https://daq13.triumf.ca/proxy/mjsroot/
also
set ODB /alias/mjsroot to "/proxy/mjsroot/"
reload MIDAS status page, observe "mjsroot" in listed in the left-hand side, open it.
K.O.
|
3008
|
28 Mar 2025 |
Konstantin Olchanski | Bug Fix | midas cmake update | MIDAS git tag midas-2025-01-a introduced an incompatible change to "include midas-targets.cmake". Instead of "midas" one now has to
say "midas::midas", as updated below. K.O.
>
> #
> # CMakeLists.txt for alpha-g frontends
> #
>
> cmake_minimum_required(VERSION 3.12)
> project(agdaq_frontends)
>
> include($ENV{MIDASSYS}/lib/midas-targets.cmake)
>
> add_compile_options("-O2")
> add_compile_options("-g")
> #add_compile_options("-std=c++11")
> add_compile_options(-Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function)
> add_compile_options("-DTMFE_REV0")
> add_compile_options("-DOS_LINUX")
>
> add_executable(feevb feevb.cxx TsSync.cxx)
> target_link_libraries(feevb midas::midas)
>
> add_executable(fectrl fectrl.cxx GrifComm.cxx EsperComm.cxx JsonTo.cxx KOtcp.cxx $ENV{MIDASSYS}/src/tmfe_rev0.cxx)
> target_link_libraries(fectrl midas::midas)
>
> #end |
3009
|
28 Mar 2025 |
Konstantin Olchanski | Suggestion | improved find_package behaviour for Midas | I figured out the breakage, added a git tag to identify where the cmake incompatible change was made (roughly)
and posted a note on how to fix it. Please reimburse me for the 2 hours I had to spend on this instead of doing
useful work. K.O. |
3010
|
28 Mar 2025 |
Konstantin Olchanski | Bug Fix | manalyzer -R8082 --jsroot | When processing MIDAS files offline, JSROOT did not work, -Rxxx worked, http
connection would open, but would not serve any histograms. This should now be
fixed.
In addition, normally, after processing all input MIDAS files, manalyzer would
exit, JSROOT would abruptly stop. To look at final results one had to open the
ROOT files using some other method (roody, TBrowser, mjsroot, etc).
I now added a command line switch "--jsroot", if supplied, after processing all
input MIDAS files, manalyzer will keep running in the JSROOT server mode (same as
mjsroot).
"manalyzer -R8082 --jsroot run*.mid.lz4" now does something useful: open
http://localhost:8082 (or ssh tunnel or mhttpd proxy per my mjsroot message) and
watch histograms fill in real time, after analysis finishes, keep looking at the
final results until bored. stop manalyzer using Ctrl-C. (we should add a "Stop
JSROOT" botton to the JSROOT main page).
MIDAS commit 1d0d6448c3ec4ffd225b8d2030fe13e379fcd007
K.O. |
3011
|
30 Mar 2025 |
Konstantin Olchanski | Bug Fix | manalyzer improvements | updated manalyzer:
- similar to --jsroot switch, in online mode, the ROOT output file remains open after run is stopped. Previously, after run was
stopped, all histograms & etc would disappear from JSROOT, making it hard to look at the full collected and analyzed data.
- there was a buglet in the multithreading code, if some module cannot analyze flow events as fast as we can read data from disk,
the flow event queue of the first module thread would grow and grow and grow infinitely, potentially consume lots of RAM. This is
because control of queue size for the first module thread was disabled to avoid a deadlock. I now added the queue size check to the
main event loop (both offline mode and online mode) and this problem should now be fixed.
- also adjusted the default queue size from 100 to 1000 and queue-full wait sleep time from 100 us to 10 us.
- another buglet was in the flow event processing. per the README, module EndRun() should not generate flow events (instead, they
should be generated in PreEndRun()). Previously this was not enforced, now there is an error message about this and the offending
flow events are deleted. (they were not being processed anyway).
K.O. |
3017
|
01 Apr 2025 |
Konstantin Olchanski | Bug Fix | ODB and event buffer - release semaphore before abort() and core dump | There is a long standing problem with ODB and event buffers. If they detect an
internal data inconsistency and cannot continue running, they call abort() to
dump core and stop.
Problem is in some code paths, they do this while holding the ODB or event
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after
core dump is finished and program holding them is stopped).
If core dump takes longer than 10 seconds (for whatever reason, but we see this
often enough), all other programs that wait for ODB or event buffer access, will
also timeout and also crash (with core dump). Result is a core dump storm, at
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply
restart everything).
Now I realize that in many situation, we do not need to hold the semaphore while
dumping core - the content of ODB and event buffer shared memories is not
important for debugging the crash - and it is safe to release the semaphore
before calling abort().
This is now implemented for ODB and event buffers. Hopefully core dump storms
will not happen again.
commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes
longer than 10 sec all other midas programs will timeout and crash.
commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(),
refactor bm_get_my_client_locked()
K.O. |
3018
|
01 Apr 2025 |
Konstantin Olchanski | Bug Report | ODB corruption | We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.
Crash is memset() called by db_delete_key1() called by cm_connect_experiment().
I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.
Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().
I will not do this immediately today, but hopefully next week or so.
Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1 0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2 free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3 0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>,
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4 db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5 0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6 0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7 0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8 0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca",
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
at /usr/include/c++/11/bits/basic_string.h:194
#9 0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "",
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit",
func=func@entry=0x0,
odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4 db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789 free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0,
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1,
last_written = 1953785965}
(gdb)
K.O. |
3019
|
01 Apr 2025 |
Konstantin Olchanski | Suggestion | Sequencer ODBSET feature requests | > ODBSET "/Path/value[1,3,5]"
> ODBSET "/Path/value[1-5,7-9]"
we support this array index syntax in several places,
specifically, in javascript odb get and set mjsonrpc RPCs.
> SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]"
> SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]"
> ODBSET "/Path/value[0-100, except $BADCHANNELS]"
this is very clever syntax, but I have not seen any programming
language actually implement it (not even perl).
there must be a good reason why nobody does this. probably we should not do it either.
but as Stefan said (and my opinion), the route of extending MIDAS sequencer
language until it becomes a superset of python, perl, tcl, bash, javascript
and algol is not a sustainable approach. I once looked at using LUA for this,
but I think basing off an full featured programming language like python
is better.
K.O. |
3020
|
01 Apr 2025 |
Konstantin Olchanski | Bug Report | MIDAS history system not using the event timestamps ? | > I confirm that when writing out history files corresponding to the slow control event data,
> MIDAS history system timestamps the data not with the event time coming from the event data,
> but with the current time determined by [mlogger].
This is correct. The timestamp in the history file is the mlogger timestamp.
In theory we could use the ODB "last_written" timestamp, but in practice,
timestamps are 1 second granularity and the difference between the two
timestamps normally would be less than 1 second. (time to react to db_watch()).
But ODB last_written also is not the data timestamp. For remote connected clients
it includes the mserver communication delay.
What is the data timestamp, only the user knows - for some FPGA based equipments,
I can see the data timestamp being read from an FPGA register together with the data.
But back to earth.
For making history plots, 1 second granularity with a small (a few seconds) delay should be okey,
and I think the mserver timestamp is good enough.
For data analysis, you are reading history data from a history data file and you are
not constrained to using the MIDAS timestamp.
You can always include your "true" data timestamp as the first value in your data.
We do this in felaview for writing labview data to midas history in the ALPHA antihydrogen experiment at CERN.
This also anticipates your next request, can we have millisecond, microsecond, nanosecond history timestamps:
since you define your "true" data timestamp, you an make it anything you want. (I use "double" time in seconds,
64-bit IEEE-754 "double" has enough precision for microsecond granularity. FPGA based devices can have timestamps
with 10 ns or 8 ns granularity, in this case a uint64_t clock counter could be more appropriate).
K.O. |
3023
|
02 Apr 2025 |
Konstantin Olchanski | Bug Report | MIDAS history system not using the event timestamps ? | > > You can always include your "true" data timestamp as the first value in your data.
>
> Are you saying that if the first data word of a history event were a timestamp,
> the MIDAS history system, when plotting the time dependencies, would use that timestamp
> instead of the mlogger timestamp?
>
you are correct, midas knows nothing about what you put in the history data.
what I suggested is: if you want your true data timestamp recorded in the history,
you can put it into the history data yourself, and I suggested using the 1st value,
but you can also make it the last value or the 10th value, it is up to you.
for making history plots, the history timestamp is used, as you wrote and I confirmed,
this timestamp is generated by mlogger.
what is not clear to me is why this is a problem? do you see a big difference between the
true data timestamp and the mlogger data timestamp? bigger than 1 second? (this would change
the shape of "last 10 minutes" plots (600 seconds). bigger than 1 minute? (this would change
the shape of "last 1 hour plots" (60 minutes, 3600 seconds).
that said, note that we currently store the timestamp as a DWORD 32-bit UNIX time value
which will overflow in 2038 and which is quickly becoming incompatible with the ongoing
switch to 64-bit time_t. Ubuntu-24 already build a large number of system libraries with 64-
bit time_t and building MIDAS with 32-bit time_t may soon become as difficult as building
32-bit MIDAS for 32-bit i686 VME processors. we have to move with the times.
what it means is that the history system data format will have to be updated to 64-bit
time_t and at the same time, we may try to change the timestamp from mlogger-generated to
frontend-generated.
but it is still not clear to me how that helps you, because the frontend-generated timestamp
is not the true data timestamp that you wanted. (and only you know what the true data
timestamp is and where it comes from and how to tell it to MIDAS).
K.O. |
3024
|
02 Apr 2025 |
Konstantin Olchanski | Suggestion | Sequencer ODBSET feature requests | > I once looked at using LUA for this
>
> > but I think basing off an full featured programming language like python
> > is better.
>
> if it came to a vote, my vote would go to Lua: it would allow to do everything needed,
> with much less external dependencies and with much less motivation to over-use the interpreter.
> The CMS experience was very teaching in this respect...
Unfortunately I am only slightly aware of Lua to say how nicve or how bad it is. And we are
not sure how well it supports the single-line-stepping that permits the nice graphical
visualization of Stefan's sequencer.
It looks like python has the single-line-stepping built-in as a standard feature
and python is a more popular and more versatile machine, so to me python looks
like a better choice compared to lua (obscure), perl ("nobody uses it anymore")
or bash (ugly syntax).
K.O. |
3034
|
05 May 2025 |
Konstantin Olchanski | Bug Fix | Bug fix in SQL history | A bug was introduced to the SQL history in 2022 that made renaming of variable names not work. This is now fixed.
break commit:
54bbc9ed5d65d8409e8c9fe60b024e99c9f34a85
fix commit:
159d8d3912c8c92da7d6d674321c8a26b7ba68d4
P.S.
This problem was caused by an unfortunate design of the c++ class system. If I want to add more data to an existing
class, I write this:
class old_class {
int i,j,k;
}
class bigger_class: public old_class {
int additional_variable;
}
But if I have this:
struct x { int i,j; }
class y {
std::vector<x> array_of_x;
}
and I want to add "k" to "x", c++ has not way to do this. history code has this workaround:
class bigger_y: public y
{
std::vector<int> array_of_k;
}
int bigger_y:foo(int n) {
printf("%d %d %d\", array_of_x[n].i, array_of_x[n].j, array_of_k[n]);
}
problem is that it is not obvious that "array_of_x" and "array_of_k" are connected
and they can easily get out of sync (if elements are added or removed). this is the
bug that happened in the history code. I now added assert(array_of_x.size()==array_of_k.size())
to offer at least some protection going forward.
P.S. As final solution I think I want to completely separate file history and sql history code,
they have more things different than common.
K.O. |
3035
|
05 May 2025 |
Konstantin Olchanski | Info | db_delete_key(TRUE) | I was working on an odb corruption crash inside db_delete_key() and I noticed
that I did not test db_delete_key() with follow_links set to TRUE. Then I noticed
that nobody nowhere seems to use db_delete_key() with follow_links set to TRUE.
Instead of testing it, can I just remove it?
This feature existed since day 1 (1st commit) and it does something unexpected
compared to filesystem "/bin/rm": the best I can tell, it is removes the link
*and* whatever the link points to. For people familiar with "/bin/rm", this is
somewhat unexpected and by my thinking, if nobody ever added such a feature to
"/bin/rm", it is probably not considered generally useful or desirable. (I would
think it dangerous, it removes not 1 but 2 files, the 2nd file would be in some
other directory far away from where we are).
By this thinking, I should remove "follow_links" (actually just make it do thing
, to reduce the disturbance to other source code). db_delete_key() should work
similar to /bin/rm aka the unlink() syscall.
K.O. |
3036
|
05 May 2025 |
Konstantin Olchanski | Bug Report | abort and core dump in cm_disconnect_experiment() | I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.
mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires
the watchdog.
In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the
core dump.
I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".
Stack trace from mhist:
[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue,
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...
Program received signal SIGABRT, Aborted.
Download failed: Invalid argument. Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6 0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7 bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8 0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9 0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb)
Stack trace from mserver:
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6 0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7 bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8 0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9 bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387
K.O. |
3040
|
16 May 2025 |
Konstantin Olchanski | Bug Report | history_schema.cxx fails to build | > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.
Missing from this report is critical information: HAVE_PGSQL is set.
I will have to check why it is not set in my development account.
I will have to check why it is not set in our bitbucket build.
Thank you for reporting this problem.
K.O. |
3041
|
16 May 2025 |
Konstantin Olchanski | Bug Report | history_schema.cxx fails to build | > > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
>
> Missing from this report is critical information: HAVE_PGSQL is set.
>
> I will have to check why it is not set in my development account.
>
The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages
apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS
>
> I will have to check why it is not set in our bitbucket build.
>
Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).
>
> Thank you for reporting this problem.
>
Fix committed. Sorry about this problem.
K.O. |
3050
|
04 Jun 2025 |
Konstantin Olchanski | Bug Report | Memory leak in mhttpd binary RPC code | Noted. I will look at this asap. K.O.
[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to
close to 100Gb. We think we've traced this to the following issue when making
RPC calls.
[LIST]
[*] The brpc method allocates memory for the response at
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the
buffer if the size is greater than zero.
[/LIST]
Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push
to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?
Thanks,
Mark.[/quote] |
3055
|
10 Jun 2025 |
Konstantin Olchanski | Bug Report | Memory leak in mhttpd binary RPC code | I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers,
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty
file, return of "new char[0]" is never freed).
"receive_event" and "read_history" RPCs never use zero-size buffers and are not
affected by this bug.
mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
Please post bug reports a plain-text so I can quote from them.
K.O. |
3065
|
23 Jul 2025 |
Konstantin Olchanski | Suggestion | K.O.'s guide to new C/C++ data types | Over the last 10 years, the traditional C/C++ data types have been
displaced by a hodgepodge of new data types that promise portability
and generate useful (and not so useful) warnings, for example:
for (int i=0; i<array_of_10_elements.size(); i++)
is now a warning with a promise of crash (in theory, even if "int" is 64 bit).
"int" and "long" are dead, welcome "size_t", "off64_t" & co.
What to do, what to do? This is what I figured out:
1) for data returned from hardware: use uint16_t, uint32_t, uint64_t, uint128_t (u16, u32, u64 in
the Linux kernel), they have well defined width to match hardware (FPGA, AXI, VME, etc) data
widths.
2) for variables used with strlen(), array.size(), etc: use size_t, a data type wide enough to
store the biggest data size possible on this hardware (32-bit on 32-bit machines, 64-bit on 64-bit
machines). use with printf("%zu").
3) for return values of read() and write() syscalls: use ssize_t and observe an inconsistency,
read() and write() syscalls take size_t (32/64 bits), return ssize_t (31/63 bits) and the error
check code cannot be written without having to defeat the C/C++ type system (a cast to size_t):
size_t s = 100;
void* ptr = malloc(s);
ssize_t rd = read(fd, ptr, s);
if (rd < 0) { syscall error }
else if ((size_t)rd != s) { short read, important for TCP sockets }
else { good read }
use ssize_t with printf("%zd")
4) file access uses off64_t with lseek64() and ftruncate64(), this is a signed type (to avoid the
cast in the error handling code) with max file size 2^63 (at $/GB, storage for a file of max size
costs $$$$$, you cannot have enough money to afford one). use with printf("%jd", (intmax_t)v).
intmax_t by definition is big enough for all off64_t values, "%jd" is the corresponding printf()
format.
5) there is no inconsistency between 32-bit size_t and 64-bit off64_t, on 32-bit systems you can
only read files in small chunks, but you can lseek64() to any place in the file.
BTW, 64-bit time_t has arrived with Ubuntu LTS 24.04, I will write about this some other time. |
3066
|
24 Jul 2025 |
Konstantin Olchanski | Suggestion | K.O.'s guide to new C/C++ data types | > for (int i=0; i<array_of_10_elements.size(); i++)
becomes
for (size_t i=0; i<array.size(); i++)
but for a reverse loop, replacing "int" with "size_t" becomes a bug:
for (size_t i=array.size()-1; i>=0; i--)
explodes, last iteration should be with i set to 0, then i--
wraps it around a very big positive value and loop end condition
is still true (i>=0), the loop never ends. (why is there no GCC warning
that with "size_t i", "i>=0" is always true?
a kludge solution is:
for (size_t i=array.size()-1; ; i--) {
do_stuff(i, array[i]);
if (i==0) break;
}
if you do not need the index variable, you can use a reverse iterator (which is missing from a few
container classes).
K.O. |
|