Back Midas Rome Roody Rootana
  Midas DAQ System, Page 1 of 49  Not logged in ELOG logo
Entry  24 Jul 2025, Konstantin Olchanski, Bug Fix, support for large history files 
FILE history code (mhf_*.dat files) did not support reading history files bigger than about 2GB, this is now 
fixed on branch "feature/history_off64_t" (in final testing, to be merged ASAP).

History files were never meant to get bigger than about 100 MBytes, but it turns out large files can still 
happen:

1) files are rotated only when history is closed and reopened
2) we removed history close and open on run start
3) so files are rotated only when mlogger is restarted

In the old code, large files would still happen if some equipment writes a lot of data (I have a file from 
Stefan with history record size about 64 kbytes, written at 1/second, MIDAS handles this just fine) or if 
there is no runs started and stopped for a long time.

There are reasons for keeping file size smaller:

a) I would like to use mmap() to read history files, and mmap() of a 100 Gbyte file on a 64 Gbyte RAM 
machine would not work very well.
b) I would like to implement compressed history files and decompression of a 100 Gbyte file takes much 
longer than decompression of a 100 Mbyte file. it is better if data is in smaller chunks.

(it is easy to write a utility program to break-up large history files into smaller chunks).

Why use mmap()? I note that the current code does 1 read() syscall per history record (it is much better to 
read data in bigger chunks) and does multiple seek()/read() syscalls to find the right place in the history 
file (plays silly buggers with the OS read-ahead and data caching). mmap() eliminates all syscalls and has 
the potential to speed things up quite a bit.

K.O.
Entry  23 Jul 2025, Konstantin Olchanski, Suggestion, K.O.'s guide to new C/C++ data types 
Over the last 10 years, the traditional C/C++ data types have been
displaced by a hodgepodge of new data types that promise portability
and generate useful (and not so useful) warnings, for example:

for (int i=0; i<array_of_10_elements.size(); i++)

is now a warning with a promise of crash (in theory, even if "int" is 64 bit).

"int" and "long" are dead, welcome "size_t", "off64_t" & co.

What to do, what to do? This is what I figured out:

1) for data returned from hardware: use uint16_t, uint32_t, uint64_t, uint128_t (u16, u32, u64 in 
the Linux kernel), they have well defined width to match hardware (FPGA, AXI, VME, etc) data 
widths.

2) for variables used with strlen(), array.size(), etc: use size_t, a data type wide enough to 
store the biggest data size possible on this hardware (32-bit on 32-bit machines, 64-bit on 64-bit 
machines). use with printf("%zu").

3) for return values of read() and write() syscalls: use ssize_t and observe an inconsistency, 
read() and write() syscalls take size_t (32/64 bits), return ssize_t (31/63 bits) and the error 
check code cannot be written without having to defeat the C/C++ type system (a cast to size_t):

size_t s = 100;
void* ptr = malloc(s);
ssize_t rd = read(fd, ptr, s);
if (rd < 0) { syscall error }
else if ((size_t)rd != s) { short read, important for TCP sockets }
else { good read }

use ssize_t with printf("%zd")

4) file access uses off64_t with lseek64() and ftruncate64(), this is a signed type (to avoid the 
cast in the error handling code) with max file size 2^63 (at $/GB, storage for a file of max size 
costs $$$$$, you cannot have enough money to afford one). use with printf("%jd", (intmax_t)v). 
intmax_t by definition is big enough for all off64_t values, "%jd" is the corresponding printf() 
format.

5) there is no inconsistency between 32-bit size_t and 64-bit off64_t, on 32-bit systems you can 
only read files in small chunks, but you can lseek64() to any place in the file.

BTW, 64-bit time_t has arrived with Ubuntu LTS 24.04, I will write about this some other time.
    Reply  24 Jul 2025, Konstantin Olchanski, Suggestion, K.O.'s guide to new C/C++ data types 
> for (int i=0; i<array_of_10_elements.size(); i++)

becomes

for (size_t i=0; i<array.size(); i++)

but for a reverse loop, replacing "int" with "size_t" becomes a bug:

for (size_t i=array.size()-1; i>=0; i--)

explodes, last iteration should be with i set to 0, then i--
wraps it around a very big positive value and loop end condition
is still true (i>=0), the loop never ends. (why is there no GCC warning
that with "size_t i", "i>=0" is always true?

a kludge solution is:

for (size_t i=array.size()-1; ; i--) {
do_stuff(i, array[i]);
if (i==0) break;
}

if you do not need the index variable, you can use a reverse iterator (which is missing from a few 
container classes).

K.O.
Entry  19 Feb 2025, Lukas Gerritzen, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
We have a frontend for slow control with a lot of legacy code. I wanted to add a new equipment using the
mdev_mscb class. It seems like the default write cache size is 1000000B now, which produces error
messages like this:
12:51:20.154 2025/02/19 [SC Frontend,ERROR] [mfe.cxx:620:register_equipment,ERROR] Write cache size mismatch for buffer "SYSTEM": equipment "Environment" asked for 0, while eqiupment "LED" asked for 10000000
12:51:20.154 2025/02/19 [SC Frontend,ERROR] [mfe.cxx:620:register_equipment,ERROR] Write cache size mismatch for buffer "SYSTEM": equipment "LED" asked for 10000000, while eqiupment "Xenon" asked for 0

I can manually change the write cache size in /Equipment/LED/Common/Write cache size to 0. However, if I delete the LED tree in the ODB, then I get the same problems again. It would be nice if I could either choose the size as 0 in the frontend code, or if the defaults were compatible with our legacy code.

The commit that made the write cache size configurable seems to be from 2019: https://bitbucket.org/tmidas/midas/commits/3619ecc6ba1d29d74c16aa6571e40920018184c0
    Reply  24 Feb 2025, Stefan Ritt, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
The commit that introduced the write cache size check is https://bitbucket.org/tmidas/midas/commits/3619ecc6ba1d29d74c16aa6571e40920018184c0

Unfortunately K.O. added the write cache size to the equipment list, but there is currently no way to change this programmatically from the user frontend code. The options I see are

1) Re-arrange the equipment settings so that the write case size comes to the end of the list which the user initializes, like
   {"Trigger",               /* equipment name */
      {1, 0,                 /* event ID, trigger mask */
         "SYSTEM",           /* event buffer */
         EQ_POLLED,          /* equipment type */
         0,                  /* event source */
         "MIDAS",            /* format */
         TRUE,               /* enabled */
         RO_RUNNING |        /* read only when running */
         RO_ODB,             /* and update ODB */
         100,                /* poll for 100ms */
         0,                  /* stop run after this event limit */
         0,                  /* number of sub events */
         0,                  /* don't log history */
         "", "", "", "", "", 0, 0},
      read_trigger_event,    /* readout routine */
      10000000,              /* write cache size */
   },

2) Add a function
fe_set_write_case(int size);
which goes through the local equipment list and sets the cache size for all equipments to be the same.

I would appreciate some guidance from K.O. who introduced that code above.

/Stefan
       Reply  20 Mar 2025, Konstantin Olchanski, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
I think I added the cache size correctly:

  {"Trigger",               /* equipment name */
      {1, 0,                 /* event ID, trigger mask */
         "SYSTEM",           /* event buffer */
         EQ_POLLED,          /* equipment type */
         0,                  /* event source */
         "MIDAS",            /* format */
         TRUE,               /* enabled */
         RO_RUNNING |        /* read only when running */
         RO_ODB,             /* and update ODB */
         100,                /* poll for 100ms */
         0,                  /* stop run after this event limit */
         0,                  /* number of sub events */
         0,                  /* don't log history */
         "", "", "", "", "", // frontend_host, name, file_name, status, status_color
         0, // hidden
         0  // write_cache_size <<--------------------- set this to zero -----------
      },
   }

K.O.
          Reply  20 Mar 2025, Konstantin Olchanski, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
the main purpose of the event buffer write cache is to prevent high contention for the 
event buffer shared memory semaphore in the pathological case of very high rate of very 
small events.

there is a computation for this, I have posted it here several times, please search for 
it.

in the nutshell, you want the semaphore locking rate to be around 10/sec, 100/sec 
maximum. coupled with smallest event size and maximum practical rate (1 MHz), this 
yields the cache size.

for slow control events generated at 1 Hz, the write cache is not needed, 
write_cache_size value 0 is the correct setting.

for "typical" physics events generated at 1 kHz, write cache size should be set to fit 
10 events (100 Hz semaphore locking rate) to 100 events (10 Hz semaphore locking rate).

unfortunately, one cannot have two cache sizes for an event buffer, so typical frontends 
that generate physics data at 1 kHz and scalers and counters at 1 Hz must have a non-
zero write cache size (or semaphore locking rate will be too high).

the other consideration, we do not want data to sit in the cache "too long", so the 
cache is flushed every 1 second or so.

all this cache stuff could be completely removed, deleted. result would be MIDAS that 
works ok for small data sizes and rates, but completely falls down at 10 Gige speeds and 
rates.

P.S. why is high semaphore locking rate bad? it turns out that UNIX and Linux semaphores 
are not "fair", they do not give equal share to all users, and (for example) an event 
buffer writer can "capture" the semaphore so the buffer reader (mlogger) will never get 
it, a pathologic situation (to help with this, there is also a "read cache"). Read this 
discussion: https://stackoverflow.com/questions/17825508/fairness-setting-in-semaphore-
class

K.O.
             Reply  20 Mar 2025, Konstantin Olchanski, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
> the main purpose of the event buffer write cache

how to control the write cache size:

1) in a frontend, all equipments should ask for the same write cache size, both mfe.c and 
tmfe frontends will complain about mismatch

2) tmfe c++ frontend, per tmfe.md, set fEqConfWriteCacheSize in the equipment constructor, in 
EqPreInitHandler() or EqInitHandler(), or set it in ODB. default value is 10 Mbytes or value 
of MIN_WRITE_CACHE_SIZE define. periodic cache flush period is 0.5 sec in 
fFeFlushWriteCachePeriodSec.

3) mfe.cxx frontend, set it in the equipment definition (number after "hidden"), set it in 
ODB, or change equipment[i].write_cache_size. Value 0 sets the cache size to 
MIN_WRITE_CACHE_SIZE, 10 Mbytes.

4) in bm_set_cache_size(), acceptable values are 0 (disable the cache), MIN_WRITE_CACHE_SIZE 
(10 Mbytes) or anything bigger. Attempt to set the cache smaller than 10 Mbytes will set it 
to 10 Mbytes and print an error message.

All this is kind of reasonable, as only two settings of write cache size are useful: 0 to 
disable it, and 10 Mbytes to limit semaphore locking rate to reasonable value for all event 
rate and size values practical on current computers.

In mfe.cxx it looks to be impossible to set the write cache size to 0 (disable it), but 
actually all you need is call "bm_set_cache_size(equipment[0].buffer_handle, 0, 0);" in 
frontend_init() (or is it in begin_of_run()?).

K.O.
                Reply  20 Mar 2025, Konstantin Olchanski, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
> > the main purpose of the event buffer write cache
> how to control the write cache size:

OP provided insufficient information to say what went wrong for them, but do try this:

1) in ODB, for all equipments, set write_cache_size to 0
2) in the frontend equipment table, set write_cache_size to 0

That is how it is done in the example frontend: examples/experiment/frontend.cxx

If this configuration still produces an error, we may have a bug somewhere, so please let us know how it shakes out.

K.O.
                Reply  21 Mar 2025, Stefan Ritt, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
 > All this is kind of reasonable, as only two settings of write cache size are useful: 0 to 
> disable it, and 10 Mbytes to limit semaphore locking rate to reasonable value for all event 
> rate and size values practical on current computers.

Indeed KO is correct that only 0 and 10MB make sense, and we cannot mix it. Having the cache setting in the equipment table is 
cumbersome. If you have 10 slow control equipment (cache size zero), you need to add many zeros at the end of 10 equipment 
definitions in the frontend. 

I would rather implement a function or variable similar to fEqConfWriteCacheSize in the tmfe framework also in the mfe.cxx 
framework, then we need only to add one line llike

gEqConfWriteCacheSize = 0;

in the frontend.cxx file and this will be used for all equipments of that frontend. If nobody complains, I will do that in April when I'm 
back from Japan.

Stefan
                   Reply  25 Mar 2025, Konstantin Olchanski, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
>  > All this is kind of reasonable, as only two settings of write cache size are useful: 0 to 
> > disable it, and 10 Mbytes to limit semaphore locking rate to reasonable value for all event 
> > rate and size values practical on current computers.
> 
> Indeed KO is correct that only 0 and 10MB make sense, and we cannot mix it. Having the cache setting in the equipment table is 
> cumbersome. If you have 10 slow control equipment (cache size zero), you need to add many zeros at the end of 10 equipment 
> definitions in the frontend. 
> 
> I would rather implement a function or variable similar to fEqConfWriteCacheSize in the tmfe framework also in the mfe.cxx 
> framework, then we need only to add one line llike
> 
> gEqConfWriteCacheSize = 0;
> 
> in the frontend.cxx file and this will be used for all equipments of that frontend. If nobody complains, I will do that in April when I'm 
> back from Japan.

Cache size is per-buffer. If different equipments write into different event buffers, should be possible to set different cache sizes.

Perhaps have:

set_write_cache_size("SYSTEM", 0);
set_write_cache_size("BUF1", bigsize);

with an internal std::map<std::string,size_t>; for write cache size for each named buffer

K.O.
                      Reply  21 Jul 2025, Stefan Ritt, Bug Report, Default write cache size for new equipments breaks compatibility with older equipments 
> Perhaps have:
> 
> set_write_cache_size("SYSTEM", 0);
> set_write_cache_size("BUF1", bigsize);
> 
> with an internal std::map<std::string,size_t>; for write cache size for each named buffer

Ok, this is implemented now in mfed.cxx and called from examples/experiment/frontend.cxx

Stefan
Entry  13 Jul 2025, Zaher Salman, Info, PySequencer 
As many of you already know Ben introduced the new PySequencer that allows running python scripts from MIDAS. In the last couple of month we have been working on integrating it into the MIDAS pages. We think that it is now ready for general testing.

To use the PySequencer:
1- Enable it from /Experiment/Menu
2- Refresh the pages to see a new PySequencer menu item
3- Click on it to start writing and executing your python script.

The look and feel are identical to the msequencer pages (both use the same JavaScript code).

Please report problems and bug here.

Known issues:
The first time you start the PySequencer program it may fail. To fix this copy:
$MIDASSYS/python/examples/pysequencer_script_basic.py
to
online/userfiles/sequencer/
and set /PySequencer/State/Filename to pysequencer_script_basic.py
Entry  04 Jul 2025, Mark Grimes, Bug Report, Memory leaks in mhttpd 
Something changed in our system and we started seeing memory leaks in mhttpd again.  I guess someone 
updated some front end or custom page code that interacted with mhttpd differently.
I found a few memory leaks in some (presumably) rarely seen corner cases and we now see steady 
memory usage.  The branch is fix/memory_leaks 
(https://bitbucket.org/tmidas/midas/branch/fix/memory_leaks) and I opened pull request #55 
(https://bitbucket.org/tmidas/midas/pull-requests/55).  I couldn't find a BitBucket account for you 
Konstantin to add as a reviewer, so it currently has none.

Thanks,

Mark.
Entry  04 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code 
Hi,
During an evening of running we noticed that memory usage of mhttpd grew to close to 100Gb. We think we've traced this to the following issue when making RPC calls.

  • The brpc method allocates memory for the response at src/mjsonrpc.cxx#lines-3449.
  • It then makes the call at src/mjsonrpc.cxx#lines-3460, which may set `buf_length` to zero if the response was empty.
  • It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to an `MJsonNode`, providing `buf_length` as the size.
  • When the `MJsonNode` is destructed at mjson.cxx#lines-657, it only calls `free` on the buffer if the size is greater than zero.

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push to https://bitbucket.org/tmidas/mjson.git. Could somebody take a look?

Thanks,

Mark.
    Reply  04 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code 
Noted. I will look at this asap. K.O.

[quote="Mark Grimes"]Hi,
During an evening of running we noticed that memory usage of mhttpd grew to 
close to 100Gb.  We think we've traced this to the following issue when making 
RPC calls.

[LIST]
[*] The brpc method allocates memory for the response at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3449]src/mjsonrpc.cxx#lines-3449[/URL].
[*] It then makes the call at 
[URL=https://bitbucket.org/tmidas/midas/src/67db8627b9ae381e5e28800dfc4c350c5bd0
5e3f/src/mjsonrpc.cxx#lines-3460]src/mjsonrpc.cxx#lines-3460[/URL], which may 
set `buf_length` to zero if the response was empty.
[*] It then uses `MJsonNode::MakeArrayBuffer` to pass ownership of the memory to 
an `MJsonNode`, providing `buf_length` as the size.
[*] When the `MJsonNode` is destructed at 
[URL=https://bitbucket.org/tmidas/mjson/src/9d01b3f72722bbf7bcec32ae218fcc0825cc
9e7f/mjson.cxx#lines-657]mjson.cxx#lines-657[/URL], it only calls `free` on the 
buffer if the size is greater than zero.
[/LIST]

Hence, mhttpd will leak at least 1024 bytes for every binary RPC call that 
returns an empty response.
I tried to submit a pull request to fix this but I don't have permission to push 
to https://bitbucket.org/tmidas/mjson.git.  Could somebody take a look?

Thanks,

Mark.[/quote]
       Reply  07 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code msysmon-mu3ebe-20250601-042124-20250606-122124.png

Hi,

We applied an intermediate fix for this locally and it seems to have fixed our issue.  The attached plot shows the percentage memory use on our machine with 128 Gb memory, as a rough proxy for mhttpd memory use.  After applying our fix mhttpd seems to be happy using ~7% of the memory after being up for 2.5 days.

Our fix to mjson was:

diff --git a/mjson.cxx b/mjson.cxx

index 17ee268..2443510 100644

--- a/mjson.cxx

+++ b/mjson.cxx

@@ -654,8 +654,7 @@ MJsonNode::~MJsonNode() // dtor

       delete subnodes[i];

    subnodes.clear();

 

-   if (arraybuffer_size > 0) {

-      assert(arraybuffer_ptr != NULL);

+   if (arraybuffer_ptr != NULL) {

       free(arraybuffer_ptr);

       arraybuffer_size = 0;

       arraybuffer_ptr = NULL;

We also applied the following in midas for good measure, although I don't think it contributed to the leak we were seeing:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx

index 2201d228..38f0b99b 100644

--- a/src/mjsonrpc.cxx

+++ b/src/mjsonrpc.cxx

@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)

    status = cm_connect_client(name.c_str(), &hconn);

 

    if (status != RPC_SUCCESS) {

+      free(buf);

       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));

    }

I hope this is useful to someone.  As previously mentioned we make heavy use of binary RPC, so maybe other experiments don't run into the same problem.

Thanks,

Mark.

          Reply  10 Jun 2025, Konstantin Olchanski, Bug Report, Memory leak in mhttpd binary RPC code 
I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().

I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).

Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
file, return of "new char[0]" is never freed).

"receive_event" and "read_history" RPCs never use zero-size buffers and are not 
affected by this bug.

mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
midas commit 576f2216ba2575b8857070ce7397210555f864e5
rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295

Please post bug reports a plain-text so I can quote from them.

K.O.
             Reply  15 Jun 2025, Mark Grimes, Bug Report, Memory leak in mhttpd binary RPC code 
Many thanks for the fix.  We've applied and see better memory performance.  We still have to kill and restart 
mhttpd after a few days however.  I think the official fix is missing this part:

diff --git a/src/mjsonrpc.cxx b/src/mjsonrpc.cxx
index 2201d228..38f0b99b 100644
--- a/src/mjsonrpc.cxx
+++ b/src/mjsonrpc.cxx
@@ -3454,6 +3454,7 @@ static MJsonNode* brpc(const MJsonNode* params)
    status = cm_connect_client(name.c_str(), &hconn);
 
    if (status != RPC_SUCCESS) {
+      free(buf);
       return mjsonrpc_make_result("status", MJsonNode::MakeInt(status));
    }

When the other process returns a failure the memory block is also currently leaked.  I originally stated "...although I 
don't think it contributed to the leak we were seeing" but it seems this was false.

Thanks,

Mark.


> I confirm that MJSON_ARRAYBUFFER does not work correctly for zero-size buffers, 
> buffer is leaked in the destructor and copied as NULL in MJsonNode::Copy().
> 
> I also confirm memory leak in mjsonrpc "brpc" error path (already fixed).
> 
> Affected by the MJSON_ARRAYBUFFER memory leak are "brpc" (where user code returns 
> a zero-size data buffer) and "js_read_binary_file" (if reading from an empty 
> file, return of "new char[0]" is never freed).
> 
> "receive_event" and "read_history" RPCs never use zero-size buffers and are not 
> affected by this bug.
> 
> mjson commit c798c1f0a835f6cea3e505a87bbb4a12b701196c
> midas commit 576f2216ba2575b8857070ce7397210555f864e5
> rootana commit a0d9bb4d8459f1528f0882bced9f2ab778580295
> 
> Please post bug reports a plain-text so I can quote from them.
> 
> K.O.
                Reply  23 Jun 2025, Stefan Ritt, Bug Report, Memory leak in mhttpd binary RPC code 
Since this memory leak is quite obvious, I pushed the fix to develop.

Stefan
Entry  10 Jun 2025, Nik Berger, Bug Report, History variables with leading spaces 
By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.

How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
Sugested fix: Produce an error if a history variable has a leading space.
    Reply  19 Jun 2025, Stefan Ritt, Bug Report, History variables with leading spaces 
I added now code to the logger so it properly complains if there would be a leading space in a variable name.

Stefan

> By accident we had history variables with leading spaces. The history schema check then decides that this is a new variable (the leading space is not read from the history file) and starts a new file. We found this because the run start became slow due to the many, many history files created. It would be nice to just get an error if one has a malformed variable name like this.
> 
> How to reproduce: Try to put a variable with a leading space in the name into the history, repeatedly start runs.
> Sugested fix: Produce an error if a history variable has a leading space.
Entry  19 Jun 2025, Frederik Wauters, Bug Report, add history variables 
I have encounter this a few times
* Make a new history panel
* Use the web GUI to add history variables
* When I am at the "add history variables" panel, there is not scroll option. So 
depending on the size and zoom of my screen, some variables further down the list 
can not be selected

tried Chrome and Firefox
Entry  11 Jun 2025, Stefan Ritt, Info, Frontend write cache size 
We had issues with the frontend write cache size and the way it was implemented in the frontend 
framework mfe. Specifically, we had two equipments like in the experiment/examples/frontend.cxx, one 
trigger event and one periodic event. While the trigger event sets the cache size correctly in the 
equipment table, the periodic event (defined afterwards) overwrites the cache size to zero, causing slow 
data taking. 

The underlying problem is that one event buffer (usually "SYSTEM") can only have one cache size for all 
events in one frontend. To avoid mishaps like that, I remove the write cache size from the equipment table 
and instead defined a function which explicitly sets the cache size for a buffer. In your frontend_init() you 
can now call

  set_cache_size("SYSTEM", 10000000);

to set the cache size of the SYSTEM buffer. Note that 10 MiB is the minimum cache size. Especially for 
smaller events, this dramatically can speed up your maximal DAQ rate.

Full commit is here:

  https://bitbucket.org/tmidas/midas/commits/24fbf2c02037ae5f7db74d0cadab23cd4bfe3b13

Note that this only affects frontends using the mfe framework, NOT for those using the tmfe framework.

Stefan
Entry  10 Jun 2025, Stefan Ritt, Info, History configuration changed 
Today the way the history system is configured has changed. Whenever one adds new equipment 
variables, the logger has to add that variable to the history system. Previously, this happened during 
startup of the logger and at run start. We have now a case in the Mu3e experiment where we have many 
variables and the history configuration takes about 15 seconds, which delays data taking considerably.

After discussion with KO we decided to remove the history re-configuration at run start. This speeds up 
the run start considerably, also for other experiments with many history variables. It means however that 
once you add/remove/rename any equipment variable going into the history system, you have to restart the 
logger for this to become active.

https://bitbucket.org/tmidas/midas/commits/c0a14c2d0166feb6b38c645947f2c5e0bef013d5

Stefan
Entry  10 Jun 2025, Amy Roberts, Info, use of modified image showing MIDAS data format? MIDAS_format.png
Hello, I'm currently writing a paper about making a dataset publicly available 
from the University of Minnesota and it's a MIDAS dataset.

I'd like to use an image that shows the MIDAS data format (that's been slightly 
modified to fit in the paper) and am wondering (1) if I could get permission to do 
so and (2) what the preferred attribution would be. 
    Reply  10 Jun 2025, Stefan Ritt, Info, use of modified image showing MIDAS data format? 
> Hello, I'm currently writing a paper about making a dataset publicly available 
> from the University of Minnesota and it's a MIDAS dataset.
> 
> I'd like to use an image that shows the MIDAS data format (that's been slightly 
> modified to fit in the paper) and am wondering (1) if I could get permission to do 
> so and (2) what the preferred attribution would be. 

Feel free to use whatever you like, the documentation is on the same open source license as midas itself.

Stefan
Entry  16 Apr 2025, Thomas Lindner, Info, MIDAS workshop (online) Sept 22-23, 2025 
Dear MIDAS enthusiasts,

We are planning a fifth MIDAS workshop, following on from previous successful 
workshops in 2015, 2017, 2019 and 2023.  The goals of the workshop include:

- Getting updates from MIDAS developers on new features, bug fixes and planned 
changes.
- Getting reports from MIDAS users on how they are using MIDAS and what problems 
they are facing.
- Making plans for future MIDAS changes and improvements

We are planning to have an online workshop on Sept 22-23, 2025 (it will coincide 
with a visit of Stefan to TRIUMF).  We are tentatively planning to have a four 
hour session on each day, with the sessions timed for morning in Vancouver and 
afternoon/evening in Europe.  Sorry, the sessions are likely to again not be well 
timed for our colleagues in Asia.  

We will provide exact times and more details closer to the date.  But I hope 
people can mark the dates in their calendars; we are keen to hear from as much of 
the MIDAS community as possible.  

Best Regards,
Thomas Lindner
    Reply  03 Jun 2025, Thomas Lindner, Info, MIDAS workshop (online) Sept 22-23, 2025 
Dear all,

We have setup an indico page for the MIDAS workshop on Sept 22-23.  The page is here

https://indico.psi.ch/event/17580/overview

As I mentioned, we are keen to hear reports from any users or developers; we want to hear  
how MIDAS is working for you and what improvements you would like to see.  If you or your 
experiment would like to give a talk about your MIDAS experiences then please submit an 
abstract through the indico page.  

Also, feel free to also register for the workshop (no fees).  Registration is not 
mandatory, but it would be useful for us to have an idea how many people will connect.

Thanks,
Thomas


> Dear MIDAS enthusiasts,
> 
> We are planning a fifth MIDAS workshop, following on from previous successful 
> workshops in 2015, 2017, 2019 and 2023.  The goals of the workshop include:
> 
> - Getting updates from MIDAS developers on new features, bug fixes and planned 
> changes.
> - Getting reports from MIDAS users on how they are using MIDAS and what problems 
> they are facing.
> - Making plans for future MIDAS changes and improvements
> 
> We are planning to have an online workshop on Sept 22-23, 2025 (it will coincide 
> with a visit of Stefan to TRIUMF).  We are tentatively planning to have a four 
> hour session on each day, with the sessions timed for morning in Vancouver and 
> afternoon/evening in Europe.  Sorry, the sessions are likely to again not be well 
> timed for our colleagues in Asia.  
> 
> We will provide exact times and more details closer to the date.  But I hope 
> people can mark the dates in their calendars; we are keen to hear from as much of 
> the MIDAS community as possible.  
> 
> Best Regards,
> Thomas Lindner
Entry  27 May 2025, Pavel Murat, Suggestion, handling of 2+ like-long messages  
Dear MIDAS experts, 

currently, the MIDAS messaging system is optimized for one-line long messages, 
so the content of 2+liners shows up in the message log in the reverse order, 
with the first line on the bottom of the message. 

I wonder if printing the message content in the reverse order, starting from the last line, 
would make sense ? - that wouldn't affect one-line long messages, but could make longer 
messages more useful. 

--thanks, regards, Pasha
Entry  26 May 2025, Francesco Renga, Forum, Reading two devices in parallel 
Dear experts,
      in the CYGNO experiment, we readout CMOS cameras for optical readout of GEM-TPCs. So far, we only developed the readout for a single camera. In the future, we will have multiple cameras to read out (up to 18 in the next phase of the experiment), and we are investigating how to optimize the readout by exploiting parallelization.

One idea is to start parallel threads within a single equipment. Alternatively, one could associate different equipment with each camera and run an Event Builder. Perhaps other solutions did not come to mind. Which one would you regard as the most effective and elegant?

Thank you very much,
           Francesco
    Reply  26 May 2025, Stefan Ritt, Forum, Reading two devices in parallel 
> Dear experts,
>       in the CYGNO experiment, we readout CMOS cameras for optical readout of GEM-TPCs. So far, we only developed the readout for a single camera. In the future, we will have multiple cameras to read out (up to 18 in the next phase of the experiment), and we are investigating how to optimize the readout by exploiting parallelization.
> 
> One idea is to start parallel threads within a single equipment. Alternatively, one could associate different equipment with each camera and run an Event Builder. Perhaps other solutions did not come to mind. Which one would you regard as the most effective and elegant?
> 
> Thank you very much,
>            Francesco

In principle both will work. It's kind of matter of taste. In the multi-threaded approach one has a single frontend to start and stop, and in the second case you have to start 18 individual frontends and make sure that they are running. 

For the multi-threaded frontend you have to ensure proper synchronization between the threads (like common run start/stop), and in the end you also have to do some event building, sending all 18 streams into a single buffer. As you know, multi-thread programming can be a bit of an art, using mutexes or semaphores, but it can be more flexible as the event builder which is a given piece of software.

Best,
Stefan
Entry  25 May 2025, Pavel Murat, Bug Report, subdirectory ordering in ODB browser ? panel_map.jsonpanel_map.png
Dear MIDAS experts, 

I'm running into a minor but annoying issue with the subdirectory name ordering by the ODB browser. 
I have a straw-man hash map which includes ODB subdirectories named "000", "010", ... "300", 
and I'm yet to succeed to have them displayed in a "natural" order: the subdirectories with names 
starting from "0" always show up on the bottom of the list - see attached .png file. 

Neither interactive re-ordering nor manual ordering of the items in the input .json file helps. 

I have also attached a .json file which can be loaded with odbedit to reproduce the issue. 

Although I'm using a relatively recent - ~ 20 days old - commit, 'db1819ac', is it possible 
that this issue has already been sorted out ?

-- many thanks, regards, Pasha  
Entry  24 May 2025, Pavel Murat, Info, ROOT scripting for MIDAS seems to work pretty much out of the box log.txt
Dear All, 

I'm pretty sure many know this already, however I found this feature by a an accident 
and want to share with those who don't know about it yet - seems very useful. 

- it looks that one can use ROOT scripting with rootcling and call from the 
  interactive ROOT prompt any function defined in midas.h and access ODB seemingly 
  WITHOUT DOING anything special 

- more surprisingly, that also works for odbxx, with one minor exception in handling 
  the 64-bit types - the proof is in attachment. The script test_odbxx.C loaded 
  interactively is Stefan's

 https://bitbucket.org/tmidas/midas/src/develop/examples/odbxx/odbxx_test.cxx 

with one minor change - the line 
 
   o[Int64 Key] = -1LL;

is replaced with

   int64_t x = -1LL;
   o["Int64 Key"] = x;

- apparently the interpeter has its limitations. 

My rootlogon.C file doesn't load any libraries, it only defines the appropriate 
include paths. So it seems that everything works pretty much out of the box. 

One issue has surfaced however. All that worked despite my experiment 
had its name="test_025", while the example specifies experiment="test". 
Is it possible that that only first 4 characters are being tested ? 

-- regards, Pasha
Entry  19 May 2025, Jonas A. Krieger, Suggestion, manalyzer root output file with custom filename including run number 
Hi all,

Would it be possible to extend manalyzer to support custom .root file names that include the run number? 

As far as I understand, the current behavior is as follows:
The default filename is ./root_output_files/output%05d.root , which can be customized by the following two command line arguments.

-Doutputdirectory: Specify output root file directory
-Ooutputfile.root: Specify output root file filename

If an output file name is specified with -O, -D is ignored, so the full path should be provided to -O. 

I am aiming to write files where the filename contains sufficient information to be unique (e.g., experiment, year, and run number). However, if I specify it with -O, this would require restarting manalyzer after every run; a scenario that I would like to avoid if possible.

Please find a suggestion of how manalyzer could be extended to introduce this functionality through an additional command line argument at
https://bitbucket.org/krieger_j/manalyzer/commits/24f25bc8fe3f066ac1dc576349eabf04d174deec

Above code would allow the following call syntax: ' ./manalyzer.exe -O/data/experiment1_%06d.root --OutputNumbered '
But note that as is, it would fail if a user specifies an incompatible format such as -Ooutput%s.root . 

So a safer, but less flexible option might be to instead have the user provide only a prefix, and then attach %05d.root in the code.

Thank you for considering these suggestions!
Entry  16 May 2025, Marius Koeppel, Bug Report, history_schema.cxx fails to build 
Hi all,

we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx. There was a major change in this code in the commits fe7f6a6 and 159d8d3.

image: rootproject/root:latest

pipelines:
  default:
    - step:
        name: 'Build and test'
        runs-on:
          - self.hosted
          - linux
        script:
          - apt-get update
          - DEBIAN_FRONTEND=noninteractive apt-get -y install python3-all python3-pip python3-pytest-dependency python3-pytest
          - DEBIAN_FRONTEND=noninteractive apt-get -y install gcc g++ cmake git python3-all libssl-dev libz-dev libcurl4-gnutls-dev sqlite3 libsqlite3-dev libboost-all-dev linux-headers-generic
          - gcc -v
          - cmake --version
          - git clone https://marius_koeppel@bitbucket.org/tmidas/midas.git
          - cd midas
          - git submodule update --init --recursive
          - mkdir build
          - cd build
          - cmake ..
          - make -j4 install


Error is:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:5991:10: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 5991 |       s->table_name = xtable_name;

      |          ^~~~~~~~~~

      |          fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx: In member function ‘virtual int PgsqlHistory::read_column_names(HsSchemaVector*, const char*, const char*)’:

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6034:14: error: ‘class HsSqlSchema’ has no member named ‘table_name’; did you mean ‘fTableName’?

 6034 |       if (s->table_name != table_name)

      |              ^~~~~~~~~~

      |              fTableName

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6065:16: error: ‘struct HsSchemaEntry’ has no member named ‘fNumBytes’

 6065 |             se.fNumBytes = 0;

      |                ^~~~~~~~~

/opt/atlassian/pipelines/agent/build/midas/src/history_schema.cxx:6140:30: error: ‘__gnu_cxx::__alloc_traits<std::allocator<HsSchemaEntry>, HsSchemaEntry>::value_type’ {aka ‘struct HsSchemaEntry’} has no member named ‘fNumBytes’

 6140 |             s->fVariables[j].fNumBytes = tid_size;

      |                              ^~~~~~~~~

At global scope:

cc1plus: note: unrecognized command-line option ‘-Wno-vla-cxx-extension’ may have been intended to silence earlier diagnostics

make[2]: *** [CMakeFiles/objlib.dir/build.make:384: CMakeFiles/objlib.dir/src/history_schema.cxx.o] Error 1

make[2]: *** Waiting for unfinished jobs....

make[1]: *** [CMakeFiles/Makefile2:404: CMakeFiles/objlib.dir/all] Error 2

make: *** [Makefile:136: all] Error 2
    Reply  16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build 
> we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> There was a major change in this code in the commits fe7f6a6 and 159d8d3.

Missing from this report is critical information: HAVE_PGSQL is set.

I will have to check why it is not set in my development account.

I will have to check why it is not set in our bitbucket build.

Thank you for reporting this problem.

K.O.
       Reply  16 May 2025, Konstantin Olchanski, Bug Report, history_schema.cxx fails to build 
> > we have a CI setup which fails since 06.05.2025 to build the history_schema.cxx.
> > There was a major change in this code in the commits fe7f6a6 and 159d8d3.
> 
> Missing from this report is critical information: HAVE_PGSQL is set.
> 
> I will have to check why it is not set in my development account.
> 

The following is needed to build MySQL and PgSQL support in MIDAS,
they were missing on my development machine. MySQL support was enabled
by accident because kde-bloat packages pull in the MySQL (not the MariaDB)
client and server. Fixed now, added to standard list of Ubuntu packages:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#install_missing_packages

apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS

>
> I will have to check why it is not set in our bitbucket build.
> 

Added MySQL and PgSQL to bitbucket Ubuntu-24 build (sqlite was already enabled).

>
> Thank you for reporting this problem.
> 

Fix committed. Sorry about this problem.

K.O.
Entry  05 May 2025, Konstantin Olchanski, Info, db_delete_key(TRUE) 
I was working on an odb corruption crash inside db_delete_key() and I noticed 
that I did not test db_delete_key() with follow_links set to TRUE. Then I noticed 
that nobody nowhere seems to use db_delete_key() with follow_links set to TRUE. 

Instead of testing it, can I just remove it?

This feature existed since day 1 (1st commit) and it does something unexpected 
compared to filesystem "/bin/rm": the best I can tell, it is removes the link 
*and* whatever the link points to. For people familiar with "/bin/rm", this is 
somewhat unexpected and by my thinking, if nobody ever added such a feature to 
"/bin/rm", it is probably not considered generally useful or desirable. (I would 
think it dangerous, it removes not 1 but 2 files, the 2nd file would be in some 
other directory far away from where we are).

By this thinking, I should remove "follow_links" (actually just make it do thing 
, to reduce the disturbance to other source code). db_delete_key() should work 
similar to /bin/rm aka the unlink() syscall.

K.O.
    Reply  05 May 2025, Stefan Ritt, Info, db_delete_key(TRUE) 
I would handle this actually like symbolic links are handled under linux. If you delete a symbolic link, the link gets 
detected and NOT the file the link is pointing to.

So I conclude that the "follow links" is a misconception and should be removed.

Stefan
Entry  05 May 2025, Konstantin Olchanski, Bug Report, abort and core dump in cm_disconnect_experiment() 
I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to 
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.

mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires 
the watchdog.

In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the 
core dump.

I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".

Stack trace from mhist:

[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my 
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue, 
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...

Program received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6  0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8  0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9  0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb) 

Stack trace from mserver:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6  0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8  0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9  bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387

K.O.
    Reply  05 May 2025, Stefan Ritt, Bug Report, abort and core dump in cm_disconnect_experiment() 
I would be in favor of not curing the symptoms, but fixing the cause of the problem. I guess you put the watchdog disable into mhist, right? Usually mhist is called locally, so no mserver should be 
involved. If not, I would prefer to propagate the watchdog disable to the mserver side as well, if that's not been done already. Actually I never would disable the watchdog, but set it to a reasonable 
maximal value, like a few minutes or so. In that case, the client gets still removed if it crashes for some reason.

My five cents,
Stefan 
ELOG V3.1.4-2e1708b5