Back Midas Rome Roody Rootana
  Midas DAQ System, Page 1 of 161  Not logged in ELOG logo
ID Datedown Author Topic Subject
  3237   25 Jun 2026 Yiwen YangSuggestionMultithreaded deferred transitions
> I recommend against using them. Maybe you can explain what you do and I can suggest a way
> to avoid using the
deferred transition.

I've only recently picked up the code and I'm not sure how it was envisioned when
initially designed, so here's my guess at how it's being used.

Deferred transition is registered by the FPN00
(main clock) process to make sure the logger has finished logging all events before continuing to stop readout
of other frontends. There is also a timeout, in case sometimes an event goes missing, to proceed with the run
stop without getting stuck waiting for the logger.

If there is a better way to implement this then I'm happy to
give it a go.

> P.S. I do not remember any use of deferred transition in the T2K/ND280 FGD, TPC and GSC
frontends,
> maybe it is in some other subsystem or was introduced after my time.

This is all from the global
DAQ code, so indeed none of the other subsystems' frontends use the feature directly. But if the clock module
frontend is started then run stops should be deferred.

Interestingly, I just checked and this feature is
disabled in the FGD ODB, but it is enabled in the TPC ODB.

Regards,
Yiwen.
  3236   25 Jun 2026 Konstantin OlchanskiSuggestionMultithreaded deferred transitions
> Multithreaded transitions were introduced by KO in 2019. Please ask him to make deferred transitions work 
> again or simply use non-multithreaded transitions. 

Deferred transition is the bane of MIDAS, I personally can never understand how they work
and what they do, and I studied them and understood them many times now.

I recommend against using them. Maybe you can explain what you do and I can suggest a way
to avoid using the deferred transition.

Some other people use deferred transitions, and it works for them,
in conjunction with normal transitions, which have been multithreaded
for years.

So unlikely this is a new bug.

P.S. I do not remember any use of deferred transition in the T2K/ND280 FGD, TPC and GSC frontends,
maybe it is in some other subsystem or was introduced after my time.

K.O.
  3235   25 Jun 2026 Konstantin OlchanskiBug Reportincompatible ODB XML dumps
> I fixed that by not requiring the handle explicitly ...
>
> [it was...] an uncaught exception. If you do not want the abort, catch the exception. 

Hi, Stefan! Thank you for fixing this!

The issue was not the exception, but the failure to load the XML ODB dump from an (immutable) data file.

This should now be fixed (TBC), so all good now.

K.O.
  3234   25 Jun 2026 Konstantin OlchanskiForummidas forum elog updated
I updated the midas forum elog to the latest version from git: 083448f7

Also investigated elogd failure to start on reboot,
it turned out to be a crasher bug, see
https://elog.psi.ch/elogs/Forum/69919

K.O.
  3233   11 Jun 2026 Stefan RittSuggestionMultithreaded deferred transitions
Multithreaded transitions were introduced by KO in 2019. Please ask him to make deferred transitions work 
again or simply use non-multithreaded transitions. 

Stefan
  3232   09 Jun 2026 Stefan RittBug Reportincompatible ODB XML dumps
I fixed that by not requiring the handle explicitly:

           if (mxml_get_attribute(node, "handle") != nullptr)
              o->set_hkey(std::stoi(std::string(mxml_get_attribute(node, "handle"))));

The reason for the handle is the following: If you attach an midas::odb object to a very large subtree of the ODB, this can take very long since each
ODB element requires a few RPC roundtrips. In Mu3e, this took up to one minute. 

To overcome the problem, we can initialize an midas::odb object remotely via an XML tree. The server creates a huge XML object, sends it over
a single RPC command, and the client re-creates the midas::odb tree from the XML object. Since each online midas::odb need the ODB handle for
watch functions etc. the handle got added to the XML file.

For a XML file, this makes no sense of course, so now it's optional with the code change above.

P.S.: The odbxx code does not core dump, it just produces an error which looks like core dump. This is an uncaught exception. Normal exceptions
just abort the program without much information. The odbxx exceptions add a stack dump if available on that OS. That makes it easier to debug.
If you do not want the abort, catch the exception. 

Stefan
  3231   01 Jun 2026 Yiwen YangSuggestionMultithreaded deferred transitions
Hi,

On the DAQ system for T2K's ND280 near detector, we use deferred
transitions to make sure all triggered events were logged before issuing run
stops to frontends.

I've recently managed to update the frontends to use a
relatively modern version of MIDAS. I then noticed that run transitions are now
by default multithreaded, when issued from e.g. mhttpd, but deferred transitions
called by cm_check_deferred_transition are still performed synchronously.

It
would be nice to make run stops use multithreaded transitions as well. A naive
patch of adding the TR_MTHREAD flag does not work, since the client handling the
deferred transition attempts to communicate with itself instead of calling
cm_transition_call_direct.

After looking into the code a bit further, I noticed
that there is an intentional check against multithreaded transitions in the
logic for determining whether the client is the one calling the transition:

https://bitbucket.org/tmidas/midas/src/fd71f63c023b7e2d4a5c91e3121651b14bd9d27b/
src/midas.cxx#lines-5009

Was there a particular concern that lead to this
particular check?


Regards,
Yiwen.
  3230   29 May 2026 Stefan RittInfoODBvalue timeout
> > > > 
> > > > How can the MSL code figure out if the wait succeeded or timed out?
> > > > 
> > > > Stefan
> > > 
> > > You get a message, something like:
> > > 17:52:12.293 2026/05/29 [Sequencer,INFO] WAIT ODBValue timeout after 10.0 seconds: /Equipment/Test/Variables/V < 1 not satisfied
> > > 
> > > Do we need something else?
> > > 
> > > Zaher
> > 
> > I mean how can the following code determine the timeout?
> 
> My intention with this was dealing with something like setting a cryostat temperature or any non-critical parameter. If it is not reached within a given timeout we give up and move on with the plan rather than sitting and wasting a whole night of beam. If your ODBvalue is "mission critical" then the wait command should not be used with a timeout. If you do use the timeout option then you will have to check in the following lines what is the state of your ODBvalue (very easy). To me this is the simplest and most useful way for our use case.

I was more thinking like a return value 0/1 if the wait function. If you change the condition, you only have to change it in one location. More like normal C functions work.

Stefan 
  3229   29 May 2026 Zaher SalmanInfoODBvalue timeout
> > > 
> > > How can the MSL code figure out if the wait succeeded or timed out?
> > > 
> > > Stefan
> > 
> > You get a message, something like:
> > 17:52:12.293 2026/05/29 [Sequencer,INFO] WAIT ODBValue timeout after 10.0 seconds: /Equipment/Test/Variables/V < 1 not satisfied
> > 
> > Do we need something else?
> > 
> > Zaher
> 
> I mean how can the following code determine the timeout?

My intention with this was dealing with something like setting a cryostat temperature or any non-critical parameter. If it is not reached within a given timeout we give up and move on with the plan rather than sitting and wasting a whole night of beam. If your ODBvalue is "mission critical" then the wait command should not be used with a timeout. If you do use the timeout option then you will have to check in the following lines what is the state of your ODBvalue (very easy). To me this is the simplest and most useful way for our use case.
  3228   29 May 2026 Stefan RittInfoODBvalue timeout
> > 
> > How can the MSL code figure out if the wait succeeded or timed out?
> > 
> > Stefan
> 
> You get a message, something like:
> 17:52:12.293 2026/05/29 [Sequencer,INFO] WAIT ODBValue timeout after 10.0 seconds: /Equipment/Test/Variables/V < 1 not satisfied
> 
> Do we need something else?
> 
> Zaher

I mean how can the following code determine the timeout?
  3227   29 May 2026 Zaher SalmanInfoODBvalue timeout
> 
> How can the MSL code figure out if the wait succeeded or timed out?
> 
> Stefan

You get a message, something like:
17:52:12.293 2026/05/29 [Sequencer,INFO] WAIT ODBValue timeout after 10.0 seconds: /Equipment/Test/Variables/V < 1 not satisfied

Do we need something else?

Zaher
  3226   29 May 2026 Stefan RittInfoODBvalue timeout
> Dear all, I implemented an optional timeout for the wait ODBvalue command. The way it works is similar to the standard wait command:
> 
> WAIT ODBvalue, /Equipment/HV/Variables/Measured[3], <, 100, timeout, 60
> 
> where the "timeout" keyword start a countdown in seconds. If the ODB condition is not met after 60 seconds the sequencer moves on to the next line.
> 
> To use this feature you must recompile the msequencer, delete /Sequencer/State and start the freshly compiled msequencer. This will add two ODBs to the /Sequencer/State: "Timeout value" (the countdown) and "Timeout limit" (the limit given in the wait command).
> 
> I suggest that we add something similar to the pysequencer using the same ODBs.

How can the MSL code figure out if the wait succeeded or timed out?

Stefan
  3225   29 May 2026 Zaher SalmanInfoODBvalue timeout
Dear all, I implemented an optional timeout for the wait ODBvalue command. The way it works is similar to the standard wait command:

WAIT ODBvalue, /Equipment/HV/Variables/Measured[3], <, 100, timeout, 60

where the "timeout" keyword start a countdown in seconds. If the ODB condition is not met after 60 seconds the sequencer moves on to the next line.

To use this feature you must recompile the msequencer, delete /Sequencer/State and start the freshly compiled msequencer. This will add two ODBs to the /Sequencer/State: "Timeout value" (the countdown) and "Timeout limit" (the limit given in the wait command).

I suggest that we add something similar to the pysequencer using the same ODBs.
  3224   21 May 2026 Konstantin OlchanskiBug Reportincompatible ODB XML dumps
While testing manalyzer, I found that it dies from an exception on odbxx, error message is "/home/olchansk/packages/midas/include/odbxx.h:1231: No "handle" 
attribute found in XML data".

Indeed, my data file is very old and it's XML ODB dump does not have the "handle" attribute:

daq00:midas$ more ~/git/midas/manalyzer/run9402bor.xml 

<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- created by MXML on Tue Aug 11 14:47:16 2020 -->
<odb root="/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://midas.psi.ch/odb.xsd">
  <dir name="Experiment">
    <key name="ODB timeout" type="INT32">10000</key>

While current MIDAS XML ODB dumps have it:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- created by MXML on Thu May 21 20:37:06 2026 -->
<odb root="/" filename="odb.xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:noNamespaceSchemaLocation="/home/olchansk/packages/midas/odb.xsd">
  <dir name="System" handle="135320">
    <dir name="Flush" handle="135408">
      <key name="Flush period" type="UINT32" handle="135496">60</key>


And odbxx requires this attribute unconditionally:

            if (mxml_get_attribute(node, "handle") == nullptr)
               mthrow("No \"handle\" attribute found in XML data");
            o->set_hkey(std::stoi(std::string(mxml_get_attribute(node, "handle"))));

The "handle" attribute was added to XML ODB dumps in September 2024 (not sure to what purpose, JSON ODB dumps do not have a "handle" attribute):

git blame src/odb.cxx
...
dd23558fbd src/odb.cxx (Stefan Ritt 2024-09-20 15:30:00 +0200  9387) mxml_write_attribute(writer, "handle", std::to_string(hKey).c_str());

This change makes MIDAS data files written before this date un-analyzable (unless odbxx is turned off).

I can prevent manalyzer from crashing by catching the exception, but I think it is better if odbxx code is updated to accept the pre-Sep-2024 ODB XML data 
format (which were valid XML ODB dumps when they were made and users are stuck with them inside compress binary MIDAS data files).

P.S. also please check the odbxx code for other crashes on malformed XML ODB dumps, it should complain, fail to load the dump, but not core dump or abort. 
Malformed ODB dumps is not a theoretical situation, I am currently looking at MIDAS data files (mid.lz4) that have invalid JSON ODB dumps created from 
corrupted ODB. Luckily the JSON parser handles this gracefully, does not crash manalyzer and I can look at the data. I did have to go 10 runs into the past 
to find an uncorrupted ODB dump to reload a good ODB. Fixes to the JSON encoder and fixes for corrupt ODB are in progress.

K.O.
  3223   21 May 2026 Konstantin OlchanskiInfomanalyzer --save-odb
Due my oversight, the code for extracting ODB dumps from MIDAS data files from rootana/old_analyzer/event_dump.cxx was missing in 
manalyzer.cxx.

This is now corrected, the new manalyzer command line flag is "--save-odb", to use it:

daq00:manalyzer$ ./manalyzer_test.exe --save-odb ~/git/midas/testexpt/run00002.mid.lz4
...
Saving begin of run ODB dump for run 2 from "/home/olchansk/git/midas/testexpt/run00002.mid.lz4" to "run2bor.json"
...
Saving end of run ODB dump for run 2 from "/home/olchansk/git/midas/testexpt/run00002.mid.lz4" to "run2eor.json"
...

manalyzer commit f4cbcb7426083edc9f74298965c90a3a91f461ab

K.O.
  3222   06 May 2026 Ben SmithBug Fixnumpy version compatibility
> There seems to be a version dependency with the numpy.bool

Thanks for reporting this Jonas! I've just updated the code to reference `np.bool_`, which is present in all versions. We use `np.bool_`
elsewhere (e.g. in midas.event), but I mistakenly used `np.bool` in the sequencer.

I just tried some sequencer tests with 1.26.0 and 2.2.6 and they seem happy now.

Cheers,
Ben
  3221   06 May 2026 Jonas A. KriegerSuggestionnumpy version compatibility
There seems to be a version dependency with the numpy.bool , e.g. used here
https://bitbucket.org/tmidas/midas/src/c6ef4aff5e7e652df79160141e570bed5f4d6a3b/python/midas/sequencer.py?at=develop#sequencer.py-1714 .

This type alias does not exist for versions in-between 1.24.0 and 2.0.0 .
https://numpy.org/doc/stable/release/1.24.0-notes.html#np-str0-and-similar-are-now-deprecated

Would it be an option to specify midas-compatible numpy versions in the setup.py with extras_require ?
  3220   27 Apr 2026 Pavel MuratBug Reportincreasing the max number of hot links in ODB
> > Indeed, updating MIDAS clients on each and every RPI etc in a running experiment may be a real challenge.
> 
> actually, only local clients must be rebuilt, remote clients connecting to the mserver do not care about ODB 
> internal structure.

thanks! I see - local clients do know about the memory mapping, remote ones - don't  

> unfortunately, the "open records" structure is allocated at compile-time inside the ODB header,
> making any change to this would break binary compatibility.

right, I guess, what I had in mind would require the very first fODB record to be a format descriptor, 
and that would be a breaking change... Anyway, the practical part of the problem is addressed, 
so I just add here a link which contains an answer to the original posting (I found it only after the fact):
 
https://daq00.triumf.ca/MidasWiki/index.php/FAQ#Increasing_Number_of_Hot-links

-- thanks again, regards, Pavel
  3219   27 Apr 2026 Pavel MuratBug Reportincreasing the max number of hot links in ODB
> > I wonder why one needs more than 256 hotlinks at all.
> 
> I confirm that ALPHA is running with MAX_OPEN_RECORDS changed from 256 to 2048,
> this is the only experiment I know of that had to increase any MIDAS ODB defaults.
> 
> The reason for this is mlogger, it opens an open record for each variable in each equipment.
> 
> This should be changed to 1 db_watch per equipment. We talked about it, but I guess we never did it.
> 
> I think this task just went almost to the top of my MIDAS to-do list.

I definitely had many more than 256 variables successfully monitored with MAX_OPEN_RECORDS=256.
Is it possible that mlogger creates a hotlink per monitoring event, not per variable ? 
- I think, that would make more sense in almost any scenario... 

-- thanks, regards, Pavel
  3218   27 Apr 2026 Pavel MuratBug Reportincreasing the max number of hot links in ODB
> I wonder why one needs more than 256 hotlinks at all. Please note that with the odbxx "watch" API, you can hotline a whole subdirectory, and get notified if ANY of the 
> underlying values or subdirectories change. In principle, one could have one hotlink to "/" and see all changes in the ODB (although that does not make sense and might slow 
> down ODB access a bit).

Thanks ! - I didn't know that. I did run into a number of hotlinks limit via mlogger which complained about not being able to create a hotlink 
to yet another event. Doubling the default value of MAX_OPEN_RECORDS solved the problem. 

I don't know the exact arithmetic defining the number of hotlinks in the system, but my today's case is a case of 
- 36 (linux servers) +18 (RPI) monitoring frontends managing one or several different equipment items each. 
- Each equipment item sends to ODB at least one monitoring event 
- in addition, each frontend created an individual hotlink for handling interactive commands 
- for MAX_OPEN_RECORDS=256, 4 equipment items per frontend easily make it into the dangerous zone. 

"Equipment items" also include the online processes running on the distributed computing farm processing the data .. 
(we are not using MIDAS event building capabilities)
 
> 
> Try the odbxx_test.cpp example in MIDAS. In line 210 it puts a single hotlink to /Experiment. If you change anything under /Experiment, the program gets notified. By checking the 
> path of the changed ODB entry, it can figure out which of the subways have been changed:
> 
>    // watch ODB key for any change with lambda function
>    midas::odb ow("/Experiment");
>    ow.watch([](midas::odb &o) {
>       std::cout << "Value of key \"" + o.get_full_path() + "\" changed to " << o << std::endl;
>    });
> 
> 
> Maybe that would solve your problem without having to change the maximum number of hotlinks.

I'll see how much mileage one can make here, but so far it looks that it is the number of various monitoring events 
handled by the mlogger which  drives the number of hotlinks 

-- thanks, regards, Pavel
ELOG V3.1.6-083448f7