ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 3 of 152

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3028 Entries

Goto page Previous 1, 2, 3, 4 ... 150, 151, 152 Next

MIDAS history system not using the event timestamps ?

> > You can always include your "true" data timestamp as the first value in your data.
> 
> Are you saying that if the first data word of a history event were a timestamp, 
> the MIDAS history system, when plotting the time dependencies, would use that timestamp 
> instead of the mlogger timestamp?
>

you are correct, midas knows nothing about what you put in the history data.

what I suggested is: if you want your true data timestamp recorded in the history,
you can put it into the history data yourself, and I suggested using the 1st value,
but you can also make it the last value or the 10th value, it is up to you.

for making history plots, the history timestamp is used, as you wrote and I confirmed,
this timestamp is generated by mlogger.

what is not clear to me is why this is a problem? do you see a big difference between the 
true data timestamp and the mlogger data timestamp? bigger than 1 second? (this would change 
the shape of "last 10 minutes" plots (600 seconds). bigger than 1 minute? (this would change 
the shape of "last 1 hour plots" (60 minutes, 3600 seconds).

that said, note that we currently store the timestamp as a DWORD 32-bit UNIX time value 
which will overflow in 2038 and which is quickly becoming incompatible with the ongoing 
switch to 64-bit time_t. Ubuntu-24 already build a large number of system libraries with 64-
bit time_t and building MIDAS with 32-bit time_t may soon become as difficult as building 
32-bit MIDAS for 32-bit i686 VME processors. we have to move with the times.

what it means is that the history system data format will have to be updated to 64-bit 
time_t and at the same time, we may try to change the timestamp from mlogger-generated to 
frontend-generated.

but it is still not clear to me how that helps you, because the frontend-generated timestamp 
is not the true data timestamp that you wanted. (and only you know what the true data 
timestamp is and where it comes from and how to tell it to MIDAS).

K.O.

MIDAS history system not using the event timestamps ?

> You can always include your "true" data timestamp as the first value in your data.

Are you saying that if the first data word of a history event were a timestamp, 
the MIDAS history system, when plotting the time dependencies, would use that timestamp 
instead of the mlogger timestamp?  

if that is true, what tells MIDAS that the first data word is the timestamp? 

I couldn't find a discussion of that on the page describing the history system - 

 https://daq00.triumf.ca/MidasWiki/index.php/History_System#Frontend_history_event

- perhaps I should be looking at a different page?

-- thanks again, regards, Pasha

Sequencer ODBSET feature requests

 I once looked at using LUA for this,
> but I think basing off an full featured programming language like python
> is better.

if it came to a vote, my vote would go to Lua: it would allow to do everything needed, 
with much less external dependencies and with much less motivation to over-use the interpreter. 
The CMS experience was very teaching in this respect... 

-- my 2c, regards, Pasha

MIDAS history system not using the event timestamps ?

> I confirm that when writing out history files corresponding to the slow control event data, 
> MIDAS history system timestamps the data not with the event time coming from the event data, 
> but with the current time determined by [mlogger].

This is correct. The timestamp in the history file is the mlogger timestamp.

In theory we could use the ODB "last_written" timestamp, but in practice,
timestamps are 1 second granularity and the difference between the two
timestamps normally would be less than 1 second. (time to react to db_watch()).

But ODB last_written also is not the data timestamp. For remote connected clients
it includes the mserver communication delay.

What is the data timestamp, only the user knows - for some FPGA based equipments,
I can see the data timestamp being read from an FPGA register together with the data.

But back to earth.

For making history plots, 1 second granularity with a small (a few seconds) delay should be okey,
and I think the mserver timestamp is good enough.

For data analysis, you are reading history data from a history data file and you are
not constrained to using the MIDAS timestamp.

You can always include your "true" data timestamp as the first value in your data.

We do this in felaview for writing labview data to midas history in the ALPHA antihydrogen experiment at CERN.

This also anticipates your next request, can we have millisecond, microsecond, nanosecond history timestamps:
since you define your "true" data timestamp, you an make it anything you want. (I use "double" time in seconds,
64-bit IEEE-754 "double" has enough precision for microsecond granularity. FPGA based devices can have timestamps
with 10 ns or 8 ns granularity, in this case a uint64_t clock counter could be more appropriate).

K.O.

Sequencer ODBSET feature requests

> ODBSET "/Path/value[1,3,5]"
> ODBSET "/Path/value[1-5,7-9]"

we support this array index syntax in several places,
specifically, in javascript odb get and set mjsonrpc RPCs.

> SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]"
> SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]"
> ODBSET "/Path/value[0-100, except $BADCHANNELS]"

this is very clever syntax, but I have not seen any programming
language actually implement it (not even perl).

there must be a good reason why nobody does this. probably we should not do it either.

but as Stefan said (and my opinion), the route of extending MIDAS sequencer
language until it becomes a superset of python, perl, tcl, bash, javascript
and algol is not a sustainable approach. I once looked at using LUA for this,
but I think basing off an full featured programming language like python
is better.

K.O.

We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.

Crash is memset() called by db_delete_key1() called by cm_connect_experiment().

I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.

Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().

I will not do this immediately today, but hopefully next week or so.

Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.

#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1  0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2  free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c 
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3  0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>, 
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5  0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6  0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7  0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8  0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca", 
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:194
#9  0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "", 
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit", 
func=func@entry=0x0, 
    odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at 
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789	            free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0, 
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1, 
  last_written = 1953785965}
(gdb) 

K.O.

ODB and event buffer - release semaphore before abort() and core dump

There is a long standing problem with ODB and event buffers. If they detect an 
internal data inconsistency and cannot continue running, they call abort() to 
dump core and stop.

Problem is in some code paths, they do this while holding the ODB or event 
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after 
core dump is finished and program holding them is stopped).

If core dump takes longer than 10 seconds (for whatever reason, but we see this 
often enough), all other programs that wait for ODB or event buffer access, will 
also timeout and also crash (with core dump). Result is a core dump storm, at 
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply 
restart everything).

Now I realize that in many situation, we do not need to hold the semaphore while 
dumping core - the content of ODB and event buffer shared memories is not 
important for debugging the crash - and it is safe to release the semaphore 
before calling abort().

This is now implemented for ODB and event buffers. Hopefully core dump storms 
will not happen again.

commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes 
longer than 10 sec all other midas programs will timeout and crash.

commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(), 
refactor bm_get_my_client_locked()


K.O.

Sequencer ODBSET feature requests

The extended ODBSET[x,y1-y2,z] could make sense to be implemented, since it then will match the alarm system which uses the same syntax.

The $GOODCHANNELS/$BADCHANNELS is however a very strange syntax which I haven't seen in any other computer language. It would take me probably several days to properly implement this, while it would take you much less time to explicitly use a few ODBSET statements to set the bad channels to zero.

For the file edit workflow, the author of the editor will have a look.

Stefan

Lukas Gerritzen wrote:

I would like to request the following sequencer features if you find the ideas as sensible as I do:

A "Reload File" button
Support for patterns in ODBSET, e.g.:
- ```
ODBSET "/Path/value[1,3,5]", 1
```
- ```
ODBSET "/Path/value[1-5,7-9]", 1
```
- Arbitrary combinations of the above

Support for variable substitution:

SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]", 1

SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]", 1

ODBSET "/Path/value[0-100, except $BADCHANNELS]", 1

Sequencer ODBSET feature requests

A new sequencer which understands Python is in the works. There you can use all features from that language.

Stefan

Sequencer ODBSET feature requests

While trying to simplify the existing spaghetti code, I encountered problems with type safety. Compare the following:

SET v, "54"
SET file, "MPPCHV_$v.odb"
ODBLOAD $file

-> successfully loads MPPCHV_54.odb

SET v, "54.2"
SET file, "MPPCHV_$v.odb"
ODBLOAD $file

-> Error reading file "[...]/MPPCHV_54.200000.odb"

The "54.2" appears to be stored as a float rather than a string. Maybe "54" was stored as an integer? I don't know how to verify this in odbedit.

Actually, I would be fine with setting the value as a float, as it allows arithmetic. In that case, I would appreciate something like a SPRINTF function in MSL:

SET v, 54.2
SPRINTF file, "MPPCHV_%f.odb", $v
ODBLOAD $file

Or, maybe a bit more modern, something akin to Python's f-strings

ODBLOAD f"MPPCHV_{v:.1f}.odb"

Sequencer ODBSET feature requests

I would like to request the following sequencer features if you find the ideas as sensible as I do:

A "Reload File" button
Support for patterns in ODBSET, e.g.:
- ```
ODBSET "/Path/value[1,3,5]", 1
```
- ```
ODBSET "/Path/value[1-5,7-9]", 1
```
- Arbitrary combinations of the above

Support for variable substitution:

SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]", 1

SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]", 1

ODBSET "/Path/value[0-100, except $BADCHANNELS]", 1

To add some context: I am using the sequencer for a voltage scan of several thousand channels. However, a few dozen of them have shorts, so I cannot simply set all demands to the voltage step. Currently, this is solved with a manually-created ODB file for each individual voltage step, but as you can imagine, this is quite difficult to maintain.

I also encountered a small annoyance in the current workflow of editing sequencer files in the browser:

Load a file
Double-click it to edit it, acknowledge the "To edit the sequence it must be opened in an editor tab" dialog
A new tab opens
Edit something, click "Start", acknowledge the "Save and start?" dialog (which pops up even if no changes are made)
Run the script
Double-click to make more changes -> another tab opens

After a while, many tabs with the same file are open. I understand this may be considered "user error", but perhaps the sequencer could avoid opening redundant tabs for the same file, or prompt before doing so?

Thanks for considering these suggestions!

MIDAS history system not using the event timestamps ?

Dear MIDAS experts, 

I confirm that when writing out history files corresponding to the slow control event data, 
MIDAS history system timestamps the data not with the event time coming from the event data, 
but with the current time determined by the program - 

https://bitbucket.org/tmidas/midas/src/293d27fad0c87c80c4ed7b94b5c40ba1e150bea4/progs/mlogger.cxx#lines-5321

where 'now' is defined as  

time_t now = time(NULL);

I'm looking for a way to timestamp the history data with the event time - that is important 
for HEP applications outside the DAQ domain. Yes, MIDAS infrastructure is very well suited for that, 
there could have a number of such applications, and experiments could significantly benefit from that.

So I'm wondering whether the implementation is a design choice made or it could be changed. 

The change itself and especially its validation may require a non-negligible amount of work - I'd be happy to contribute.

Any insight much appreciated. 

-- thanks, regards, Pasha

manalyzer improvements

updated manalyzer:

- similar to --jsroot switch, in online mode, the ROOT output file remains open after run is stopped. Previously, after run was 
stopped, all histograms & etc would disappear from JSROOT, making it hard to look at the full collected and analyzed data.

- there was a buglet in the multithreading code, if some module cannot analyze flow events as fast as we can read data from disk, 
the flow event queue of the first module thread would grow and grow and grow infinitely, potentially consume lots of RAM. This is 
because control of queue size for the first module thread was disabled to avoid a deadlock. I now added the queue size check to the 
main event loop (both offline mode and online mode) and this problem should now be fixed.

- also adjusted the default queue size from 100 to 1000 and queue-full wait sleep time from 100 us to 10 us.

- another buglet was in the flow event processing. per the README, module EndRun() should not generate flow events (instead, they 
should be generated in PreEndRun()). Previously this was not enforced, now there is an error message about this and the offending 
flow events are deleted. (they were not being processed anyway).

K.O.

manalyzer -R8082 --jsroot

When processing MIDAS files offline, JSROOT did not work, -Rxxx worked, http 
connection would open, but would not serve any histograms. This should now be 
fixed.

In addition, normally, after processing all input MIDAS files, manalyzer would 
exit, JSROOT would abruptly stop. To look at final results one had to open the 
ROOT files using some other method (roody, TBrowser, mjsroot, etc).

I now added a command line switch "--jsroot", if supplied, after processing all 
input MIDAS files, manalyzer will keep running in the JSROOT server mode (same as 
mjsroot).

"manalyzer -R8082 --jsroot run*.mid.lz4" now does something useful: open 
http://localhost:8082 (or ssh tunnel or mhttpd proxy per my mjsroot message) and 
watch histograms fill in real time, after analysis finishes, keep looking at the 
final results until bored. stop manalyzer using Ctrl-C. (we should add a "Stop 
JSROOT" botton to the JSROOT main page).

MIDAS commit 1d0d6448c3ec4ffd225b8d2030fe13e379fcd007

K.O.

improved find_package behaviour for Midas

I figured out the breakage, added a git tag to identify where the cmake incompatible change was made (roughly) 
and posted a note on how to fix it. Please reimburse me for the 2 hours I had to spend on this instead of doing 
useful work. K.O.

MIDAS git tag midas-2025-01-a introduced an incompatible change to "include midas-targets.cmake". Instead of "midas" one now has to 
say "midas::midas", as updated below. K.O.

> 
> #
> # CMakeLists.txt for alpha-g frontends
> #
> 
> cmake_minimum_required(VERSION 3.12)
> project(agdaq_frontends)
> 
> include($ENV{MIDASSYS}/lib/midas-targets.cmake)
> 
> add_compile_options("-O2")
> add_compile_options("-g")
> #add_compile_options("-std=c++11")
> add_compile_options(-Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function)
> add_compile_options("-DTMFE_REV0")
> add_compile_options("-DOS_LINUX")
> 
> add_executable(feevb feevb.cxx TsSync.cxx)
> target_link_libraries(feevb midas::midas)
> 
> add_executable(fectrl fectrl.cxx GrifComm.cxx EsperComm.cxx JsonTo.cxx KOtcp.cxx $ENV{MIDASSYS}/src/tmfe_rev0.cxx)
> target_link_libraries(fectrl midas::midas)
> 
> #end

I need to look at histograms inside a ROOT file, but all the old ways for doing this no longer work. (in theory I can scp the ROOT file to 
the computer I am sitting in front of, but this assumes I have a working ROOT there. anyhow it is pointless to fight this, all modern 
packages are written to only work on the developer's laptop).

- root new TBrowser starts a web server, tries to open firefox (and fails)
- root --web=off new TBrowser using ssh X11 tunnel no longer works, ROOT X11 graphics refresh is broken
- macos root binary kit is built without X11 support, root --web=off does not work at all
- root7 recommended "rootssh" prints an error message (and fails)

What does work well is JSROOT which we use to look at manalyzer live histograms (through apache and mhttpd web proxies).

So I wrote mjsroot.exe. It opens a ROOT file and starts JSROOT to look at it (plus a bit of dancing around to make it actually work):

mjsroot.exe -R8082 root_output_files/output00371.root

To actually see the histograms:

a) if you sitting in front of the same computer, open http://localhost:8082
b) if you are somewhere else, start an ssh tunnel: ssh daq13 -L8082:localhost:8082, open http://localhost:8082
c) if daq13 is running mhttpd, setup http proxy:
set ODB /webserver/proxy/mjsroot to http://localhost:8082
open https://daq13.triumf.ca/proxy/mjsroot/
also
set ODB /alias/mjsroot to "/proxy/mjsroot/"
reload MIDAS status page, observe "mjsroot" in listed in the left-hand side, open it.

K.O.

TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file

I do not understand what you are doing. If you are offline, there is no TMFE singleton instance,
there is nothing TMFeRpcHandlerInterface to attach to, there is nobody to call TMFeRpcHandlerInterface methods.

Maybe what you are asking for is a mode where you analyze data from a file, but you want your analysis code
to think that it is online and that it is analyzing live data. This requires creating a fake TMFE singleton, attaching
TMFeRpcHandlerInterface to this fake TMFE singleton and using ProcessMidasOnlineTmfe(), driven by all this fake stuff:
a fake OpenBuffer() that actually opens a file, a fake ReceiveEvent() that actually reads from a file, fake callbacks
for begin and end run, etc.

That's a lot of work, but for what purpose? What is it about the existing offline and online modes that you do not like
and how all this fake stuff will make it better for you?

P.S. This is a 3rd version of my reply. Wrote and deleted 2 version. I think I completely do not understand
what you are doing and you completely do not understand what I am saying. Communication is not happening.

P.P.S. Simplest if you show me your code (email, elog), I am quite good at reading code and divining what
people are trying to do. You do not have to show me any of your secret secret stuff.

K.O.

> This was exactly the question, should I expect it to run?  There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
> I have the answer and we have a workaround, thanks.
> 
> > then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the 
> > picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
> > 
> > maybe if you point me to your source code, I can see what you are doing?
> > 
> > K.O.

TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file

This was exactly the question, should I expect it to run?  There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
I have the answer and we have a workaround, thanks.

> then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the 
> picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
> 
> maybe if you point me to your source code, I can see what you are doing?
> 
> K.O.

Default write cache size for new equipments breaks compatibility with older equipments

>  > All this is kind of reasonable, as only two settings of write cache size are useful: 0 to 
> > disable it, and 10 Mbytes to limit semaphore locking rate to reasonable value for all event 
> > rate and size values practical on current computers.
> 
> Indeed KO is correct that only 0 and 10MB make sense, and we cannot mix it. Having the cache setting in the equipment table is 
> cumbersome. If you have 10 slow control equipment (cache size zero), you need to add many zeros at the end of 10 equipment 
> definitions in the frontend. 
> 
> I would rather implement a function or variable similar to fEqConfWriteCacheSize in the tmfe framework also in the mfe.cxx 
> framework, then we need only to add one line llike
> 
> gEqConfWriteCacheSize = 0;
> 
> in the frontend.cxx file and this will be used for all equipments of that frontend. If nobody complains, I will do that in April when I'm 
> back from Japan.

Cache size is per-buffer. If different equipments write into different event buffers, should be possible to set different cache sizes.

Perhaps have:

set_write_cache_size("SYSTEM", 0);
set_write_cache_size("BUF1", bigsize);

with an internal std::map<std::string,size_t>; for write cache size for each named buffer

K.O.

Goto page Previous 1, 2, 3, 4 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5