Back Midas Rome Roody Rootana
  Midas DAQ System, Page 110 of 139  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  16 May 2022, Konstantin Olchanski, Info, RO_STOPPED with triggered events 
> > some old front-end are not running any more since they do use RO_ALWAYS together with 
> triggered events.
> 
> I confirm, if you have mfe.c frontends that have RO_ALWAYS, after you update MIDAS, 
> some of these frontends will fail to start.
> https://bitbucket.org/tmidas/midas/commits/1961af0d657e4f76ab9db17f9b70c0c492172b6d
> 
> tmfe c++ frontends do not have this restriction but by default only read data when run 
> is active (per-equipment fEqConfReadOnlyWhenRunning default is true).

As of commit 
https://bitbucket.org/tmidas/midas/commits/28d9c96bd6d4f65346ebcd6a04492ea764c90823 mfe.c 
frontends will no longer fail to start. an error will still be issued "Equipment \"%s\" 
contains RO_STOPPED or RO_ALWAYS. This can lead to undesired side-effect and should be 
removed."

BTW 1:

Some of our old frontends use EQ_MULTITHREAD to implement multithreaded periodic equipments. 
They do not generate any events when there is no run (some of them do not generate any 
events at all). Now they will start printing this error message, for no reason. (no we will 
not be rewriting them justy to get rid of this message. life is too short).

BTW 2:

the c++ tmfe frontend does not have any protections against these "undersired side-effects".

What are these undesired side effects and should we add protection against them?

K.O.
    Reply  16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache 
> for correct operation of bm_send_event() under all conditions we need to ...

to continue computation from last message:

default SYSTEM buffer size: 32 MiBytes
default max event size: 4 MiBytes

hard max buffer size: 2 Gbytes (code is only 31-bit-clean)
hard max event size: 2 Gbytes (code is only 31-bit-clean)

max event size currently: 32 Mbytes (same as buffer size)
max event size per (1) in previous post: 32*0.5..0.3 = 16..9 MiBytes

number of default-max-size events buffered: 32/4 = 8.
number of per (1) max-size events buffered: 2 or 3
number of current max-size events buffered: 0 (bad, frontend is serialized with mlogger)

default write cache size: 100 kbytes

max write cache size currently: buffer size / 4 = 32/4 = 8 MiBytes
max write cache size per (3) in previous post: buffer_size / 3 = 10 Mbytes
hard max write cache size per (3): 2 Gbytes/3 = 600 Mbytes

max size of cached events:

current: 100 kbytes (size as cache size)
per (2) in previous post: 0.1..0.3 * cache size = 10..30 kbytes
per (2), 1 Mbyte cahe: 0.1..0.3 * cache size = 100..300 kbytes
hard max size: 0.1..0.3 * hard_max_cache_size = 0.1..0.3 * 600 = 60..180 Mbytes.

max data rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 kbytes/sec
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 100*1kbyte*100Hz -> 10 Mbytes/sec
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 Mbytes/sec (1gige ethernet)
N kbyte events, 1 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
100 kbyte events, 1 Mbyte cache, not cached per (2): 100kbyte*100Hz = 10 Mbytes/sec
300 kbyte events, 1 Mbyte cache, not cached per (2): 300kbyte*100Hz = 30 Mbytes/sec
N00 kbyte events: N0 Mbytes/sec (500->50, etc)
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 Mbytes/sec (10gige ethernet)
N kbyte events, 10 Mbyte cache: same thing (data rate is limited by cache flush rate 100 Hz)
1000 kbyte events, 10 Mbyte cache, not cached per (2): 1000kbyte*100Hz = 100 Mbytes/sec
3000 kbyte events, 10 Mbyte cache, not cached per (2): 3000kbyte*100Hz = 300 Mbytes/sec
N000 kbyte events: N00 Mbytes/sec (4000->400, 5000->500, etc)
default max event size: 4 Mibytes*100Hz = 400 Mbytes/sec (exceeds 1gige ethernet)
hard max event size (divided by 10 to buffer 10 events): 200 Mbytes*100Hz -> 20 Gbytes/sec

max event rate before event buffer semaphore locking rate exceeds 100 Hz:

1 kbyte events, no write cache: 100 Hz (obviously)
1 kbyte events, 100 kbyte cache: 100 events cached, cache flush rate 100 Hz -> 10 kHz
1 kbyte events, 1 Mbyte cache: 1000 events cached, cache flush rate 100 Hz -> 100 kHz
N kbyte events, 1 Mbyte cache: 1000/N events cached, cache flush rate 100 Hz -> 100/N kHz
1 kbyte events, 10 Mbyte cache: 10000 events cached, cache flush rate 100 Hz -> 1000 kHz
N kbyte events, 10 Mbyte cache: 10000/N events cached, cache flush rate 100 Hz -> 1000/N kHz
100 kbyte events, not cached per (2): 100 Hz (obviously)
300 kbyte events, not cached per (2): 100 Hz (obviously)
default max event size: 100 Hz (obviously)

K.O.
    Reply  16 May 2022, Konstantin Olchanski, Info, analysis of corner cases in event buffer write cache 
> > for correct operation of bm_send_event() under all conditions we need to ...
> to continue computation from last message:

if I got my numbers right, for present-day hardware (1gige/10gige data rates, 100 Hz max locking rate), we should 
increase the default buffer write cache size from 100 kbytes to 10 Mbytes.

this cache size will permit processing of the full mix of small/big events
at the full mix of event rates without exceeding the 100 Hz semaphore locking rate.

with the 10 Mbyte write cache, default event buffer size should be 30-40 Mbytes (current size is 33 Mbytes, so does 
not need to change).

this computation is for 1 writer (1 reader, mlogger). it is a typical case for our experiments.

multiple writers can run into contention for event buffer space.

consider 10 writers want to flush their 10 Mbyte write cache all at the same time:

if buffer size is the default 33 Mbytes, the first 3 writers will have successful write cache flush,
but the other 7 will stall, there is no space in the buffer, we have to wait for mlogger to free
some (mlogger writing X Mbytes/sec will take Y milliseconds to liberate 10 Mbytes of space for the 4th writer
to successfully flush, writers 5..10 are still stalled).

but in a system with 10 writers writing at 10 Mbytes/sec (1 Hz default cache flush rate) is 100 Mbytes/sec
will likely have SYSTEM buffer size at least 200-300 Mbytes (to buffer 1-2 seconds of data against
any delays in writing to disk/network storage).

so there should be no problem in practice.

K.O.
    Reply  16 May 2022, Konstantin Olchanski, Bug Fix, mserver buffer overrun and crash 
> There is a memory allocation bug in the mserver.

Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache, 
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer

This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401

Event buffer write cache (as set via ODB Equipment/Common and via 
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful 
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.

mfe.c and tmfe c++ frontends updated to request the new write cache size by default.

if events are getting stuck in the write cache for too long, instead of reducing the 
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by 
default).

commit 373bcc3ab7f83c3c7bf6c051c237de043a982502

K.O.
    Reply  17 May 2022, Konstantin Olchanski, Info, MIDAS switched to C++ 
> Hi, I have three naive questions about this:

all good questions, ask more of them.

>  - have you posted somewhere this guide about converting C frontends to C++?

yes, in this elog here I posted a guide for converting C mfe.c frontends to C++ and
a guide for converting mfe.c frontend to C++ TMFE frontend. please use the "find" function,
if you cannot find them, let me know, I will look for it for you.

>  - it was mentioned previously that there will be a 'tag the last "C" midas', which version is it?

correct. please run "git tag", tags before "midas-2019-05-cxx"is "C", after is "C++".

>  - it means that even a simple example like odb_test.c cannot be compile anymore? Even when using g++?
> g++ -I $HOME/daq/packages/midas/include/ -L $HOME/daq/packages/midas/lib/ odb_test.c -l midas
> is expected to fail or is just me glitching? Is it because of thread library differences?

yes, it is expected to fail, you have spaces after "-I", "-L" and "-l", incorrect g++ command syntax. after
correcting this, it may or may not work depending on what you have inside odb_test.c. I would be happy
to help you debug this, but please start a separate thread instead of necroposting into the C++ announcements.

K.O.
    Reply  18 Jul 2022, Konstantin Olchanski, Bug Report, RPC timeout for manalyzer over network 
> In ALPHA, I get RPC timeouts running a (reasonably heavy) analyzer on a remote machine (connected directly via a ~30 meter 10Gbe Ethernet cable) after ~5 minutes of running. If I run the analyser locally, I dont not see a timeout...

there is a subtle bug in the mserver. under rare conditions, ss_suspend() will recurse in an unexpected way
and mserver will go to sleep waiting for data from a udp socket (that will never arrive, so sleep forever).
remote client will see it as an rpc timeout. in my tests (and in ALPHA-g at CERN, as reported by Joseph),
I see this rare condition to happen about every 5 minutes. in normal use, this is the first time we become
aware of this problem, the best I can tell this bug was in the mserver since day one.

commit https://bitbucket.org/tmidas/midas/commits/fbd06ad9d665b1341bd58b0e28d6625877f3cbd0
to develop and
to release/midas-2022-05

The stack trace that shows the mserver hang/crash (sleep() is the stand-in for the sleep-forever socket read).

(gdb) bt
#0  0x00007f922c53f9e0 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x00007f922c53f894 in sleep () from /lib64/libc.so.6
#2  0x0000000000451922 in ss_suspend (millisec=millisec@entry=100, msg=msg@entry=1) at /home/agmini/packages/midas/src/system.cxx:4433
#3  0x0000000000411d53 in bm_wait_for_more_events_locked (pbuf_guard=..., pc=pc@entry=0x7f920639b93c, timeout_msec=timeout_msec@entry=100, 
    unlock_read_cache=unlock_read_cache@entry=1) at /home/agmini/packages/midas/src/midas.cxx:9429
#4  0x00000000004238c3 in bm_fill_read_cache_locked (timeout_msec=100, pbuf_guard=...) at /home/agmini/packages/midas/src/midas.cxx:9003
#5  bm_read_buffer (pbuf=pbuf@entry=0xdf8b50, buffer_handle=buffer_handle@entry=2, bufptr=bufptr@entry=0x0, buf=buf@entry=0x7f9203d75020, 
    buf_size=buf_size@entry=0x7f920639aa20, vecptr=vecptr@entry=0x0, timeout_msec=timeout_msec@entry=100, convert_flags=0, 
    dispatch=dispatch@entry=0) at /home/agmini/packages/midas/src/midas.cxx:10279
#6  0x0000000000424161 in bm_receive_event (buffer_handle=2, destination=0x7f9203d75020, buf_size=0x7f920639aa20, timeout_msec=100)
    at /home/agmini/packages/midas/src/midas.cxx:10649
#7  0x0000000000406ae4 in rpc_server_dispatch (index=11111, prpc_param=0x7ffcad70b7a0) at /home/agmini/packages/midas/progs/mserver.cxx:575
#8  0x000000000041ce9c in rpc_execute (sock=10, buffer=buffer@entry=0xe11570 "g+", convert_flags=0)
    at /home/agmini/packages/midas/src/midas.cxx:15003
#9  0x000000000041d7a5 in rpc_server_receive_rpc (idx=idx@entry=0, sa=0xde6ba0) at /home/agmini/packages/midas/src/midas.cxx:15958
#10 0x0000000000451455 in ss_suspend (millisec=millisec@entry=1000, msg=msg@entry=0) at /home/agmini/packages/midas/src/system.cxx:4575
#11 0x000000000041deb2 in rpc_server_loop () at /home/agmini/packages/midas/src/midas.cxx:15907
#12 0x0000000000405266 in main (argc=9, argv=<optimized out>) at /home/agmini/packages/midas/progs/mserver.cxx:390
(gdb) 

K.O.
Entry  18 Jul 2022, Konstantin Olchanski, Release, midas-2022-05 
There is a release branch for midas-2022-05 and corresponding git tag midas-2022-
05-b. This branch is known to be stable and is working well for the ALPHA 
experiment at CERN. Latest update to this branch fixes two problems in the 
mserver (rpc timeout and a use-after-free internal error).

https://bitbucket.org/tmidas/midas/branch/release/midas-2022-05
K.O.
Entry  08 Aug 2022, Konstantin Olchanski, Info, midas on ubuntu LTS 22.04 
reporting that as of commit 78f707c0686d22f8329c7a1f1c46d7dccf35ceff, midas builds 
without errors or warnings on Ubuntu LTS 22.04, 20.04, CentOS-7 and MacOS 12.4. 
(except for some warnings from mscb and msc). K.O.
    Reply  08 Aug 2022, Konstantin Olchanski, Info, Information for midas updates though git 
> git pull --recurse-submodules
> git submodule update --init --recursive
> git config submodule.recurse true

does not work for me, macos 12.4 git 2.32.1.

after I set "submodule.recurse true", I still have to type "git submodule update --
init --recursive", without --recursive, mscb/mxml is empty and the build bombs.

P.S. the underlying issue is that the mxml submodule is now included twice 
(midas/mxml and midas/mscb/mxml) and there is nothing to enforce that both copies are 
the same. (No idea what happens if the two mxml's are different).
Entry  08 Aug 2022, Konstantin Olchanski, Info, odb disallow key names that start or end with spaces 
while testing the new odb editor, we ran into a number of problems with key names 
that start or end with spaces. we cannot think of any valid use case for such key 
names (subdirectories and variables) and we think they could only have been 
created by mistake. ODB now disallows such names. K.O.
    Reply  16 Aug 2022, Konstantin Olchanski, Bug Report, firefox hangs due to mhistory 
> > > Firefox is hanging/becoming unresponsive due to javascript code.
> 
> The URL (reachable only within PSI) is http://lem03.psi.ch:8081/?cmd=custom&page=Mudas

so malfunction is not in the midas history page, but in a custom page. I could help you debug it,
but you would have to provide the complete source code (javascript and html).

> Firefox is version 91.12.0esr (64-bit), but I had similar issues with chrome/chromium too.

my firefox is 103.something. when you say google-chrome has "similar issues",
I read it as "google-chrome does not show this same bug, but shows some other
bug somewhere else". (if I misread you, you have to write better).

but this gives you a front to attack your bugs. basically all browsers should render your
custom page exactly the same (unless you use some obscure or experimental feature, which I
recommend against).

so you tweak your page to identify the source of different rendering results, and try to eliminate it,
hopefully by the time you get your page render exactly the same everywhere, all the real bugs
have gotten shaken out, too. (this is similar to debugging a c++ program by compiling
it on linux, mac, windows, vax, raspbery pi, etc and checking that you get the same result everywhere).

> The hangs seem to happen randomly so I have not been able to reproduce it yet.

I find that javascript debuggers are not setup to debug hangs. I think debugger runs partially
inside the same javascript engine you are debugging, so both hang and debugging is impossible.

(latest google-chrome has another improvement, all pages from the same computer run in the same
javascript engine, so if one midas page stops (on exception or because I debug it), all midas pages
stop and I have to run two different browsers if I want to debug (i.e.) a history page crash
and look at odb at the same time. fun).

K.O.
Entry  19 Aug 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files 
serious (but rare) bug was fixed in the history reader. unlucky experiment would see 
errors about "Detected duplicate or non-monotonous data" in some history file, fixed by 
removing/renaming the offending file. (reported by MEG experiment)

it turns out there was nothing wrong with the data files (good), but there
was a nasty bug in the history reader. it did not ensure that we read history
files in chronological order. under some conditions order of files could be
reversed, older files would be read after newer files and trip the built-in
protection against returning non-monotonically increasing history data to the user.

fixed commit 
https://bitbucket.org/tmidas/midas/commits/9893f85ebe33e96cc63f501a0f89e1f8932c894d

for more details, see https://bitbucket.org/tmidas/midas/issues/350/file-history-non-
monotonic-time

K.O.
    Reply  23 Aug 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files 
> serious (but rare) bug was fixed in the history reader.

previous fix was incomplete. please update to git commit
https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63

K.O.
    Reply  17 Nov 2022, Konstantin Olchanski, Bug Fix, O_CREAT in open in split.cxx 
> > midas currently does not compile on linux
> > fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600

I got more warnings from split.cxx, looked at the code and see so many problems that it is easier
to delete it than it is to fix it.

Check for end of file is done incorrectly (check for read() return 0, -1 or short read),
memory overrun if given file name is longer than 80 bytes, no check for valid event length
read from the file, and so on and so on.

A better example for reading and writing midas files is in midasio/test_midasio.cxx. Proper c++ coding, and can read compressed files.

K.O.
    Reply  17 Nov 2022, Konstantin Olchanski, Bug Fix, "Detected duplicate or non-monotonous data" in history files 
> > serious (but rare) bug was fixed in the history reader.
> previous fix was incomplete. please update to git commit
> https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63

a race condition between reading history file in mhttpd and writing history file in 
mlogger was accidentally introduced. mhttpd would file spurious errors about "timestamp 
is after last timestamp".

fixed, please update to git commit
https://bitbucket.org/tmidas/midas/commits/7a9f6e0c58ffddcacb9ee19934ce3e2033a805ef

fix race condition in history file reader - a race condition was added accidentally - 
first the reader remembers the history file size and the time of the last entry, then it 
goes to read the file and bombs if at the same time mlogger added more entries - their 
time is after the remembered time of last entry and error "timestamp is after last 
timestamp" is triggered.

K.O.
    Reply  06 Mar 2023, Konstantin Olchanski, Forum, pull request for PostgreSQL support 
> some minutes ago I published a PR for PostgreSQL support I developed
> at INFN-Napoli for Darkside experiment...
> 
> I don't know if you receive a notification about this PR and in doubt
> I wrote this message...

Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
of your pull request and everything looked quite good. But that pull request was 
for an older version of midas and it would not have applied cleanly to the current 
version. I will take a look at your updated pull request. In theory it should only 
add the Postgres class and modify a few other places in history_schema.cxx and have 
no changes to anything else. (if you need those changes, it should be a separate 
pull request).

Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
observed for storing and using midas history data.

K.O.
Entry  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
midas bitbucket repository had a spam attack, about 40 spam messages were posted 
into the issues. I was able to delete them manually. No idea how they got past 
bitbucket spam filters and if they are spam or an attack against automated issue 
tracker tools or an attack against the repo owner (who is vulnerable as they rush 
in to deal with the spam). if this happens again, "anonymous issues" may have to 
be disabled, bitbucket login required. K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, Having trouble with MIDAS setup 
> I'm not sure if this is the right forum for this query

this might be the right place, depending.

> I'm having a little bit of trouble with the setup of a Midas system ...
> inherited after the previous guy ...

a rather major problem, but a typical situation.

> There was a point at which it was working.

this is very good. if it worked before, there is good chance it will work again.

> And then there was an unrelated issue in the electrical system which, as a side effect,
> meant that the building lost power for a time, and the entire system had to be rebooted.
> No problem, I thought. I'll just reset and restart all of the software...
> ...and I can't seem to get it to work.

this happens often enough. several things are likely to happen:
- unexpected software updates, i.e. new linux kernel was installed but inactive, waiting for a reboot
- hardware failures, i.e. we usually see blown up power supplies. check that all VME crate voltages are okey. (ask me how).
- firmware corruption, i.e. we have seen VME modules lose their firmware after power outage, had to be reloaded by jtag

> I keep getting the error message "mvme_read_value: Could not perform read!: Bad address".

this is a generic error, it does not mean that software suddenly is trying to read from wrong address.

> I imagine that there is something that needs to be set, twiddled, tweaked, or turned on in the driver. The output of 'lsmod | grep vme' is:
> vmedriver             117742  0

this is not the vme driver we use at TRIUMF, so I am not familiar with it's errors. we use the vme_tsi148 driver and the vme_universe driver. (ask me about them).

> so presumably the driver is at least *present*, even if I have no idea how to twiddle anything on it.

could be the wrong version of the driver or the wrong version of the linux kernel. worth checking log files
to see if kernel and driver version numbers are the same.

> Could anyone perhaps suggest a way forward?

Yes. You will have to tell me much more about your system. You can do this publicly here or privately by email to olchansk@triumf.ca

To start, I need to know your VME setup, what is the crate, what is the VME processor, what OS you run, what VME driver you use, what VME modules you have installed.

K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
> midas bitbucket repository had a spam attack, about 40 spam messages were posted 
> into the issues. I was able to delete them manually. No idea how they got past 
> bitbucket spam filters and if they are spam or an attack against automated issue 
> tracker tools or an attack against the repo owner (who is vulnerable as they rush 
> in to deal with the spam). if this happens again, "anonymous issues" may have to 
> be disabled, bitbucket login required. K.O.

Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.

K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
> > midas bitbucket repository had a spam attack, about 40 spam messages were posted 
> > into the issues. I was able to delete them manually. No idea how they got past 
> > bitbucket spam filters and if they are spam or an attack against automated issue 
> > tracker tools or an attack against the repo owner (who is vulnerable as they rush 
> > in to deal with the spam). if this happens again, "anonymous issues" may have to 
> > be disabled, bitbucket login required. K.O.
> 
> Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.

Also, same for: rootana.
Also, empty issue trackers disabled: mvodb, midasio, mscb.

K.O.
ELOG V3.1.4-2e1708b5