ID |
Date |
Author |
Topic |
Subject |
2436
|
23 Aug 2022 |
Konstantin Olchanski | Bug Fix | "Detected duplicate or non-monotonous data" in history files | > serious (but rare) bug was fixed in the history reader.
previous fix was incomplete. please update to git commit
https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63
K.O. |
2449
|
17 Nov 2022 |
Konstantin Olchanski | Bug Fix | O_CREAT in open in split.cxx | > > midas currently does not compile on linux
> > fix is to give open in midas/examples/lowlevel/split.cxx a default mode, e.g. 006600
I got more warnings from split.cxx, looked at the code and see so many problems that it is easier
to delete it than it is to fix it.
Check for end of file is done incorrectly (check for read() return 0, -1 or short read),
memory overrun if given file name is longer than 80 bytes, no check for valid event length
read from the file, and so on and so on.
A better example for reading and writing midas files is in midasio/test_midasio.cxx. Proper c++ coding, and can read compressed files.
K.O. |
2450
|
17 Nov 2022 |
Konstantin Olchanski | Bug Fix | "Detected duplicate or non-monotonous data" in history files | > > serious (but rare) bug was fixed in the history reader.
> previous fix was incomplete. please update to git commit
> https://bitbucket.org/tmidas/midas/commits/b343c3c98e4e6fd00a00cf686c74c7ccc6da0c63
a race condition between reading history file in mhttpd and writing history file in
mlogger was accidentally introduced. mhttpd would file spurious errors about "timestamp
is after last timestamp".
fixed, please update to git commit
https://bitbucket.org/tmidas/midas/commits/7a9f6e0c58ffddcacb9ee19934ce3e2033a805ef
fix race condition in history file reader - a race condition was added accidentally -
first the reader remembers the history file size and the time of the last entry, then it
goes to read the file and bombs if at the same time mlogger added more entries - their
time is after the remembered time of last entry and error "timestamp is after last
timestamp" is triggered.
K.O. |
2461
|
06 Mar 2023 |
Konstantin Olchanski | Forum | pull request for PostgreSQL support | > some minutes ago I published a PR for PostgreSQL support I developed
> at INFN-Napoli for Darkside experiment...
>
> I don't know if you receive a notification about this PR and in doubt
> I wrote this message...
Hi, Gennaro, thank you for the very useful contribution. I saw the previous version
of your pull request and everything looked quite good. But that pull request was
for an older version of midas and it would not have applied cleanly to the current
version. I will take a look at your updated pull request. In theory it should only
add the Postgres class and modify a few other places in history_schema.cxx and have
no changes to anything else. (if you need those changes, it should be a separate
pull request).
Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have
observed for storing and using midas history data.
K.O. |
2466
|
16 Mar 2023 |
Konstantin Olchanski | Forum | bitbucket issue spam cleaned | midas bitbucket repository had a spam attack, about 40 spam messages were posted
into the issues. I was able to delete them manually. No idea how they got past
bitbucket spam filters and if they are spam or an attack against automated issue
tracker tools or an attack against the repo owner (who is vulnerable as they rush
in to deal with the spam). if this happens again, "anonymous issues" may have to
be disabled, bitbucket login required. K.O. |
2467
|
16 Mar 2023 |
Konstantin Olchanski | Forum | Having trouble with MIDAS setup | > I'm not sure if this is the right forum for this query
this might be the right place, depending.
> I'm having a little bit of trouble with the setup of a Midas system ...
> inherited after the previous guy ...
a rather major problem, but a typical situation.
> There was a point at which it was working.
this is very good. if it worked before, there is good chance it will work again.
> And then there was an unrelated issue in the electrical system which, as a side effect,
> meant that the building lost power for a time, and the entire system had to be rebooted.
> No problem, I thought. I'll just reset and restart all of the software...
> ...and I can't seem to get it to work.
this happens often enough. several things are likely to happen:
- unexpected software updates, i.e. new linux kernel was installed but inactive, waiting for a reboot
- hardware failures, i.e. we usually see blown up power supplies. check that all VME crate voltages are okey. (ask me how).
- firmware corruption, i.e. we have seen VME modules lose their firmware after power outage, had to be reloaded by jtag
> I keep getting the error message "mvme_read_value: Could not perform read!: Bad address".
this is a generic error, it does not mean that software suddenly is trying to read from wrong address.
> I imagine that there is something that needs to be set, twiddled, tweaked, or turned on in the driver. The output of 'lsmod | grep vme' is:
> vmedriver 117742 0
this is not the vme driver we use at TRIUMF, so I am not familiar with it's errors. we use the vme_tsi148 driver and the vme_universe driver. (ask me about them).
> so presumably the driver is at least *present*, even if I have no idea how to twiddle anything on it.
could be the wrong version of the driver or the wrong version of the linux kernel. worth checking log files
to see if kernel and driver version numbers are the same.
> Could anyone perhaps suggest a way forward?
Yes. You will have to tell me much more about your system. You can do this publicly here or privately by email to olchansk@triumf.ca
To start, I need to know your VME setup, what is the crate, what is the VME processor, what OS you run, what VME driver you use, what VME modules you have installed.
K.O. |
2468
|
16 Mar 2023 |
Konstantin Olchanski | Forum | bitbucket issue spam cleaned | > midas bitbucket repository had a spam attack, about 40 spam messages were posted
> into the issues. I was able to delete them manually. No idea how they got past
> bitbucket spam filters and if they are spam or an attack against automated issue
> tracker tools or an attack against the repo owner (who is vulnerable as they rush
> in to deal with the spam). if this happens again, "anonymous issues" may have to
> be disabled, bitbucket login required. K.O.
Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.
K.O. |
2469
|
16 Mar 2023 |
Konstantin Olchanski | Forum | bitbucket issue spam cleaned | > > midas bitbucket repository had a spam attack, about 40 spam messages were posted
> > into the issues. I was able to delete them manually. No idea how they got past
> > bitbucket spam filters and if they are spam or an attack against automated issue
> > tracker tools or an attack against the repo owner (who is vulnerable as they rush
> > in to deal with the spam). if this happens again, "anonymous issues" may have to
> > be disabled, bitbucket login required. K.O.
>
> Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.
Also, same for: rootana.
Also, empty issue trackers disabled: mvodb, midasio, mscb.
K.O. |
2470
|
17 Mar 2023 |
Konstantin Olchanski | Info | T2K/ND280 - Many warning from ten year old variables in ODB | Forwarded from the T2K/ND280 elog:
Author : Nick Hastings
Subject : Many warning from ten year old variables in ODB
Logbook URL : http://elog.nd280.org/elog/FGD/2553
Midas does period checks that the variables in the ODB are ok.
One of these is a check to see if each variable was set with +/- 10
years. Since this experiment has been running for longer than 10
years there are *many* variables that fail this check.
As a result the midas.log and messages in mhttpd are spammed
with many warnings. Eg
Mon Feb 13 14:49:18 2023 [ODBEdit,ERROR] [odb.c:548:db_validate_key,ERROR] Warning: invalid access time, key
"/System/Prompt", time 1288763123
These can be removed by simply setting the variable again with its current value.
So I wonder if it would be best to just do a full odbdump and then load all the values
back in. Comments from MIDAS experts would be appreciated. Eg:
odbedit -c 'save fgddaq.odb'
odbedit -c 'load fgddaq.odb'
Note this problem is currently seen on both the FGD DAQ and the global slow control MIDAS instances.
It may also be a problem on the INGRID GSC and the DAQs of other ND280 systems but I did not check. |
2474
|
21 Apr 2023 |
Konstantin Olchanski | Forum | Setup Midas with Caen vx2740 - ask for help | > I'm trying to setup Midas with the Caen vx2740 VME digitizer board.
welcome to the world of daq and midas! Ben already answered and he will help you with this specific hardware. (we work together)
> #0 0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
> at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947
I look at this line in midas and I do not see any problems other than all functions that touch rpc_list are not thread safe,
and calling them at the same time as rpc calls are active will cause memory corruption and crash. This is not a problem
in most programs because rpc_register_function() is usually called once at the beginning of everything, before any RPCs
are received, sent or processed. I filed a bug against this problem. https://bitbucket.org/tmidas/midas/issues/362/rpc_list-is-not-thread-safe
> with is "jrpc"?
I implemented it years ago to allow web pages to call mhttpd (XHR/HTTP) to call user frontends (MIDAS RPC) to perform real-time actions,
i.e. to turn power supplied on or off. "j" stands for "json", but most experiments send very simple commands
and do not use json encoding.
K.O. |
2477
|
27 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > I agree, I think we can safely bump the limit from 100 Mbytes to 1 Gbyte, maybe 1.5 or
> > 1.99 Gbytes. Above that we run into 32-bit/31-bit cleanliness problems.
>
> We just went in and changed: int odb_size_limit = INT_MAX;//100*1000*1000; in odb.cxx.
>
This is change is wrong. As I wrote, ODB is not 64-bit clean and it is not 32-bit clean. We think is is 31-bit clean, so maximum size would be slightly less than 2 Gbytes.
> And we could create ODBs with 1GB and 1.5 GB.
Congratulations. created != "it works". for proper test, you should fill it with 1.5 GB of stuff, save to json file, reload from json file, save to a different json file and compare that they have same contents (minus timestamps).
We could spend a lot of time making odb 32-bit clean and give you 4GB-max ODB, but would it be useful? For large ODB, "save to .json" already takes a long time ("save to .xml" is slower, "save to .odb" ditto, also buggy). We already have complaints that runs take forever to start because mlogger
takes a long time to write the ODB save file.
P.S. 64-bit clean ODB will be binary incompatible, all internal pointers are 32-bit right now.
K.O. |
Draft
|
27 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > H
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1 0x00000000000042D828 (null) + 4380712
> 2 0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3 0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4 0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5 0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6 0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7 0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8 0x0000000000004385EF setup_odb() + 8351
> 9 0x00000000000043B2E6 frontend_init() + 22
> 10 0x000000000000433304 main + 1540
> 11 0x0000007F8C6FE3724D __libc_start_main + 239
> 12 0x000000000000433F7A _start + 42
>
> Aborted (core dumped)
>
>
> We have the same problem for all our frontends. When we want to start them locally they work. Starting them locally with ./frontend -h localhost also reproduces the error above.
>
> The error can also be reproduced with the odbxx_test.cxx example in the midas repo by replacing line 22 in midas/examples/odbxx/odbxx_test.cxx (cm_connect_experiment(NULL, NULL, "test", NULL);) with cm_connect_experiment("localhost", "Mu3e", "test", NULL); (Put the name of the experiment instead of "Mu3e")
>
> running odbxx_test locally gives us then the same error as our other frontend.
>
> Thanks in advance,
> Martin |
2479
|
27 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.
> Connect to experiment Mu3e on host 10.32.113.210...
> OK
> Init hardware...
> terminate called after throwing an instance of 'mexception'
> what():
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1 0x00000000000042D828 (null) + 4380712
> 2 0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3 0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4 0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5 0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6 0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7 0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8 0x0000000000004385EF setup_odb() + 8351
> 9 0x00000000000043B2E6 frontend_init() + 22
> 10 0x000000000000433304 main + 1540
> 11 0x0000007F8C6FE3724D __libc_start_main + 239
> 12 0x000000000000433F7A _start + 42
>
> Aborted (core dumped)
K.O. |
2481
|
27 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > You said the writing into .odb is buggy. Do you mean it’s buggy in general or only in this specific case?
> We save the ODB most of the time in the .odb format.
I recommend JSON. Main advantage is you can read it using JSON decoder available for any language, no need to write custom code.
Other than that, the main issue is encoding of strings. For ODB this is key names and string values.
JSON was the first to standardize escape characters what can encode all valid UTF-8 UNICODE strings,
the system of escape characters is clean, easy to understand and easy to implement. https://www.json.org/json-en.html
XML is not as well defined as JSON, i.e. go and try to find the XML BNF grammar. I am not sure if the MIDAS XML encoder
and decoder is fully UTF-8 clean, and if some unlucky combinations of characters break string encoding or decoding. This
is usually tested using a fuzzer (generates all possible, unlucky and unlikely string values). Most suspicious
would be quotes, and square and angle brackets. If some character combinations break encoding or decoding, likely
this cannot be fixed in MIDAS without breaking backwards self-compatibility (will not read old ODB files correctly).
Same applies for the ODB format, except that it is even more ad-hoc. Again, any problems are hard to fix without
breaking backward self-compatibility.
In addition, in the past, the ODB and XML decoders had trouble with very long strings, this has been
fixed some time ago.
K.O. |
2482
|
27 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | my vote is to bump the ODB size limit to 1999*1000*1000 (not quite 2GB). but this needs to be tested. especially save and restore from ODB, XML and JSON files, including how long it takes to save and load a 1.9GB ODB. K.O. |
2488
|
28 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > Congratulations. created != "it works".
>
> Two other tings to consider:
>
> 1) The ODB shared memory is dumped into a binary file (".ODB.SHM") after the last client finished and read if the first client starts, to get it persistent.
> So this could slow down starting and stopping, but only the first client, so I guess it's not an issue.
>
typical disk writing speed is 100-1000 Mbytes/sec, so writing 1 GB .ODB.SHM will take 1-10 seconds. NFS over 1gige network is 100 Mbytes/sec, so 10 seconds to
write .ODB.SHM. embedded ARM write speed to SD flash can be as low as 10 Mbytes/sec, so up to 100 seconds.
>
> 2) Traditionally, the ODB gets dumped to the .mid file at the beginning and end of every run, so that one know the exact configuration of the experiment
> for offline analysis. This can be turned off of course, but most experiments use it. If the ODB is dumped in any ASCII format, this can take quite long.
> Assume it takes 10 seconds at the beginning of each run, and we take a run every five minutes. Then we loose 48 mins of precious beam time every day.
>
new default is to save as JSON, (as of my last measurement) JSON encoder is faster than the XML (and ODB?) encoder, by default result is compressed by GZIP-1 (66
Mbytes/sec is my old benchmark, should remeasure on new DDR5 machines), compressed JSON is written .mid.gz file at disk speed (as above). Alternatively, use LZ4
compression, runs roughly at memcpy() speed, less compression, written to .mid.lz4 at disk speed.
if data storage is ZFS, ZFS built-in LZ4 compression is now enabled by default, so result writing uncompressed .mid file (no compression of ODB dump), should be
roughly same as when using MIDAS LZ4 compression and writing .mid.lz4.
bottom line, I need to remeasure gzip and lz4 compression speeds on new computers (DDR4 AMD 5000 series and DDR5 AMD 7000 series).
K.O. |
2489
|
28 Apr 2023 |
Konstantin Olchanski | Suggestion | Maximum ODB size | > > Is this maybe related to what Stefan said about the run start - so that odbedit needs some time to load the bigger ODB?
>
> At the run start mlogger writes the ODB to the .mid file. This needs conversion (binary ODB -> XML ASCII) which can take time.
> This does NOT depend on the ODB size, but on the ODB *content*.
>
Yes and no. They must be storing more than 100 Mbytes of stuff in ODB, if they are asking to bump ODB size from 100 Mbyte to 2 GByte ODB.
On the MIDAS, side, though, we have to plan for the worst case, if max ODB size 1.9 GB and it is full of data,
and mlogger (and odbedit save and load) take 10-30 seconds, then at least all timeouts (watchdog timeout, RPC timeout, etc)
must be increased accordingly.
K.O. |
2490
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.
but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
K.O. |
2492
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines.
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine
so, a regression. ouch.
if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to
serve your RPC connection that crashes.
> int main() {
> cm_connect_experiment("localhost", "Mu3e", "test", NULL);
> midas::odb o = {
> {"Int32 Key", 42}
> };
> o.connect("/Test/Settings");
> cm_disconnect_experiment();
> return 1;
> }
to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.
so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).
rant-off.
to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu
last known-working point is:
commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date: Wed Jul 13 14:45:28 2022 -0700
Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.
K.O. |
2515
|
12 May 2023 |
Konstantin Olchanski | Info | New environment variable MIDAS_EXPNAME | > A new environment variable MIDAS_EXPNAME has been introduced [to be used together with
MIDAS_DIR]
This is fixes an important buglet. If experiment uses MIDAS_DIR instead of exptab, at the time
of connecting to ODB, we do not know the experiment name and use name "Default" to create ODB
shared memory, instead of actual experiment name.
This creates an inconsistency, if some MIDAS programs in the same experiment use MIDAS_DIR while
others use exptab (this would be unusual, but not impossible) they would connect to two
different ODB shared memories, former using name "Default", latter using actual experiment name.
As an indication that something is not right, when stopping MIDAS programs, there is an error
message about failure to delete shared ODB shared memory (because it was created using name
"Default" and delete using the correct experiment name fails).
Also it can cause co-mingling between two different experiments, depending on the type of shared
memory used by MIDAS (see $MIDAS_DIR/.SHM_TYPE.TXT):
POSIX - (usually not used) not affected (experiment name is not used)
POSIXv2 - (usually not used) affected (shm name is "$EXP_NAME_ODB")
POSIXv3 - used on MacOS - affected (shm name is "$UID_$EXP_NAME_ODB" so "$UID_Default_ODB" will
collide)
POSIXv4 - used on Linux - not affected (shm name includes $MIDAS_DIR which is different for
different experiments)
K.O. |
|