29 Sep 2020, Amy Roberts, Forum, using python client to start and stop run
|
I'm using a python client to start and stop runs, and the following code *appears*
to set the MIDAS state to "Run"
client.odb_set("/Runinfo/State", 3)
However, it doesn't seem to do other things associated with a run, like start
accumulating events.
Is there a different way I should start the run from the python client?
Thanks! |
24 Nov 2020, Amy Roberts, Suggestion, ODBSET wildcards with array keys in Sequencer files
|
I'm interested in using the matching feature for ODBSET explained on
https://midas.triumf.ca/MidasWiki/index.php/Sequencer for settings that are in an
array, like:
COMMENT "Ground the detectors"
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[?]" 0
Currently I get an error when I try to run this script. Is this expected? Would it
be possible to implement matching for array values?
Thanks! |
25 Nov 2020, Amy Roberts, Suggestion, ODBSET wildcards with array keys in Sequencer files
|
The following all fail with "Cannot find ODB key "<key>""
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[*]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[0-9]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[1]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)*" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)" 0
> Hi,
> I guess the issue is in the "[?]" part of the command, the indexing is handled differently from the odb path and does not
> support "?".
> Are you trying to set only the first 9 channels?
> Could you try with "[*]" or "[0-9]" instead?
>
> Marco
>
> > I'm interested in using the matching feature for ODBSET explained on
> > https://midas.triumf.ca/MidasWiki/index.php/Sequencer for settings that are in an
> > array, like:
> >
> > COMMENT "Ground the detectors"
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[?]" 0
> >
> > Currently I get an error when I try to run this script. Is this expected? Would it
> > be possible to implement matching for array values?
> >
> > Thanks! |
25 Nov 2020, Amy Roberts, Suggestion, ODBSET wildcards with array keys in Sequencer files
|
I think the issue may be the version of MIDAS I'm using. Mine is current as of February 4, 2020.
But since then there have been changes to the sequencer code, specifically parts that handle indexing.
I'll try this out with an updated version of MIDAS and report back if there are still any issues after updating.
> I created some keys in my ODB to try to match yours.
> The ODBSET commands you wrote are all working fine (of course with different results), except only for the "/Detectors/Det*/Settings/Charge/Bias (V)*" which I will have to
> look into.
> In any case the error message I'm getting is "could not match ay key" and not the one you are reporting.
>
> Now I'm a bit puzzled:
> Are you sure your ODB contains those keys?
> Are you testing the ODBSET inside a more complex sequencer or on its own?
>
> Maybe I can try to reproduce it using your ODB setup.
> Could you send an ODB dump of the "/Detectors" folder using the "save" command of odbedit ("cd /Detectors" and then "save detector.odb")?
>
> Best,
>
> Marco
>
>
> > The following all fail with "Cannot find ODB key "<key>""
> >
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[*]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[0-9]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[1]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)*" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)" 0
> >
> >
> > > Hi,
> > > I guess the issue is in the "[?]" part of the command, the indexing is handled differently from the odb path and does not
> > > support "?".
> > > Are you trying to set only the first 9 channels?
> > > Could you try with "[*]" or "[0-9]" instead?
> > >
> > > Marco
> > >
> > > > I'm interested in using the matching feature for ODBSET explained on
> > > > https://midas.triumf.ca/MidasWiki/index.php/Sequencer for settings that are in an
> > > > array, like:
> > > >
> > > > COMMENT "Ground the detectors"
> > > > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[?]" 0
> > > >
> > > > Currently I get an error when I try to run this script. Is this expected? Would it
> > > > be possible to implement matching for array values?
> > > >
> > > > Thanks! |
17 Dec 2020, Amy Roberts, Suggestion, Improving variable functionality in Sequencer?
|
We're using the sequencer to manage runs, and this typically looks something like:
1. save ODB keys to variables via ODBGET
2. set ODB keys to new values for a "pre-run" process
3. return ODB keys to values created in line 1
4. take data
The problem I'm running into is that the list of ODB keys to save is pretty
unwieldy. I'm wondering if there are sequencer features that exist or that I could
request that might make this easier.
For example, having a way to list ODB keys, save ODB directories, and load ODB
directories would be much more concise way for me to write my script.
Another option might be to have some version of the ODBSET wildcards for ODBGET.
Although for this, setting the variable names might be tricky.
In any case, even being able to ODBGET an array and set that to one variable name
would be a big improvement. |
05 Jan 2021, Amy Roberts, Suggestion, Improving variable functionality in Sequencer?
|
Hello, just wanted to re-ping on this question now that folks are starting to get back from
the holidays. |
14 Oct 2021, Amy Roberts, Suggestion, Adding (or improving discoverability) of TID for odbset
|
Creating an ODB key requires users to know the Type ID that are defined in
https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.
I can't find any information on the Midas Wiki about these values or how to find
them.
Am I missing something obvious? Is there a way to improve how to find these values?
Or is this not the best way to interact with the ODB? |
07 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual
Machine and are getting a persistent error when we run mserver. As far as I
know there are minimal changes between this and the MIDAS branch, but Ben Smith
may have more to say on this.
[lekhraj@sdfcdmsdaq online]$ mserver
mserver started interactively
[mserver,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer
because process pid 481051 does not exist
mserver will listen on TCP port 1175
[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
timeout 10000 ms, exiting...
[mserver,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment
not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while
already inside db_{lock,unlock}_database(). Maybe this is a call from a signal
handler. Cannot continue, aborting...
Aborted (core dumped)
We thought perhaps we had a corrupted ODB file, so we removed the ODB file and
tried to create a new one (sized correctly for our experiment):
[lekhraj@sdfcdmsdaq online]$ odbedit -s 50000000
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client
'mserver', index 0 because process pid 481326 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC
hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC
hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481326' for client 'mserver'
because it is not connected to ODB
[ODBEdit,INFO] Client 'mserver' on buffer 'SYSMSG' removed by bm_open_buffer
because process pid 481326 does not exist
[local:test:S]/>Bus error (core dumped) |
08 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual
> > Machine and are getting a persistent error when we run mserver. As far as I
> > know there are minimal changes between this and the MIDAS branch, but Ben Smith
> > may have more to say on this.
>
> For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
>
> We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
>
> So the questions are:
> * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB In particular the bit about running odbinit with the --cleanup flag?
Here's what happens when I try to run odbedit:
[lekhraj@sdfcdmsdaq setup]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
[ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$
And mhttpd:
[lekhraj@sdfcdmsdaq setup]$ mhttpd
[mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
[mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Corrected 1 ODB entries
[mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
[mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
[mhttpd,INFO] ODB subtree /Runinfo corrected successfully
Password protection is off
Hostlist off, connections from anywhere will be accepted
Listening on "http://localhost:8080", passwords OFF, hostlist OFF
Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
[mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$
We have been running everything as a single user, the user who cloned the repositories and owns the directories.
We did follow the corrupted-ODB cleanup instructions.
[lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
total 1.3M
-rw------- 1 lekhraj dm 1.2M Oct 8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
-rw------- 1 lekhraj dm 114K Oct 7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
[lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
-rw-r--r-- 1 lekhraj dm 1.2M Oct 8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
-rw-r--r-- 1 lekhraj dm 0 Oct 3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM |
08 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
>
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
>
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas
> mserver (just while we debug this problem).
>
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
>
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix
> shared memory.
>
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
>
> K.O.
I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
This was done using the "CDMS" version of MIDAS, I'll compile the current MIDAS repository just to be sure we're seeing
the same error and report back here! |
10 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
> > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
>
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
>
> It is best if you run gdb and extract the stack traces on your end.
>
> In case you are not familiar with gdb:
>
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
>
> K.O.
Hi Konstantin, thanks for the instructions. I do appear to be missing some debug symbols, but the output
looks potentially useful:
[lekhraj@sdfcdmsdaq ~]$ gdb mserver
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]
warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.
warning: Section `.reg-xstate/601715' in core file too small.
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
format",
hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
subhKey=0x7ffcc536d348)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
filename=0x7ffcc536d690,
linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
hDB=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
subhKey=subhKey@entry=0x7ffcc536da04) at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
Id Target Id Frame
* 1 Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
format",
hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
subhKey=0x7ffcc536d348)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
filename=0x7ffcc536d690,
linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
hDB=1)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
subhKey=subhKey@entry=0x7ffcc536da04) at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) |
16 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
> Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
> once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock
> failure is the cm_msg() codes.
>
> Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the
> heck...
>
> Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will
> unlock it!).
>
> commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
>
> K.O.
>
>
> > > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > >
> > > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > >
> > > It is best if you run gdb and extract the stack traces on your end.
> > >
> > > In case you are not familiar with gdb:
> > >
> > > gdb mserver core # start gdb
> > > bt # stack trace of crashed thread
> > > info thr # get list of threads
> > > thr 1
> > > bt
> > > thr 2
> > > bt
> > > # etc, get stack trace of each thread, there should not be too many of them
> > >
> > > K.O.
> >
> > Hi Konstantin, thanks for the instructions. I do appear to be missing some debug symbols, but the output
> > looks potentially useful:
> >
> > [lekhraj@sdfcdmsdaq ~]$ gdb mserver
> > core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> > GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> > ...
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from mserver...
> > [New LWP 601715]
> >
> > warning: Section `.reg-xstate/601715' in core file too small.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `mserver'.
> > Program terminated with signal SIGABRT, Aborted.
> >
> > warning: Section `.reg-xstate/601715' in core file too small.
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> > 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64
> > openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> > (gdb)
> > (gdb) bt
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> > format",
> > hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> > subhKey=0x7ffcc536d348)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> > key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> > s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> > filename=0x7ffcc536d690,
> > linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> > message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> > hDB=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> > subhKey=subhKey@entry=0x7ffcc536da04) at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb) info thr
> > Id Target Id Frame
> > * 1 Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from
> > /lib64/libc.so.6
> > (gdb) thr 1
> > [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > (gdb) bt
> > #0 0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1 0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2 0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3 0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date
> > format",
> > hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5 db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format",
> > subhKey=0x7ffcc536d348)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6 0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> > key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> > s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7 0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>,
> > filename=0x7ffcc536d690,
> > linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8 0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> > message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore,
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9 0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0,
> > hDB=1)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> > subhKey=subhKey@entry=0x7ffcc536da04) at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> > at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb)
I checked out the modified version of Midas and recompiled, and am still getting a similar error when I try to run
odbedit:
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid
1615051 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
I'm not sure what's causing the call to lock the database, all I'm doing is typing "odbedit" in the command prompt.
I should add that I followed the instructions for unlocking the ODB, but when I call "odbedit" this error still
appears:
[aroberts@sdfcdmsdaq midas]$ ipcs -s -t
------ Semaphore Operation/Change Times --------
semid owner last-op last-changed
4 aroberts Wed Oct 16 11:44:10 2024 Wed Oct 16 11:44:00 2024
[aroberts@sdfcdmsdaq midas]$ ipcrm sem 4
resource(s) deleted
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid
1617050 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617050' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617050 does not
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped) |
28 Oct 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
> Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.
It appears that time is moving forward:
[aroberts@sdfcdmsdaq build]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 1617119 does
not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617119' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617119 does not exist
[local:amy_test:S]/>ss_semaphore_wait_for: semop/semtimedop(5) returned -1, errno 11 (Resource temporarily unavailable),
start time 0xd4fd98f6, now 0xd4fdc0ef, dt 0x000027f9, timeout 0x00002710 ms, SEMAPHORE TIMEOUT!
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped) |
06 Nov 2024, Amy Roberts, Bug Report, Difficulty running MIDAS on Rocky 9.4
|
After following Konstantin's debugging suggestions, I thought I would try to replicate
the issue on my own computer. My hope was that I could provide instructions for
replicating the bug so that the MIDAS team could try debugging things more easily.
However, when I ran the current version of MIDAS in a Rocky 9.4 VM on my laptop (both
VMWare and VirtualBox), mserver and odbedit ran just fine (!).
I'm currently trying to find out if there's a way to compare the VMs on my machine and
the machine that's being problematic, I'll report back if I learn anything. |
06 Jan 2020, Alireza Talebitaher, Forum, SSL_ERROR_NO_CYPHER_OVERLAP
|
Hello,
I am quite new in both Linux and MIDAS.
I have install MIDAS on my desktop by going through this link:
https://midas.triumf.ca/MidasWiki/index.php/Quickstart_Linux
in the last step when I send "mhttpd" command and try to open the link
https://localhost:8443 (of course, changing the localhost with my host name), it
failed to connect and shows this error: SSL_ERROR_NO_CYPHER_OVERLAP (please see
attached file includes a screenshot of the error).
I have tried many ways to solve this problem: In Firefox: going to option/privacy
and security/ security and uncheck the option "Block dangerous and deceptive
content". but it does not help.
Looking forward your help
Thanks
Mehran |
07 Jan 2020, Alireza Talebitaher, Forum, SSL_ERROR_NO_CYPHER_OVERLAP
|
Hi Konstantin,
Thanks for your reply,
> What Linux? (on most linuxes, run "lsb_release -a")
> What version of midas? (run odbedit "ver" command)
I am using CentOS 8
> What version of firefox? (from the "about firefox" menu)
Firefox 71.0
Thanks
Mehran
> No you cannot fix it from inside firefox. The issue is that the overlap of encryption methods
> supported by your firefox and by your openssl library (used by mhttpd) is an empty set.
> No common language, so to say, communication is impossible.
>
> So either you have a very old openssl but very new firefox, or a very new openssl but very old
> firefox. Both very old or both very new can talk to each other, difficulties start with greater
> difference in age, as new (better) encryption methods are added and old (no-longer-secure)
> methods are banished.
>
> BTW, for good security we recommend using apache httpd as the https proxy (instead of built-in
> https support in mhttpd). (I am not sure what it says in the current documentation). (But apache
> httpd will use the same openssl library, so this may not solve your problem. Let's see what
> versions of software you are using, per questions above, first).
>
> K.O. |
08 Jan 2020, Alireza Talebitaher, Forum, SSL_ERROR_NO_CYPHER_OVERLAP
|
Hi,
As, the link suggests, I perform "yum install -y mod_ssl certwatch crypto-utils" but it complains as:
No match for argument: certwatch
No match for argument: crypto-utils
You may have a look on this link: https://blog.cloudware.bg/en/whats-new-in-centos-linux-8/
What’s gone?
In with the new, out with the old. CentOS 8 also says goodbye to some features. The OS removes several security functionalities. Among them is the Clevis HTTP pin, Coolkey and crypto-utils.
Cent OS 8 comes with securetty disabled by default. The configuration file is no longer included. You can add it back, but you will have to do it yourself. Another change is that shadow-utils no longer allow all-numeric user and group names.
Thanks
Mehran
> Hi, I have not run midas on Centos-8 yet. Maybe there is a problem with the openssl library there. The Centos-7
> instructions for setting up apache httpd proxy are here, with luck they work on centos-8:
> https://daq.triumf.ca/DaqWiki/index.php/SLinstall#Configure_HTTPS_server_.28CentOS7.29
>
> K.O.
> |
06 Jun 2014, Alexey Kalinin, Forum, problem with writing data on disk
|
Hello,
Our experiment based on MIDAS 2.x DAQ.
I'm using several identical frontend-%d with only lam source & event id changed,
running on 2 computers(~3frontends per one).
Each recieve about 10k Events (Max_SIZE =8*1024, but usually it is less then
sizeof(DWORD)*400) per 7sec.
With no mlogger running it works just fine, but when I'm starting mlogger (on 3-d
computer with mserver running)... looking at ethernet stat graph first 2-3 spills
goes well, with one peak per 7 sec, then it becomes junky and everithing crushed
(mlogger and frontends).
I tried to increase SYSTEM buffer and restart everything. What I saw was Logger
writes only half of recieved events from sum of frontends, it stays running for
awhile ~15minutes. If I push STOP button before crashing, mlogger continious
writing data on disk enough priod of time.
I will try to look at disk usage for bad sectors @HDD, but may be there is an easy
way to fix this problem and i did something wrong.
structure of frontend has code like
EQ_POLLED , POLL for 500,
frontend_loop{
read big buffer with 10k events;bufferread=true;
}
poll_event{
for (i=0;i<count;i++){
if (bufferread) lam=1;
if (!test) return lam;
}
return 0;
}
read_trigger{
bk_init32();
//fill event with buffer until current word!=0xffffffff
if (currentposition+2 >buffer_size) bufferread=false
}
|
Help needed, please. Suggestions.
Thanks, Alexey. |
16 Jun 2014, Alexey Kalinin, Forum, problem with writing data on disk
|
Hello, once again.
What I found is when I tryed to stop the run, mlogger still working and writing some
data, that i'm sure is not right, because frontend's are in stopped state
( for ex. every 3*frontend got 50k, mlogger showes 120k . Stop button pushed, but data
in .mid file collect more then 150k~300k ev)
. And it continue writing until it crashes by the default waiting period 10s. |
18 Jun 2014, Alexey Kalinin, Forum, problem with writing data on disk
|
Hello,
I'm in deppression.
I removed Everything from computer with mserver and reinstall system and midas.
Then I tried to run tutorial example.
Often run did not stop by pushing STOP button (mlogger stuck it, odbedit stop
works)
After first START button pushed number of event taken by frontend equals mlogger
events
written. Next run (without mlogger restarting) mlogger double the number of
events taken by
frontend.(see attachment).Restarting mlogger fix this double counting.
What i've did wrong? |
|