Back Midas Rome Roody Rootana
  Midas DAQ System, Page 143 of 144  Not logged in ELOG logo
ID Date Authordown Topic Subject
  2023   25 Nov 2020 Amy RobertsSuggestionODBSET wildcards with array keys in Sequencer files
The following all fail with "Cannot find ODB key "<key>""

ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[*]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[0-9]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[1]" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)*" 0
ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)" 0


> Hi,
> I guess the issue is in the "[?]" part of the command, the indexing is handled differently from the odb path and does not 
> support "?".
> Are you trying to set only the first 9 channels?
> Could you try with "[*]" or "[0-9]" instead?
> 
> Marco
> 
> > I'm interested in using the matching feature for ODBSET explained on 
> > https://midas.triumf.ca/MidasWiki/index.php/Sequencer for settings that are in an 
> > array, like:
> > 
> > COMMENT "Ground the detectors"
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[?]" 0
> > 
> > Currently I get an error when I try to run this script.  Is this expected?  Would it 
> > be possible to implement matching for array values?
> > 
> > Thanks!
  2025   25 Nov 2020 Amy RobertsSuggestionODBSET wildcards with array keys in Sequencer files
I think the issue may be the version of MIDAS I'm using.  Mine is current as of February 4, 2020.  

But since then there have been changes to the sequencer code, specifically parts that handle indexing.

I'll try this out with an updated version of MIDAS and report back if there are still any issues after updating.

> I created some keys in my ODB to try to match yours.
> The ODBSET commands you wrote are all working fine (of course with different results), except only for the "/Detectors/Det*/Settings/Charge/Bias (V)*" which I will have to 
> look into.
> In any case the error message I'm getting is "could not match ay key" and not the one you are reporting.
> 
> Now I'm a bit puzzled:
> Are you sure your ODB contains those keys?
> Are you testing the ODBSET inside a more complex sequencer or on its own?
> 
> Maybe I can try to reproduce it using your ODB setup.
> Could you send an ODB dump of the "/Detectors" folder using the "save" command of odbedit ("cd /Detectors" and then "save detector.odb")?
> 
> Best,
> 
> Marco
> 
> 
> > The following all fail with "Cannot find ODB key "<key>""
> > 
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[*]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[0-9]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[1]" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)*" 0
> > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)" 0
> > 
> > 
> > > Hi,
> > > I guess the issue is in the "[?]" part of the command, the indexing is handled differently from the odb path and does not 
> > > support "?".
> > > Are you trying to set only the first 9 channels?
> > > Could you try with "[*]" or "[0-9]" instead?
> > > 
> > > Marco
> > > 
> > > > I'm interested in using the matching feature for ODBSET explained on 
> > > > https://midas.triumf.ca/MidasWiki/index.php/Sequencer for settings that are in an 
> > > > array, like:
> > > > 
> > > > COMMENT "Ground the detectors"
> > > > ODBSET "/Detectors/Det*/Settings/Charge/Bias (V)[?]" 0
> > > > 
> > > > Currently I get an error when I try to run this script.  Is this expected?  Would it 
> > > > be possible to implement matching for array values?
> > > > 
> > > > Thanks!
  2061   17 Dec 2020 Amy RobertsSuggestionImproving variable functionality in Sequencer?
We're using the sequencer to manage runs, and this typically looks something like:

1. save ODB keys to variables via ODBGET
2. set ODB keys to new values for a "pre-run" process
3. return ODB keys to values created in line 1
4. take data

The problem I'm running into is that the list of ODB keys to save is pretty 
unwieldy.  I'm wondering if there are sequencer features that exist or that I could 
request that might make this easier.

For example, having a way to list ODB keys, save ODB directories, and load ODB 
directories would be much more concise way for me to write my script.

Another option might be to have some version of the ODBSET wildcards for ODBGET.  
Although for this, setting the variable names might be tricky.  

In any case, even being able to ODBGET an array and set that to one variable name 
would be a big improvement.
  2065   05 Jan 2021 Amy RobertsSuggestionImproving variable functionality in Sequencer?
Hello, just wanted to re-ping on this question now that folks are starting to get back from 
the holidays.
  2289   14 Oct 2021 Amy RobertsSuggestionAdding (or improving discoverability) of TID for odbset
Creating an ODB key requires users to know the Type ID that are defined in 
https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.

I can't find any information on the Midas Wiki about these values or how to find 
them.

Am I missing something obvious?  Is there a way to improve how to find these values?  
Or is this not the best way to interact with the ODB?
  2861   07 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
Machine and are getting a persistent error when we run mserver.  As far as I 
know there are minimal changes between this and the MIDAS branch, but Ben Smith 
may have more to say on this.

[lekhraj@sdfcdmsdaq online]$ mserver
mserver started interactively
[mserver,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481051 does not exist
mserver will listen on TCP port 1175
[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...
[mserver,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment 
not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
handler. Cannot continue, aborting...
Aborted (core dumped)

We thought perhaps we had a corrupted ODB file, so we removed the ODB file and 
tried to create a new one (sized correctly for our experiment):

[lekhraj@sdfcdmsdaq online]$ odbedit -s 50000000
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 
'mserver', index 0 because process pid 481326 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481326' for client 'mserver' 
because it is not connected to ODB
[ODBEdit,INFO] Client 'mserver' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481326 does not exist
[local:test:S]/>Bus error (core dumped)
  2865   08 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.  As far as I 
> > know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> > may have more to say on this.
> 
> For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> 
> We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> 
> So the questions are:
> * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?

Here's what happens when I try to run odbedit:

[lekhraj@sdfcdmsdaq setup]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
[ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

And mhttpd:

[lekhraj@sdfcdmsdaq setup]$ mhttpd
[mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
[mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Corrected 1 ODB entries
[mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
[mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
[mhttpd,INFO] ODB subtree /Runinfo corrected successfully
Password protection is off
Hostlist off, connections from anywhere will be accepted
Listening on "http://localhost:8080", passwords OFF, hostlist OFF
Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
[mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

We have been running everything as a single user, the user who cloned the repositories and owns the directories.

We did follow the corrupted-ODB cleanup instructions.

[lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
total 1.3M
-rw------- 1 lekhraj dm 1.2M Oct  8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
-rw------- 1 lekhraj dm 114K Oct  7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_

[lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
-rw-r--r-- 1 lekhraj dm 1.2M Oct  8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM
  2866   08 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.

I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.

This was done using the "CDMS" version of MIDAS, I'll compile the current MIDAS repository just to be sure we're seeing 
the same error and report back here!
  2873   10 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> 
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> 
> It is best if you run gdb and extract the stack traces on your end.
> 
> In case you are not familiar with gdb:
> 
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
> 
> K.O.

Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
looks potentially useful:

[lekhraj@sdfcdmsdaq ~]$ gdb mserver 
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]

warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/601715' in core file too small.
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
  Id   Target Id                          Frame
* 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb)
  2877   16 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
> once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
> failure is the cm_msg() codes.
> 
> Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
> heck...
> 
> Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
> unlock it!).
> 
> commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11
> 
> K.O.
> 
> 
> > > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > > 
> > > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > > 
> > > It is best if you run gdb and extract the stack traces on your end.
> > > 
> > > In case you are not familiar with gdb:
> > > 
> > > gdb mserver core # start gdb
> > > bt # stack trace of crashed thread
> > > info thr # get list of threads
> > > thr 1
> > > bt
> > > thr 2
> > > bt
> > > # etc, get stack trace of each thread, there should not be too many of them
> > > 
> > > K.O.
> > 
> > Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> > looks potentially useful:
> > 
> > [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> > core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> > GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> > ...
> > For help, type "help".
> > Type "apropos word" to search for commands related to "word"...
> > Reading symbols from mserver...
> > [New LWP 601715]
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> > Core was generated by `mserver'.
> > Program terminated with signal SIGABRT, Aborted.
> > 
> > warning: Section `.reg-xstate/601715' in core file too small.
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> > 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> > openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> > (gdb)
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > --Type <RET> for more, q to quit, c to continue without paging--
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb) info thr
> >   Id   Target Id                          Frame
> > * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > (gdb) thr 1
> > [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > (gdb) bt
> > #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> > #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> > #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> > #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> > format",
> >     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> > subhKey=0x7ffcc536d348)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
> >     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
> >     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> > #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> > filename=0x7ffcc536d690,
> >     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> > #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
> >     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> > #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> > #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> > #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> > #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> > #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> > hDB=1)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> > #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
> >     subhKey=subhKey@entry=0x7ffcc536da04) at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> > #15 0x0000000000455fd2 in al_check () at 
> > /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> > #16 0x000000000041ff85 in cm_periodic_tasks ()
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> > #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> > #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
> >     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> > (gdb)

I checked out the modified version of Midas and recompiled, and am still getting a similar error when I try to run 
odbedit:

[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1615051 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1615051' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1615051 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)

I'm not sure what's causing the call to lock the database, all I'm doing is typing "odbedit" in the command prompt.

I should add that I followed the instructions for unlocking the ODB, but when I call "odbedit" this error still 
appears:

[aroberts@sdfcdmsdaq midas]$ ipcs -s -t

------ Semaphore Operation/Change Times --------
semid    owner      last-op                    last-changed
4        aroberts    Wed Oct 16 11:44:10 2024   Wed Oct 16 11:44:00 2024

[aroberts@sdfcdmsdaq midas]$ ipcrm sem 4
resource(s) deleted
[aroberts@sdfcdmsdaq midas]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 
1617050 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617050' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617050 does not 
exist
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
  2884   28 Oct 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
> Now for each timeout it will print detailed syscall and timing information, if time goes backwards, it should catch it.

It appears that time is moving forward:

[aroberts@sdfcdmsdaq build]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2043:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 1617119 does 
not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/1617119' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 1617119 does not exist
[local:amy_test:S]/>ss_semaphore_wait_for: semop/semtimedop(5) returned -1, errno 11 (Resource temporarily unavailable), 
start time 0xd4fd98f6, now 0xd4fdc0ef, dt 0x000027f9, timeout 0x00002710 ms, SEMAPHORE TIMEOUT!
[ODBEdit,ERROR] [odb.cxx:2489:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, aborting...
Aborted (core dumped)
  2889   06 Nov 2024 Amy RobertsBug ReportDifficulty running MIDAS on Rocky 9.4
After following Konstantin's debugging suggestions, I thought I would try to replicate 
the issue on my own computer.  My hope was that I could provide instructions for 
replicating the bug so that the MIDAS team could try debugging things more easily.

However, when I ran the current version of MIDAS in a Rocky 9.4 VM on my laptop (both 
VMWare and VirtualBox), mserver and odbedit ran just fine (!).

I'm currently trying to find out if there's a way to compare the VMs on my machine and 
the machine that's being problematic, I'll report back if I learn anything.
  1751   06 Jan 2020 Alireza TalebitaherForumSSL_ERROR_NO_CYPHER_OVERLAP
Hello,

I am quite new in both Linux and MIDAS.
I have install MIDAS on my desktop by going through this link: 
https://midas.triumf.ca/MidasWiki/index.php/Quickstart_Linux 

in the last step when I send "mhttpd" command and try to open the link 
https://localhost:8443 (of course, changing the localhost with my host name), it 
failed to connect and shows this error: SSL_ERROR_NO_CYPHER_OVERLAP (please see 
attached file includes a screenshot of the error).

I have tried many ways to solve this problem: In Firefox: going to option/privacy 
and security/ security and uncheck the option "Block dangerous and deceptive 
content". but it does not help.

Looking forward your help
Thanks
Mehran
Attachment 1: MIDAS_SSL_ERROR.png
MIDAS_SSL_ERROR.png
  1753   07 Jan 2020 Alireza TalebitaherForumSSL_ERROR_NO_CYPHER_OVERLAP
Hi Konstantin,
Thanks for your reply, 

> What Linux? (on most linuxes, run "lsb_release -a")
> What version of midas? (run odbedit "ver" command)
I am using CentOS 8

> What version of firefox? (from the "about firefox" menu)
Firefox 71.0

Thanks 
Mehran

> No you cannot fix it from inside firefox. The issue is that the overlap of encryption methods
> supported by your firefox and by your openssl library (used by mhttpd) is an empty set.
> No common language, so to say, communication is impossible.
> 
> So either you have a very old openssl but very new firefox, or a very new openssl but very old 
> firefox. Both very old or both very new can talk to each other, difficulties start with greater  
> difference in age, as new (better) encryption methods are added and old (no-longer-secure) 
> methods are banished.
> 
> BTW, for good security we recommend using apache httpd as the https proxy (instead of built-in 
> https support in mhttpd). (I am not sure what it says in the current documentation). (But apache 
> httpd will use the same openssl library, so this may not solve your problem. Let's see what 
> versions of software you are using, per questions above, first).
> 
> K.O.
  1755   08 Jan 2020 Alireza TalebitaherForumSSL_ERROR_NO_CYPHER_OVERLAP
Hi,
As, the link suggests, I perform "yum install -y mod_ssl certwatch crypto-utils" but it complains as:
No match for argument: certwatch
No match for argument: crypto-utils

You may have a look on this link: https://blog.cloudware.bg/en/whats-new-in-centos-linux-8/
What’s gone?
In with the new, out with the old. CentOS 8 also says goodbye to some features. The OS removes several security functionalities. Among them is the Clevis HTTP pin, Coolkey and crypto-utils.
Cent OS 8 comes with securetty disabled by default. The configuration file is no longer included. You can add it back, but you will have to do it yourself. Another change is that shadow-utils no longer allow all-numeric user and group names.

Thanks
Mehran

> Hi, I have not run midas on Centos-8 yet. Maybe there is a problem with the openssl library there. The Centos-7 
> instructions for setting up apache httpd proxy are here, with luck they work on centos-8:
> https://daq.triumf.ca/DaqWiki/index.php/SLinstall#Configure_HTTPS_server_.28CentOS7.29
> 
> K.O.
> 
  1006   06 Jun 2014 Alexey KalininForumproblem with writing data on disk
Hello,
Our experiment based on MIDAS 2.x DAQ.
I'm using several identical frontend-%d  with only lam source & event id changed, 
running on 2 computers(~3frontends per one).
Each recieve about 10k Events (Max_SIZE =8*1024, but usually it is less then 
sizeof(DWORD)*400) per 7sec.
With no mlogger running it works just fine, but when I'm starting mlogger (on 3-d 
computer with mserver running)... looking at ethernet stat graph first 2-3 spills 
goes well, with one peak per 7 sec, then it becomes junky and everithing crushed 
(mlogger and frontends).
I tried to increase SYSTEM buffer and restart everything. What I saw was Logger 
writes only half of recieved events from sum of frontends, it stays running for 
awhile ~15minutes. If I push STOP button  before crashing, mlogger continious 
writing data on disk enough priod of time.
I will try to look at disk usage for bad sectors @HDD, but may be there is an easy 
way to fix this problem and i did something wrong. 
structure of frontend has code like
EQ_POLLED , POLL for 500,

frontend_loop{
read big buffer with 10k events;bufferread=true;
}

poll_event{
for (i=0;i<count;i++){
 if (bufferread) lam=1;
 if (!test) return lam;
 }
return 0;
}

read_trigger{
bk_init32();
//fill event with buffer until current word!=0xffffffff
if (currentposition+2 >buffer_size) bufferread=false
}
|
Help needed, please. Suggestions.
Thanks, Alexey.
  1009   16 Jun 2014 Alexey KalininForumproblem with writing data on disk
Hello, once again.
What I found is when I tryed to stop the run, mlogger still working and writing some 
data, that i'm sure is not right, because frontend's are in stopped state
( for ex. every 3*frontend got 50k, mlogger showes 120k . Stop button pushed, but data 
in .mid file collect more then 150k~300k ev)
. And it continue writing until it crashes by the default waiting period 10s.
  1010   18 Jun 2014 Alexey KalininForumproblem with writing data on disk
Hello, 
I'm in deppression.
I removed Everything from computer with mserver and reinstall system and midas.
Then I tried to run tutorial example.
Often run did not stop by pushing STOP button (mlogger stuck it, odbedit stop 
works)
After first START button pushed number of event taken by frontend equals mlogger 
events 
written. Next run (without mlogger restarting) mlogger double the number of 
events taken by 
frontend.(see attachment).Restarting mlogger fix this double counting.
What i've did wrong?
Attachment 1: 39.png
39.png
  2307   02 Dec 2021 Alexey KalininBug Reportsome frontend kicked by cm_periodic_tasks
Hello,
We have a small experiment with MIDAS based DAQ.
Status page shows :
ES	ESFrontend@192.168.0.37	207	0.2	0.000
Trigger06	Sample Frontend06@192.168.0.37	1.297M	0.3	0.000
Trigger01	Sample Frontend01@192.168.0.37	1.297M	0.3	0.000
Trigger16	Sample Frontend16@192.168.0.37	1.297M	0.3	0.000
Trigger38	Sample Frontend38@192.168.0.37	1.297M	0.3	0.000
Trigger37	Sample Frontend37@192.168.0.37	1.297M	0.3	0.000
Trigger03	Sample Frontend03@192.168.0.38	1.297M	0.3	0.000
Trigger07	Sample Frontend07@192.168.0.38	1.297M	0.3	0.000
Trigger04	Sample Frontend04@192.168.0.38	59898	0.0	0.000
Trigger08	Sample Frontend08@192.168.0.38	59898	0.0	0.000
Trigger17	Sample Frontend17@192.168.0.38	59898	0.0	0.000


And SYSTEM buffers page shows:
ESFrontend	1968	198	47520	0	0x00000000	0		
193 ms
Sample Frontend06	1332547	1330826	379729872	0	0x00000000	
0		1.1 sec
Sample Frontend16	1332542	1330839	361988208	0	0x00000000	
0		94 ms
Sample Frontend37	1332530	1330841	337798408	0	0x00000000	
0		1.1 sec
Sample Frontend01	1332543	1330829	467136688	0	0x00000000	
0		34 ms
Sample Frontend38	1332528	1330830	291453608	0	0x00000000	
0		1.1 sec
Sample Frontend04	63254	61467	20882584	0	0x00000000	
0		208 ms
Sample Frontend08	63262	61476	27904056	0	0x00000000	
0		205 ms
Sample Frontend17	63271	61473	20433840	0	0x00000000	
0		213 ms
Sample Frontend03	1332549	1330818	386821728	0	0x00000000	
0		82 ms
Sample Frontend07	1332554	1330821	462210896	0	0x00000000	
0		37 ms
Logger	968742	0w+9500418r	0w+2718405736r	0	0x00000000	0	
GET_ALL Used 0 bytes 0.0%	303 ms
rootana	254561	0w+29856958r	0w+8718288352r	0	0x00000000	0		
762 ms


The problem is that eventually some of frontend closed with message 
:19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist

in the meantime mserver loggging :
mserver started interactively
mserver will listen on TCP port 1175
double free or corruption (!prev)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)


I can find some correlation between number of events/event size produced by 
frontend, cause its failed when its become big enough. 

frontend scheme is like this:

poll event time set to 0;

poll_event{
//if buffer not transferred return (continue cutting the main buffer)
//read main buffer from hardware
//buffer not transfered
}

read event{
// cut the main buffer to subevents (cut one event from main buffer) return;
//if (last subevent) {buffer transfered ;return}
}

What is strange to me that 2 frontends (1 per remote pc) causing this.

Also, I'm executing one FEcode with -i # flag , put setting eventid in 
frontend_init , and using SYSTEM buffer for all.

Is there something I'm missing?
Thanks. 
A.
  2337   11 Feb 2022 Alexey KalininBug Reportsome frontend kicked by cm_periodic_tasks
Thanks for the answer.
As soon as I can(possible in a month) I'll try suggestion below:

> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").

I tried to change this ODB for one of the frontend via mhttpd/browser, and eventually it goes back 
to default value (1000 as I remember). but this frontend has the minimum rate 50DWORD/~10sec. and 
depending on cashe size it appears in mdump once per 31 events but all aff them . SO its different 
story, but m.b. it has the same solution to play with Write Cashe Size.    


double free message goes from mserver terminal. 
all of the frontends are remote.
I can't exclude crashes of frontend , but when I run ./frontend -i 1(2,3 etc) thet means that I run 
one code for all, and only several causes crash.also I found that crash in frontend happened while 
it do nothing with collected data (last event reached and new data is not ready), but it tries to 
watch for the ODB changes.I mean it crashes iside (while {odb_changes(value in watchdog)}),and I don't 
know what else happenned meanwhile with cahed buffer.

Future plans is to use event buider for frontends when data/signals will be perfectly reasonable 
i/e/ without broken events. for now i kinda worry about if one of frontends will skip one of the 
event inside its buffer.


Thanks for the way to dig into.
A.    

> > The problem is that eventually some of frontend closed with message 
> > :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
> > 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist
> 
> This messages means what it says. A client was registered with the SYSMSG buffer and this 
> client had pid 9789. At some point some other client (rootana, in this case) checked it and 
> process pid 9789 was no longer running. (it then proceeded to remove the registration).
> 
> There is 2 possibilities:
> - simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for 
> the crash.
> - unlikely: reported pid is bogus, real pid of your frontend is different, the client 
> registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared 
> memory buffers, not impossible if your frontend misbehaves and writes to random memory 
> addresses. ODB has protection against this (normally turned off, easy to enable, set ODB 
> "/experiment/protect odb" to yes), shared memory buffers do not have protection against this 
> (should be added?).
> 
> Do this. When you start your frontend, write down it's pid, when you see the crash message, 
> confirm pid number printed is the same. As additional test, run your frontend inside gdb, 
> after it crashes, you can print the stack trace, etc.
> 
> > 
> > in the meantime mserver loggging :
> > mserver started interactively
> > mserver will listen on TCP port 1175
> > double free or corruption (!prev)
> > double free or corruption (!prev)
> > free(): invalid next size (normal)
> > double free or corruption (!prev)
> > 
> 
> Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run 
> them in different terminals, not all in the same terminal?).
> 
> If messages are coming from the mserver, this confirms possibility (1),
> except that for frontends connected remotely, the pid is the pid of the mserver,
> and what we see are crashes of mserver, not crashes of your frontend. These are much harder to 
> debug.
> 
> You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"),
> confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created
> in the directory where you started the mserver), reproduce the crash, run "gdb mserver 
> core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me 
> directly).
> 
> >
> > I can find some correlation between number of events/event size produced by 
> > frontend, cause its failed when its become big enough. 
> > 
> 
> There is no limit on event size or event rate in midas, you should not see any crash
> regardless of what you do. (there is a limit of event size, because an event has
> to fit inside an event buffer and event buffer size is limited to 2 GB).
> 
> Obviously you hit a bug in mserver that makes it crash. Let's debug it.
> 
> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").
> 
> >
> > frontend scheme is like this:
> > 
> 
> Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not
> have to debug the convoluted old code in mfe.c.
> 
> K.O.
> 
> >
> > poll event time set to 0;
> > 
> > poll_event{
> > //if buffer not transferred return (continue cutting the main buffer)
> > //read main buffer from hardware
> > //buffer not transfered
> > }
> > 
> > read event{
> > // cut the main buffer to subevents (cut one event from main buffer) return;
> > //if (last subevent) {buffer transfered ;return}
> > }
> > 
> > What is strange to me that 2 frontends (1 per remote pc) causing this.
> > 
> > Also, I'm executing one FEcode with -i # flag , put setting eventid in 
> > frontend_init , and using SYSTEM buffer for all.
> > 
> > Is there something I'm missing?
> > Thanks. 
> > A.
ELOG V3.1.4-2e1708b5