ELOG Midas

Thank you for the stack trace, I fixed the buglet that cause midas programs to crash twice,
once on failure to lock ODB, then call exit() -> atexit() handlers -> cm_check_connect() -> crash on ODB lock 
failure is the cm_msg() codes.

Replaced exit(1) with abort(). Could have used kill(getpid(),SIGKILL) to avoid making a core dump, but what the 
heck...

Of course this does nothing to the original bug where ODB was locked and nobody will ever unlock it (reboot will 
unlock it!).

commit bdd1d7fdc093b5a8d54a1b8467002bb3cac3ac11

K.O.


> > > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> > 
> > I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> > 
> > It is best if you run gdb and extract the stack traces on your end.
> > 
> > In case you are not familiar with gdb:
> > 
> > gdb mserver core # start gdb
> > bt # stack trace of crashed thread
> > info thr # get list of threads
> > thr 1
> > bt
> > thr 2
> > bt
> > # etc, get stack trace of each thread, there should not be too many of them
> > 
> > K.O.
> 
> Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
> looks potentially useful:
> 
> [lekhraj@sdfcdmsdaq ~]$ gdb mserver 
> core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
> GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
> ...
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from mserver...
> [New LWP 601715]
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `mserver'.
> Program terminated with signal SIGABRT, Aborted.
> 
> warning: Section `.reg-xstate/601715' in core file too small.
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
> 3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
> openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
> (gdb)
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> --Type <RET> for more, q to quit, c to continue without paging--
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb) info thr
>   Id   Target Id                          Frame
> * 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> (gdb) thr 1
> [Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
> #2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
> #3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
> #4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
> format",
>     hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
> subhKey=0x7ffcc536d348)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
>     key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
>     s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
> #7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
> filename=0x7ffcc536d690,
>     linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
> #8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
>     message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
> #9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
> #10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
> #11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
> #12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
> #13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
> hDB=1)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
> #14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
>     subhKey=subhKey@entry=0x7ffcc536da04) at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
> #15 0x0000000000455fd2 in al_check () at 
> /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
> #16 0x000000000041ff85 in cm_periodic_tasks ()
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
> #17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
> #18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
>     at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
> (gdb)

2875

11 Oct 2024

Stefan Ritt

Bug Report

Frontend name must differ from others by more than the last three characters

Hi Denis,

indeed a bug. Will fix it next week.

Best,
Stefan


> Hi,
> I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).
> 
> Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.
> 
> The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.

Frontend name must differ from others by more than the last three characters

Hi,
I have developed two Midas front-end programs for different hardware. The frontend_name of the first one is "FSCD_SC" (slow control) and that of the second one is "FSCD_PS" (power supply).

Each front-end program runs fine separately, but when attempting to start FSCD_SC while FSCD_PS is running, FSCD_PS is terminated and Midas indicates "Previous frontend stopped" in the window where it starts FSCD_SC.

The problem is that these two frontend names only differ in their last two characters, and Midas currently does not distinguish them properly.

Looking in mfe.cxx we have:

int main(int argc, char *argv[])
{
...
/* shutdown previous frontend */
   status = cm_shutdown(full_frontend_name, FALSE);
...

And looking in midas.cxx we have:

INT cm_shutdown(const char *name, BOOL bUnique) {
...
if (!bUnique)
            client_name[strlen(name)] = 0;      /* strip number */
...

The above line removes the last 3 characters of the front-end name before the subsequent comparison with other frontend names. Stripping the last 3 characters of the front-end name is correct for frontend programs that use the "-i" command line option to specify an index for that frontend, but all the characters of the front-end name should otherwise be kept for comparison.

I have changed the names of my frontend programs to avoid the interference, but it would be nice that the code that determines if an instance of a frontend program is already running is corrected.

I hope this can help.

Best regards,
Denis.

Difficulty running MIDAS on Rocky 9.4

> > I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.
> 
> I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).
> 
> It is best if you run gdb and extract the stack traces on your end.
> 
> In case you are not familiar with gdb:
> 
> gdb mserver core # start gdb
> bt # stack trace of crashed thread
> info thr # get list of threads
> thr 1
> bt
> thr 2
> bt
> # etc, get stack trace of each thread, there should not be too many of them
> 
> K.O.

Hi Konstantin, thanks for the instructions.  I do appear to be missing some debug symbols, but the output 
looks potentially useful:

[lekhraj@sdfcdmsdaq ~]$ gdb mserver 
core.mserver.17468.b174bb74f2bb44f9a0905e78ec6b2677.601715.1728422354000000
GNU gdb (GDB) Rocky Linux 10.2-11.1.el9_3
...
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from mserver...
[New LWP 601715]

warning: Section `.reg-xstate/601715' in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `mserver'.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/601715' in core file too small.
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-83.el9.12.x86_64 libgcc-11.4.1-
3.el9.x86_64 libstdc++-11.4.1-2.1.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 mysql-libs-8.0.36-1.el9_3.x86_64 
openssl-libs-3.0.7-25.el9_3.x86_64 zlib-1.2.11-40.el9.x86_64
(gdb)
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb) info thr
  Id   Target Id                          Frame
* 1    Thread 0x7fbdec0b1740 (LWP 601715) 0x00007fbdeaca154c in __pthread_kill_implementation () from 
/lib64/libc.so.6
(gdb) thr 1
[Switching to thread 1 (Thread 0x7fbdec0b1740 (LWP 601715))]
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fbdeaca154c in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fbdeac54d06 in raise () from /lib64/libc.so.6
#2  0x00007fbdeac287f3 in abort () from /lib64/libc.so.6
#3  0x0000000000430ee4 in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2473
#4  0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536d348, key_name=0x4687a8 "/Logger/Message file date 
format",
    hKey=0, hDB=1) at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#5  db_find_key (hDB=1, hKey=0, key_name=0x4687a8 "/Logger/Message file date format", 
subhKey=0x7ffcc536d348)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#6  0x0000000000448297 in db_get_value_string (hdb=1, hKeyRoot=hKeyRoot@entry=0,
    key_name=key_name@entry=0x4687a8 "/Logger/Message file date format", index=index@entry=0,
    s=s@entry=0x7ffcc536d470, create=create@entry=1, create_string_length=0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:13950
#7  0x000000000040a690 in cm_msg_get_logfile (fac=<optimized out>, t=<optimized out>, 
filename=0x7ffcc536d690,
    linkname=0x7ffcc536d6b0, linktarget=0x7ffcc536d6d0)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:573
#8  0x000000000041a307 in cm_msg_log (message_type=1, facility=0x46db0e "midas",
    message=0x7e4290 "[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...") at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:685
#9  0x0000000000421fcd in cm_msg_flush_buffer () at /usr/include/c++/11/bits/basic_string.h:194
#10 0x00007fbdeac574dd in __run_exit_handlers () from /lib64/libc.so.6
#11 0x00007fbdeac57620 in exit () from /lib64/libc.so.6
#12 0x0000000000430f7a in db_lock_database (hDB=hDB@entry=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:2499
#13 0x0000000000437e9c in db_find_key (subhKey=0x7ffcc536da04, key_name=0x476a21 "/Alarms/Alarms", hKey=0, 
hDB=1)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4099
#14 db_find_key (hDB=1, hKey=hKey@entry=0, key_name=key_name@entry=0x476a21 "/Alarms/Alarms",
    subhKey=subhKey@entry=0x7ffcc536da04) at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/odb.cxx:4075
#15 0x0000000000455fd2 in al_check () at 
/sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/alarm.cxx:614
#16 0x000000000041ff85 in cm_periodic_tasks ()
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5596
#17 0x00000000004235c5 in cm_yield (millisec=millisec@entry=1000)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/src/midas.cxx:5676
#18 0x00000000004065c2 in main (argc=<optimized out>, argv=0x7ffcc536e628)
    at /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/midas_fork/progs/mserver.cxx:295
(gdb)

odbedit minor quality of life

Ok, accepted, done and pushed.

Stefan

Lukas Gerritzen wrote:

I have made two minor quality of life changes to odbedit.

cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.

Here's the diff:

@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)

       /* cd */
       else if (param[0][0] == 'c' && param[0][1] == 'd') {
-         compose_name(pwd, param[1], str);
+         if (strlen(param[1]) == 0)
+            strcpy(str, "/");
+         else
+            compose_name(pwd, param[1], str);

          status = db_find_key(hDB, 0, str, &hKey);

@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)

    cm_disconnect_experiment();

+   printf("\n");
    exit(EXIT_SUCCESS);
 }

Please consider incorporating those changes to odbedit.

Lukas

odbedit minor quality of life

I have made two minor quality of life changes to odbedit.

cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.

Here's the diff:

@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)

       /* cd */
       else if (param[0][0] == 'c' && param[0][1] == 'd') {
-         compose_name(pwd, param[1], str);
+         if (strlen(param[1]) == 0)
+            strcpy(str, "/");
+         else
+            compose_name(pwd, param[1], str);

          status = db_find_key(hDB, 0, str, &hKey);

@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)

    cm_disconnect_experiment();

+   printf("\n");
    exit(EXIT_SUCCESS);
 }

Please consider incorporating those changes to odbedit.

Lukas

Difficulty running MIDAS on Rocky 9.4

> I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail...

Recovery from this is:
- stop all midas programs (actually they should have all crashed by now)
- identify the ODB semaphore with: ipcs -s -t
- remove the ODB semaphore with: ipcrm sem <semid>
- where <semid> is from the first column of ipcs
- keep deleting semaphores until odbedit works.
- if you delete extra midas sempahores, odbedit will recreate them
- if you delete non-midas semaphores, oh, well...

Little bit better steps for this recovery may be written up by Suzannah in this forum
or in the midas wiki... good luck finding them.

K.O.

Difficulty running MIDAS on Rocky 9.4

> Basically I think Rocky 9.4 is a red herring.

This is what likely happened:
- some program crashed while holding the ODB lock semaphore (by ctrl-C at the wrong time or by kill -KILL at the wring time)
- semaphore is locked with a flag "unlock if this program stops"
- this is supposed to ensure ODB lock semaphore never gets stuck in the locked state
- (there is no code in MIDAS to unlock the ODB lock semaphore without locking it first)
- we have observed a malfunction in the Linux kernel, where this automatic unlock does not happen.
- it is rare and so far cannot be reproduced. you can find more about it by searching this forum.
- I think it is a bug in the System-V semaphore code or in the Linux "program stop" code (a path where they fail to call the semaphore unlock handler, and who knows what
other handlers).
- System-V semaphores are very obsolete, replaced by POSIX semaphores.
- POSIX semaphores do not have the "unlock if this program stops" magic, the user (MIDAS) is responsible with detecting that the program who locked the semaphore is gone
and and with taking corrective action, i.e. release the lock, automatically.

I do not know if this problem with System-V semaphores is in the generic Linux kernel, or if it is specific
to the Red Hat kernels (they are known to have many patches, deviating quite far from vanilla kernels).

I do not know if this problem is somehow sensitive to virtual machines.

So yes/no, Red Hat derived linux on a virtual machine could be where this problem happens more often that elsewhere.

K.O.

Difficulty running MIDAS on Rocky 9.4

I read these error messages. There is no ODB corruption. ODB semaphore is locked and all midas programs will fail, they will timeout trying to get the lock, report the timeout, then it looks like a bug was introduced where instead of hard exit or abort() they attempt a clean shutdown which crashes from a recursive call in db_lock_database(). Amy's 
core dump should confirm this.

K.O.


> > > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > > Machine and are getting a persistent error when we run mserver.  As far as I 
> > > know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> > > may have more to say on this.
> > 
> > For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> > 
> > We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> > 
> > So the questions are:
> > * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> > * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> > * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> > * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?
> 
> Here's what happens when I try to run odbedit:
> 
> [lekhraj@sdfcdmsdaq setup]$ odbedit
> [ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
> [ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [ODBEdit,INFO] Corrected 1 ODB entries
> [ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
> [ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
> [ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
> 
> And mhttpd:
> 
> [lekhraj@sdfcdmsdaq setup]$ mhttpd
> [mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
> [mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
> [mhttpd,INFO] Corrected 1 ODB entries
> [mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
> [mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
> [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
> Password protection is off
> Hostlist off, connections from anywhere will be accepted
> Listening on "http://localhost:8080", passwords OFF, hostlist OFF
> Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
> bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
> [mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
> [mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
> Aborted (core dumped)
> [lekhraj@sdfcdmsdaq setup]$
> 
> We have been running everything as a single user, the user who cloned the repositories and owns the directories.
> 
> We did follow the corrupted-ODB cleanup instructions.
> 
> [lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
> total 1.3M
> -rw------- 1 lekhraj dm 1.2M Oct  8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
> -rw------- 1 lekhraj dm 114K Oct  7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
> 
> [lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
> -rw-r--r-- 1 lekhraj dm 1.2M Oct  8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
> -rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM

Difficulty running MIDAS on Rocky 9.4

> I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.

I cannot read the core dump without the corresponding executable (and likely all it's shared libraries).

It is best if you run gdb and extract the stack traces on your end.

In case you are not familiar with gdb:

gdb mserver core # start gdb
bt # stack trace of crashed thread
info thr # get list of threads
thr 1
bt
thr 2
bt
# etc, get stack trace of each thread, there should not be too many of them

K.O.

Difficulty running MIDAS on Rocky 9.4

> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.

I've uploaded the current core dump at: https://gitlab.com/det-lab/coredumps#.

This was done using the "CDMS" version of MIDAS, I'll compile the current MIDAS repository just to be sure we're seeing 
the same error and report back here!

Difficulty running MIDAS on Rocky 9.4

> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.  As far as I 
> > know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> > may have more to say on this.
> 
> For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.
> 
> We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.
> 
> So the questions are:
> * Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
> * If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
> * What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
> * Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?

Here's what happens when I try to run odbedit:

[lekhraj@sdfcdmsdaq setup]$ odbedit
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 481823 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481823' for client 'ODBEdit' because it is not connected to ODB
[ODBEdit,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 481823 does not exist
[ODBEdit,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[ODBEdit,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

And mhttpd:

[lekhraj@sdfcdmsdaq setup]$ mhttpd
[mhttpd,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 'ODBEdit', index 0 because process pid 601054 does not exists
[mhttpd,INFO] Removed open record flag from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Removed exclusive access mode from "/Experiment/Security/RPC hosts/Allowed hosts"
[mhttpd,INFO] Corrected 1 ODB entries
[mhttpd,INFO] Deleted entry '/System/Clients/601054' for client 'ODBEdit' because it is not connected to ODB
[mhttpd,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer because process pid 601054 does not exist
[mhttpd,INFO] ODB subtree /Runinfo corrected successfully
Password protection is off
Hostlist off, connections from anywhere will be accepted
Listening on "http://localhost:8080", passwords OFF, hostlist OFF
Listening on "http://[::1]:8080", passwords OFF, hostlist OFF
bm_lock_buffer: Lock buffer "SYSMSG" is taking longer than 1 second!
[mhttpd,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, timeout 10000 ms, exiting...
[mhttpd,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while already inside db_{lock,unlock}_database(). Maybe this is a call from a signal handler. Cannot continue, aborting...
Aborted (core dumped)
[lekhraj@sdfcdmsdaq setup]$

We have been running everything as a single user, the user who cloned the repositories and owns the directories.

We did follow the corrupted-ODB cleanup instructions.

[lekhraj@sdfcdmsdaq setup]$ ls -lh /dev/shm
total 1.3M
-rw------- 1 lekhraj dm 1.2M Oct  8 14:13 17468_test_ODB__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_
-rw------- 1 lekhraj dm 114K Oct  7 14:06 17468_test_SYSMSG__sdf_home_l_lekhraj_packages_SuperCDMS_DAQ_MidasDAQ_online_

[lekhraj@sdfcdmsdaq setup]$ ls -lh ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ALARM.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ELOG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.HISTORY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.LAZY.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.MSG.SHM
-rw-r--r-- 1 lekhraj dm 1.2M Oct  8 14:12 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.ODB.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSMSG.SHM
-rw-r--r-- 1 lekhraj dm    0 Oct  3 08:46 /sdf/home/l/lekhraj/packages/SuperCDMS_DAQ/MidasDAQ/online/.SYSTEM.SHM

Difficulty running MIDAS on Rocky 9.4

We run Midas with no problems on Rocky 9.4, although not in a virtual machine.  We're very close to the head of `develop`.

I'm fairly sure I've seen an error like this before.  I didn't pay it much attention because it was transitory and I was doing something weird at the time - probably 
stepping through with a debugger and hit a timeout.  It was definitely about an ODB semaphore but I can't recall if it was about a recursive call.

Basically I think Rocky 9.4 is a red herring.  Do you have another crashed copy of mserver/mhttpd running somewhere and stuck in limbo?  If it's a virtual machine 
are you sharing the shared memory location with the host, and running another midas on there?


> > We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> > Machine and are getting a persistent error when we run mserver.
> >
> > [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> > timeout 10000 ms, exiting...
> > db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> > already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> > handler. Cannot continue, aborting...
> > Aborted (core dumped)
> 
> This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).
> 
> I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
> mserver (just while we debug this problem).
> 
> Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
> is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.
> 
> If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
> semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
> shared memory.
> 
> If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.
> 
> K.O.

Difficulty running MIDAS on Rocky 9.4

> We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> Machine and are getting a persistent error when we run mserver.
>
> [mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
> timeout 10000 ms, exiting...
> db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
> already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
> handler. Cannot continue, aborting...
> Aborted (core dumped)

This is super very bad. Since you have a core dump, please post the stack trace here (or email it to me).

I probably cannot debug your private version of midas and I will recommend that you install and run vanilla midas 
mserver (just while we debug this problem).

Let's look at the core dump stack trace first, but likely we see a problem with System-V semaphores and hopefully it 
is not some breakage due to Red Hat bogosity or due to something specific to running on a virtual machine.

If indeed this is Linux-kernel level breakage of System-V semaphores, solution would be to start using Posix 
semaphores, something I wanted to do for a long time. We already switched MIDAS shared memory from System-V to Posix 
shared memory.

If we are lucky it is just one more crasher bug in ODB. Let's see that core dump stack trace.

K.O.

Difficulty running MIDAS on Rocky 9.4

> We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
> Machine and are getting a persistent error when we run mserver.  As far as I 
> know there are minimal changes between this and the MIDAS branch, but Ben Smith 
> may have more to say on this.

For reference, "the SuperCDMS version of MIDAS" is just a fork that no longer has any meaningful differences vs the main MIDAS repo, but we only pull updates infrequently after testing a bunch. We last pulled from the develop branch in November 2023. But that should be irrelevant here as semaphore code hasn't been touched for a very long time.

We're running Alma 9.4 on a machine at TRIUMF and the same version of midas works fine there (Amy, you may already have access to scdms-zeus). I believe Alma and Rocky should be basically identical for this.

So the questions are:
* Have you tried other midas programs, or only mserver? E.g. did odbedit and mhttpd work?
* If other programs work, have you been running them all as the same user? In particular, if you ran one program as root and another as an unprivileged user, then you will likely get odd permissions issues.
* What do you see if you run `ls -l /dev/shm` and `ls -l ~/packages/SuperCDMS_DAQ/MidasDAQ/online/.*SHM`? (Or wherever your online dir is for the 2nd one).
* Did you follow the full instructions for recovering from a corrupt ODB? https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB   In particular the bit about running odbinit with the --cleanup flag?

Difficulty running MIDAS on Rocky 9.4

We're trying to install the SuperCDMS version of MIDAS on a Rocky 9.4 Virtual 
Machine and are getting a persistent error when we run mserver.  As far as I 
know there are minimal changes between this and the MIDAS branch, but Ben Smith 
may have more to say on this.

[lekhraj@sdfcdmsdaq online]$ mserver
mserver started interactively
[mserver,INFO] Client 'ODBEdit' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481051 does not exist
mserver will listen on TCP port 1175
[mserver,ERROR] [odb.cxx:2498:db_lock_database,ERROR] cannot lock ODB semaphore, 
timeout 10000 ms, exiting...
[mserver,ERROR] [midas.cxx:2205:cm_check_connect,ERROR] cm_disconnect_experiment 
not called at end of program
db_lock_database: Detected recursive call to db_{lock,unlock}_database() while 
already inside db_{lock,unlock}_database(). Maybe this is a call from a signal 
handler. Cannot continue, aborting...
Aborted (core dumped)

We thought perhaps we had a corrupted ODB file, so we removed the ODB file and 
tried to create a new one (sized correctly for our experiment):

[lekhraj@sdfcdmsdaq online]$ odbedit -s 50000000
[ODBEdit,ERROR] [odb.cxx:2052:db_open_database,ERROR] Removed ODB client 
'mserver', index 0 because process pid 481326 does not exists
[ODBEdit,INFO] Removed open record flag from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Removed exclusive access mode from "/Experiment/Security/RPC 
hosts/Allowed hosts"
[ODBEdit,INFO] Corrected 1 ODB entries
[ODBEdit,INFO] Deleted entry '/System/Clients/481326' for client 'mserver' 
because it is not connected to ODB
[ODBEdit,INFO] Client 'mserver' on buffer 'SYSMSG' removed by bm_open_buffer 
because process pid 481326 does not exist
[local:test:S]/>Bus error (core dumped)

Python frontend rate limitations?

> in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer.
> 
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.

I've now added better support for numpy arrays in the python code that encodes a `midas.event.Event` object. If you use the "correct" numpy data type then you can get vastly improved performance as numpy already stores the data in memory in the format that we need.

In your example, if you change
        self.zero_buffer = [0] * self.total_data_size
to 
        self.zero_buffer = np.ndarray(self.total_data_size, np.int16)

then the max data rate of the frontend goes from 330MB/s to 7600MB/s on my laptop (a factor 20 improvement from one line of code!) 

To ensure you're using the optimal numpy dtype for your bank, you can reference a dict called `midas.tid_np_formats`. For example `midas.tid_np_formats[midas.TID_SHORT]` is equivalent to `np.int16`. If you use an int16 array and write it as a TID_SHORT bank, then we'll use the fast path. If there is a mismatch, we'll have to do type conversions and will end up on the slow path.

Clean up compiler warning in manalyzer

> > I like the look of std::format, looks cleaner than string streams
> 
> I fully agree. String streams is a pain if you want to do zero-leading hex output mixed with decimal output. Yes it's easier to read if you don't know printf syntax,
> but 10-20 times more chars to write and not necessarily cleaner.
>

IMO c++ string streams formatting is optimized for "hello world" and is useless for printing hex numbers, table-formatted data and generally anything real-life.

plus the borked std::to_string() (it takes a global lock for the "C" locale), "fixed" it by introducing std::to_chars() in C++17,
with "ultimate fix" in std::format in C++26.

no question why C++ has the bad reputation. for a "done right" example, take a look at the Go standard library.

> 
> Probable is that we would have to convert about a few thousand of sprintf's() in midas.
> 

surprising few bare sprintf() remaining in MIDAS, most of them overflow-safe and most of them to be converted to msprintf().

K.O.

Can we convert the .mid file into .root file

"Can we convert the .mid file into .root file".

yes, you can, but the operation is under-defined. it's like asking "can I convert these stones into houses". the answer is "yes", but it involves 
more than running a universal conversion program.

For this reason, I recommend against converting midas files "to root". for some types of midas data such a conversion makes no sense (i.e. alpha-g 
streamed udp packets with chopped compressed waveforms).

I recommend that you analyze you data in the midas analyzer. You can start with manalyzer_example_root.cxx,
it shows how to create a ROOT histogram, how to access midas event bank data and call the TH1 "Fill" method.

Instead of filling histograms in the analyzer, you can create a ROOT TTree and fill it with data from midas data banks,
effectively you will create your own custom converter from midas to root.

The key thing is that it has to be a custom converter, because only you know the meaning of midas bank data
and how it should be best stored in a root tree.

K.O.

> Where is the example of error handling?

#include "mscbxx.h"
#include "mexcept.h"

...
   try {
   
      // connect to node 10 at submaster mscb123
      midas::mscb m("mscb123", 10);

      // print a variable
      std::cout << m["Input0"] << std::endl;
   
   } catch (mexception e) {
      std::cout << e << std::endl; // simply print exception
   }
...

Goto page Previous 1, 2, 3 ... 16, 17, 18 ... 157, 158, 159 Next

ELOG V3.1.4-2e1708b5