Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  10 Jun 2020, Ivo Schulthess, Forum, slow-control equipment crashes when running multi-threaded on a remote machine 
    Reply  10 Jun 2020, Konstantin Olchanski, Forum, slow-control equipment crashes when running multi-threaded on a remote machine 
       Reply  10 Jun 2020, Stefan Ritt, Forum, slow-control equipment crashes when running multi-threaded on a remote machine 
          Reply  12 Jun 2020, Ivo Schulthess, Forum, slow-control equipment crashes when running multi-threaded on a remote machine 
Message ID: 1944     Entry time: 10 Jun 2020     In reply to: 1942     Reply to this: 1945
Author: Konstantin Olchanski 
Topic: Forum 
Subject: slow-control equipment crashes when running multi-threaded on a remote machine 
Yes, it is supposed to crash. On a remote frontend, cm_get_path() cannot be used
(we are on a different computer, all filesystems maybe no the same!) and is actually not set and
triggers a trap if something tries to use it. (this is the crash you see).

The caller to cm_get_path() is a MIDAS semaphore function.

And I think there is a mistake here. It is unusual for the driver framework to use a semaphore. For multithreaded
protection inside the frontend, a mutex would normally be used. (and mutexes do not use cm_get_path(), so
all is good). But if a semaphore is used, than all frontends running on the same computer become
serialized across the locked section. This is the right thing to do if you have multiple frontends
sharing the same hardware, i.e. a /dev/ttyUSB serial line, but why would a generic framework function
do this. I am not sure, I will have to take a look at why there is a semaphore and what it is locking/protecting.

(In midas, semaphores are normally used to protect global memory, such as ODB, or global resources, such as alarms,
against concurrent access by multiple programs, but of course that does not work for remote frontends,
they are on a different computer! semaphores only work locally, do not work across the network!)

K.O.

> 
> scfe: /home/neutron/packages/midas/src/midas.cxx:1569: INT cm_get_path(char*, int): Assertion `_path_name.length() > 0' failed.
> 
> Running the frontend with GDB and set a breakpoint at the exit leads to the following: 
> 
> (gdb) where
> #0  0x00007ffff68d599f in raise () from /lib64/libc.so.6
> #1  0x00007ffff68bfcf5 in abort () from /lib64/libc.so.6
> #2  0x00007ffff68bfbc9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
> #3  0x00007ffff68cde56 in __assert_fail () from /lib64/libc.so.6
> #4  0x000000000041efbf in cm_get_path (path=0x7fffffffd060 "P\373g", path_size=256)
>     at /home/neutron/packages/midas/src/midas.cxx:1563
> #5  cm_get_path (path=path@entry=0x7fffffffd060 "P\373g", path_size=path_size@entry=256)
>     at /home/neutron/packages/midas/src/midas.cxx:1563
> #6  0x0000000000453dd8 in ss_semaphore_create (name=name@entry=0x7fffffffd2c0 "DD_Input", 
>     semaphore_handle=semaphore_handle@entry=0x67f700 <multi_driver+96>)
>     at /home/neutron/packages/midas/src/system.cxx:2340
> #7  0x0000000000451d25 in device_driver (device_drv=0x67f6a0 <multi_driver>, cmd=<optimized out>)
>     at /home/neutron/packages/midas/src/device_driver.cxx:155
> #8  0x00000000004175f8 in multi_init(eqpmnt*) ()
> #9  0x00000000004185c8 in cd_multi(int, eqpmnt*) ()
> #10 0x000000000041c20c in initialize_equipment () at /home/neutron/packages/midas/src/mfe.cxx:827
> #11 0x000000000040da60 in main (argc=1, argv=0x7fffffffda48)
>     at /home/neutron/packages/midas/src/mfe.cxx:2757
> 
> I also tried to use the generic class driver which results in the same. I am not sure if this is a problem of the multi-threaded frontend running on a remote machine or is it 
> something of our system which is not properly set up. Anyway I am running out of ideas how to solve this and would appreciate any input. 
> 
> Thanks in advance,
> Ivo
ELOG V3.1.4-2e1708b5