Back Midas Rome Roody Rootana
  Midas DAQ System, Page 115 of 121  Not logged in ELOG logo
ID Date Author Topic Subjectup
  865   19 Feb 2013 Wes GohnForumsend_tcp error

Thank you for the help. As it turns out, the problem was due to the fact that we were compiling MIDAS on our 64 bit backend machine, but one of the frontend machines is 32 bit. The problem was resolved by compiling a 32 bit version of MIDAS in
addition to the 64 bit version.
  101   20 Nov 2003 Konstantin Olchanski set-uid-root midas programs
I see that MIDAS installs several set-uid-root programs into /usr/local/bin.
In this age and time of evil computer hackers, this is not a good idea and
we should Do Something (TM) about it. Here is my risk assessment:

[olchansk@midtis06 midas]$ ls -l /usr/local/bin | grep wsr
-rwsr-sr-x    1 root     root        25811 Nov 20 09:27 dio
-rwsr-sr-x    1 root     root       344553 Nov 20 09:27 mhttpd
-rwsr-sr-x    1 root     root        70736 Nov 20 09:27 webpaw

dio- is required to be setuid-root to gain I/O permissions. I looked at it a
few times, and it is probably safe, but I would like to get a second
opinion. Stephan, can you should it to your local security geeks?

mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
stick at. Why is it suid-root anyway?

webpaw- what is it?!?

K.O.
  102   20 Nov 2003 Stefan Ritt set-uid-root midas programs
> dio- is required to be setuid-root to gain I/O permissions. I looked at it a
> few times, and it is probably safe, but I would like to get a second
> opinion. Stephan, can you should it to your local security geeks?
> 
> mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
> stick at. Why is it suid-root anyway?
> 
> webpaw- what is it?!?

dio was written by Pierre. 

mhttpd and webpaw both are web servers. webpaw is used to display PAW 
pictures over the web. If you run these programs at a port <1024, and most 
people do run them at port 80 (at least at PSI), they need to be setuid-root. 
Unless you know a better way to do that...
  2115   02 Mar 2021 Konstantin OlchanskiInfoshortest possible sleep
since I am implementing a polled equipment, I was curious what is the smallest possible sleep time on current computers.

in current UNIX, there are 2 system calls available for sleeping: select() (with microsecond granularity) and nanosleep() (with nanosecond granularity).

So I wrote a little test program to check it out (progs/test_sleep).

First, Linux result using select(). Typical run on AMD 3700X CPU (4.1 GHz turbo boost) with Ubuntu LTS 20, linux kernel 5.8:

daq13:midas$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1003368.855 usec actual, 100336.885 usec actual per loop, oversleep 336.885 usec, 0.3%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1008512.020 usec actual, 10085.120 usec actual per loop, oversleep 85.120 usec, 0.9%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1062137.842 usec actual, 1062.138 usec actual per loop, oversleep 62.138 usec, 6.2%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1528650.999 usec actual, 152.865 usec actual per loop, oversleep 52.865 usec, 52.9%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  6250898.123 usec actual, 62.510 usec actual per loop, oversleep 52.510 usec, 525.1%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 54056918.144 usec actual, 54.057 usec actual per loop, oversleep 53.057 usec, 5305.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,   210875.988 usec actual, 0.211 usec actual per loop, oversleep 0.111 usec, 110.9%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total,   204804.897 usec actual, 0.205 usec actual per loop, oversleep 0.195 usec, 1948.0%
daq13:midas$ 

How to read this:

First line is 10 sleeps of 100 ms, for a total of 1 sec. this actually sleeps for a bit longer,
average over-sleep is 300 usec out of 100 ms is 0.3%.

Next few lines use progressively shorter sleep, 10 ms, 1 ms and 0.1 ms. over-sleep is consistently around 50-60 usec,
which I conclude to be this linux sleep granularity.

Last two lines try sleep for 0.1 usec and 0.01 usec, resulting in a zero-time sleep of select(),
so we just measure the average time cost of a linux syscall, around 200 ns in this machine.

Going to different machines:

Intel E-2236 (4.8 GHz tutboboost), Ubuntu LTS 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 400 ns.
Intel E-2226G (same, see arc.intel.com), CentOS-7, linux kernel 3.10: over-sleep is 60 usec, zero-sleep is 600 ns.
VME processor (2 GHz Intel T7400), Ubuntu 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 1700 ns.

This is pretty consistent, select() over-sleep is 60 usec on all hardware, zero-sleep tracks CPU GHz ratings.

Next, MacOS result, MacBookAir2020, MacOS 10.15.7, CPU 1.2 GHz i7-1060G7:

4ed0:midas olchansk$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1031108.856 usec actual, 103110.886 usec actual per loop, oversleep 3110.886 usec, 3.1%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1091104.984 usec actual, 10911.050 usec actual per loop, oversleep 911.050 usec, 9.1%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1270800.829 usec actual, 1270.801 usec actual per loop, oversleep 270.801 usec, 27.1%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1370345.116 usec actual, 137.035 usec actual per loop, oversleep 37.035 usec, 37.0%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  1706473.112 usec actual, 17.065 usec actual per loop, oversleep 7.065 usec, 70.6%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total,  5150341.034 usec actual, 5.150 usec actual per loop, oversleep 4.150 usec, 415.0%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,   595654.011 usec actual, 0.596 usec actual per loop, oversleep 0.496 usec, 495.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total,   591560.125 usec actual, 0.592 usec actual per loop, oversleep 0.582 usec, 5815.6%
4ed0:midas olchansk$ 

things are quite different here, OS is Mach microkernel with an oldish FreeBSD UNIX single-server (from NextSTEP),
so the sleep granularity is different, better than linux. zero-sleep still measures the syscall time, 600 ns on this machine.

Next we measure the same using the nansleep() syscall.

daq13:midas$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1004133.940 usec actual, 100413.394 usec actual per loop, oversleep 413.394 usec, 0.4%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1046117.067 usec actual, 10461.171 usec actual per loop, oversleep 461.171 usec, 4.6%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1096894.979 usec actual, 1096.895 usec actual per loop, oversleep 96.895 usec, 9.7%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1526744.843 usec actual, 152.674 usec actual per loop, oversleep 52.674 usec, 52.7%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  6250154.018 usec actual, 62.502 usec actual per loop, oversleep 52.502 usec, 525.0%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 53344123.125 usec actual, 53.344 usec actual per loop, oversleep 52.344 usec, 5234.4%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 52641665.936 usec actual, 52.642 usec actual per loop, oversleep 52.542 usec, 52541.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 52637501.001 usec actual, 52.638 usec actual per loop, oversleep 52.628 usec, 526275.0%
daq13:midas$ 

Here everything is simple. sleep longer than 1000 usec works the same as select(), sleep for shorter than 100 usec sleeps for 52 usec, regardless of what 
we ask for.

MacOS does no better, long sleeps are same as select(), sleeps is 1 usec or less sleep for too long. no improvement over select().

4ed0:midas olchansk$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1023327.827 usec actual, 102332.783 usec actual per loop, oversleep 2332.783 usec, 2.3%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1130330.086 usec actual, 11303.301 usec actual per loop, oversleep 1303.301 usec, 13.0%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1333846.807 usec actual, 1333.847 usec actual per loop, oversleep 333.847 usec, 33.4%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1402330.160 usec actual, 140.233 usec actual per loop, oversleep 40.233 usec, 40.2%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  2034706.831 usec actual, 20.347 usec actual per loop, oversleep 10.347 usec, 103.5%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total,  6646192.074 usec actual, 6.646 usec actual per loop, oversleep 5.646 usec, 564.6%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,  7556284.189 usec actual, 7.556 usec actual per loop, oversleep 7.456 usec, 7456.3%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 15720005.035 usec actual, 15.720 usec actual per loop, oversleep 15.710 usec, 157100.1%
4ed0:midas olchansk$ 

On Linux, strace tells us that the actual syscall behind nanosleep() is this:
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=10000}, 0x7fffc159e200) = 0

Let's try it directly... result is the same.
Let's try it with CLOCK_MONOTONIC... result is the same.

The man page of clock_nanosleep() specifies that this syscall always suspends the calling thread,
so what we see here is the Linux scheduler tick size.

Bottom line.

On current linux, shortest sleep is around 100 usec both select() and nanosleep().
On MacOS, shortest sleep is down to 5 usec using select(), but I cannot tell if CPU sleeps or busy-loops.

select() is still the best syscall for sleeping.

K.O.
  2116   02 Mar 2021 Stefan RittInfoshortest possible sleep
Why do you need that? Periodic equipment typically runs ever ten seconds or so, meaning one can do this easily in a scheduler.

For polled equipment, you don't want to sleep at all. Because if you sleep, you might miss an event. That's why I put my poll in mfe.c into a for() loop. No 
sleep, maximum polling rate. I just double checked on my macbook air. 

- If poll is always false (no event available), the loop executes 50M times in 100ms (calibrated during startup of the frontend). That means one iteration 
takes 2ns (!). So if an event occurs, the readout is started with a 2ns overhead. No sleep can beat that. In a real world application, one has to add of course 
the VME access or so to poll for the event.

- If poll is always true, the framework generates about 700k events each second (returning jus a few bytes of event data).

So if one adds any sleep here, things can get only worse, so I don't see the point for that. Of course polling eats one kernel at 100%, but these days every 
CPU has more than one, even my 800 MHz Xilinx embedded ARM CPU (Zynq).

Best,
Stefan
  2117   03 Mar 2021 Konstantin OlchanskiInfoshortest possible sleep
> Why do you need that?

UNIX/POSIX advertises functions for sleeping in microseconds and nanoseconds,
for sure it is interesting to know what they actually do and what happens
when you ask them to sleep for 1 microsecond or 1 nanosecond.

To sleep or not to sleep that is a question.

But if I do decide to sleep, and I call the sleep function, I want to know what actually happens.

Now I do and I share it with all.

On current Linux, shortest sleep is around 60 usec. select() with sleep
shorter than that will not sleep at all, nanosleep() will always sleep for
the shortest amount.

P.S. For fans of interrupts ("because they are fast"), sleep waiting for interrupt
probably has same latency/granularity as above (60 usec), so if I drive a DMA engine
and I except the DMA transfer to complete under 60 usec, I should use a busy loop
to poll the "DMA done" bit instead of going to sleep and wait for the DMA interrupt.

K.O.
  158   13 Oct 2004 Konstantin OlchanskiBug Reportsilly odbedit "rename Display xxx/yyy"
odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
with a slash in the name) and this key cannot be deleted or renamed...
K.O.
  159   13 Oct 2004 Stefan RittBug Reportsilly odbedit "rename Display xxx/yyy"
> odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
> with a slash in the name) and this key cannot be deleted or renamed...
> K.O.

"rename" is "rename", not "mv" under Unix. If you want this functionality, put it
in and don't complain!
  758   10 May 2011 Jianglai LiuForumsimple example frontend for V1720
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai
  759   10 May 2011 Stefan RittForumsimple example frontend for V1720

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


During "Calibrating", the framework calls your poll_event() routine. You code there accesses for the first time the VME crate and probably gets stuck.
  760   10 May 2011 Pierre-Andre AmaudruzForumsimple example frontend for V1720

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Under the drivers/vme you can find code for the v1720.c (VME access) and ov1720.c
(A2818/A3818 PCIe optical link access). For testing the hardware, we use this code compiled and linked
with MAIN_ENABLE to confirm its functionality. You may want to do the same for your USB. Once this
is under control, the Midas frontend implementation using the same driver shouldn't give you trouble.
  761   18 May 2011 Jimmy NgaiForumsimple example frontend for V1720

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy
  762   24 May 2011 Jianglai LiuForumsimple example frontend for V1720
Thanks all for the kind help. This did point me to the right direction. I was now able to make v1720.c as well as my MIDAS frontend (thanks to
Jimmy's example) talking to V1720, and read out the waveform bank.

However the readout values did not seem quite right. I fed in a PMT-like pulse of about 0.1 V and 50 ns wide, with an external trigger just in time.
However, the readout by both v1720.c stand-alone code, and my midas frontend seemed to be flat noise.

I tried to play with the post trigger value, as well as the DAC setting of V1720. None seemed to help.

BTW I tested my V1720 board functionality by using the CAEN windows software (CAENScope and WaveDump). They worked just fine.

Any suggestions? Attached is my modified v1720.c code.


Pierre-Andre Amaudruz wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Under the drivers/vme you can find code for the v1720.c (VME access) and ov1720.c
(A2818/A3818 PCIe optical link access). For testing the hardware, we use this code compiled and linked
with MAIN_ENABLE to confirm its functionality. You may want to do the same for your USB. Once this
is under control, the Midas frontend implementation using the same driver shouldn't give you trouble.
  825   10 Aug 2012 Carl BlaksleyForumsimple example frontend for V1720

Jimmy Ngai wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy


Jimmy,

How exactly did you link the CAENVMElib with your frontend? That is the part which I can not seem to replicate using your example frontend!

Thanks,
-Carl
  826   12 Aug 2012 Jimmy NgaiForumsimple example frontend for V1720

Carl Blaksley wrote:

Jimmy Ngai wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy


Jimmy,

How exactly did you link the CAENVMElib with your frontend? That is the part which I can not seem to replicate using your example frontend!

Thanks,
-Carl


Hi Carl,

Attached is a cut-down version of my original Makefile just for demonstrating how to link the CAENVMElib. I didn't test it for bugs. Please make sure the libCAENVME.so is in your library path.

Jimmy
  1942   10 Jun 2020 Ivo SchulthessForumslow-control equipment crashes when running multi-threaded on a remote machine
Dear all

To reduce the time needed by Midas between runs, we want to change some of our periodic equipment to multi-threaded slow-control equipment. To do that I wanted to start from 
the slowcont with the multi/hv class driver and the nulldev device driver and null bus driver. The example runs fine as it is on the local midas machine and also on remote 
machines. When adding the DF_MULTITHREAD flag to the device driver list, it does not run anymore on remote machines but aborts with the following assertion:

scfe: /home/neutron/packages/midas/src/midas.cxx:1569: INT cm_get_path(char*, int): Assertion `_path_name.length() > 0' failed.

Running the frontend with GDB and set a breakpoint at the exit leads to the following: 

(gdb) where
#0  0x00007ffff68d599f in raise () from /lib64/libc.so.6
#1  0x00007ffff68bfcf5 in abort () from /lib64/libc.so.6
#2  0x00007ffff68bfbc9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007ffff68cde56 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000041efbf in cm_get_path (path=0x7fffffffd060 "P\373g", path_size=256)
    at /home/neutron/packages/midas/src/midas.cxx:1563
#5  cm_get_path (path=path@entry=0x7fffffffd060 "P\373g", path_size=path_size@entry=256)
    at /home/neutron/packages/midas/src/midas.cxx:1563
#6  0x0000000000453dd8 in ss_semaphore_create (name=name@entry=0x7fffffffd2c0 "DD_Input", 
    semaphore_handle=semaphore_handle@entry=0x67f700 <multi_driver+96>)
    at /home/neutron/packages/midas/src/system.cxx:2340
#7  0x0000000000451d25 in device_driver (device_drv=0x67f6a0 <multi_driver>, cmd=<optimized out>)
    at /home/neutron/packages/midas/src/device_driver.cxx:155
#8  0x00000000004175f8 in multi_init(eqpmnt*) ()
#9  0x00000000004185c8 in cd_multi(int, eqpmnt*) ()
#10 0x000000000041c20c in initialize_equipment () at /home/neutron/packages/midas/src/mfe.cxx:827
#11 0x000000000040da60 in main (argc=1, argv=0x7fffffffda48)
    at /home/neutron/packages/midas/src/mfe.cxx:2757

I also tried to use the generic class driver which results in the same. I am not sure if this is a problem of the multi-threaded frontend running on a remote machine or is it 
something of our system which is not properly set up. Anyway I am running out of ideas how to solve this and would appreciate any input. 

Thanks in advance,
Ivo
  1944   10 Jun 2020 Konstantin OlchanskiForumslow-control equipment crashes when running multi-threaded on a remote machine
Yes, it is supposed to crash. On a remote frontend, cm_get_path() cannot be used
(we are on a different computer, all filesystems maybe no the same!) and is actually not set and
triggers a trap if something tries to use it. (this is the crash you see).

The caller to cm_get_path() is a MIDAS semaphore function.

And I think there is a mistake here. It is unusual for the driver framework to use a semaphore. For multithreaded
protection inside the frontend, a mutex would normally be used. (and mutexes do not use cm_get_path(), so
all is good). But if a semaphore is used, than all frontends running on the same computer become
serialized across the locked section. This is the right thing to do if you have multiple frontends
sharing the same hardware, i.e. a /dev/ttyUSB serial line, but why would a generic framework function
do this. I am not sure, I will have to take a look at why there is a semaphore and what it is locking/protecting.

(In midas, semaphores are normally used to protect global memory, such as ODB, or global resources, such as alarms,
against concurrent access by multiple programs, but of course that does not work for remote frontends,
they are on a different computer! semaphores only work locally, do not work across the network!)

K.O.

> 
> scfe: /home/neutron/packages/midas/src/midas.cxx:1569: INT cm_get_path(char*, int): Assertion `_path_name.length() > 0' failed.
> 
> Running the frontend with GDB and set a breakpoint at the exit leads to the following: 
> 
> (gdb) where
> #0  0x00007ffff68d599f in raise () from /lib64/libc.so.6
> #1  0x00007ffff68bfcf5 in abort () from /lib64/libc.so.6
> #2  0x00007ffff68bfbc9 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
> #3  0x00007ffff68cde56 in __assert_fail () from /lib64/libc.so.6
> #4  0x000000000041efbf in cm_get_path (path=0x7fffffffd060 "P\373g", path_size=256)
>     at /home/neutron/packages/midas/src/midas.cxx:1563
> #5  cm_get_path (path=path@entry=0x7fffffffd060 "P\373g", path_size=path_size@entry=256)
>     at /home/neutron/packages/midas/src/midas.cxx:1563
> #6  0x0000000000453dd8 in ss_semaphore_create (name=name@entry=0x7fffffffd2c0 "DD_Input", 
>     semaphore_handle=semaphore_handle@entry=0x67f700 <multi_driver+96>)
>     at /home/neutron/packages/midas/src/system.cxx:2340
> #7  0x0000000000451d25 in device_driver (device_drv=0x67f6a0 <multi_driver>, cmd=<optimized out>)
>     at /home/neutron/packages/midas/src/device_driver.cxx:155
> #8  0x00000000004175f8 in multi_init(eqpmnt*) ()
> #9  0x00000000004185c8 in cd_multi(int, eqpmnt*) ()
> #10 0x000000000041c20c in initialize_equipment () at /home/neutron/packages/midas/src/mfe.cxx:827
> #11 0x000000000040da60 in main (argc=1, argv=0x7fffffffda48)
>     at /home/neutron/packages/midas/src/mfe.cxx:2757
> 
> I also tried to use the generic class driver which results in the same. I am not sure if this is a problem of the multi-threaded frontend running on a remote machine or is it 
> something of our system which is not properly set up. Anyway I am running out of ideas how to solve this and would appreciate any input. 
> 
> Thanks in advance,
> Ivo
  1945   10 Jun 2020 Stefan RittForumslow-control equipment crashes when running multi-threaded on a remote machine
Few comments:

- As KO write, we might need semaphores also on a remote front-end, in case several programs share the same hardware. So it should work and cm_get_path() should not just exit

- When I wrote the multi-threaded device drivers, I did use semaphores instead of mutexes, but I forgot why. Might be that midas semaphores have a timeout and mutexes not, or 
something along those lines.

- I do need either semaphores or mutexes since in a multi-threaded slow-control font-end (too many dashes...) several threads have to access an internal data exchange buffer, which 
needs protection for multi-threaded environments.

So we can how either fix cm_get_path() or replace all semaphores in with mutexes in midas/src/device_driver.cxx. I have kind of a feeling that we should do both. And what about 
switching to c++ std::mutex instead of pthread mutexes?

Stefan
  1946   12 Jun 2020 Ivo SchulthessForumslow-control equipment crashes when running multi-threaded on a remote machine
Thanks you two once again for the very fast answers. I tested the example on the local machine and it works perfectly fine. In the meantime I also created two new drivers for our devices 
and everything works with them, the improvement in time is significant and I will create drivers for all our devices where possible. If they are in a working state I can also provide 
them to add to the Midas drivers. Of course if it would be possible to run the front-end also on our remote machines this would be even better. I am not experienced in any multi-threaded 
programming but if I can provide any help or input, please let me know. 

Have a great weekend,
Ivo
  1330   01 Dec 2017 Frederik WautersBug Reportsmall bug in mfe.c init
There is a small bug in the mfe.c initialization for the EQ_POLLED mode. There 
is a routine where the number of polls fitting in eq_info->period is counted:


         count = 1;
         do {
            if (display_period)
               printf(".");

            start_time = ss_millitime();

            poll_event(equipment[idx].info.source, (INT)count, TRUE);

            delta_time = ss_millitime() - start_time;

            ...

            if (delta_time > 0)
               count = count * eq_info->period / delta_time;
            else
               count *= 100;
            
            // avoid overflows
            if (count > 2147483647.0) {
               count = 2147483647.0;
               break;
            }
            
         } while (delta_time > eq_info->period * 1.2 || delta_time < eq_info-
>period * 0.8);

As "start_time = ss_millitime();" resets "delta_time" each time, only the 
"avoid overflows" addition saves the day. 

start_time = ss_millitime(); show be out of the loop.
ELOG V3.1.4-2e1708b5