Back Midas Rome Roody Rootana
  Midas DAQ System, Page 90 of 143  Not logged in ELOG logo
    Reply  05 Sep 2024, Ben Smith, Forum, Python frontend rate limitations? 
> What limits the rate that poll_func is called in a python frontend? 

First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.


However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.

I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.


Cheers,
Ben


P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.


P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:

python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
20 loops, best of 5: 15.3 msec per loop
    Reply  05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations? 
Thank you, this was very helpful.

> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

Thanks, I thought that was just for periodic triggering (or at least that's how I've used it in C++ frontends). Changing this allowed me to get past the 100Hz event rate cap I described.

> If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.
> 
> 
> However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.
> 
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
> 
> 
> Cheers,
> Ben


> P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.
I had tested the way you described at first, then later changed

> P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:
> 
> python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
> 20 loops, best of 5: 15.3 msec per loop
    Reply  05 Sep 2024, Stefan Ritt, Forum, Python frontend rate limitations? 
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. 
> You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"

Just for your general understanding: The "period" i the C framework works differently. It calls the poll function with a number, 
and then that number is used in the poll function like (simplified):

poll(INT count) {
   for (i=0 ; i<count ; i++)
      if (new_event())
         return TRUE;
   return FALSE;
}

This ensures that polling is done as quickly as possible, even staying in the same function (poll) rather than called from the 
framework in a loop (which would require a function call to poll each time). The "count" is determined from the framework
during startup of the framework such that the execution time of the poll() routine equals the "period". Like if the period 
is 0.1, the count might be a few millions, so that the poll routine returns immediately when a new event occurs or when
100ms have expired. During the polling the frontend is "dead" meaning it cannot react on run transitions for example. That's
why most experiments use 0.1-0.5 seconds. But this does then NOT mean that you can only have 10-2 events per second, but that
the reaction time if the frontend is at maximum 0.1-0.5 seconds which is acceptable most of the case. 

Due to this design, the C frontend is capable of producing millions of events per second. It took me some while in the early 1990's
to work out that scheme sitting in the "R" trailer at TRIUMF (old guys will remember...).

Best,
Stefan
    Reply  06 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations? 
Thanks for the responses, they were very helpful.

>First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll 
call it as often as possible.

Thanks, this solves the event rate limitation I described. I didn't think to change this because the "period" did not affect the observed rate in C (and now 
I know why thanks to Stefan).

A couple more questions:

1. 
For me, 
python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, 
buf, *arr)"
10 loops, best of 3: 43.7 msec per loop

which suggests my maximum data rate is about 1.25 MB * 1000/43.7 Hz = 23 MB/s (?). But I see data rates up to 60 MB/s with a python frontend. Am I 
misinterpreting the meaning of this result?


2. I can effectively bypass the rate limitations in python by running two concurrent frontends. For example, with one python frontend at best I can generate 
60 MB/s of data (setting "period" to 0 now); but with two frontends I can double this to 120 MB/s. This implies one python frontend is not bottlenecked by 
hardware limitations in my case.

Am I doing something wrong to artificially bottleneck my frontends? Perhaps there's a multi-threading solution I can implement to avoid needing multiple 
frontends?


Thanks,
Jack
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> I'm trying to get a sense of the rate limitations of a python frontend.

1) python is single-threaded, for ultimate performance, a MIDAS frontend (or any DAQ 
application) has to be multithreaded:
a) thread with busy loop read the data and place it into a FIFO
b) thread to read data from FIFO and send it to SYSTEM buffer shared memory or to 
mserver
c) thread to respond to begin-run, end-run, etc RPCs
d) probably a thread to recycle memory from thread (b) back to thread (a) if per-event 
malloc()/free() adds too much overhead

2) data readout. C++ AXI bus access is compiled into 1 instruction and results in 1 AXI 
bus operation. comparable for python likely has much more overhead, slows you down.

3) event bank filling. C++ for() loop is compiled into very compact machine code, 
python loop cannot because each array element can be random data type, shows you down.

bottom line, there is a reason high speed data acquisitions are written in C/C++, not 
in shell, perl, tcl/tk, or (today's favourite) python.

> The C++ frontend is about 100 times faster in both data and event rates.

This is as expected. You can probably improve python code to get closer to 10 times 
slower than C++. But consider:

a) will it be "fast enough" for the task?
b) learning C++ and optimizing python to within "2-3-10x slower than C++" may involve a 
similar amount of time and effort.

And you have not looked at the real-time properties of your frontend. You may discover 
that it's actually faster than you think, but occasionally stops for a millisecond (or 
two or hundred). some applications a notorious for running memory garbage collection 
just at the wrong time.

I am working right now on exactly this problem, I have a 1 GHz ARM CPU (Cyclone-V FPGA) 
and I need to push data out at 100 Mbytes/sec while avoiding and bad-real-time dropouts  
that cause the FPGA data FIFO to overflow. And I only have 2 CPU cores, 1 to read the 
FPGA FIFO, 1 to run the TCP/IP stack and the ethernet driver. No this can be done with 
python.

K.O.
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> 
> poll(INT count) {
>    for (i=0 ; i<count ; i++)
>       if (new_event())
>          return TRUE;
>    return FALSE;
> }

in the c++ frontend (tmfe.h) this loop usually runs in a separate thread, and I am now working on the linux magic to assign this thread maximum 
uninterruptible priority. otherwise on my Cyclone-V FPGA SoC I see 1-10 msec dropouts, I think from taking ethernet interrupts.

K.O.
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations? 
> > I'm trying to get a sense of the rate limitations of a python frontend.

forgot one more:

c++ toolchain comes with extensive profiler tools aimed to answer the question "why is my 
program so slow, where is it spending all the time?". some of these tools go all the way to 
the hardware level and report CPU cache misses, TLB flushes, context switches and any other 
hardware events that interrupt or slow down computations. programmer than uses this 
information to restructure the code to avoid the worst slow downs (i.e. avoid branch mis-
predictions, avoid cache misses, etc).

I doubt the python toolchain will ever profiler tools as good.

K.O.
    Reply  11 Sep 2024, Konstantin Olchanski, Forum, "Safe" abort of sequencer scripts 
> We often use the MIDAS sequencer to temporarily control detector settings, such as:
> 
> * <change some setting>
> * WAIT 60 seconds
> * <revert setting to original value>
> 
> The question arises of what happens if the sequencer scripts gets aborted during that wait, preventing the value from being reset.

Common problem. Go have an elegant solution using the "defer" keyword.

https://go.dev/tour/flowcontrol/12

K.O.
    Reply  27 Sep 2024, Ben Smith, Forum, Python frontend rate limitations? 
> in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer.
> 
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.

I've now added better support for numpy arrays in the python code that encodes a `midas.event.Event` object. If you use the "correct" numpy data type then you can get vastly improved performance as numpy already stores the data in memory in the format that we need.

In your example, if you change
        self.zero_buffer = [0] * self.total_data_size
to 
        self.zero_buffer = np.ndarray(self.total_data_size, np.int16)

then the max data rate of the frontend goes from 330MB/s to 7600MB/s on my laptop (a factor 20 improvement from one line of code!) 

To ensure you're using the optimal numpy dtype for your bank, you can reference a dict called `midas.tid_np_formats`. For example `midas.tid_np_formats[midas.TID_SHORT]` is equivalent to `np.int16`. If you use an int16 array and write it as a TID_SHORT bank, then we'll use the fast path. If there is a mismatch, we'll have to do type conversions and will end up on the slow path.
Entry  13 Oct 2004, Konstantin Olchanski, Bug Report, db_paste: found string exceeding MAX_STRING_LENGTH 
I am updating TWIST to the latest MIDAS and when I load a saved .odb file, I get
these messages. Their text ought to say where and what strings it does not like.
K.O.



[twistonl@midtwist ~/online]$ odbedit
Please define environment variable 'MIDASSYS'
pointing to the midas installation directory.
[local:twist:S]/>load /twist/data_onl/current/run17548.odb
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH
    Reply  13 Oct 2004, Stefan Ritt, Bug Report, db_paste: found string exceeding MAX_STRING_LENGTH 
Can you attach 

/twist/data_onl/current/run17548.odb

so I can reproduce the problem?
Entry  13 Oct 2004, Konstantin Olchanski, Bug Report, silly odbedit "rename Display xxx/yyy" 
odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
with a slash in the name) and this key cannot be deleted or renamed...
K.O.
    Reply  13 Oct 2004, Stefan Ritt, Bug Report, silly odbedit "rename Display xxx/yyy" 
> odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
> with a slash in the name) and this key cannot be deleted or renamed...
> K.O.

"rename" is "rename", not "mv" under Unix. If you want this functionality, put it
in and don't complain!
Entry  13 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed... 
The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
crashes during shared memory data buffer accesses. I am looking into it and I
will add information as I figure things out. K.O.
    Reply  13 Oct 2004, Pierre-Andre Amaudruz, Bug Report, TWIST upgrade bombed... 
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
where the new mevb scheme is explained.
Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
successfully. Data rate at the EventBuilder were measured about 50MB/s without the
logger and ~30MB/s with the logger.
    Reply  13 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed... 
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

I traced buffer memory corruption to a logic error in system.c::ss_shm_open(). If
a .SHM file exists, it's size is used as the size of the sysv shared memory
segment, even if the requested shared memory size is bigger, but the caller of
ss_shm_open()  thinks it got all the requested memory. Eventually we try to use
the unallocated memory and crash. This is the proposed fix and I will commit it
after I retest the upgrade during the next few days.

[olchansk@send src]$ cvs diff -u system.c
olchansk@midas.psi.ch's password: 
Index: system.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/system.c,v
retrieving revision 1.83
diff -u -r1.83 system.c
--- system.c    4 Oct 2004 07:04:01 -0000       1.83
+++ system.c    14 Oct 2004 05:51:16 -0000
@@ -544,8 +544,14 @@
       } else {
          /* if file exists, retrieve its size */
          file_size = (INT) ss_file_size(file_name);
-         if (file_size > 0)
+         if (file_size > 0) {
+            if (file_size < size) {
+               cm_msg(MERROR, "ss_shm_open", "Shared memory segment \'%s\' size
%d is smaller than requested size %d. Please remove it and try
again",file_name,file_size,size);
+               return SS_NO_MEMORY;
+            }
+            
             size = file_size;
+         }
       }
 
       /* get the shared memory, create if not existing */

K.O.
    Reply  13 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed... 
> > The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> > crashes during shared memory data buffer accesses. I am looking into it and I
> > will add information as I figure things out. K.O.
> 
> Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
> where the new mevb scheme is explained.
> Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
> successfully. Data rate at the EventBuilder were measured about 50MB/s without the
> logger and ~30MB/s with the logger.

It turns out that TWIST uses a private mevb.c. We will consider upgrading to the
standard one.

K.O.
    Reply  14 Oct 2004, Stefan Ritt, Bug Report, TWIST upgrade bombed... 
Agree.

Once you did the modification, please check following situation: Create a fresh
ODB withe increased size ("odbedit -s 2000000" for example). Then check that the
other clients "adopt" this increased size. Note that some experiments need a
bigger ODB, and I don't want to have them recompile all clients, that's why the
code in ss_shm_open() can attach to a *larger* shared memory. However, it should
not matter to the process, since the ODB (or SYSTEM) shared memory size is
stored in the pheader->key_size and pheader->data_size of each participating
process. So they should never write beyond the limits defined in that header.
The size to ss_shm_open() is only a "hint" if the shared memory does not exist,
and is nowhere later used in the code.
Entry  14 Oct 2004, Konstantin Olchanski, Bug Report, lazylogger complains about zero-size files 
With latest midas, I see this:

Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
file run17567.ybs doesn't exists
Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
file run17567.ybs doesn't exists

The file run17567.ybs has size zero:

-rw-r--r--    1 twistonl users      950272 Oct 13 19:29
/twist/data_onl/current/run17565.ybs
-rw-r--r--    1 twistonl users      950272 Oct 13 19:45
/twist/data_onl/current/run17566.ybs
-rw-r--r--    1 twistonl users           0 Oct 13 20:00
/twist/data_onl/current/run17567.ybs
-rw-r--r--    1 twistonl users      983040 Oct 13 20:03
/twist/data_onl/current/run17568.ybs
-rw-r--r--    1 twistonl users      950272 Oct 13 20:26
/twist/data_onl/current/run17569.ybs

I am not sure how to fix this lazylogger logic. Please help.

K.O.
    Reply  14 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed... 
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

On second try, it looks like we are in business- the first try did not work
because of two mistakes:

1) I did not delete *all* old .SHM files (.ODB.SHM, .SYSTEM.SHM, .YBUF1.SHM,
.YBUF2.SHM). I deleted ODB.SHM, so odb worked, but forgot about the data buffers
SYSTEM.SHM & co and ended up with segmentation faults and core dumps in the buffer
management code caused by a mismatch of the old-midas buffers and new-midas code.
2) while debugging these core dumps, I made an error in my test code, so even
after I deleted the old data buffers, things still did not work. Talk about
over-debugging a problem...

K.O.
ELOG V3.1.4-2e1708b5