05 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations?
|
Thank you, this was very helpful.
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"
Thanks, I thought that was just for periodic triggering (or at least that's how I've used it in C++ frontends). Changing this allowed me to get past the 100Hz event rate cap I described.
> If that's still not fast enough, then you can return a *list* of events from your readout_func. I've seen real-world cases of 25kHz+ of midas events generated in this fashion.
>
>
> However in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer. In particular it takes 15ms on my machine to just pack the data into a memory buffer (see timeit command below). I am sure there must be a faster way to do this packing, especially in the case where the bank contains a numpy array rather than a python list.
>
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
>
>
> Cheers,
> Ben
> P.S. You may have a bug in your calculations (depending on how you did your testing). In poll_func I think you should be updating the stats every time the function is called, not just the times when you return True.
I had tested the way you described at first, then later changed
> P.P.S. Command I used to test how slow it is to pack the data. One-time setup of creating the buffers, then multiple tests of the pack_into function:
>
> python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt, buf, *arr)"
> 20 loops, best of 5: 15.3 msec per loop |
05 Sep 2024, Stefan Ritt, Forum, Python frontend rate limitations?
|
> First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently.
> You can set it to 0 and we'll call it as often as possible. You can set this in the ODB at "/Equipment/Python Data Simulator/Common/Period"
Just for your general understanding: The "period" i the C framework works differently. It calls the poll function with a number,
and then that number is used in the poll function like (simplified):
poll(INT count) {
for (i=0 ; i<count ; i++)
if (new_event())
return TRUE;
return FALSE;
}
This ensures that polling is done as quickly as possible, even staying in the same function (poll) rather than called from the
framework in a loop (which would require a function call to poll each time). The "count" is determined from the framework
during startup of the framework such that the execution time of the poll() routine equals the "period". Like if the period
is 0.1, the count might be a few millions, so that the poll routine returns immediately when a new event occurs or when
100ms have expired. During the polling the frontend is "dead" meaning it cannot react on run transitions for example. That's
why most experiments use 0.1-0.5 seconds. But this does then NOT mean that you can only have 10-2 events per second, but that
the reaction time if the frontend is at maximum 0.1-0.5 seconds which is acceptable most of the case.
Due to this design, the C frontend is capable of producing millions of events per second. It took me some while in the early 1990's
to work out that scheme sitting in the "R" trailer at TRIUMF (old guys will remember...).
Best,
Stefan |
06 Sep 2024, Jack Carlton, Forum, Python frontend rate limitations?
|
Thanks for the responses, they were very helpful.
>First the general advice: if you reduce the "period" of your equipment, then your function will get called more frequently. You can set it to 0 and we'll
call it as often as possible.
Thanks, this solves the event rate limitation I described. I didn't think to change this because the "period" did not affect the observed rate in C (and now
I know why thanks to Stefan).
A couple more questions:
1.
For me,
python -m timeit -s "import struct;import ctypes;arr = [0]*1250001;buf = ctypes.create_string_buffer(10000000);fmt = \">1250000d\"" "struct.pack_into(fmt,
buf, *arr)"
10 loops, best of 3: 43.7 msec per loop
which suggests my maximum data rate is about 1.25 MB * 1000/43.7 Hz = 23 MB/s (?). But I see data rates up to 60 MB/s with a python frontend. Am I
misinterpreting the meaning of this result?
2. I can effectively bypass the rate limitations in python by running two concurrent frontends. For example, with one python frontend at best I can generate
60 MB/s of data (setting "period" to 0 now); but with two frontends I can double this to 120 MB/s. This implies one python frontend is not bottlenecked by
hardware limitations in my case.
Am I doing something wrong to artificially bottleneck my frontends? Perhaps there's a multi-threading solution I can implement to avoid needing multiple
frontends?
Thanks,
Jack |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
> I'm trying to get a sense of the rate limitations of a python frontend.
1) python is single-threaded, for ultimate performance, a MIDAS frontend (or any DAQ
application) has to be multithreaded:
a) thread with busy loop read the data and place it into a FIFO
b) thread to read data from FIFO and send it to SYSTEM buffer shared memory or to
mserver
c) thread to respond to begin-run, end-run, etc RPCs
d) probably a thread to recycle memory from thread (b) back to thread (a) if per-event
malloc()/free() adds too much overhead
2) data readout. C++ AXI bus access is compiled into 1 instruction and results in 1 AXI
bus operation. comparable for python likely has much more overhead, slows you down.
3) event bank filling. C++ for() loop is compiled into very compact machine code,
python loop cannot because each array element can be random data type, shows you down.
bottom line, there is a reason high speed data acquisitions are written in C/C++, not
in shell, perl, tcl/tk, or (today's favourite) python.
> The C++ frontend is about 100 times faster in both data and event rates.
This is as expected. You can probably improve python code to get closer to 10 times
slower than C++. But consider:
a) will it be "fast enough" for the task?
b) learning C++ and optimizing python to within "2-3-10x slower than C++" may involve a
similar amount of time and effort.
And you have not looked at the real-time properties of your frontend. You may discover
that it's actually faster than you think, but occasionally stops for a millisecond (or
two or hundred). some applications a notorious for running memory garbage collection
just at the wrong time.
I am working right now on exactly this problem, I have a 1 GHz ARM CPU (Cyclone-V FPGA)
and I need to push data out at 100 Mbytes/sec while avoiding and bad-real-time dropouts
that cause the FPGA data FIFO to overflow. And I only have 2 CPU cores, 1 to read the
FPGA FIFO, 1 to run the TCP/IP stack and the ethernet driver. No this can be done with
python.
K.O. |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
>
> poll(INT count) {
> for (i=0 ; i<count ; i++)
> if (new_event())
> return TRUE;
> return FALSE;
> }
in the c++ frontend (tmfe.h) this loop usually runs in a separate thread, and I am now working on the linux magic to assign this thread maximum
uninterruptible priority. otherwise on my Cyclone-V FPGA SoC I see 1-10 msec dropouts, I think from taking ethernet interrupts.
K.O. |
11 Sep 2024, Konstantin Olchanski, Forum, Python frontend rate limitations?
|
> > I'm trying to get a sense of the rate limitations of a python frontend.
forgot one more:
c++ toolchain comes with extensive profiler tools aimed to answer the question "why is my
program so slow, where is it spending all the time?". some of these tools go all the way to
the hardware level and report CPU cache misses, TLB flushes, context switches and any other
hardware events that interrupt or slow down computations. programmer than uses this
information to restructure the code to avoid the worst slow downs (i.e. avoid branch mis-
predictions, avoid cache misses, etc).
I doubt the python toolchain will ever profiler tools as good.
K.O. |
11 Sep 2024, Konstantin Olchanski, Forum, "Safe" abort of sequencer scripts
|
> We often use the MIDAS sequencer to temporarily control detector settings, such as:
>
> * <change some setting>
> * WAIT 60 seconds
> * <revert setting to original value>
>
> The question arises of what happens if the sequencer scripts gets aborted during that wait, preventing the value from being reset.
Common problem. Go have an elegant solution using the "defer" keyword.
https://go.dev/tour/flowcontrol/12
K.O. |
27 Sep 2024, Ben Smith, Forum, Python frontend rate limitations?
|
> in your case the limitation is likely that you're sending 1.25MB per event and we have a lot of data marshalling to do between the python and C++ layer.
>
> I'll add it to my to-do list to investigate improving the performance of medium-to-large events in the python code.
I've now added better support for numpy arrays in the python code that encodes a `midas.event.Event` object. If you use the "correct" numpy data type then you can get vastly improved performance as numpy already stores the data in memory in the format that we need.
In your example, if you change
self.zero_buffer = [0] * self.total_data_size
to
self.zero_buffer = np.ndarray(self.total_data_size, np.int16)
then the max data rate of the frontend goes from 330MB/s to 7600MB/s on my laptop (a factor 20 improvement from one line of code!)
To ensure you're using the optimal numpy dtype for your bank, you can reference a dict called `midas.tid_np_formats`. For example `midas.tid_np_formats[midas.TID_SHORT]` is equivalent to `np.int16`. If you use an int16 array and write it as a TID_SHORT bank, then we'll use the fast path. If there is a mismatch, we'll have to do type conversions and will end up on the slow path. |
05 Nov 2024, Jack Carlton, Forum, How to properly write a client listens for events on a given buffer?
|
If there's some template for writing a client to access event data, that would be
very useful (and you can probably just ignore the context I gave below in that
case).
Some context:
Quite a while ago, I wrote the attached "data pipeline" client whose job was to
listen for events, copy their data, and pipe them to a python script. I believe I
just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the
attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the
below problems; so don't take the logic in the attached code as the exact code that
caused the issues below.
However, I'm unable to resolve a couple issues:
1. If a timeout is set, everything will work until that timeout is reached. Then
regardless of what kind of logic I tried to implement (retry receiving event,
disconnect and reconnect client, etc.) the client would refuse to receive more data.
2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while
loop. But because I can't set a timeout I have to ctrl-C twice; this would
occasionally corrupt the ODB which was not ideal. I was able to get around this with
some impractical solution involving ncurses I believe.
Thanks,
Jack |
05 Nov 2024, Maia Henriksson-Ward, Forum, How to properly write a client listens for events on a given buffer?
|
> If there's some template for writing a client to access event data, that would be
> very useful (and you can probably just ignore the context I gave below in that
> case).
>
>
> Some context:
>
> Quite a while ago, I wrote the attached "data pipeline" client whose job was to
> listen for events, copy their data, and pipe them to a python script. I believe I
> just stole bits and pieces from mdump.cxx to accomplish this. Later I wrote the
> attached wrapper class "MidasConnector.cpp" and a main.cpp to generalize
> data_pipeline.cxx a bit. There were a lot of iterations to the code where I had the
> below problems; so don't take the logic in the attached code as the exact code that
> caused the issues below.
>
> However, I'm unable to resolve a couple issues:
>
> 1. If a timeout is set, everything will work until that timeout is reached. Then
> regardless of what kind of logic I tried to implement (retry receiving event,
> disconnect and reconnect client, etc.) the client would refuse to receive more data.
>
> 2. When I ctrl-C main, it hangs; this is expected because it's stuck in a while
> loop. But because I can't set a timeout I have to ctrl-C twice; this would
> occasionally corrupt the ODB which was not ideal. I was able to get around this with
> some impractical solution involving ncurses I believe.
>
>
> Thanks,
> Jack
midas/examples/lowlevel/consume.cxx might be what you're looking for, but I think all
you're missing is a call to cm_yield() in your loop, so your midas client doesn't get
killed when the timeout is reached (and also so you can act on shutdown requests from
midas)
Something like
int status = cm_yield(100);
if (status == SS_ABORT || status == RPC_SHUTDOWN)
break;
There might be a recommended way to handle the ctrl-c and disconnect from the ODB, but
off the top of my head I don't remember it.
Also check out Ben's new(ish) python library, midas/python/examples/event_receiver.py
might be a much easier solution. And you can use the context manager, which will take
care of safely disconnecting from midas after you ctrl-C. |
02 Dec 2024, Stefan Ritt, Forum, "Safe" abort of sequencer scripts
|
The atexit() function has been implemented in the current develop branch of midas, see
https://daq00.triumf.ca/MidasWiki/index.php/Sequencer#ATEXIT_subroutine
Stefan |
29 Dec 2024, Pavel Murat, Forum, time ordering of run transition calls to TMFeEquipment things
|
Dear MIDAS experts,
I have a question about "tmfe approach" to implementing MIDAS frontends. If I read the code correctly,
within this approach it is the TMFeEquipment things, not the TMFrontend's themselves,
which handle the run transitions - the TMFrontend class
https://bitbucket.org/tmidas/midas/src/423082fb67c7711813fcda61f7cd03784c398f49/include/tmfe.h#lines-306:378
simply doesn't have methods to handle those directly.
So how does a user control the sequence in which TMFeEquipment::HandleBeginRun functions of different
TMFeEquipment pieces are called at begin run? - there are two cases to consider: TMFeEquipment things
defined by the same TMFrontend and by different TMFrontend's.
Many thanks and happy holidays to everyone!
-- regards, Pasha
|
01 Jan 2025, Konstantin Olchanski, Forum, time ordering of run transition calls to TMFeEquipment things
|
> I have a question about "tmfe approach" to implementing MIDAS frontends. If I read the code correctly,
> within this approach it is the TMFeEquipment things, not the TMFrontend's themselves,
> which handle the run transitions - the TMFrontend class
that's correct and it is documented so in https://bitbucket.org/tmidas/midas/src/develop/tmfe.md
> So how does a user control the sequence in which TMFeEquipment::HandleBeginRun functions of different
> TMFeEquipment pieces are called at begin run? - there are two cases to consider: TMFeEquipment things
> defined by the same TMFrontend and by different TMFrontend's.
I am not sure what you are trying to do. It is always easier to suggest a solution to a specific problem.
But I will try to answer anyway:
1) "time ordering of run transitions" - of course midas transitions are ordered by transition sequence numbers
and the tmfe class provides methods to control this. ditto for the mfe.cxx frontends.
2) for one TMFrontend, the order of calling HandleBeginRun() is the order in which equipments were added to the
equipment using FeAddEquipment(). HandleEndRun() is called in reverse order. (I better check this).
3) to have multiple TMFrontends in one program would be unusual (mfe.cxx frontends completely do not support
this), but should work. Everything was coded to support this, but it was never tested in practice because we
cannot invent any useful use-case for it. HandleBeginRun() handlers are likely to be called in the frontends are
created. (I could check this and confirm it works, as long as you have a valid use-case for this configuration).
4) Frontend X has EquipmentA and EquipmentB, you want EqA::HandleBeginRun() to be called at run transition 200
and EqB::HandleBeginRun() to be called at run transition 400.
This is not directly supported by mfe.cxx frontends (the begin_run() handler is a global function) and I did not
directly implement it in the TMFE frontend.
But I think this would be a useful improvement. I will look into this.
Likely I will add per-equipment data members fEqConfBeginRunSeqNo, fEqConfEndRunSeqno, etc. Value 0 would
unregister the corresponding run transition handler. This would cleanup the code quite a bit, a bunch
of RegisterTranstionXXX functions could go away.
K.O. |
02 Jan 2025, Pavel Murat, Forum, time ordering of run transition calls to TMFeEquipment things
|
Hi K.O., your clarification is much appreciated!
"
> I am not sure what you are trying to do. It is always easier to suggest a solution to a specific problem.
I think, I owe you an explanation :) :
Consider ~ 40 nodes with two FPGAs (PCIE cards) per node, talking to the detector hardware.
One of those FPGAs, in addition to reading the data, performs the global timing synchronization.
The high-bandwidth data readout is not controlled by MIDAS, so all frontends perform only 'slow control'-type functions.
In MIDAS language, an FPGA implements two different units of slow control equipment:
one - configuring and controlling a single FPGA (equipment type A), and another one - synchronizing
multiple FPGAs (equipment type B). On one of the nodes, unit A and unit B share the FPGA card,
so they better be controlled by the same frontend.
For one, I need to make sure that all type A equipment units, managed by multiple frontends,
are initialized before the [single] type B unit which shares the frontend with the type A unit.
And, of course, the end of a run transition has to be handled in the opposite order - type B unit
shuts down first.
As 'periodic' actions for all registered pieces of equipment are performed in the same loop [thread],
registering the equipment in the needed order - first A, then B - should give a solution - thanks for making that clear.
>
> 1) "time ordering of run transitions" - of course midas transitions are ordered by transition sequence numbers
> and the tmfe class provides methods to control this. ditto for the mfe.cxx frontends.
>
> 2) for one TMFrontend, the order of calling HandleBeginRun() is the order in which equipments were added to the
> equipment using FeAddEquipment(). HandleEndRun() is called in reverse order. (I better check this).
the ordering of the rpc handler calls in tmfe's tr_stop/tr_pause/tr_resume functions is ok.
>
> 3) to have multiple TMFrontends in one program would be unusual (mfe.cxx frontends completely do not support
> this), but should work. Everything was coded to support this, but it was never tested in practice because we
> cannot invent any useful use-case for it. HandleBeginRun() handlers are likely to be called in the frontends are
> created. (I could check this and confirm it works, as long as you have a valid use-case for this configuration).
agreed, I don't think there is a good use case for that, so no need to spend time checking.
>
> 4) Frontend X has EquipmentA and EquipmentB, you want EqA::HandleBeginRun() to be called at run transition 200
> and EqB::HandleBeginRun() to be called at run transition 400.
>
> This is not directly supported by mfe.cxx frontends (the begin_run() handler is a global function) and I did not
> directly implement it in the TMFE frontend.
>
> But I think this would be a useful improvement. I will look into this.
In the simplest case, registering the equipment units in the right order is definitely the answer.
However a single FPGA can perform multiple logically independent tasks and thus represent
multiple logical units of equipment. Those units however are not independent: they share the hardware (FPGA)
and thus do depend on each other. Giving users a full control over the sequence in which those logical units
execute their run transitions is quite likely to be needed, for example, to work around peculiarities of the
custom-made kernel drivers.
>
> Likely I will add per-equipment data members fEqConfBeginRunSeqNo, fEqConfEndRunSeqno, etc. Value 0 would
> unregister the corresponding run transition handler. This would cleanup the code quite a bit, a bunch
> of RegisterTranstionXXX functions could go away.
this also makes sense. -- thanks again, regards, Pasha
>
> K.O. |
05 Jan 2025, Stefan Ritt, Forum, time ordering of run transition calls to TMFeEquipment things
|
Hi Pavel,
have you looked into
cm_set_transition_sequence()
which let's you define the sequence number for every midas client. You give any number between 1 and 1000 (default is 500 for frontends I believe).
The default value of 500 is defined in mfe.cxx:2641 where you have 500 for all four transitions, but it can be overwritten in the frontend_init function via
cm_set_transition_sequence(). Since you have separate values for the start and stop transition, you can get the different sequencing for both transitions as you need. Like set
all type A to 400, type B to 600 for TR_START, and set type A to 600 and type B to 400 for TR_STOP.
Since this works on the midas client level, it should also work for the tmfe.cxx framework. I'm however not sure if you have a similar default of 500 as in mfe.cxx:2641. But K.O.
should know.
Best,
Stefan |
13 Oct 2004, Konstantin Olchanski, Bug Report, db_paste: found string exceeding MAX_STRING_LENGTH
|
I am updating TWIST to the latest MIDAS and when I load a saved .odb file, I get
these messages. Their text ought to say where and what strings it does not like.
K.O.
[twistonl@midtwist ~/online]$ odbedit
Please define environment variable 'MIDASSYS'
pointing to the midas installation directory.
[local:twist:S]/>load /twist/data_onl/current/run17548.odb
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH
[odb.c:5600:db_paste] found string exceeding MAX_STRING_LENGTH |
13 Oct 2004, Stefan Ritt, Bug Report, db_paste: found string exceeding MAX_STRING_LENGTH
|
Can you attach
/twist/data_onl/current/run17548.odb
so I can reproduce the problem? |
13 Oct 2004, Konstantin Olchanski, Bug Report, silly odbedit "rename Display xxx/yyy"
|
odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
with a slash in the name) and this key cannot be deleted or renamed...
K.O. |
13 Oct 2004, Stefan Ritt, Bug Report, silly odbedit "rename Display xxx/yyy"
|
> odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
> with a slash in the name) and this key cannot be deleted or renamed...
> K.O.
"rename" is "rename", not "mv" under Unix. If you want this functionality, put it
in and don't complain! |
13 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed...
|
The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
crashes during shared memory data buffer accesses. I am looking into it and I
will add information as I figure things out. K.O. |
|