Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  15 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved! 
    Reply  15 Jun 2021, Stefan Ritt, Info, 1000 Mbytes/sec through midas achieved! frontend.cxx
       Reply  16 Jun 2021, Marco Francesconi, Info, 1000 Mbytes/sec through midas achieved! 
          Reply  18 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved! 
       Reply  18 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved! 
Message ID: 2223     Entry time: 18 Jun 2021     In reply to: 2218
Author: Konstantin Olchanski 
Topic: Info 
Subject: 1000 Mbytes/sec through midas achieved! 
> ... MEG II ... 34 crates each with 32 DRS4 digitiser chips and a single 1 Gbps readout link through a Xilinx Zynq SoC.
>
> Zynq ... embedded ethernet MAC does not support jumbo frames (always read the fine prints in the manuals!)
> and the embedded Linux ethernet stack seems to struggle when we go beyond 250 Mbps of UDP traffic.

that's an ouch. we use the altera ethernet mac, and jumbo frames are supported, but the firmware data path
was originally written assuming 1500-byte packets and it is too much work to rewrite it for jumbo frames.

we send the data directly from the FPGA fabric to the ethernet, there is an avalon/axi bus multiplexer
to split the ethernet packets to the NIOS slow control CPU. not sure if such scheme is possible
for SoC FPGAs with embedded ARM CPUs.

and yes, a 1 GHz ARM CPU will not do 10gige. You see it yourself, measure your memcpy() speed. Where
typical PC will have dual-channel 128-bit wide memory (and the famous for it's low latency
Intel memory controller), ARM SoC will have at best 64-bit wide memory (some boards are only 32-bit wide!),
with DDR3 (not DDR4) severely under-clocked (i.e. DDR3-900, etc). This is why the new Apple ARM chips
are so interesting - can Apple ARM memory controller beat the Intel x86 memory controller?

> On the receiver side, we have the DAQ server with an Intel E5-2630 v4 CPU

that's the right gear for the job. quad-channel memory with nominal "Max Memory Bandwidth 68.3 GB/s",
10 CPU cores. My benchmark of memcpy() for the much older duad-channel memory i7-4820 with DDR3-1600 DIMMs
is 20 Gbytes/sec. waiting for ARM CPU with similar specs.

> and a 10 Gbit connection to the network using an Intel X710 Network card.
> In the past, we used also a "cheap" 10 Gbit card from Tehuti but the driver performance was so bad that it could not digest more than 5 Gbps of data.

yup, same here. use Intel ethernet exclusively, even for 1gige links.

> A major modification to Konstantin scheme is that we need to calibrate all WFMs online so that a software zero suppression

I implemented hardware zero suppression in the FPGA code. I think 1 GHz ARM CPU does not have the oomph for this.

> rb_get_wp() returns almost always DB_TIMEOUT

replace rb_xxx() with std::deque<std::vector<char>> (protected by a mutex, of course). lots of stuff in the mfe.c frontend
is obsolete in the same way. check out the newer tmfe frontends (tmfe.md, tmfe.h and tmfe examples).

> It is difficult to report three years of development in a single Elog

but quite successful at it. big thanks for your write-up. I think our info is quite useful for the next people.

K.O.
ELOG V3.1.4-2e1708b5