> In MEG II we also kind of achieved this rate.
>
> Instead of an expensive high-grade switch, we chose a cheap "Chinese" high-grade switch.
Right. We built this DAQ system about 3 years ago and the cheep Chineese switches arrived
on the market about 1 year after we purchased the big 96 port juniper switch. Bad timing/good timing.
Actually I have a very nice 24-port 1gige switch ($2000 about 3 years ago), I could have
used 4 of them in parallel, but they were discontinued and replaced with a $5000 switch
(+$3000 for a 10gige uplink. I think I got the last very last one cheap switch).
But not all Chineese switches are equal. We have an Ubiquity 10gige switch, and it does
not have working end-to-end ethernet flow control. (yikes!).
BTW, for this project we could not use just any cheap switch, we must have 64 fiber SFP ports
for connecting on-TPC electronics. This narrows the market significantly and it does
not match the industry standard port counts 8-16-24-48-96.
> MikroTik CRS354-48G-4S+2Q+RM 54 port
> MikroTik CRS326-24S-2Q+RM 26 Port
We have a hard time buying this stuff in Vancouver BC, Canada. Most of our regular suppliers
are US based and there is a technology trade war still going on between the US and China.
I guess we could buy direct on alibaba, but for the risk of scammers, scalpers and iffy shipping.
> both cost in the order of 500 US$
tell one how much we overpay for US based stuff. not surprising, with how Cisco & co can afford
to buy sports arenas, etc.
> We were astonished that they don't loose UDP packets when all inputs send a packet at the
> same time, and they have to pipe them to the single output one after the other,
> but apparently the switch have enough buffers.
You probably see ethernet flow control in action. Look at the counters for ethernet pause frames
in your daq boards and in your main computer.
> (which is usually NOT written in the data sheets).
True, when I looked into this, I found a paper by somebody in Berkley for special
technique to measure the size of such buffers.
(The big Juniper switch has only 8 Mbytes of buffer. The current wisdom for backbone networks
is to have as little buffering as possible).
> To avoid UDP packet loss for several events, we do traffic shaping by arming the trigger only when the previous event is
> completely received by the frontend. This eliminates all flow control and other complicated methods. Marco can tell you the
> details.
We do not do this. (very bad!). When each trigger arrives, all 64+8 DAQ boards send a train of UDP packets
at maximum line speed (64+8 at 1 gige) all funneled into one 10 gige ((64+8)/10 oversubscription).
Before we got ethernet flow control to work properly, we had to throttle all the 1gige links by about 60%
to get any complete events at all. This would not have been acceptable for physics data taking.
> Another interesting aspect: While we get the data into the frontend, we have problems in getting it through midas. Your
> bm_send_event_sg() is maybe a good approach which we should try. To benchmark the out-of-the-box midas, I run the dummy frontend
> attached on my MacBook Pro 2.4 GHz, 4 cores, 16 GB RAM, 1 TB SSD disk.
Dummy frontend is not very representative, because limitation is the memory bandwidth
and CPU load, and a real ethernet receiver has quite a bit of both (interrupt processing,
DMA into memory, implicit memcpy() inside the socket read()).
For example, typical memcpy() speeds are between 22 and 10 Gbytes/sec for current
generation CPUs and DRAM. This translates for a total budget of 22 and 10 memcpy()
at 10gige speeds. Subtract from this 1 memcpy() to DMA data from ethernet into memory
and 1 memcpy() to DMA data from memory to storage. Subtract from this 2 implicit
memcpy() for read() in the frontend and write() in mlogger. (the Linux sendfile() syscall
was invented to cut them out). Subtract from this 1 memcpy() for instruction and incidental
data fetch (no interesting program fits into cache). Subtract from this memory bandwidth
for running the rest of linux (systemd, ssh, cron jobs, NFS, etc). Hardly anything
left when all is said and done. (Found it, the alphagdaq memcpy() runs at 14 Gbytes/sec,
so total budget of 14 memcpy() at 10gige speeds).
And the event builder eats up 2 CPU cores to process the UDP packets at 10gige rate,
there is quite a bit of CPU-expensive data unpacking, inspection and processing
going on that cannot be cut out. (alphagdaq has 4 cores, "8 threads").
K.O.
P.S. Waiting for rack-mounted machines with AMD "X" series processors... K.O. |