15 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved!
|
I am sure everybody else has 10gige and 40gige networks and are sending terabytes of data before breakfast.
Myself, I only have one computer with a 10gige network link and sufficient number of daq boards to fill
it with data. Here is my success story of getting all this data through MIDAS.
This is the anti-matter experiment ALPHA-g now under final assembly at CERN. The main particle detector is a long but
thin cylindrical TPC. It surrounds the magnetic bottle (particle trap) where we make and study anti-hydrogen. There are
64 daq boards to read the TPC cathode pads and 8 daq boards to read the anode wires and to form the trigger. Each daq
board can produce data at 80-90 Mbytes/sec (1gige links). Data is sent as UDP packets (no jumbo frames). Altera FPGA
firmware was done here at TRIUMF by Bryerton Shaw, Chris Pearson, Yair Lynn and myself.
Network interconnect is a 96-port Juniper switch with a 10gige uplink to the main daq computer (quad core Intel(R)
Xeon(R) CPU E3-1245 v6 @ 3.70GHz, 64 GBytes of DDR4 memory).
MIDAS data path is: UDP packet receiver frontend -> event builder -> mlogger -> disk -> lazylogger -> CERN EOS cloud
storage.
First chore was to get all the UDP packets into the main computer. "U" in UDP stands for "unreliable", and at first, UDP
packets have been disappearing pretty much anywhere they could. To fix this, in order:
- reading from the udp socket must be done in a dedicated thread (in the midas context, pauses to write statistics or
check alarms result in lost udp packets)
- udp socket buffer has to be very big
- maximum queue sizes must be enabled in the 10gige NIC
- ethernet flow control must be enabled on the 10gige link
- ethernet flow control must be enabled in the switch (to my surprise many switches do not have working end-to-end
ethernet flow control and lose UDP packets, ask me about this. our big juniper switch balked at first, but I got it
working eventually).
- ethernet flow control must be enabled on the 1gige links to each daq module
- ethernet flow control must be enabled in the FPGA firmware (it's a checkbox in qsys)
- FPGA firmware internally must have working back pressure and flow control (avalon and axi buses)
- ideally, this back-pressure should feed back to the trigger. ALPHA-g does not have this (it does not need it).
Next chore was to multithread the UDP receiver frontend and to multithread the event builder. Stock single-threaded
programs quickly max out with 100% CPU use and reach nowhere near 10gige data speeds.
Naive multithreading, with two threads, reader (read UDP packet, lock a mutex, put it into a deque, unlock, repeat) and
sender (lock a mutex, get a packet from deque, unlock, bm_send_event(), repeat) spends all it's time locking and
unlocking the mutex and goes nowhere fast (with 1500 byte packets, about 600 kHz of lock/unlock at 10gige speed).
So one has to do everything in batches: reader thread: accumulate 1000 udp packets in an std::vector, lock the mutex,
dump this batch into a deque, unlock, repeat; sender thread: lock mutex, get 1000 packets from the deque, unlock, stuff
the 1000 packets into 1 midas event, bm_send_event(), repeat.
It takes me 5 of these multithreaded udp reader frontends to keep up with a 10gige link without dropping any UDP packets.
My first implementation chewed up 500% CPU, that's all of it, there is only 4 CPU cores available, leaving nothing
for the event builder (and mlogger, and ...)
I had to:
a) switch from plain socket read() to socket recvmmsg() - 100000 udp packets per syscall vs 1 packet per syscall, and
b) switch from plain bm_send_event() to bm_send_event_sg() - using a scatter-gather list to avoid a memcpy() of each udp
packet into one big midas event.
Next is the event builder.
The event builder needs to read data from the 5 midas event buffers (one buffer per udp reader frontend, each midas event
contains 1000 udp packets as indovidual data banks), examine trigger timestamps inside each udp packet, collect udp
packets with matching timestamps into a physics event, bm_send_event() it to the SYSTEM buffer. rinse and repeat.
Initial single threaded implementation maxed out at about 100-200 Mbytes/sec with 100% busy CPU.
After trying several threading schemes, the final implementation has these threads:
- 5 threads to read the 5 event buffers, these threads also examine the udp packets, extract timestamps, etc
- 1 thread to sort udp packets by timestamp and to collect them into physics events
- 1 thread to bm_send_event() physics events to the SYSTEM buffer
- main thread and rpc handler thread (tmfe frontend)
(Again, to reduce lock contention, all data is passed between threads in large batches)
This got me up to about 800 Mbytes/sec. To get more, I had to switch the event builder from old plain bm_send_event() to
the scatter-gather bm_send_event_sg(), *and* I had to reduce CPU use by other programs, see steps (a) and (b) above.
So, at the end, success, full 10gige data rate from daq boards to the MIDAS SYSTEM buffer.
(But wait, what about the mlogger? In this experiment, we do not have a disk storage array to sink this
much data. But it is an already-solved problem. On the data storage machines I built for GRIFFIN - 8 SATA NAS HDDs using
raidz2 ZFS - the stock MIDAS mlogger can easily sink 1000 Mbytes/sec from SYSTEM buffer to disk).
Lessons learned:
- do not use UDP. dealing with packet loss will cost you a fortune in headache medicines and hair restorations.
- use jumbo frames. difference in per-packet overhead between 1500 byte and 9000 byte packets is almost a factor of 10.
- everything has to be done in bulk to reduce per-packet overheads. recvmmsg(), batched queue push/pop, etc
- avoid memory allocations (I has a per-packet std::string, replaced it with char[5])
- avoid memcpy(), use writev(), bm_send_event_sg() & co
K.O.
P.S. Let's counting the number of data copies in this system:
x udp reader frontend:
- ethernet NIC DMA into linux network buffers
- recvmmsg() memcpy() from linux network buffer to my memory
- bm_send_event_sg() memcpy() from my memory to the MIDAS shared memory event buffer
x event builder:
- bm_receive_event() memcpy() from MIDAS shared memory event buffer to my event buffer
- my memcpy() from my event buffer to my per-udp-packet buffers
- bm_send_event_sg() memcpy() from my per-udp-packet buffers to the MIDAS shared memory event buffer (SYSTEM)
x mlogger:
- bm_receive_event() memcpy() from MIDAS SYSTEM buffer
- memcpy() in the LZ4 data compressor
- write() syscall memcpy() to linux system disk buffer
- SATA interface DMA from linux system disk buffer to disk.
Would a monolithic massively multithreaded daq application be more efficient?
("udp receiver + event builder + logger"). Yes, about 4 memcpy() out of about 10 will go away.
Would I be able to write such a monolithic daq application?
I think not. Already, at 10gige data rates, for all practical purposes, it is impossible
to debug most problems, especially subtle trouble in multithreading (race conditions)
and in memory allocations. At best, I can sprinkle assert()s and look at core dumps.
So the good old divide-and-conquer approach is still required, MIDAS still rules.
K.O. |
15 Jun 2021, Stefan Ritt, Info, 1000 Mbytes/sec through midas achieved!
|
In MEG II we also kind of achieved this rate. Marco F. will post an entry soon to describe the details. There is only one thing
I want to mention, which is our network switch. Instead of an expensive high-grade switch, we chose a cheap "Chinese" high-grade
switch. We have "rack switches", which are collector switch for each rack receiving up to 10 x 1GBit inputs, and outputting 1 x
10 GBit to an "aggregation switch", which collects all 10 GBit lines form rack switches and forwards it with (currently a single
) 10 GBit line. For the rack switch we use a
MikroTik CRS354-48G-4S+2Q+RM 54 port
and for the aggregation switch
MikroTik CRS326-24S-2Q+RM 26 Port
both cost in the order of 500 US$. We were astonished that they don't loose UDP packets when all inputs send a packet at the
same time, and they have to pipe them to the single output one after the other, but apparently the switch have enough buffers
(which is usually NOT written in the data sheets).
To avoid UDP packet loss for several events, we do traffic shaping by arming the trigger only when the previous event is
completely received by the frontend. This eliminates all flow control and other complicated methods. Marco can tell you the
details.
Another interesting aspect: While we get the data into the frontend, we have problems in getting it through midas. Your
bm_send_event_sg() is maybe a good approach which we should try. To benchmark the out-of-the-box midas, I run the dummy frontend
attached on my MacBook Pro 2.4 GHz, 4 cores, 16 GB RAM, 1 TB SSD disk. I got
Event size: 7 MB
No logging: 900 events/s = 6.7 GBytes/s
Logging with LZ4 compression: 155 events/s = 1.2 GBytes/s
Logging without compression: 170 events/s = 1.3 GBytes/s
So with this simple approach I got already more than 1 GByte of "dummy data" through midas, indicating that the buffer
management is not so bad. I did use the plain mfe.c frontend framework, no bm_send_event_sg() (but mfe.c uses rpc_send_event() which is an
optimized version of bm_send_event()).
Best,
Stefan |
16 Jun 2021, Marco Francesconi, Info, 1000 Mbytes/sec through midas achieved!
|
As reported by Stefan, in MEG II we have very similar ethernet throughputs.
In total, we have 34 crates each with 32 DRS4 digitiser chips and a single 1 Gbps readout link through a Xilinx Zynq SoC.
The data arrives in push mode without any external intervention, the only throttling being an optional prescaling on the trigger rate.
We discovered the hard way that 1 Gbps throughput on Zynq is not trivial at all: the embedded ethernet MAC does not support jumbo frames (always read the fine prints in the manuals!) and the embedded Linux ethernet stack seems to struggle when we go beyond 250 Mbps of UDP traffic.
Anyhow, even with the reduced speed, the maximum throughput at network input is around 8.5 Gbps which passes through the Mikrotik switches mentioned by Stefan.
We had very bad experiences in the past with similar price-point switches, observing huge packet drops when the instantaneous switching capacity cannot cope with the traffic, but so far we are happy with the Mikrotik ones.
On the receiver side, we have the DAQ server with an Intel E5-2630 v4 CPU and a 10 Gbit connection to the network using an Intel X710 Network card.
In the past, we used also a "cheap" 10 Gbit card from Tehuti but the driver performance was so bad that it could not digest more than 5 Gbps of data.
The current frontend is based on the mfe.c scheme for historical reasons (the very first version dates back to 2015).
We opted for a monolithic multithread solution so we can reuse the underlying DAQ code for other experiments which may not have the complete Midas backend.
Just to mention them: one is the FOOT experiment (which afaik uses an adapted version of Altas DAQ) and the other is the LOLX experiment (for which we are going to ship to Canada soon a small 32 channel system using Midas).
A major modification to Konstantin scheme is that we need to calibrate all WFMs online so that a software zero suppression can be applied to reduce the final data size (that part is still to be implemented).
This requirement results in additional resource usage to parse the UDP content into floats and calibrate them.
Currently, we have 7 packet collector threads to digest the full packet flow (using recvmmsg), followed by an event building stage that uses 4 threads and 3 other threads for WFM calibration.
We have progressive packet numbers on each packet generated by the hardware and a set of flags marking the start and end of the event; combining the packet number difference between the start and end of the event and the total received packets for that event it is really easy to understand if packet drops are happening.
All the thread infrastructure was tested and we could digest the complete throughput, we still have to finalise the full 10 Gbit connection to Midas because the final system has been installed only recently (April).
We are using EQ_USER flag to push events into mfe.c buffers with up to 4 threads, but I was observing that above ~1.5 Gbps the rb_get_wp() returns almost always DB_TIMEOUT and I'm forced to drop the event.
This conflicts with the measurements reported by Stefan (we were discussing this yesterday), so we are still investigating the possible cause.
It is difficult to report three years of development in a single Elog, I hope I put all the relevant point here.
It looks to me that we opted for very complementary approaches for high throughput ethernet with Midas, and I think there are still a lot of details that could be worth reporting.
In case someone organises some kind of "virtual workshop" on this, I'm willing to participate.
Best,
Marco
> In MEG II we also kind of achieved this rate. Marco F. will post an entry soon to describe the details. There is only one thing
> I want to mention, which is our network switch. Instead of an expensive high-grade switch, we chose a cheap "Chinese" high-grade
> switch. We have "rack switches", which are collector switch for each rack receiving up to 10 x 1GBit inputs, and outputting 1 x
> 10 GBit to an "aggregation switch", which collects all 10 GBit lines form rack switches and forwards it with (currently a single
> ) 10 GBit line. For the rack switch we use a
>
> MikroTik CRS354-48G-4S+2Q+RM 54 port
>
> and for the aggregation switch
>
> MikroTik CRS326-24S-2Q+RM 26 Port
>
> both cost in the order of 500 US$. We were astonished that they don't loose UDP packets when all inputs send a packet at the
> same time, and they have to pipe them to the single output one after the other, but apparently the switch have enough buffers
> (which is usually NOT written in the data sheets).
>
> To avoid UDP packet loss for several events, we do traffic shaping by arming the trigger only when the previous event is
> completely received by the frontend. This eliminates all flow control and other complicated methods. Marco can tell you the
> details.
>
> Another interesting aspect: While we get the data into the frontend, we have problems in getting it through midas. Your
> bm_send_event_sg() is maybe a good approach which we should try. To benchmark the out-of-the-box midas, I run the dummy frontend
> attached on my MacBook Pro 2.4 GHz, 4 cores, 16 GB RAM, 1 TB SSD disk. I got
>
> Event size: 7 MB
>
> No logging: 900 events/s = 6.7 GBytes/s
>
> Logging with LZ4 compression: 155 events/s = 1.2 GBytes/s
>
> Logging without compression: 170 events/s = 1.3 GBytes/s
>
> So with this simple approach I got already more than 1 GByte of "dummy data" through midas, indicating that the buffer
> management is not so bad. I did use the plain mfe.c frontend framework, no bm_send_event_sg() (but mfe.c uses rpc_send_event() which is an
> optimized version of bm_send_event()).
>
> Best,
> Stefan |
18 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved!
|
> ... MEG II ... 34 crates each with 32 DRS4 digitiser chips and a single 1 Gbps readout link through a Xilinx Zynq SoC.
>
> Zynq ... embedded ethernet MAC does not support jumbo frames (always read the fine prints in the manuals!)
> and the embedded Linux ethernet stack seems to struggle when we go beyond 250 Mbps of UDP traffic.
that's an ouch. we use the altera ethernet mac, and jumbo frames are supported, but the firmware data path
was originally written assuming 1500-byte packets and it is too much work to rewrite it for jumbo frames.
we send the data directly from the FPGA fabric to the ethernet, there is an avalon/axi bus multiplexer
to split the ethernet packets to the NIOS slow control CPU. not sure if such scheme is possible
for SoC FPGAs with embedded ARM CPUs.
and yes, a 1 GHz ARM CPU will not do 10gige. You see it yourself, measure your memcpy() speed. Where
typical PC will have dual-channel 128-bit wide memory (and the famous for it's low latency
Intel memory controller), ARM SoC will have at best 64-bit wide memory (some boards are only 32-bit wide!),
with DDR3 (not DDR4) severely under-clocked (i.e. DDR3-900, etc). This is why the new Apple ARM chips
are so interesting - can Apple ARM memory controller beat the Intel x86 memory controller?
> On the receiver side, we have the DAQ server with an Intel E5-2630 v4 CPU
that's the right gear for the job. quad-channel memory with nominal "Max Memory Bandwidth 68.3 GB/s",
10 CPU cores. My benchmark of memcpy() for the much older duad-channel memory i7-4820 with DDR3-1600 DIMMs
is 20 Gbytes/sec. waiting for ARM CPU with similar specs.
> and a 10 Gbit connection to the network using an Intel X710 Network card.
> In the past, we used also a "cheap" 10 Gbit card from Tehuti but the driver performance was so bad that it could not digest more than 5 Gbps of data.
yup, same here. use Intel ethernet exclusively, even for 1gige links.
> A major modification to Konstantin scheme is that we need to calibrate all WFMs online so that a software zero suppression
I implemented hardware zero suppression in the FPGA code. I think 1 GHz ARM CPU does not have the oomph for this.
> rb_get_wp() returns almost always DB_TIMEOUT
replace rb_xxx() with std::deque<std::vector<char>> (protected by a mutex, of course). lots of stuff in the mfe.c frontend
is obsolete in the same way. check out the newer tmfe frontends (tmfe.md, tmfe.h and tmfe examples).
> It is difficult to report three years of development in a single Elog
but quite successful at it. big thanks for your write-up. I think our info is quite useful for the next people.
K.O. |
18 Jun 2021, Konstantin Olchanski, Info, 1000 Mbytes/sec through midas achieved!
|
> In MEG II we also kind of achieved this rate.
>
> Instead of an expensive high-grade switch, we chose a cheap "Chinese" high-grade switch.
Right. We built this DAQ system about 3 years ago and the cheep Chineese switches arrived
on the market about 1 year after we purchased the big 96 port juniper switch. Bad timing/good timing.
Actually I have a very nice 24-port 1gige switch ($2000 about 3 years ago), I could have
used 4 of them in parallel, but they were discontinued and replaced with a $5000 switch
(+$3000 for a 10gige uplink. I think I got the last very last one cheap switch).
But not all Chineese switches are equal. We have an Ubiquity 10gige switch, and it does
not have working end-to-end ethernet flow control. (yikes!).
BTW, for this project we could not use just any cheap switch, we must have 64 fiber SFP ports
for connecting on-TPC electronics. This narrows the market significantly and it does
not match the industry standard port counts 8-16-24-48-96.
> MikroTik CRS354-48G-4S+2Q+RM 54 port
> MikroTik CRS326-24S-2Q+RM 26 Port
We have a hard time buying this stuff in Vancouver BC, Canada. Most of our regular suppliers
are US based and there is a technology trade war still going on between the US and China.
I guess we could buy direct on alibaba, but for the risk of scammers, scalpers and iffy shipping.
> both cost in the order of 500 US$
tell one how much we overpay for US based stuff. not surprising, with how Cisco & co can afford
to buy sports arenas, etc.
> We were astonished that they don't loose UDP packets when all inputs send a packet at the
> same time, and they have to pipe them to the single output one after the other,
> but apparently the switch have enough buffers.
You probably see ethernet flow control in action. Look at the counters for ethernet pause frames
in your daq boards and in your main computer.
> (which is usually NOT written in the data sheets).
True, when I looked into this, I found a paper by somebody in Berkley for special
technique to measure the size of such buffers.
(The big Juniper switch has only 8 Mbytes of buffer. The current wisdom for backbone networks
is to have as little buffering as possible).
> To avoid UDP packet loss for several events, we do traffic shaping by arming the trigger only when the previous event is
> completely received by the frontend. This eliminates all flow control and other complicated methods. Marco can tell you the
> details.
We do not do this. (very bad!). When each trigger arrives, all 64+8 DAQ boards send a train of UDP packets
at maximum line speed (64+8 at 1 gige) all funneled into one 10 gige ((64+8)/10 oversubscription).
Before we got ethernet flow control to work properly, we had to throttle all the 1gige links by about 60%
to get any complete events at all. This would not have been acceptable for physics data taking.
> Another interesting aspect: While we get the data into the frontend, we have problems in getting it through midas. Your
> bm_send_event_sg() is maybe a good approach which we should try. To benchmark the out-of-the-box midas, I run the dummy frontend
> attached on my MacBook Pro 2.4 GHz, 4 cores, 16 GB RAM, 1 TB SSD disk.
Dummy frontend is not very representative, because limitation is the memory bandwidth
and CPU load, and a real ethernet receiver has quite a bit of both (interrupt processing,
DMA into memory, implicit memcpy() inside the socket read()).
For example, typical memcpy() speeds are between 22 and 10 Gbytes/sec for current
generation CPUs and DRAM. This translates for a total budget of 22 and 10 memcpy()
at 10gige speeds. Subtract from this 1 memcpy() to DMA data from ethernet into memory
and 1 memcpy() to DMA data from memory to storage. Subtract from this 2 implicit
memcpy() for read() in the frontend and write() in mlogger. (the Linux sendfile() syscall
was invented to cut them out). Subtract from this 1 memcpy() for instruction and incidental
data fetch (no interesting program fits into cache). Subtract from this memory bandwidth
for running the rest of linux (systemd, ssh, cron jobs, NFS, etc). Hardly anything
left when all is said and done. (Found it, the alphagdaq memcpy() runs at 14 Gbytes/sec,
so total budget of 14 memcpy() at 10gige speeds).
And the event builder eats up 2 CPU cores to process the UDP packets at 10gige rate,
there is quite a bit of CPU-expensive data unpacking, inspection and processing
going on that cannot be cut out. (alphagdaq has 4 cores, "8 threads").
K.O.
P.S. Waiting for rack-mounted machines with AMD "X" series processors... K.O. |
28 May 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
There have been times in ALPHA that an alarm is triggered and the shift crew
are unclear who to contact if they aren't trained to fix the specific
failure mode.
I wish to add the property 'Users responsible' to the ODB for Alarms and
Programs.
I have drafted what this might look like in a new pull request:
https://bitbucket.org/tmidas/midas/pull-requests/22/add-users-responsible-
field-for-specific
It requires changing of several data structures, I think I have found all
instances of the definitions so the ODB should 'repair' any of the old
structures adding in users responsible.
If 'Users responsible' is set, MIDAS messages append them after the message
in brackets '()'. If used in conjunction with the MIDAS messenger
(mmessenger), the users responsible can be 'tagged' directly.
I.e, for slack, simply set the 'users responsible' to <@UserID|Nickname>,
for mattermost '@username', for discord '<@userid>'. Note that discord
doesn't allow you to tag by username, but numeric userid
I have expanded char array in 'al_trigger_class' to handle the potentially
longer MIDAS messages. Perhaps since I'm touching these lines I should
change these temporary containers to std::string (line 383 and 386 of
alarm.cxx)?
I have tested this quite a bit for my system, I am not sure how I can test
mjsonrpc. |
28 May 2021, Stefan Ritt, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
I think this is a good idea and I support it. We have a similar problem in MEG and
we solved that with external (bash) scripts called in case of alarms. One feature
there we have is that for some alarms, several people want to get notified. So
people can "subscribe" to certain alarms. The subscription are now handled inside
Slack which I like better, but maybe it would be good to have more than one "user
responsible". Like if one person is sleeping/traveling, it's good to have a
substitute. Can you make an array out of that? Or a comma-separated list?
Best,
Stefan |
28 May 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> I think this is a good idea and I support it. We have a similar problem in MEG and
> we solved that with external (bash) scripts called in case of alarms. One feature
> there we have is that for some alarms, several people want to get notified. So
> people can "subscribe" to certain alarms. The subscription are now handled inside
> Slack which I like better, but maybe it would be good to have more than one "user
> responsible". Like if one person is sleeping/traveling, it's good to have a
> substitute. Can you make an array out of that? Or a comma-separated list?
>
> Best,
> Stefan
Presently there are 256 characters in the 'users responsible' field, so you can just
list many users (no space, space or comma whatever). Discord, slack and mattermost
don't care, they just parse the user tags.
I can still make this an array and pass a std::vector<std::string> into
al_trigger_class function? |
28 May 2021, Stefan Ritt, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> I can still make this an array and pass a std::vector<std::string> into
> al_trigger_class function?
Maybe 256 chars are enough at the moment. If other people complain in the future, we can
re-visit.
Stefan |
28 May 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> > I can still make this an array and pass a std::vector<std::string> into
> > al_trigger_class function?
>
> Maybe 256 chars are enough at the moment. If other people complain in the future, we can
> re-visit.
>
> Stefan
Thinking about it, an array of maybe 80 character would give enough space for a name, a tag
and phone number. Do I need to budget memory very strictly? Would 32 entries of 80
characters be too much? |
28 May 2021, Stefan Ritt, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> > > I can still make this an array and pass a std::vector<std::string> into
> > > al_trigger_class function?
> >
> > Maybe 256 chars are enough at the moment. If other people complain in the future, we can
> > re-visit.
> >
> > Stefan
>
> Thinking about it, an array of maybe 80 character would give enough space for a name, a tag
> and phone number. Do I need to budget memory very strictly? Would 32 entries of 80
> characters be too much?
On that level memory is cheap.
Stefan |
28 May 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
I've updated the branch / pull request to use an array of 10 entries (80 chars each). 32 felt a
little overkill when I saw it on screen, but absolutely happy to set it to any number you
recommend.
The array gets flattened out when an alarm is triggered, currently the formatting produces
AlarmClass : AlarmMessage (Flattened List Of Users Responsible Array With Space Separators)
If experiments want to use Discord / Slack / Mattermost tags and or add phone numbers, that
should fit in 80 characters |
31 May 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
This list of responsible being attached to alarm message strings will be great for the
mmessenger, however, perhaps its going to generate very long messages for the speaker programs
(web interface and mlxspeaker ):
AlarmClass : AlarmMessage (ResponsibleUser1 ResponsibleUser2 ResponsibleUser3 ResponsibleUser4
... ResponsibleUser4)
especially if people put in user tags or emergency contact details...
Should we add a key word or character for the programs that create audio to parse that silence
the list of responsible users? I'd be tempted to use a single character but there is a risk
users might have that in a custom alarm message. Maybe something usual like the 'bel'
character? '|'?
Perhaps use the string 'Responsible:' or 'Users:' to trim out the Users Responsible list from
the message string?
AlarmClass : AlarmMessage Responsible:(ResponsibleUser1 ResponsibleUser2 ResponsibleUser3
ResponsibleUser4 ... ResponsibleUser4)
AlarmClass : AlarmMessage Users:(ResponsibleUser1 ResponsibleUser2 ResponsibleUser3
ResponsibleUser4 ... ResponsibleUser4) |
02 Jun 2021, Konstantin Olchanski, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> This list of responsible being attached to alarm message strings ...
This is a great idea. But I think we do not need to artificially limit ourselves
to string and array lengths.
The code in alarm.c should be changes to use std::string and std::vector<std::string> (STRING_LIST
#define), db_get_record() should be replaced with individual ODB reads (that's what it does behind
the scenes, but in a non-type and -size safe way).
I think the web page code will work correctly, it does not care about string lengths.
K.O. |
09 Jun 2021, Joseph McKenna, Suggestion, Have a list of 'users responsible' in Alarms and Programs odb entries
|
> > This list of responsible being attached to alarm message strings ...
>
> This is a great idea. But I think we do not need to artificially limit ourselves
> to string and array lengths.
>
> The code in alarm.c should be changes to use std::string and std::vector<std::string> (STRING_LIST
> #define), db_get_record() should be replaced with individual ODB reads (that's what it does behind
> the scenes, but in a non-type and -size safe way).
>
> I think the web page code will work correctly, it does not care about string lengths.
>
> K.O.
Auto growing lists is an excellent plan. I am making decent progress and should have something to
report soon |
05 Apr 2021, Konstantin Olchanski, Info, blog - convert mfe frontend to tmfe c++ framework
|
notes from converting ALPHA-g chronobox frontend fechrono to tmfe c++ framework.
the chronobox device is a timestamp/low resolution tdc/scaler/generic TTL and ECL io
mainboard with an altera DE10_NANO plugin board. it has a cyclone-5 FPGA SOC running Raspbian linux.
FPGA communication is done by avalon-bus memory mapped registers, main data readout
is PIO from an FPGA 32-bit wide FIFO (no DMA yet).
- login to main computer (daq16)
- cd packages
- git clone https://bitbucket.org/tmidas/midas midas-develop
- cd midas-develop
- make mini ### creates linux-x86_64/{bin,lib}
- ssh agdaq@cb02 ### private network
- cd ~/packages/midas-develop
- make mini ### creates linux-linux-armv7l/{bin/lib}
- cd ~/online/chronobox_software
- cat fechrono.cxx ~/packages/midas-develop/progs/tmfe_example_everything.cxx > fechrono_tmfe.cxx
- edit fechrono_tmfe.cxx:
- rename "FeEverything" to "FeChrono"
- copy contents of frontend_init() to HandleFrontendInit()
- copy contents of frontend_exit() to HandleFrontendExit()
- replace get_frontend_index() with fFe->fFeIndex
- replace "return SUCCESS" with return TMFeOk()
- replace "return !SUCCESS" with return TMFeErrorMessage("boo!!!")
- this frontend has 3 indexed equipments, copy EqEverything 3 times, rename EqEverything to EqCbHist, EqCbms, EqCbFlow
- copy contents of begin_of_run() to EqCbHist::HandleBeginRun()
- copy contents of end_of_run() to EqCbHist::HandleEndRun()
- pause_run(), resume_run() are empty, delete all HandlePauseRun() and all HandleResumeRun()
- frontend_loop() is empty, delete
- poll_event() and interrupt_configure() are empty, delete
- delete all HandleStartAbortRun(), delete all calls to RegisterTransitionStartAbort();
- examine equipment[]:
- "cbhist%02d" - periodic, copy contents of read_cbhist() to EqCbHist::HandlePeriodic()
- "cbms%02d" - polled, copy contents of read_cbms_fifo() to EqCbms::HandlePollRead()
- "cbflow%02d" - periodic, copy contents of read_flow() to EqCbFlow::HandlePeriodic()
- delete unused HandlePoll(), HandlePollRead() and HandlePeriodic()
- replace bk_init32() with "size_t event_size = 100*1024; char* event = (char*)malloc(event_size); ComposeEvent(event,
event_size); BkInit(event, event_size);"
- replace bk_create(pevent) with BkOpen(event)
- replace bk_close(pevent, ...) with BkClose(event, ...)
- replace "return bk_size(pevent)" with "EqSendEvent(event); free(event);"
- remove unused example SendData()
- if there linker complains about references to "hDB", add "HNDLE hDB" is global scope, add "hDB = fMfe->fDB"
- replace set_equipment_status() with EqSetStatus()
- move equipment configuration from the equipment[] array to the equipment constructors
- remove unused HandleRpc()
- remove unused HandleBeginRun() and unused HandleEndRun()
- remove all example code from HandleInit(), breakup frontend_init() code into per-equipment HandleInit() functions
- EqCbms::HandlePoll() replace all example code with "return true"
- if desired, replace ODB functions from utils.cxx with MVOdb RI(), RD(), etc
- if desired, replace cm_msg() with Msg() and delete "const char* frontend_name"
- update FeChrono() constructor:
FeSetName("fechrono%02d");
FeAddEquipment(new EqCbHist("cbhist%02d", __FILE__));
FeAddEquipment(new EqCbms("cbms%02d", __FILE__));
FeAddEquipment(new EqCbFlow("cbflow%02d", __FILE__));
- build:
g++ -std=c++11 -Wall -Wuninitialized -g -Ialtera -Dsoc_cv_av -I/home/agdaq/packages/midas-develop/include -
I/home/agdaq/packages/midas-develop/mvodb -c fechrono_tmfe.cxx
g++ -o fechrono_tmfe.exe -std=c++11 -Wall -Wuninitialized -g -Ialtera -Dsoc_cv_av -I/home/agdaq/packages/midas-develop/include
-I/home/agdaq/packages/midas-develop/mvodb fechrono_tmfe.o utils.o cb.o /home/agdaq/packages/midas-develop/linux-
armv7l/lib/libmidas.a -lm -lz -lutil -lnsl -lpthread -lrt
- run:
- bombs on bm_set_cache_size(), reduce default cache size, old mserver cannot deal with the new default size, set
fEqConfWriteCacheSize = 100*1024;
- run:
- prints too many messages, comment out print "HandlePollRead!"
- run:
- good now!
success, was not too bad.
also:
- replace gHaveRun with fMfe->fStateRunning
- replace gRunNumber with fMfe->fRunNumber
see tmfe.md section "variables provided by the framework"
K.O. |
05 Apr 2021, Konstantin Olchanski, Info, blog - convert mfe frontend to tmfe c++ framework
|
Result is here:
https://bitbucket.org/expalpha/chronobox_software/src/master/fechrono_tmfe.cxx
Original code is in fechrono.cxx. Not super pretty, but representative of most mfe-based frontends
we see around here. A good example of why the old mfe.c structure no longer works so well for us.
After conversion to tmfe, we do not win a beauty contest yet, but the path for further
clean up and refactoring into better c++ is quite clear. (And it is very obvious where
the missing "event object" wants to be here)
K.O. |
15 Jun 2021, Konstantin Olchanski, Info, blog - convert tmfe_rev0 event builder to develop-branch tmfe c++ framework
|
Now we are converting the alpha-g event builder from rev0 tmfe (midas-2020-xx) to the new tmfe c++
framework in midas-develop. Earlier, I followed the steps outlined in this blog
to convert this event builder from mfe.c framework to rev0 tmfe.
- get latest midas-develop
- examine progs/tmfe_example_everything.cxx
- open feevb.cxx
- comment-out existing main() function
- from tmfe_example_everything.cxx, copy class FeEverything and main() to the bottom of feevb.cxx
- comment-out old main()
- make sure we include the correct #include "tmfe.h"
- rename example frontend class FeEverything to FeEvb
- rename feevb's "rpc handler" and "periodic handler" class EvbEq to EqEvb
- update class declaration and constructor of EqEvb from EqEverything in example_everything: EqEvb extends TMFeEquipment,
EqEvb constructor calls constructor of base class (c++ bogosity), keep the bits of the example that initialize the
equipment "common"
- in EqEvb, remove data members fMfe and fEq: fMfe is now inherited from the base class, fEq is now "this"
- in FeEvb constructor, wire-in the EqEvb constructor: FeSetName("feevb") and FeAddEquipment(new EqEvb("EVB",__FILE__))
- migrate function names:
- fEq->SendEvent() with EqSendEvent()
- fEq->SetStatus() with EqSetStatus()
- fEq->ZeroStatistics() with EqZeroStatistics() -- can be removed, taken care of in the framework
- fEq->WriteStatistics() with EqWriteStatistics() -- can be removed, taken care of in the framework
- (my feevb.o now compiles, but will not work, yet, keep going:)
- EqEvb - update prototypes of all HandleFoo() methods per example_everything.cxx or per tmfe.h: otherwise the framework
will not call them. c++ compiler will not warn about this!
- migrate old main():
- restore initialization of "common" and other things done in the old main():
- TMFeCommon was merged into TMFeEquipment, move common->Foo = ... to the EqEvb constructor, consult tmfe.h and tmfe.md
for current variable names.
- consider adding "fEqConfReadConfigFromOdb = false;" (see tmfe.md)
- if EqEvb has a method Init() called from old main(), change it's name to HandleInit() with correct arguments.
- split EqEvb constructor: leave initialization of "common" in the constructor, move all functions, etc into HandleInit()
- move fMfe->SetTransitionSequenceFoo() calls to HandleFrontendInit()
- move fMfe->DeregisterTransition{Pause,Resume}() to HandleFrontendInit()
- old main should be empty now
- remove linking tmfe_rev0.o from feevb Makefile, now it builds!
- try to run it!
- it works!
- done.
K.O. |
02 Jun 2021, Konstantin Olchanski, Info, bitbucket build truncated
|
I truncated the bitbucket build to only build on ubuntu LTS 20.04.
Somehow all the other build targets - centos-7, centos-8, ubuntu-18 - have
an obsolete version of cmake. I do not know where the bitbucket os images
get these obsolete versions of cmake - my centos-7 and centos-8 have much
more recent versions of cmake.
If somebody has time to figure it out, please go at it, I would like very
much to have centos-7 and centos-8 builds restored (with ROOT), also
to have a ubuntu LTS 20.04 build with ROOT. (For me, debugging bitbucket
builds is extremely time consuming).
Right now many midas cmake files require cmake 3.12 (released in late 2018).
I do not know why that particular version of cmake (I took the number
from the tutorials I used).
I do not know what is the actual version of cmake that MIDAS (and ROOTANA)
require/depend on.
I wish there were a tool that would look at a cmake file, examine all the
features it uses and report the lowest version of cmake that supports them.
K.O. |
12 May 2021, Pierre Gorel, Bug Report, History formula not correctly managed
|
OS: OSX 10.14.6 Mojave
MIDAS: Downloaded from repo on April 2021.
I have a slow control frontend doing the command/readout of a MPOD HV/LV. Since I am reading out the current that are in nA (after updating snmp), I wanted to multiply the number by 1e9.
I noticed the new "Formula" field (introduced in 2019 it seems) instead of the "Factor/Offset" I was used to. None of my entries seems to be accepted (after hitting save, when coming back thee field is empty).
Looking in ODB in "/History/Display/MPOD/HV (Current)/", the field "Formula" is a string of size 32 (even if I have multiple plots in that display). I noticed that the fields "Factor" and "Offset" are still existing and they are arrays with the correct size. However, changing the values does not seem to do anything.
Deleting "Formula" by hand and creating a new field as an array of string (of correct length) seems to do the trick: the formula is displayed in the History display config, and correctly used. |
02 Jun 2021, Konstantin Olchanski, Bug Report, History formula not correctly managed
|
> OS: OSX 10.14.6 Mojave
> MIDAS: Downloaded from repo on April 2021.
>
> I have a slow control frontend doing the command/readout of a MPOD HV/LV. Since I am reading out the current that are in nA (after updating snmp), I wanted to multiply the number by 1e9.
>
> I noticed the new "Formula" field (introduced in 2019 it seems) instead of the "Factor/Offset" I was used to. None of my entries seems to be accepted (after hitting save, when coming back thee field is empty).
>
> Looking in ODB in "/History/Display/MPOD/HV (Current)/", the field "Formula" is a string of size 32 (even if I have multiple plots in that display). I noticed that the fields "Factor" and "Offset" are still existing and they are arrays with the correct size. However, changing the values does not seem to do anything.
>
> Deleting "Formula" by hand and creating a new field as an array of string (of correct length) seems to do the trick: the formula is displayed in the History display config, and correctly used.
I see this, too. Problem is that the history plot code must be compatible with both
the old scheme (factor/offset) and the new scheme (formula). But something goes wrong somewhere.
https://bitbucket.org/tmidas/midas/issues/307/history-plot-config-incorrect-in-odb
Why?
- new code cannot to "3 year" plots, old code has no problem with it
- old experiments (alpha1, etc) have only the old-style history plot definitions,
and both old and new plotting code should be able to show them (there is nobody
to convert this old stuff to the "new way", but we still desire to be able to look at it!)
K.O. |
24 May 2021, Mathieu Guigue, Bug Report, Bug "is of type"
|
Hi,
I am running a simple FE executable that is supposed to define a PRAW DWORD bank.
The issue is that, right after the start of the run, the logger crashes without messages.
Then the FE reports this error, which is rather confusing.
```
12:59:29.140 2021/05/24 [feTestDatastruct,ERROR] [odb.cxx:6986:db_set_data1,ERROR] "/Equipment/Trigger/Variables/PRAW" is of type UINT32, not UINT32
``` |
02 Jun 2021, Konstantin Olchanski, Bug Report, Bug "is of type"
|
> Hi,
>
> I am running a simple FE executable that is supposed to define a PRAW DWORD bank.
> The issue is that, right after the start of the run, the logger crashes without messages.
> Then the FE reports this error, which is rather confusing.
> ```
> 12:59:29.140 2021/05/24 [feTestDatastruct,ERROR] [odb.cxx:6986:db_set_data1,ERROR] "/Equipment/Trigger/Variables/PRAW" is of type UINT32, not UINT32
> ```
I think this is fixed in latest midas. There was a typo in this message, the same tid was printed twice,
with result you report "mismatch UINT32 and UINT32", instead of "mismatch of UINT32 vs what is actually there".
This fixes the message, after that you have to manually fix the mismatch in the data type in ODB (delete old one, I guess).
K.O. |
26 May 2021, Marco Chiappini, Info, label ordering in history plot
|
Dear all,
is there any way to order the labels in the history plot legend? In the old
system there was the “order” column in the config panel, but I can not find it
in the new system. Thanks in advance for the support.
Best regards,
Marco Chiappini |
02 Jun 2021, Konstantin Olchanski, Info, label ordering in history plot
|
> is there any way to order the labels in the history plot legend? In the old
> system there was the “order” column in the config panel, but I can not find it
> in the new system. Thanks in advance for the support.
correct, for reasons unknown, the function to reorder and to delete individual
entries was removed from the history panel editor.
K.O. |
02 Jun 2021, Konstantin Olchanski, Info, label ordering in history plot
|
> > is there any way to order the labels in the history plot legend? In the old
> > system there was the “order” column in the config panel, but I can not find it
> > in the new system. Thanks in advance for the support.
>
> correct, for reasons unknown, the function to reorder and to delete individual
> entries was removed from the history panel editor.
>
> K.O.
https://bitbucket.org/tmidas/midas/issues/284/history-panel-editor-reordering-of
K.O. |
27 May 2021, Lukas Gerritzen, Bug Report, Wrong location for mysql.h on our Linux systems
|
Hi,
with the recent fix of the CMakeLists.txt, it seems like another bug surfaced.
In midas/progs/mlogger.cxx:48/49, the mysql header files are included without a
prefix. However, mysql.h and mysqld_error.h are in a subdirectory, so for our
systems, the lines should be
48 #include <mysql/mysql.h>
49 #include <mysql/mysqld_error.h>
This is the case with MariaDB 10.5.5 on OpenSuse Leap 15.2, MariaDB 10.5.5 on
Fedora Workstation 34 and MySQL 5.5.60 on Raspbian 10.
If this problem occurs for other Linux/MySQL versions as well, it should be
fixed in mlogger.cxx and midas/src/history_schema.cxx.
If this problem only occurs on some distributions or MySQL versions, it needs
some more differentiation than #ifdef OS_UNIX.
Also, this somehow seems familiar, wasn't there such a problem in the past? |
27 May 2021, Nick Hastings, Bug Report, Wrong location for mysql.h on our Linux systems
|
Hi,
> with the recent fix of the CMakeLists.txt, it seems like another bug
surfaced.
> In midas/progs/mlogger.cxx:48/49, the mysql header files are included without
a
> prefix. However, mysql.h and mysqld_error.h are in a subdirectory, so for our
> systems, the lines should be
> 48 #include <mysql/mysql.h>
> 49 #include <mysql/mysqld_error.h>
> This is the case with MariaDB 10.5.5 on OpenSuse Leap 15.2, MariaDB 10.5.5 on
> Fedora Workstation 34 and MySQL 5.5.60 on Raspbian 10.
>
> If this problem occurs for other Linux/MySQL versions as well, it should be
> fixed in mlogger.cxx and midas/src/history_schema.cxx.
> If this problem only occurs on some distributions or MySQL versions, it needs
> some more differentiation than #ifdef OS_UNIX.
What does "mariadb_config --cflags" or "mysql_config --cflags" return on
these systems? For mariadb 10.3.27 on Debian 10 it returns both paths:
% mariadb_config --cflags
-I/usr/include/mariadb -I/usr/include/mariadb/mysql
Note also that mysql.h and mysqld_error.h reside in /usr/include/mariadb *not*
/usr/include/mariadb/mysql so using "#include <mysql/mysql.h>" would not work.
On CentOS 7 with mariadb 5.5.68:
% mysql_config --include
-I/usr/include/mysql
% ls -l /usr/include/mysql/mysql*.h
-rw-r--r--. 1 root root 38516 May 6 2020 /usr/include/mysql/mysql.h
-r--r--r--. 1 root root 76949 Oct 2 2020 /usr/include/mysql/mysqld_ername.h
-r--r--r--. 1 root root 28805 Oct 2 2020 /usr/include/mysql/mysqld_error.h
-rw-r--r--. 1 root root 24717 May 6 2020 /usr/include/mysql/mysql_com.h
-rw-r--r--. 1 root root 1167 May 6 2020 /usr/include/mysql/mysql_embed.h
-rw-r--r--. 1 root root 2143 May 6 2020 /usr/include/mysql/mysql_time.h
-r--r--r--. 1 root root 938 Oct 2 2020 /usr/include/mysql/mysql_version.h
So this seems to be the correct setup for both Debian and RHEL. If this is to
be worked around in Midas I would think it would be better to do it at the
cmake level than by putting another #ifdef in the code.
Cheers,
Nick. |
02 Jun 2021, Konstantin Olchanski, Bug Report, Wrong location for mysql.h on our Linux systems
|
> % mariadb_config --cflags
> -I/usr/include/mariadb -I/usr/include/mariadb/mysql
I get similar, both .../include and .../include/mysql are in my include path,
so both #include "mysql/mysql.h" and #include "mysql.h" work.
I added a message to cmake to report the MySQL CFLAGS and libraries, so next time
this is a problem, we can see what happened from the cmake output:
4ed0:midas olchansk$ make cmake | grep MySQL
...
-- MIDAS: Found MySQL version 10.4.16
-- MIDAS: MySQL CFLAGS: -I/opt/local/include/mariadb-10.4/mysql;-I/opt/local/include/mariadb-
10.4/mysql/mysql and libs: -L/opt/local/lib/mariadb-10.4/mysql/ -lmariadb
K.O. |
27 May 2021, Joseph McKenna, Info, MIDAS Messenger - A program to send MIDAS messages to Discord, Slack and or Mattermost
|
I have created a simple program that parses the message buffer in MIDAS and
sends notifications by webhook to Discord, Slack and or Mattermost.
Active pull request can be found here:
https://bitbucket.org/tmidas/midas/pull-requests/21
Its written in python and CMake will install it in bin (if the Python3 binary
is found by cmake). The only dependency outside of the MIDAS python library is
'requests', full documentation are in the mmessenger.md |
28 May 2021, Joseph McKenna, Info, MIDAS Messenger - A program to forward MIDAS messages to Discord, Slack and or Mattermost merged
|
A simple program to forward MIDAS messages to Discord, Slack and or Mattermost
(Python 3 required)
Pull request accepted! Documentation can be found on the wiki
https://midas.triumf.ca/MidasWiki/index.php/Mmessenger |
02 Jun 2021, Konstantin Olchanski, Info, MIDAS Messenger - A program to forward MIDAS messages to Discord, Slack and or Mattermost merged
|
> A simple program to forward MIDAS messages to Discord, Slack and or Mattermost
>
> (Python 3 required)
>
> Pull request accepted! Documentation can be found on the wiki
>
> https://midas.triumf.ca/MidasWiki/index.php/Mmessenger
This sounds like a very useful and welcome addition to MIDAS.
But from documentation provided, I have clue how to activate it.
Perhaps it would help if you could write up the basic steps on how to go about it, i.e.
- go to discord
- push these buttons
- cut and paste this thingy from the web page to ODB
K.O. |
19 May 2021, Francesco Renga, Suggestion, MYSQL logger
|
Dear all,
I'm trying to use the logging on a mysql DB. Following the instructions on
the Wiki, I recompiled MIDAS after installing mysql, and cmake with NEED_MYSQL=1
can find it:
-- MIDAS: Found MySQL version 8.0.23
Then, I compiled my frontend (cmake with no options + make) and run it, but in the
ODB I cannot find the tree for mySQL. I have only:
Logger/Runlog/ASCII
while I would expect also:
Logger/Runlog/SQL
What could be missing? Maybe should I add something in the CMakeList file or run
cmake with some option?
Thank you,
Francesco |
21 May 2021, Francesco Renga, Suggestion, MYSQL logger
|
I solved this, it was a failed "make clean" before recompiling. Now it works.
Sorry for the noise.
Francesco
> Dear all,
> I'm trying to use the logging on a mysql DB. Following the instructions on
> the Wiki, I recompiled MIDAS after installing mysql, and cmake with NEED_MYSQL=1
> can find it:
>
> -- MIDAS: Found MySQL version 8.0.23
>
> Then, I compiled my frontend (cmake with no options + make) and run it, but in the
> ODB I cannot find the tree for mySQL. I have only:
>
> Logger/Runlog/ASCII
>
> while I would expect also:
>
> Logger/Runlog/SQL
>
> What could be missing? Maybe should I add something in the CMakeList file or run
> cmake with some option?
>
> Thank you,
> Francesco |
19 May 2021, Konstantin Olchanski, Info, update of event buffer code
|
a big update to the event buffer code was merged today.
two important bug fixes:
- a logic error in bm_receive_event() (actually bm_fill_read_cache_locked())
caused use of uninitialized variable to increment the read pointer and crash
with error "read pointed points to an invalid event")
- missing bm_unlock() in bm_flush_cache() caused double-locking of event buffer
caused a hang and a subsequent crash via the watchdog timeout.
several improvements:
- bm_receive_event_vec(std::vector<char>) with automatic memory allocation, one
does not need to worry about providing a large event buffer to receive event
data. For local connections MAX_EVENT_SIZE is no longer used, for remote
connections, a buffer of MAX_EVENT_SIZE is allocated automatically, this is a
limitation of the MIDAS RPC layer (it does not know how to allocate memory to
receive arbitrary large data)
(MAX_EVENT_SIZE is now only used in bm_receive_event_rpc()).
- rpc_send_event_sg() - thread safe method to send events to the mserver. it
takes an array of scatter-gather buffers, so a midas event does not have to be
in one continious buffer.
- bm_send_event_sg() - same for local connections.
- on top of bm_send_event_sg() we now have bm_send_event_vec(std::vector<char>)
and bm_send_event_vec(std::vector<vector<char>>). now we can move forward with
implementing a new "event object" (the TMEvent event object from midasio.h
already works with these new methods).
- remote connected bm_send_event() & co now always send events to the mserver
using the event socket. (before, bm_send_event() used RPC_BM_SEND_EVENT and
suffered from the RPC layer encoding/decoding overhead. mfe.c used
rpc_send_event() for remote connections)
- bm_send_event(), bm_receive_event() & co now take a timeout value (in
milliseconds) instead of an async_flag. The old async_flag values BM_WAIT and
BM_NO_WAIT continue working as expected (wait forever and do not wait at all,
respectively).
- following improvements are only for remote connections:
- in the case of event buffer congestion (event readers are slow, event buffers
are close to 100% full), the bm_flush_cache() RPC will no longer timeout due to
mserver being stuck waiting for free buffer space. (RPC is called with a 1000
msec timeout, infinite loop waiting for flush is done on the frontend side, the
RPC timeout will never fire)
- in the case of event buffer congestion, ODB RPC will no longer timeout.
(previously mserver was stuck waiting for free buffer space and did not process
any RPCs).
- at the end of run, last few events could be stuck in the event socket. now,
frontends can flush it using bm_flush_cache(0,BM_WAIT) (use zero for the buffer
handle). correct run transition should stop the trigger, stop generating new
events, call bm_flush_cache(0,BM_WAIT), call bm_flush_cache("SYSTEM",BM_WAIT)
and return success. (TMFE frontend already does this). Note that
bm_flush_cache(BM_WAIT) can be stuck for very long time waiting for the event
buffers to empty-out, so run transition RPC timeout is still possible.
K.O. |
07 May 2021, Zaher Salman, Bug Report, modbselect trigget hotlink
|
It seems that a modbselect triggers a "change" in an ODB which has a hot link. This happens onload (or whenever the custom page is reloaded) and otherwise it behaves as expected, i.e. no change unless the modbselect is actually changed. Is this the intended behaviour? can this be modified? |
10 May 2021, Stefan Ritt, Bug Report, modbselect trigget hotlink
|
Thanks for reporting that bug, I fixed it in the last commit.
Stefan |
06 May 2021, Ben Smith, Info, New feature in odbxx that works like db_check_record()
|
For those unfamiliar, odbxx is the interface that looks like a C++ map, but automatically syncs with the ODB - https://midas.triumf.ca/MidasWiki/index.php/Odbxx.
I've added a new feature that is similar to the existing odb::connect() function, but works like the old db_check_record(). The new odb::connect_and_fix_structure() function:
- keeps the existing value of any keys that are in both the ODB and your code
- creates any keys that are in your code but not yet in the ODB
- deletes any keys that are in the ODB but not your code
- updates the order of keys in the ODB to match your code
This will hopefully make it easier to automate ODB structure changes when you add/remove keys from a frontend.
The new feature is currently in the develop branch, and should be included in the next release. |
16 Feb 2021, Ruslan Podviianiuk, Forum, m is not defined error
|
Hello,
I see this mhttpd error starting MSL-script:
Uncaught (in promise) ReferenceError: m is not defined
at mhttpd_message (VM2848 mhttpd.js:2304)
at VM2848 mhttpd.js:2122
As I can see it does not affect work of MSL script but shows ReferenceError in
Midas sequencer (see picture).
Could please point me how to fix this error?
Thanks.
Ruslan |
25 Feb 2021, Konstantin Olchanski, Forum, m is not defined error
|
> I see this mhttpd error starting MSL-script:
> Uncaught (in promise) ReferenceError: m is not defined
> at mhttpd_message (VM2848 mhttpd.js:2304)
> at VM2848 mhttpd.js:2122
your line numbers do not line up with my copy of mhttpd.js. what version of midas
do you run?
please give me the output of odbedit "ver" command (GIT revision, looks like this:
IT revision: Wed Feb 3 11:47:02 2021 -0800 - midas-2020-08-a-84-g78d18b1c on
branch feature/midas-2020-12).
same info is in the midas "help" page (GIT revision).
to decipher the git revision string:
midas-2020-08-a-84-g78d18b1c means:
is commit 78d18b1c
which is 84 commits after git tag midas-2020-08-a
"on branch feature/midas-2020-12" confirms that I have the midas-2020-12 pre-
release version without having to do all the decoding above.
if you also have "-dirty" it means you changed something in the source code
and warranty is voided. (just joking! we can debug even modified midas source
code)
K.O. |
05 May 2021, Zaher Salman, Forum, m is not defined error
|
We had the same issue here, which comes from mhttpd.js line 2395 on the current git version. This seems to happen mostly when there is an alarm triggered or when there is an error message.
Anyway, the easiest solution for us was to define m at the beginning of mhttpd_message function
let m;
and replace line 2395 with
if (m !== undefined) {
> > I see this mhttpd error starting MSL-script:
> > Uncaught (in promise) ReferenceError: m is not defined
> > at mhttpd_message (VM2848 mhttpd.js:2304)
> > at VM2848 mhttpd.js:2122
>
> your line numbers do not line up with my copy of mhttpd.js. what version of midas
> do you run?
>
> please give me the output of odbedit "ver" command (GIT revision, looks like this:
> IT revision: Wed Feb 3 11:47:02 2021 -0800 - midas-2020-08-a-84-g78d18b1c on
> branch feature/midas-2020-12).
>
> same info is in the midas "help" page (GIT revision).
>
> to decipher the git revision string:
>
> midas-2020-08-a-84-g78d18b1c means:
> is commit 78d18b1c
> which is 84 commits after git tag midas-2020-08-a
>
> "on branch feature/midas-2020-12" confirms that I have the midas-2020-12 pre-
> release version without having to do all the decoding above.
>
> if you also have "-dirty" it means you changed something in the source code
> and warranty is voided. (just joking! we can debug even modified midas source
> code)
>
> K.O. |
06 May 2021, Stefan Ritt, Forum, m is not defined error
|
Thanks for reporting and pointing to the right location.
I fixed and committed it.
Best,
Stefan |
09 Apr 2021, Lars Martin, Suggestion, Time zone selection for web page
|
The new history as well as the clock in the web page header show the local time
of the user's computer running the browser.
Would it be possible to make it either always use the time zone of the Midas
server, or make it selectable from the config page?
It's not ideal trying to relate error messages from the midas.log to history
plots if the time stamps don't match. |
14 Apr 2021, Stefan Ritt, Suggestion, Time zone selection for web page
|
> The new history as well as the clock in the web page header show the local time
> of the user's computer running the browser.
> Would it be possible to make it either always use the time zone of the Midas
> server, or make it selectable from the config page?
> It's not ideal trying to relate error messages from the midas.log to history
> plots if the time stamps don't match.
I implemented a new row in the config page to select the time zone.
"Local": Time zone where the browser runs
"Server": Time zone where the midas server runs (you have to update mhttpd for that)
"UTC+X": Any other time zone
The setting affects both the status header and the history display.
I spent quite some time with "named" time zones like "PST" "EST" "CEST", but the
support for that is not that great in JavaScript, so I decided to go with simple
UTC+X. Hope that's ok.
Please give it a try and let me know if it's working for you.
Best,
Stefan |
29 Apr 2021, Pierre-Andre Amaudruz, Suggestion, Time zone selection for web page
|
> > The new history as well as the clock in the web page header show the local time
> > of the user's computer running the browser.
> > Would it be possible to make it either always use the time zone of the Midas
> > server, or make it selectable from the config page?
> > It's not ideal trying to relate error messages from the midas.log to history
> > plots if the time stamps don't match.
>
> I implemented a new row in the config page to select the time zone.
>
> "Local": Time zone where the browser runs
> "Server": Time zone where the midas server runs (you have to update mhttpd for that)
> "UTC+X": Any other time zone
>
> The setting affects both the status header and the history display.
>
> I spent quite some time with "named" time zones like "PST" "EST" "CEST", but the
> support for that is not that great in JavaScript, so I decided to go with simple
> UTC+X. Hope that's ok.
>
> Please give it a try and let me know if it's working for you.
>
> Best,
> Stefan
Hi Stefan,
This is great, the UTC+x is perfect, thank you.
PAA |
10 Mar 2021, Zaher Salman, Suggestion, embed modbvalue in SVG
|
Is it possible to embed modbvalue in an SVG for use within a custom page?
thanks. |
10 Mar 2021, Stefan Ritt, Suggestion, embed modbvalue in SVG
|
You can't really embed it, but you can overlay it. You tag the SVG with a
"relative" position and then move the modbvalue with an "absolute" position over
it:
<svg style="position:relative" width="400" height="100">
<rect width="300" height="100" style="fill:rgb(255,0,0);stroke-width:3;stroke:rgb(0,0,0)" />
<div class="modbvalue" style="position:absolute;top:50px;left:50px" data-odb-path="/Runinfo/Run number"></div>
</svg> |
26 Apr 2021, Zaher Salman, Suggestion, embed modbvalue in SVG
|
I found a way to embed modbvalue into a SVG:
<text x="100" y="100" font-size="30rem">
Run=<tspan class="modbvalue" data-odb-path="/Runinfo/Run number"></tspan>
</text>
This seems to behave better that the suggestion below.
> You can't really embed it, but you can overlay it. You tag the SVG with a
> "relative" position and then move the modbvalue with an "absolute" position over
> it:
>
> <svg style="position:relative" width="400" height="100">
> <rect width="300" height="100" style="fill:rgb(255,0,0);stroke-width:3;stroke:rgb(0,0,0)" />
> <div class="modbvalue" style="position:absolute;top:50px;left:50px" data-odb-path="/Runinfo/Run number"></div>
> </svg> |
25 Mar 2021, Lars Martin, Bug Report, Minor bug: Change all time axes together doesn't work with +- buttons
|
Version: release/midas-2020-12
In the new history display, the checkbox "Change all time axes together" works
well with the mouse-based zoom, but does not apply to the +- buttons. |
14 Apr 2021, Stefan Ritt, Bug Report, Minor bug: Change all time axes together doesn't work with +- buttons
|
> Version: release/midas-2020-12
>
> In the new history display, the checkbox "Change all time axes together" works
> well with the mouse-based zoom, but does not apply to the +- buttons.
Fixed in current commit.
Stefan |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
Version: release/midas-2020-12
I'm exporting the history data shown in elog:2132/1 to CSV, but when I look at the
CSV data, the step no longer occurs at the same time in both data sets (elog:2132/2) |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
History is from two separate equipments/frontends, but both have "Log history" set to 1. |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
Tried with export of two different time ranges, and the shift appears to remain the same,
about 4040 rows. |
24 Mar 2021, Stefan Ritt, Bug Report, Time shift in history CSV export
|
I confirm there is a problem. If variables are from the same equipment, they have the same
time stamps, like
t1 v1(t1) v2(t1)
t2 v1(t2) v2(t2)
t3 v1(t3) v2(t3)
when they are from different equipments, they have however different time stamps
t1 v1(t1)
t2 v2(t2)
t3 v1(t3)
t4 v2(t4)
The bug in the current code is that all variables use the time stamps of the first variable,
which is wrong in the case of different equipments, like
t1 v1(t1) v2(*t2*)
t3 v1(t3) v2(*t4*)
So I can change the code, but I'm not sure what would be the bast way. The easiest would be to
export one array per variable, like
t1 v1(t1)
t2 v1(t2)
...
t3 v2(t3)
t4 v2(t4)
...
Putting that into a single array would leave gaps, like
t1 v1(t1) [gap]
t2 [gap] v2(t2)
t3 v1(t3) [gap]
t4 [ga]] v2(t4)
plus this is programmatically more complicated, since I have to merge two arrays. So which
export format would you prefer?
Stefan |
24 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
I think from my perspective the separate files are fine. I personally don't really like the format
with the gaps, so don't see an advantage in putting in the extra work.
I'm surprised the shift is this big, though, it was more than a whole hour in my case, is it the
time difference between when the frontends were started? |
14 Apr 2021, Stefan Ritt, Bug Report, Time shift in history CSV export
|
I finally found some time to fix this issue in the latest commit. Please update and check if it's
working for you.
Stefan |
03 Apr 2020, Stefan Ritt, Info, Change of TID_xxx data types
|
We have to request of a 64-bit integer data type to be included in MIDAS banks.
Since 64-bit integers are on some systems "long" and on other systems "long long",
I decided to create the two new data types
TID_INT64
TID_UINT64
which follows more the standard C++ tradition:
https://en.cppreference.com/w/cpp/types/integer
To be consistent, I renamed the old types:
TID_BYTE -> TID_UINT8
TID_SBYTE -> TID_INT8
TID_WORD -> TID_UINT16
TID_SHORT -> TID_INT16
TID_DWORD -> TID_UINT32
TID_INT -> TID_INT32
I left the old definitions in midas.h, so old code will still compile fine and be binary
compatible. But if you write new code, the new types are recommended.
If you save the ODB in ASCII format, the new types are used as stings as well, like
[/Experiment]
ODB timeout = INT32 : 10000
but the old types are still understood when you load an old ODB file.
I hope I didn't break anything, please report if you have any issue.
Stefan |
30 Mar 2021, Konstantin Olchanski, Info, INT64/UINT64/QWORD not permitted in ODB and history... Change of TID_xxx data types
|
> We have to request of a 64-bit integer data type to be included in MIDAS banks.
> Since 64-bit integers are on some systems "long" and on other systems "long long",
> I decided to create the two new data types
>
> TID_INT64
> TID_UINT64
>
These 64-bit data types do not work with ODB and they do not work with the MIDAS history.
As of commits on 30 March 2021, mlogger will refuse to write them to the history and
db_create_key() will refuse to create them in ODB.
Why these limitations:
a1) all reading of history is done using the "double" data type, IEEE-754 double precision
floating point numbers have around 53 bits of precision and are too small to represent all
possible values of 64-bit integers.
a2) SQL, SQLite and FILE history know nothing about reading and writing 64-bit integer data
types (this should be easy to fix, as long as MySQL/MariaDB and PostgresQL support it)
b1) in ODB, odbedit and mhttd web pages do not display INT64/UINT64/QWORD data
b2) ODB save and restore from odb, xml and json formats most likely does not work for these
data types
Fixing all this is possible, with a medium amount of work. As long as somebody needs it.
Display of INT64/UINT64/QWORD on history plots will probably forever be truncated to
"double" precision.
K.O. |
14 Apr 2021, Stefan Ritt, Info, INT64/UINT64/QWORD not permitted in ODB and history... Change of TID_xxx data types
|
> These 64-bit data types do not work with ODB and they do not work with the MIDAS history.
They were never meant to work with the history. They were primarily implemented to put large 64-
bit data words into midas banks. We did not yet have a request to put these values into the ODB.
Once such a request comes, we can address this.
Stefan |
04 Apr 2021, Konstantin Olchanski, Info, Change of TID_xxx data types
|
>
> To be consistent, I renamed the old types:
>
> TID_DWORD -> TID_UINT32
> TID_INT -> TID_INT32
>
this created an incompatibility with old XML save files,
old versions of midas cannot load new XML save files,
variable types have changed i.e. from "INT" to "INT32".
it would have been better if XML save files kept using the old names.
now packages that read midas XML files also need updating.
specifically, in ROOTANA:
- the old TVirtualOdb/XmlOdb.cxx (no longer used, deleted),
- mvodb/mxmlodb.cxx
K.O. |
12 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button.
|
Hi all,
I'm running into a curious problem when I try to run a program using my custom
script button. I have been using a script button to start my DAQ, this button
has always worked. It starts by exporting an absolute path to scripts and then
runs scripts, my frontend, my analyzer, and mlogger relative to this path.
I recently added a line of code to run a new script "logic_controller". If I run
the script_daq from my terminal (./start_daq), mhttpd accepts the client and the
program works as intended. But, if I use the script button, the logic_controller
program is immediately deleted by MIDAS. It can be seen appearing in the status
page clients list and then immediately gets deleted. This is a client that runs
on the local experiment host.
What might be the issue? What is the difference between running the script
through the terminal as opposed to running it through the mhttpd button?
I have added a picture of my simple script and the logic_controller code.
Any help would be greatly appreciated.
Cheers.
Isaac |
12 Apr 2021, Ben Smith, Forum, Client gets immediately removed when using a script button.
|
> if I use the script button, the logic_controller program is immediately deleted by MIDAS.
This is indeed very curious, and I can't reproduce it on my test experiment. Can you redirect stdout and stderr from the logic_controller program into a file, to see how far the program gets? If it gets to the while loop at the end, then it would be useful to add some debug statements to see what condition causes it to exit the loop.
Are there any relevant messages in the midas message log about the program being killed? What's the value of "/Programs/logic_controller/Watchdog timeout"? |
12 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button.
|
> > if I use the script button, the logic_controller program is immediately deleted by MIDAS.
>
> This is indeed very curious, and I can't reproduce it on my test experiment. Can you redirect stdout and stderr from the logic_controller program into a file, to see how far the program gets? If it gets to the while loop at the end, then it would be useful to add some debug statements to see what condition causes it to exit the loop.
I have redirected stdout and stderr into a text file and I have attached it to this entry. From what the stdout says, it seems that the lambda
function gets called 4 times before the program disconnects from the experiment. Somehow the status must become SS_ABORT or RPC_SHUTDOWN.
> Are there any relevant messages in the midas message log about the program being killed? What's the value of "/Programs/logic_controller/Watchdog timeout"?
There are no interesting messages in the midas.log and "/Programs/logic_controller/Watchdog timeout" is 10000 when I run the command from the terminal window.
What happens when you run it on your test experiment?
I'll try some more debugging.
Thanks for helping me out! Cheers.
Isaac |
12 Apr 2021, Ben Smith, Forum, Client gets immediately removed when using a script button.
|
I think it would be useful to find the minimal example that exhibits this behaviour.
What happens if your logic controller code is simply the 17 lines below? What happens if you create another script button that only starts the logic controller, not any of the other programs? etc. Gradually re-add features until you hit the problem (or scream in horror if it breaks with 17 lines of C++ and a 1 line shell script).
#include "midas.h"
#include "stdio.h"
int main() {
cm_connect_experiment("", "", "logic_controller", NULL);
do {
int status = cm_yield(100);
printf("cm_yield returned %d\n", status);
if (status == SS_ABORT || status == RPC_SHUTDOWN)
break;
} while (!ss_kbhit());
cm_disconnect_experiment();
return 0;
} |
13 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button.
|
> I think it would be useful to find the minimal example that exhibits this behaviour.
>
> What happens if your logic controller code is simply the 17 lines below? What happens if you create another script button that only starts the logic controller, not any of the other programs? etc. Gradually re-add features until you hit the problem (or scream in horror if it breaks with 17 lines of C++ and a 1 line shell script).
>
Hi Ben,
I have followed your suggestions and the program still stops immediately. My status as returned from "cm_yield(100)" is always 412 (SS_TIMEOUT) which is fine.
The issue is that, when run with the script button, the do-wile loop stops immediately because the !ss_kbhit() always evaluates to FALSE.
My temporary solution has been to let the loop run forever :)
Let me know what think. Thanks again!
Isaac
>
>
> #include "midas.h"
> #include "stdio.h"
>
> int main() {
> cm_connect_experiment("", "", "logic_controller", NULL);
>
> do {
> int status = cm_yield(100);
> printf("cm_yield returned %d\n", status);
> if (status == SS_ABORT || status == RPC_SHUTDOWN)
> break;
> } while (!ss_kbhit());
>
> cm_disconnect_experiment();
>
> return 0;
> } |
13 Apr 2021, Stefan Ritt, Forum, Client gets immediately removed when using a script button.
|
> I have followed your suggestions and the program still stops immediately. My status as returned from "cm_yield(100)" is always 412 (SS_TIMEOUT) which is fine.
> The issue is that, when run with the script button, the do-wile loop stops immediately because the !ss_kbhit() always evaluates to FALSE.
>
> My temporary solution has been to let the loop run forever :)
Ahh, could be that ss_kbhit() misbehaves if there is no keyboard, meaning that it is started in the background as a script.
We never had the issue before, since all "standard" midas programs like mlogger, mhttpd etc. also use ss_kbhit() and they
can be started in the background via the "-D" flag, but maybe the stdin is then handled differentlhy.
So just remove the ss_kbhit(), but keep the break, so that you can stop your program via the web page, like
#include "midas.h"
#include "stdio.h"
int main() {
cm_connect_experiment("", "", "logic_controller", NULL);
do {
int status = cm_yield(100);
printf("cm_yield returned %d\n", status);
if (status == SS_ABORT || status == RPC_SHUTDOWN)
break;
} while (TRUE);
cm_disconnect_experiment();
return 0;
} |
|