20 Nov 2013, Konstantin Olchanski, Bug Report, Too many bm_flush_cache() in mfe.c
|
I was looking at something in the mserver and noticed that for remote frontends, for every periodic event,
there are about 3 RPC calls to bm_flush_cache().
Sure enough, in mfe.c::send_event(), for every event sent, there are 2 calls to bm_flush_cache() (once for
the buffer we used, second for all buffers). Then, for a good measure, the mfe idle loop calls
bm_flush_cache() for all buffers about once per second (even if no events were generated).
So what is going on here? To allow good performance when processing many small events,
the MIDAS event buffer code (bm_send_event()) buffers small events internally, and only after this internal
buffer is full, the accumulated events are flushed into the shared memory event buffer,
where they become visible to the mlogger, mdump and other consumers.
Because of this internal buffering, infrequent small size periodic events can become
stuck for quite a long time, confusing the user: "my frontend is sending events, how come I do not
see them in mdump?"
To avoid this, mfe.c manually flushes these internal event buffers by calling bm_flush_buffer().
And I think that works just fine for frontends directly connected to the shared memory, one call to
bm_flush_buffer() should be sufficient.
But for remote fronends connected through the mserver, it turns out there is a race condition between
sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another
tcp connection.
I see that the mserver always reads the rpc connection before the event connection, so bm_flush_cache()
is done *before* the event is written into the buffer by bm_send_event(). So the newly
send event is stuck in the buffer until bm_flush_cache() for the *next* event shows up:
mfe.c: send_event1 -> flush -> ... wait until next event ... -> send_event2 -> flush
mserver: flush -> receive_event1 -> ... wait ... -> flush -> receive_event2 -> ... wait ...
mdump -> ... nothing ... -> ... nothing ... -> event1 -> ... nothing ...
Enter the 2nd call to bm_flush_cache in mfe.c (flush all buffers) - now because mserver seems to be
alternating between reading the rpc connection and the event connection, the race condition looks like
this:
mfe.c: send_event -> flush -> flush
mserver: flush -> receive_event -> flush
mdump: ... -> event -> ...
So in this configuration, everything works correctly, the data is not stuck anywhere - but by accident, and
at the price of an extra rpc call.
But what about the periodic 1/second bm_flush_cache() on all buffers? I think it does not quite work
either because the race condition is still there: we send an event, and the first flush may race it and only
the 2nd flush gets the job done, so the delay between sending the event and seeing it in mdump would be
around 1-2 seconds. (no more than 2 seconds, I think). Since users expect their events to show up "right
away", a 2 second delay is probably not very good.
Because periodic events are usually not high rate, the current situation (4 network transactions to send 1
event - 1x send event, 3x flush buffer) is probably acceptable. But this definitely sets a limit on the
maximum rate to 3x (2x?) the mserver rpc latency - without the rpc calls to bm_flush_buffer() there
would be no limit - the events themselves are sent through a pipelined tcp connection without
handshaking.
One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to
bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).
Another solution could be to send events with a special flag telling the mserver to "flush the buffer right
away".
P.S. Look ma!!! A race condition with no threads!!!
K.O. |
21 Nov 2013, Stefan Ritt, Bug Report, Too many bm_flush_cache() in mfe.c
|
> And I think that works just fine for frontends directly connected to the shared memory, one call to
> bm_flush_buffer() should be sufficient.
That's correct. What you want is once per second or so for polled events, and once per periodic event (which anyhow will typically come only every 10 seconds or so). If there are 3 calls
per event, this is certainly too much.
> But for remote fronends connected through the mserver, it turns out there is a race condition between
> sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another
> tcp connection.
>
> ...
>
> One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to
> bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).
>
> Another solution could be to send events with a special flag telling the mserver to "flush the buffer right
> away".
That's a very good and useful observation. I never really thought about that.
Looking at your proposed solutions, I prefer the second one. mserver is just an interface for RPC calls, it should not do anything "by itself". This was a strategic decision at the beginning.
So sending a flag to punch through the cache on mserver seems to me has less side effects. Will just break binary compatibility :-)
/Stefan |
01 Sep 2009, Jimmy Ngai, Forum, Timeout during run transition
|
Dear All,
I'm using SL5 and MIDAS rev 4528. Occasionally, when I stop a run in odbedit,
a timeout would occur:
[midas.c:9496:rpc_client_call,ERROR] rpc timeout after 121 sec, routine
= "rc_transition", host = "computerB", connection closed
Error: Unknown error 504 from client 'Frontend' on host computerB
This error seems to be random without any reason or pattern. After this error
occurs, I cannot start or stop any run. Sometime restarting MIDAS can bring
the system working again, but sometime not.
Another transition timeout occurs after I change any ODB value using the web
interface:
[midas.c:8291:rpc_client_connect,ERROR] timeout on receive remote computer
info:
[midas.c:3642:cm_transition,ERROR] cannot connect to client "Frontend" on host
computerB, port 36255, status 503
Error: Cannot connect to client 'Frontend'
This error is reproducible: start run -> change ODB value within webpage ->
stop run -> timeout!
Any idea?
Thanks,
Jimmy |
03 Sep 2009, Stefan Ritt, Forum, Timeout during run transition
|
> Dear All,
>
> I'm using SL5 and MIDAS rev 4528. Occasionally, when I stop a run in odbedit,
> a timeout would occur:
> [midas.c:9496:rpc_client_call,ERROR] rpc timeout after 121 sec, routine
> = "rc_transition", host = "computerB", connection closed
> Error: Unknown error 504 from client 'Frontend' on host computerB
>
> This error seems to be random without any reason or pattern. After this error
> occurs, I cannot start or stop any run. Sometime restarting MIDAS can bring
> the system working again, but sometime not.
>
> Another transition timeout occurs after I change any ODB value using the web
> interface:
> [midas.c:8291:rpc_client_connect,ERROR] timeout on receive remote computer
> info:
> [midas.c:3642:cm_transition,ERROR] cannot connect to client "Frontend" on host
> computerB, port 36255, status 503
> Error: Cannot connect to client 'Frontend'
>
> This error is reproducible: start run -> change ODB value within webpage ->
> stop run -> timeout!
A few hints for debugging:
- do the run stop via odbedit and the "-v" flag, like
[local:Online:R]/> stop -v
then you see which computer is contacted when.
- Then put some debugging code into your front-end end_of_run() routine at the
beginning and the end of that routine, so you see when it's executed and how long
this takes. If you do lots of things in your EOR routine, this could maybe cause a
timeout.
- Then make sure that cm_yield() in mfe.c is called periodically by putting some
debugging code there. This function checks for any network message, such as the
stop command from odbedit. If you trigger event readout has an endless loop for
example, cm_yield() will never be called and any transition will timeout.
- Make sure that not 100% CPU is used on your frontend. Some OSes have problems
handling incoming network connections if the CPU is completely used of if
input/output operations are too heavy.
- Stefan |
09 Apr 2021, Lars Martin, Suggestion, Time zone selection for web page
|
The new history as well as the clock in the web page header show the local time
of the user's computer running the browser.
Would it be possible to make it either always use the time zone of the Midas
server, or make it selectable from the config page?
It's not ideal trying to relate error messages from the midas.log to history
plots if the time stamps don't match. |
14 Apr 2021, Stefan Ritt, Suggestion, Time zone selection for web page
|
> The new history as well as the clock in the web page header show the local time
> of the user's computer running the browser.
> Would it be possible to make it either always use the time zone of the Midas
> server, or make it selectable from the config page?
> It's not ideal trying to relate error messages from the midas.log to history
> plots if the time stamps don't match.
I implemented a new row in the config page to select the time zone.
"Local": Time zone where the browser runs
"Server": Time zone where the midas server runs (you have to update mhttpd for that)
"UTC+X": Any other time zone
The setting affects both the status header and the history display.
I spent quite some time with "named" time zones like "PST" "EST" "CEST", but the
support for that is not that great in JavaScript, so I decided to go with simple
UTC+X. Hope that's ok.
Please give it a try and let me know if it's working for you.
Best,
Stefan |
29 Apr 2021, Pierre-Andre Amaudruz, Suggestion, Time zone selection for web page
|
> > The new history as well as the clock in the web page header show the local time
> > of the user's computer running the browser.
> > Would it be possible to make it either always use the time zone of the Midas
> > server, or make it selectable from the config page?
> > It's not ideal trying to relate error messages from the midas.log to history
> > plots if the time stamps don't match.
>
> I implemented a new row in the config page to select the time zone.
>
> "Local": Time zone where the browser runs
> "Server": Time zone where the midas server runs (you have to update mhttpd for that)
> "UTC+X": Any other time zone
>
> The setting affects both the status header and the history display.
>
> I spent quite some time with "named" time zones like "PST" "EST" "CEST", but the
> support for that is not that great in JavaScript, so I decided to go with simple
> UTC+X. Hope that's ok.
>
> Please give it a try and let me know if it's working for you.
>
> Best,
> Stefan
Hi Stefan,
This is great, the UTC+x is perfect, thank you.
PAA |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
Version: release/midas-2020-12
I'm exporting the history data shown in elog:2132/1 to CSV, but when I look at the
CSV data, the step no longer occurs at the same time in both data sets (elog:2132/2) |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
History is from two separate equipments/frontends, but both have "Log history" set to 1. |
23 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
Tried with export of two different time ranges, and the shift appears to remain the same,
about 4040 rows. |
24 Mar 2021, Stefan Ritt, Bug Report, Time shift in history CSV export
|
I confirm there is a problem. If variables are from the same equipment, they have the same
time stamps, like
t1 v1(t1) v2(t1)
t2 v1(t2) v2(t2)
t3 v1(t3) v2(t3)
when they are from different equipments, they have however different time stamps
t1 v1(t1)
t2 v2(t2)
t3 v1(t3)
t4 v2(t4)
The bug in the current code is that all variables use the time stamps of the first variable,
which is wrong in the case of different equipments, like
t1 v1(t1) v2(*t2*)
t3 v1(t3) v2(*t4*)
So I can change the code, but I'm not sure what would be the bast way. The easiest would be to
export one array per variable, like
t1 v1(t1)
t2 v1(t2)
...
t3 v2(t3)
t4 v2(t4)
...
Putting that into a single array would leave gaps, like
t1 v1(t1) [gap]
t2 [gap] v2(t2)
t3 v1(t3) [gap]
t4 [ga]] v2(t4)
plus this is programmatically more complicated, since I have to merge two arrays. So which
export format would you prefer?
Stefan |
24 Mar 2021, Lars Martin, Bug Report, Time shift in history CSV export
|
I think from my perspective the separate files are fine. I personally don't really like the format
with the gaps, so don't see an advantage in putting in the extra work.
I'm surprised the shift is this big, though, it was more than a whole hour in my case, is it the
time difference between when the frontends were started? |
14 Apr 2021, Stefan Ritt, Bug Report, Time shift in history CSV export
|
I finally found some time to fix this issue in the latest commit. Please update and check if it's
working for you.
Stefan |
15 Jun 2009, Jimmy Ngai, Forum, Time limit of each run
|
Dear All,
Can one set a time limit for each run? I can only find event limit in ODB.
Thanks.
Jimmy |
04 Aug 2009, Exaos Lee, Forum, The contents of the attachment
|
As requested from K.O., I paste the "00README.txt" as the following:
#-*- mode: outline -*-
#-*- encoding: utf-8 -*-
#AUTHOR: Exaos Lee <Exaos DOT Lee AT gmail DOT com>
* Directories
+--> 00README.txt : This file
|
+--> bustester : Directory contains utilities for VME bus testing
|
+--> modules : APIs to handle VME modules
|
+--> pyutil : Uitilies in Python, including PyMVME
|
+--> sis3100 : Provide lib_sis3100mvme.a/so using with "mvmestd.h"
* Utilities in Python
** PyMVME module
The module "PyMVME" provides the following stuff:
a. class StdVME
-- contains standard VME informations.
b. class MVME_INTERFACE
-- the C structure MVME_INTERFACE wrapped in Python
c. dict MVME_STATUS
-- the return information defined in "mvmestd.h"
d. the related useful aliases from "mvmestd.h"
-- including "mvme_addr_t", "mvme_locaddr_t", "mvme_size_t"
e. class MvmeDev
-- the major class which provides methods to access VME bus.
You may find examples of how to use module "PyMVME" from "find_caen.py" or
scripts in dir "test". All of the examples are using "lib_sis3100mvme.so".
You may find information later in this introduction.
** find_caen.py
The script to find VME modules from CAEN. Now, it is still in test status
and can only find ADCs, TDCs or QDCs.
* SIS3100 library to be used togather with "mvmestd.h"
The directory "sis3100" contains sources to build libraries as the following:
a. lib_sis3100.a -- APIs declared in "sis3100_vme_calls.h"
b. lib_sis3100mvme.a -- APIs declared in "mvmestd.h". It also contains the
same APIs from lib_sis3100.a
If you want to use shared libraries, especially when you are using utilities
wrote in Python, you may rebuild the libraries as the following:
$ cd sis3100
$ make shared
* APIs to handle VME modules
** vadc_caen.h/c
Provides APIs to handle ADC-type modules from CAEN, including:
a. ADCs --- V785, V785N
b. TDCs --- V775, V775N
c. QDCs --- V792, V792N
* VME bus testers
Still under development.
|
04 Jun 2020, Hisataka YOSHIDA, Forum, Template of slow control frontend
|
I’m beginner of Midas, and trying to develop the slow control front-end with the latest Midas.
I found the scfe.cxx in the “example”, but not enough to refer to write the front-end for my own devices
because it contains only nulldevice and null bus driver case...
(I could have succeeded to run the HV front-end for ISEG MPod, because there is the device driver...)
Can I get some frontend examples such as simple TCP/IP and/or RS232 devices?
Hopefully, I would like to have examples of frontend and device driver.
(if any device driver which is included in the package is similar, please tell me.)
Thanks a lot. |
04 Jun 2020, Pintaudi Giorgio, Forum, Template of slow control frontend
|
> I’m beginner of Midas, and trying to develop the slow control front-end with the latest Midas.
> I found the scfe.cxx in the “example”, but not enough to refer to write the front-end for my own devices
> because it contains only nulldevice and null bus driver case...
> (I could have succeeded to run the HV front-end for ISEG MPod, because there is the device driver...)
>
> Can I get some frontend examples such as simple TCP/IP and/or RS232 devices?
> Hopefully, I would like to have examples of frontend and device driver.
> (if any device driver which is included in the package is similar, please tell me.)
>
> Thanks a lot.
Dear Yoshida-san,
my name is Giorgio and I am a Ph.D. student working on the T2K experiment.
I had to write many MIDAS frontends recently, so I think that my code could be of some help to you.
As you might already know, the MIDAS slow control system is structured into three layers/levels.
- The highest layer is the "class" layer that directly interfaces with the user and the ODB. It is called
"class" layer because it refers to a class of devices (for example all the high voltage power supplies,
etc...). The idea is that in the same experiment you can have many different models of power supplies but
all of them can be controlled with a single class driver.
- Then there is the "device" layer that implements the functions specific to the particular device.
- Finally, there is the "BUS" layer that directly communicates with the device. The BUS can be Ethernet
(TCP/IP), Serial (RS-232 / RS-422 / RS-485), USB, etc ...
You can read more about the MIDAS slow control system here:
https://midas.triumf.ca/MidasWiki/index.php/Slow_Control_System
Anyway, you need to write code for all those layers. If you are lucky you can reuse some of the already
existing MIDAS code. Keep in mind that all the examples that you find in the MIDAS documentation and the
MIDAS source code are written in C (even if it is then compiled with g++). But, you can write a frontend in
C++ without any problem so choose whichever language you are familiar the most with.
I am attaching an archive with some sample code directly taken from our experiment. It is just a small
fraction of the code not meant to be compilable. The code is disclosed with the GPL3 license, so you can use
it as you please but if you do, please cite my name and the WAGASCI-T2K experiment somewhere visible.
In the archive, you can find two example frontends with the respective drivers. The "Triggers" frontend is
written in C++ (or C+ if you consider that the mfe.cxx API is very C-like). The "WaterLevel" frontend is
written in plain C. The "Triggers" frontend controls our trigger board called CCC and the "WaterLevel"
frontend controls our water level sensors called PicoLog 1012. They share a custom implementation of the
TCP/IP bus. Anyway, this is not relevant to you. You may just want to take a look at the code structure.
Finally, recently there have been some very interesting developments regarding the ODB C++ API. I would
definitely take a look at that. I wish I had that when I was developing these frontends.
Good luck
--
Pintaudi Giorgio, Ph.D. student
Neutrino and Particle Physics Minamino Laboratory
Faculty of Science and Engineering, Yokohama National University
giorgio.pintaudi.kx@ynu.jp
TEL +81(0)45-339-4182 |
04 Jun 2020, Stefan Ritt, Forum, Template of slow control frontend
|
> I’m beginner of Midas, and trying to develop the slow control front-end with the latest Midas.
> I found the scfe.cxx in the “example”, but not enough to refer to write the front-end for my own devices
> because it contains only nulldevice and null bus driver case...
> (I could have succeeded to run the HV front-end for ISEG MPod, because there is the device driver...)
>
> Can I get some frontend examples such as simple TCP/IP and/or RS232 devices?
> Hopefully, I would like to have examples of frontend and device driver.
> (if any device driver which is included in the package is similar, please tell me.)
Have you checked the documentation?
https://midas.triumf.ca/MidasWiki/index.php/Slow_Control_System
Basically you have to replace the nulldevice driver with a "real" driver. You find all existing drivers under
midas/drivers/device. If your favourite is not there, you have to write it. Use one which is close to the one
you need and modify it.
Best,
Stefan |
04 Jun 2020, Hisataka YOSHIDA, Forum, Template of slow control frontend
|
Dear Stefan,
Thank you for you quick reply.
> Have you checked the documentation?
>
> https://midas.triumf.ca/MidasWiki/index.php/Slow_Control_System
Yes, I have read the wiki, but not easy to figure out how I treat the individual case.
> Basically you have to replace the nulldevice driver with a "real" driver. You find all existing drivers under
> midas/drivers/device. If your favourite is not there, you have to write it. Use one which is close to the one
> you need and modify it.
Okay, I will try to write drivers for my own devices using existing drivers.
(maybe I can find some device drivers which uses TCP/IP, RS232)
Best regards,
Hisataka Yoshida |
04 Jun 2020, Hisataka YOSHIDA, Forum, Template of slow control frontend
|
Dear Giorgio,
Thank you very much for your kind and quick reply!
I appreciate you giving me such a nice explanation, experience, and great sample codes (This is what I desired!).
They all are useful for me. I will try to write my frontend codes using gift from you.
Thank you again!
Best regards,
Hisataka Yoshida |
|