14 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue
|
Just post here a minimal script which produces the error, so that I can try myself.
... and make sure that you have the latest develop version of midas.
Stefan |
14 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue
|
> I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last
> year. Indeed we have not installed any later versions of MIDAS since then so we will
> double check we have the latest version.
As you see from my reply to Jacob, the bug has been fixed in midas since then, so just
update.
Stefan |
15 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue
|
> But the error still persists. Is there another way to update which we are missing?
The bug was definitively fixed in this modification:
https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do
a "ver" you see which git revision you are currently running. Make sure to get this output:
MIDAS version: 2.1
GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch
develop
ODB version: 3
Stefan |
23 Feb 2022, Stefan Ritt, Info, Midas slow control event generation switched to 32-bit banks
|
The midas slow control system class drivers automatically read their equipment and generate events containing midas banks. So far these have been 16-bit banks using bk_init(). But now more and more experiments use large amount of channels, so the 16-bit address space is exceeded. Until last week, there was even no check that this happens, leading to unpredictable crashes.
Therefore I switched the bank generation in the drivers generic.cxx, hv.cxx and multi.cxx to 32-bit banks via bk_init32(). This should be in principle transparent, since the midas bank functions automatically detect the bank type during reading. But I thought I let everybody know just in case.
Stefan |
03 Mar 2022, Stefan Ritt, Bug Report, Writting MIDAS Events via FPGAs
|
> Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI.
> But when I stop the run and try to start the next run the frontend is sending no events anymore.
> It get stuck at line 221 (if (status == DB_TIMEOUT)).
> I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI.
> If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again.
This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than
the logger can digest, so not many people see it.
Best,
Stefan |
16 Mar 2022, Stefan Ritt, Info, New midas sequencer version
|
A new version of the midas sequencer has been developed and now available in the
develop/seq_eval branch. Many thanks to Lewis Van Winkle and his TinyExpr library
(https://codeplea.com/tinyexpr), which has now been integrated into the sequencer
and allow arbitrary Math expressions. Here is a complete list of new features:
* Math is now possible in all expressions, such as "x = $i*3 + sin($y*pi)^2", or
in "ODBSET /Path/value[$i*2+1], 10"
* "SET <var>,<value>" can be written as "<var>=<value>", but the old syntax is
still possible.
* There are new functions ODBCREATE and ODBDLETE to create and delete ODB keys,
including arrays
* Variable arrays are now possible, like "a[5] = 0" and "MESSAGE $a[5]"
If the branch works for us in the next days and I don't get complaints from
others, I will merge the branch into develop next week.
Stefan |
21 Mar 2022, Stefan Ritt, Bug Report, Python ODB watch
|
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:
1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.
2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.
Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.
Stefan |
22 Mar 2022, Stefan Ritt, Info, New midas sequencer version
|
After several days of testing in various experiments, the new sequencer has
been merged into the develop branch. One more feature was added. The path to
the ODB can now contain variables which are substituted with their values.
Instead writing
ODBSET /Equipment/XYZ/Setting/1/Switch, 1
ODBSET /Equipment/XYZ/Setting/2/Switch, 1
ODBSET /Equipment/XYZ/Setting/3/Switch, 1
one can now write
LOOP i, 3
ODBSET /Equipment/XYZ/Setting/$i/Switch, 1
ENDLOOP
Of course it is not possible for me to test any possible script. So if you
have issues with the new sequencer, please don't hesitate to report them
back to me.
Best,
Stefan |
22 Mar 2022, Stefan Ritt, Bug Fix, fix for event buffer corruption in bm_flush_cache()
|
Thanks Konstantin for your detailed description.
I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly
from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the
rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx.
Each thread does (simplified):
while (true) {
do {
status = rb_get_wp(&pevent);
} while (status == DB_TIMEOUT)
bm_compose_event_threadsafe(pevent, ..., &serial_number);
bk_init32(pevent+1);
... fill event ...
bk_close(pevent)
rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size);
}
The framework now collects all these events in receive_trigger_event() which runs in the main thread:
for (i=0 ; i<n_thread ; i++) {
rb_get_rp(i, pevent);
if (pevent->serial_number == prev_serial+1)
break;
}
prev_serial = pevent->serial_number;
rpc_send_event(pevent);
rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size);
This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only
from a single thread, so the write buffer can be used effectively without complicated multi-thread locks.
This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well
instead of struggling with all your locks.
Stefan |
24 Mar 2022, Stefan Ritt, Bug Fix, mhttpd bug fixed
|
> 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
>
> 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> nc pointer is no longer stale, but points to 2nd wget's connection.
Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be
mistakenly re-used by the next thread.
Stefan |
24 Mar 2022, Stefan Ritt, Bug Fix, fix for event buffer corruption in bm_flush_cache()
|
> > ... instead of struggling with all your locks.
>
> it is better to have midas fully thread safe. ODB has been so for a long time,
> event buffer partially (except for this bug), now fully.
>
> without that the problem still exists, because in many frontends,
> bm_flush_buffer() is called from the main thread, and will race
> against the "bm_send_event() thread". Of course you can do
> everything on the main thread, but this opens you to RPC timeouts
> during run transitions (if you sleep in bm_wait_for_free_space()).
Just for the record: in the mfe.cxx framework both bm_send_event() and
bm_flush_buffer() are called from the main thread, as can be seen in the
midas/examples/mtfe/mtfe.cxx example.
But I agree that having all buffer operations thread safe is a clear benefit.
Stefan |
24 Mar 2022, Stefan Ritt, Bug Fix, mhttpd bug fixed
|
I see, now I understand.
As for the browser cache problem: This Chrome extension is your friend:
https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn?hl=en
I use it all the time I change the CSS or a JS file. Having the "Developer Tools" open in Chrome helps as well
(cache is then turned off). Firefox has similar extensions.
Stefan |
24 Mar 2022, Stefan Ritt, Bug Report, data missing in runXXXXXX.mid
|
One idea: we should have a look at mlogger::close_channels(). There the SYSTEM buffer is emptied through the cm_yield() call. Instrumenting this with some debugging code will enlighten us.
Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling
cm_register_transition(TR_SOP, 500);
while the mlogger does
cm_register_transition(TR_STOP, tr_stop, 800);
and since 800 > 500 the logger will be called AFTER the frontend. If one use a framework different from mfe.cxx, this could however be different.
Stefan |
31 Mar 2022, Stefan Ritt, Suggestion, Maximum ODB size
|
Anybody some idea what the maximum ODB size can be? In the old days, the linux
kernels had a severe limit on shared memory of usually 8MB, but in the age of
64GB RAM being a standard, we should be able to grow bigger. Tried
odbinit -s 1024MB --cleanup
which went through without complain, even put that value in to .ODB_SIZE.TXT, but
when I started odbedit doing "mem", I only see a size of 1MB. Probably somewhere
deep inside we have a limit which prevents the user to create very large ODBs,
but this should be mentioned more prominently in odbinit. Like "size too large,
maximum allowd is xxx MB".
Stefan |
13 Apr 2022, Stefan Ritt, Info, ODB JSON support
|
> Per xkcd, there is a new json standard "json5". In addition to other things, numeric
> values NaN, +Infinity and -Infinity are encoded as literals NaN, Infinity and -Infinity (without quotes):
> https://spec.json5.org/#numbers
Just for curiosity: Is this implemented by the midas json library now? |
15 Apr 2022, Stefan Ritt, Info, New midas sequencer version
|
I prepared some slides about the new features of the sequencer and post it here so
people can have a quick look at get some inspiration.
Stefan |
02 May 2022, Stefan Ritt, Info, added web page for "mdump"
|
Here are some of my thoughts:
- I volunteer to write the JavaScript midas bank decoder. Just a couple of pure javascript functions, no
midasio.cxx library needed.
- If different javascript connections "steal" events from each other, I would not be concerned. Actually I
would rather like that all connections see the SAME event. So mhttpd keeps one event, serves it to all
links, so displays are consistent. If a browser wants to see the "next" event, it send the old serial
number and days "please send next event AFTER serial number". If the serial number is larger than the
event in the buffer, mhttpd fetches a new event and puts it into its buffer.
- Since javascript connections are connectionless, I would rather pass event_id and trigger_mask with each
request. Then mhttpd can retrieve events until event_id and trigger_mask match, then serve that event.
Since reading events from a midas buffer is fast (many 10'000s of events per second), the won't be much of
a delay.
- GET_ALL does not make sense for browsers, you don't want to slow down any frontend. If someone wants to
do histogramming in the browser, then GET_SOME (which is kind of GET_OLD) would make sense, but most of
the cases we have some single event display, and there a GET_RECENT is most appropriate. |
04 May 2022, Stefan Ritt, Info, added web pages for "show odb clients" and "show open records"
|
Concerning the "scl" page, we are currently having a discussion. At the moment, one can
see midas clients in three different places:
1) the main status page at the bottom, only names and hosts are there
2) the programs page, where one can also start/stop program
3) now the new page "Show ODB clients" in the ODB editor page, which shows also the
alive status, PID and timeout
I'm thinking that three locations are two too much, so we are considering to merge the
tree pages into one. That would mean that 1) goes away, and the "Programs" page will
show more information. We have some rare cases that programs are removed from
/System/Clients in the ODB but still attached to the ODB. For those "zombies" we would
add a "hard kill" function.
I would like to hear feedback from the midas community before we proceed with the
plans. Anybody desperately in need of the programs shown on the status page?
Best,
Stefan |
06 May 2022, Stefan Ritt, Info, Increased timeout for program shut down
|
We had the problem in our lab that a frontend took about 6 seconds to gracefully
shut down, mainly it needed to park some motors. I found that the shutdown command
had a hard-coded timeout of 5 seconds, after which the frontend gets killed, and
cannot finish the park operation. I change the code so that the client timeout
stored in the ODB is taken instead of the hard-coded 5 seconds. This allows each
client to fine-tune its timeout, to allow graceful shutdown, but also not let the
user wait too long if the client gets stuck and needs a hard kill.
The default timeout for mfe.cxx based frontends has been changed to 10 seconds
now, but in the frontend_init function this can be changed by the user code
easily.
I hope this char does not trigger any bad side effects, but if it does, please
report here.
Stefan |
08 May 2022, Stefan Ritt, Info, RO_STOPPED with triggered events
|
We had issues in one of our experiment that people used RO_STOPPED in the
equipment list together with triggered events (EQ_USER). If events are sent when
a run is stopped, this leads to many unexpected results, so I added a check in
the mfe.cxx code which prevents RO_STOPPED (or RO_ALWAYS which includes
RO_STOPPED) together with EQ_TRIGGERED, EQ_INTERRUPT, EQ_MULTITHREAD and EQ_USER
type of events.
I got now complaints that some old front-end are not running any more since they
do use RO_ALWAYS together with triggered events. Can the author of these frontend
please tell me the rationale why this is needed, then I can maybe add a better
fix for that.
Stefan |
|