ID |
Date |
Author |
Topic |
Subject |
2272
|
24 Aug 2021 |
Stefan Ritt | Bug Fix | changes in history plots |
One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to
determine an order.
See for example: https://jsbin.com/hijetos/edit?js,output
But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)
Stefan |
2277
|
19 Sep 2021 |
Stefan Ritt | Bug Fix | Chat working again |
Not sure how many people are using it, but the Chat facility in midas was broken
for some time now and got fixed today again.
Just for your information: Chat can be used like WhatsApp & Co, and connects all
people who access a midas experiment through their browser. It's good to
communicate between shift crew members located at different places. One advantage
is that the chat messages can get 'spoken' by the text-to-speech engine of your
browser, so it can be used to "wake up" shifters. Can be configured through the
"Config" page.
Stefan |
Attachment 1: Screenshot_2021-09-19_at_21.27.19_.png
|
|
2330
|
08 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
Hi all,
I am having some issues getting the ODBINC command to work within the MIDAS
sequencer. I am trying to increment one of the ODB values but it is returning a
mismatch data-type size error (see attached image).
All the ODB variables are MIDAS data-type FLOAT and should all be 32-bit values,
but for some reason MIDAS is thinking they are 4-bit values. I have tried creating
new ODB keys of type INT32, UINT32 and DOUBLE but they all give the same error.
If anybody has any suggestions I would really appreciate some help:)
Thanks
|
Attachment 1: mslerror.PNG
|
|
2331
|
08 Feb 2022 |
Konstantin Olchanski | Bug Fix | ODBINC/Sequencer Issue |
Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the
variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is
correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32
and this is wrong, we need to see the output of "ls -l".
K.O. |
2332
|
09 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the
> variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is
> correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32
> and this is wrong, we need to see the output of "ls -l".
> K.O.
Hi,
Thanks for getting back to me regarding this. The output of "ls -l" is:
[local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
[local:mu3eMSci:S]Variables>ls -l
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
_T_ FLOAT 1 4 1h 0 RWD 20.93
_F_ FLOAT 1 4 1h 0 RWD 12.8
_P_ FLOAT 1 4 1h 0 RWD 56
_S_ FLOAT 1 4 1h 0 RWD 5
_H_ FLOAT 1 4 60h 0 RWD 44.74
_B_ FLOAT 1 4 60h 0 RWD 18.54
_A_ FLOAT 1 4 1h 0 RWD 14.41
_RH_ FLOAT 1 4 1h 0 RWD 41.81
_AT_ FLOAT 1 4 1h 0 RWD 20.46
SP INT16 1 2 1h 0 RWD 10
Many Thanks
Jago |
2333
|
09 Feb 2022 |
Konstantin Olchanski | Bug Fix | ODBINC/Sequencer Issue |
>
> [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> [local:mu3eMSci:S]Variables>ls -l
> Key name Type #Val Size Last Opn Mode Value
> ---------------------------------------------------------------------------
> _T_ FLOAT 1 4 1h 0 RWD 20.93
> _F_ FLOAT 1 4 1h 0 RWD 12.8
> _P_ FLOAT 1 4 1h 0 RWD 56
> _S_ FLOAT 1 4 1h 0 RWD 5
> _H_ FLOAT 1 4 60h 0 RWD 44.74
> _B_ FLOAT 1 4 60h 0 RWD 18.54
> _A_ FLOAT 1 4 1h 0 RWD 14.41
> _RH_ FLOAT 1 4 1h 0 RWD 41.81
> _AT_ FLOAT 1 4 1h 0 RWD 20.46
> SP INT16 1 2 1h 0 RWD 10
>
This looks okey, so we still have no explanation for your error. Please post your sequencer
script?
K.O. |
2335
|
10 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue |
I tried following script:
ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
LOOP 10
WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_
ENDLOOP
and it worked as expected. So I conclude the problem must be in your script. Probably a typo in
the ODB path pointing to a 32-byte string instead to a 4-byte float.
Stefan |
2338
|
14 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> I tried following script:
>
> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
>
> LOOP 10
> WAIT seconds, 3
> ODBINC /Equipment/ArduinoTestStation/Variables/_S_
> ENDLOOP
>
> and it worked as expected. So I conclude the problem must be in your script. Probably a typo in
> the ODB path pointing to a 32-byte string instead to a 4-byte float.
>
> Stefan
Hi Stefan,
Cheers for the reply. I believe the syntax we are using is correct. I have tried copying the script
you used above and it results in the same error as before. Perhaps something is going wrong
elsewhere - I will take another look today.
Jago |
2339
|
14 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> >
> > [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> > [local:mu3eMSci:S]Variables>ls -l
> > Key name Type #Val Size Last Opn Mode Value
> > ---------------------------------------------------------------------------
> > _T_ FLOAT 1 4 1h 0 RWD 20.93
> > _F_ FLOAT 1 4 1h 0 RWD 12.8
> > _P_ FLOAT 1 4 1h 0 RWD 56
> > _S_ FLOAT 1 4 1h 0 RWD 5
> > _H_ FLOAT 1 4 60h 0 RWD 44.74
> > _B_ FLOAT 1 4 60h 0 RWD 18.54
> > _A_ FLOAT 1 4 1h 0 RWD 14.41
> > _RH_ FLOAT 1 4 1h 0 RWD 41.81
> > _AT_ FLOAT 1 4 1h 0 RWD 20.46
> > SP INT16 1 2 1h 0 RWD 10
> >
>
> This looks okey, so we still have no explanation for your error. Please post your sequencer
> script?
>
> K.O.
Hey, thanks for getting back to me
We are fairly confident the syntax is correct. Having tried the test script posted by Stefan:
> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
>
> LOOP 10
> WAIT seconds, 3
> ODBINC
The same error is returned:/
We will take another look today.
Jago |
2340
|
14 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue |
Just post here a minimal script which produces the error, so that I can try myself.
... and make sure that you have the latest develop version of midas.
Stefan |
2341
|
14 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> Just post here a minimal script which produces the error, so that I can try myself.
>
> ... and make sure that you have the latest develop version of midas.
>
> Stefan
Here is the simplest script which produces the error:
WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_
I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last
year. Indeed we have not installed any later versions of MIDAS since then so we will
double check we have the latest version.
Jago |
2342
|
14 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue |
> I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last
> year. Indeed we have not installed any later versions of MIDAS since then so we will
> double check we have the latest version.
As you see from my reply to Jacob, the bug has been fixed in midas since then, so just
update.
Stefan |
2343
|
14 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> > I noticed that "Jacob Thorne" in the forum had the same issue as us in Novemeber last
> > year. Indeed we have not installed any later versions of MIDAS since then so we will
> > double check we have the latest version.
>
> As you see from my reply to Jacob, the bug has been fixed in midas since then, so just
> update.
>
> Stefan
We have tried updating using both:
git submodule update --init --recursive
and:
git pull --recurse-submodules
But the error still persists. Is there another way to update which we are missing?
Cheers
Jago |
2344
|
15 Feb 2022 |
Stefan Ritt | Bug Fix | ODBINC/Sequencer Issue |
> But the error still persists. Is there another way to update which we are missing?
The bug was definitively fixed in this modification:
https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do
a "ver" you see which git revision you are currently running. Make sure to get this output:
MIDAS version: 2.1
GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch
develop
ODB version: 3
Stefan |
Draft
|
15 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> > But the error still persists. Is there another way to update which we are missing?
>
> The bug was definitively fixed in this modification:
>
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
>
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do
> a "ver" you see which git revision you are currently running. Make sure to get this output:
>
> MIDAS version: 2.1
> GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch
> develop
> ODB version: 3
>
>
> Stefan
Hey Stefan,
We are running the GIT revision midas-2020-08-a-509-g585faa96:
[local:mu3eMSci:S]/>ver
MIDAS version: 2.1
GIT revision: Tue Feb 15 16:31:07 2022 +0000 - midas-2020-08-a-521-ge43ea7c5 on branch develop
ODB version: 3
which is still giving the error unfortunately. |
2346
|
16 Feb 2022 |
jago aartsen | Bug Fix | ODBINC/Sequencer Issue |
> > But the error still persists. Is there another way to update which we are missing?
>
> The bug was definitively fixed in this modification:
>
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
>
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do
> a "ver" you see which git revision you are currently running. Make sure to get this output:
>
> MIDAS version: 2.1
> GIT revision: Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch
> develop
> ODB version: 3
>
>
> Stefan
We we're having some problems compiling but have got it sorted now - thanks for your help:)
Jago |
2354
|
15 Mar 2022 |
Konstantin Olchanski | Bug Fix | mhttpd ipv6 bind should be fixed now |
Something changed after my initial implementation of ipv6 in mhttpd
and listening to ipv6 http/https connections was broken.
It turns out, I do not need to listen to both ipv4 and ipv6 sockets,
it is sufficient to listen to just ipv6. ipv4 connections will also
magically work. see linux kernel "bindv6only" sysctl setting: https://sysctl-
explorer.net/net/ipv6/bindv6only/
The specific bug in mhttpd was to bind to ipv4 socket first, subsequent bind() to ipv6 socket
fails with error "Address already in use", which is silent, not reported by the mongoose library.
For reasons unknown, this does not happen to bind() to "localhost" aka ipv6 "::1".
Apparently other web servers (apache, nginx) are/were also affected by this problem.
https://chrisjean.com/fix-nginx-emerg-bind-to-80-failed-98-address-already-in-use/
First fix was to bind to ipv6 first (success) and to ipv4 second (fails). Second fix
committed to git is to only listen to ipv6.
This works both on MacOS and on Linux. Linux reports the listener socket is "tcp6", MacOS reports
the listener socket as "tcp46":
4ed0:javascript1 olchansk$ netstat -an | grep 808 | grep LISTEN
tcp46 0 0 *.8081 *.* LISTEN
tcp6 0 0 ::1.8080 *.* LISTEN
tcp4 0 0 127.0.0.1.8080 *.* LISTEN
4ed0:javascript1 olchansk$
K.O. |
2359
|
22 Mar 2022 |
Konstantin Olchanski | Bug Fix | fix for event buffer corruption in bm_flush_cache() |
multithreaded frontends have an unusual event buffer corruption if the write
cache is enabled. For a long time now I had to disable the write cache on
all multithreaded frontends in alpha-g, I was hitting this bug quite often.
(somehow I do not see this problem reported on bitbucket!)
last week I reworked the multithread locking of event buffers, in hope
that this bug will turn up, but nope, all mutexes and locking look okey,
except for a number of unrelated problems (races against bm_close_buffer()
were the most troublesome to fix).
but finally found the trouble.
first, some background.
because multiprocess locking is expensive, frontends that generate
a large number of small events can use the write cache to reduce
this overhead. instead of locking the shared memory event buffer for
each event, events are accumulated in the write cache, and periodic
calls to bm_flush_buffer() flush them to shared memory. For best effect,
one should increase the size of the write cache until lock rate is around
10/second.
it turns out introduction of multithreading broke bm_flush_cache().
it does this:
- int ask_free = pbuf->wp; // how much data we have in the write cache now
- call bm_wait_for_free_space(ask_free); // ensure we have this much free shared
memory space
- copy pbuf->wp worth if events to shared memory
looks okey at first sight. this is what happens to trigger the bug:
- int ask_free = pbuf->wp; // ok
- call bm_wait_for_free_space(ask_free); // ok, but if shared memory is full, it
will go to sleep waiting for free space
- in the mean time, another thread calls bm_send_event(), this adds more data to
the write cache, moves pbuf->wp
- bm_wait_for_free_space() eventually returns
- copy pbuf->wp worth of data to shared memo KABOOM! shared memory corruption!
we just overwrote some unlucky event in shared memory: we only have "ask_free"
free bytes available, but pbuf->wp moved and now has more data,
and it does not fit, and there is no check against it.
of course in the single threaded world this bug did not exist, there was no
other thread to call bm_send_event() while bm_flush_cache() is sleeping.
the obvious fix is to ask for more free space if cached data does not fit.
this is now implemented on the branch feature/buffer_mutex. after a bit more
tested I will merge it into develop.
so that's it?
not so fast. there was more going on. as described, the bug will only happen
when shared memory event buffer is full. (i.e. rarely or never). It turns
out the old version of thread locking code was defective and permitted
a race between bm_send_event() and bm_send_event() in another thread:
thread 1: while (1) { bm_send_event(very small event); }
thread 2:
-> bm_send_event(very big event)
-> no space in the cache for the very big event, call bm_flush_cache()
-> bm_flush_cache() asks bm_wait_for_free_space() to make space for cached data
-> this was done with write cache mutex released (mistake!)
-> at the same time bm_send_event(very small event) added 1 more small event to
the cache
-> back in bm_flush_cache() write cache mutex is locked correctly, we copy
cached data to shared memory and again KABOOM because we now have more data than
we asked free space for.
So in the original implementation, corruption was possible even when share
memory event buffer was pretty much empty.
The reworked locking code closed that loop hole - bm_flush_cache() is now
called with write cache locked, and bm_send_event() from another thread
cannot confuse things, unless shared memory buffer is full and we go to
sleep inside bm_wait_for_free_space(). And this is now fixed, too.
K.O. |
2360
|
22 Mar 2022 |
Stefan Ritt | Bug Fix | fix for event buffer corruption in bm_flush_cache() |
Thanks Konstantin for your detailed description.
I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly
from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the
rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx.
Each thread does (simplified):
while (true) {
do {
status = rb_get_wp(&pevent);
} while (status == DB_TIMEOUT)
bm_compose_event_threadsafe(pevent, ..., &serial_number);
bk_init32(pevent+1);
... fill event ...
bk_close(pevent)
rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size);
}
The framework now collects all these events in receive_trigger_event() which runs in the main thread:
for (i=0 ; i<n_thread ; i++) {
rb_get_rp(i, pevent);
if (pevent->serial_number == prev_serial+1)
break;
}
prev_serial = pevent->serial_number;
rpc_send_event(pevent);
rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size);
This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only
from a single thread, so the write buffer can be used effectively without complicated multi-thread locks.
This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well
instead of struggling with all your locks.
Stefan |
2361
|
23 Mar 2022 |
Ivo Schulthess | Bug Fix | fix for event buffer corruption in bm_flush_cache() |
Thanks for the investigation. Back in 2020, we had some issues of losing data between the system buffer and the logger writing them to disk (https://daq00.triumf.ca/elog-midas/Midas/1966). This was polled equipment but we had a multithreaded FE running at the same time. Could this be related to the same problem?
Best, Ivo |