Back Midas Rome Roody Rootana
  Midas DAQ System, Page 136 of 145  Not logged in ELOG logo
ID Date Author Topicdown Subject
  2272   24 Aug 2021 Stefan RittBug Fixchanges in history plots
One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more 
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to 
determine an order. 

See for example: https://jsbin.com/hijetos/edit?js,output

But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)

Stefan
  2277   19 Sep 2021 Stefan RittBug FixChat working again
Not sure how many people are using it, but the Chat facility in midas was broken 
for some time now and got fixed today again.

Just for your information: Chat can be used like WhatsApp & Co, and connects all 
people who access a midas experiment through their browser. It's good to 
communicate between shift crew members located at different places. One advantage 
is that the chat messages can get 'spoken' by the text-to-speech engine of your 
browser, so it can be used to "wake up" shifters. Can be configured through the 
"Config" page.

Stefan
Attachment 1: Screenshot_2021-09-19_at_21.27.19_.png
Screenshot_2021-09-19_at_21.27.19_.png
  2330   08 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
Hi all,

I am having some issues getting the ODBINC command to work within the MIDAS 
sequencer. I am trying to increment one of the ODB values but it is returning a 
mismatch data-type size error (see attached image).

All the ODB variables are MIDAS data-type FLOAT and should all be 32-bit values, 
but for some reason MIDAS is thinking they are 4-bit values. I have tried creating 
new ODB keys of type INT32, UINT32 and DOUBLE but they all give the same error.

If anybody has any suggestions I would really appreciate some help:)

Thanks



 
Attachment 1: mslerror.PNG
mslerror.PNG
  2331   08 Feb 2022 Konstantin OlchanskiBug FixODBINC/Sequencer Issue
Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the 
variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is 
correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32 
and this is wrong, we need to see the output of "ls -l".
K.O.
  2332   09 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the 
> variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is 
> correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32 
> and this is wrong, we need to see the output of "ls -l".
> K.O.

Hi,

Thanks for getting back to me regarding this. The output of "ls -l" is:

[local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
[local:mu3eMSci:S]Variables>ls -l
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
_T_                             FLOAT   1     4     1h   0   RWD  20.93
_F_                             FLOAT   1     4     1h   0   RWD  12.8
_P_                             FLOAT   1     4     1h   0   RWD  56
_S_                             FLOAT   1     4     1h   0   RWD  5
_H_                             FLOAT   1     4     60h  0   RWD  44.74
_B_                             FLOAT   1     4     60h  0   RWD  18.54
_A_                             FLOAT   1     4     1h   0   RWD  14.41
_RH_                            FLOAT   1     4     1h   0   RWD  41.81
_AT_                            FLOAT   1     4     1h   0   RWD  20.46
SP                              INT16   1     2     1h   0   RWD  10

Many Thanks
Jago
  2333   09 Feb 2022 Konstantin OlchanskiBug FixODBINC/Sequencer Issue
> 
> [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> [local:mu3eMSci:S]Variables>ls -l
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> _T_                             FLOAT   1     4     1h   0   RWD  20.93
> _F_                             FLOAT   1     4     1h   0   RWD  12.8
> _P_                             FLOAT   1     4     1h   0   RWD  56
> _S_                             FLOAT   1     4     1h   0   RWD  5
> _H_                             FLOAT   1     4     60h  0   RWD  44.74
> _B_                             FLOAT   1     4     60h  0   RWD  18.54
> _A_                             FLOAT   1     4     1h   0   RWD  14.41
> _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> SP                              INT16   1     2     1h   0   RWD  10
> 

This looks okey, so we still have no explanation for your error. Please post your sequencer 
script?

K.O.
  2335   10 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
I tried following script:

ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10

LOOP 10
  WAIT seconds, 3
  ODBINC /Equipment/ArduinoTestStation/Variables/_S_
ENDLOOP

and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
the ODB path pointing to a 32-byte string instead to a 4-byte float.

Stefan  
  2338   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> I tried following script:
> 
> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC /Equipment/ArduinoTestStation/Variables/_S_
> ENDLOOP
> 
> and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
> the ODB path pointing to a 32-byte string instead to a 4-byte float.
> 
> Stefan  

Hi Stefan,

Cheers for the reply. I believe the syntax we are using is correct. I have tried copying the script 
you used above and it results in the same error as before. Perhaps something is going wrong 
elsewhere - I will take another look today.

Jago
  2339   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > 
> > [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> > [local:mu3eMSci:S]Variables>ls -l
> > Key name                        Type    #Val  Size  Last Opn Mode Value
> > ---------------------------------------------------------------------------
> > _T_                             FLOAT   1     4     1h   0   RWD  20.93
> > _F_                             FLOAT   1     4     1h   0   RWD  12.8
> > _P_                             FLOAT   1     4     1h   0   RWD  56
> > _S_                             FLOAT   1     4     1h   0   RWD  5
> > _H_                             FLOAT   1     4     60h  0   RWD  44.74
> > _B_                             FLOAT   1     4     60h  0   RWD  18.54
> > _A_                             FLOAT   1     4     1h   0   RWD  14.41
> > _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> > _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> > SP                              INT16   1     2     1h   0   RWD  10
> > 
> 
> This looks okey, so we still have no explanation for your error. Please post your sequencer 
> script?
> 
> K.O.

Hey, thanks for getting back to me

We are fairly confident the syntax is correct. Having tried the test script posted by Stefan:

> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC 

The same error is returned:/

We will take another look today.

Jago
  2340   14 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
Just post here a minimal script which produces the error, so that I can try myself.

... and make sure that you have the latest develop version of midas.

Stefan
  2341   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> Just post here a minimal script which produces the error, so that I can try myself.
> 
> ... and make sure that you have the latest develop version of midas.
> 
> Stefan

Here is the simplest script which produces the error:

WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_

I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
year. Indeed we have not installed any later versions of MIDAS since then so we will 
double check we have the latest version.

Jago
  2342   14 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
> I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> year. Indeed we have not installed any later versions of MIDAS since then so we will 
> double check we have the latest version.

As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
update.

Stefan
  2343   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> > year. Indeed we have not installed any later versions of MIDAS since then so we will 
> > double check we have the latest version.
> 
> As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
> update.
> 
> Stefan

We have tried updating using both:

git submodule update --init --recursive

and:

git pull --recurse-submodules

But the error still persists. Is there another way to update which we are missing?

Cheers
Jago
  2344   15 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
> But the error still persists. Is there another way to update which we are missing?

The bug was definitively fixed in this modification:

https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2

You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
a "ver" you see which git revision you are currently running. Make sure to get this output:

MIDAS version:      2.1
GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
develop
ODB version:        3


Stefan
  Draft   15 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing?
> 
> The bug was definitively fixed in this modification:
> 
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
> 
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
> a "ver" you see which git revision you are currently running. Make sure to get this output:
> 
> MIDAS version:      2.1
> GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
> develop
> ODB version:        3
> 
> 
> Stefan

Hey Stefan,

We are running the GIT revision midas-2020-08-a-509-g585faa96:

[local:mu3eMSci:S]/>ver
MIDAS version:      2.1
GIT revision:       Tue Feb 15 16:31:07 2022 +0000 - midas-2020-08-a-521-ge43ea7c5 on branch develop
ODB version:        3


which is still giving the error unfortunately. 
  2346   16 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing?
> 
> The bug was definitively fixed in this modification:
> 
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
> 
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
> a "ver" you see which git revision you are currently running. Make sure to get this output:
> 
> MIDAS version:      2.1
> GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
> develop
> ODB version:        3
> 
> 
> Stefan

We we're having some problems compiling but have got it sorted now - thanks for your help:)

Jago
  2354   15 Mar 2022 Konstantin OlchanskiBug Fixmhttpd ipv6 bind should be fixed now
Something changed after my initial implementation of ipv6 in mhttpd
and listening to ipv6 http/https connections was broken.

It turns out, I do not need to listen to both ipv4 and ipv6 sockets,
it is sufficient to listen to just ipv6. ipv4 connections will also
magically work. see linux kernel "bindv6only" sysctl setting: https://sysctl-
explorer.net/net/ipv6/bindv6only/

The specific bug in mhttpd was to bind to ipv4 socket first, subsequent bind() to ipv6 socket 
fails with error "Address already in use", which is silent, not reported by the mongoose library. 
For reasons unknown, this does not happen to bind() to "localhost" aka ipv6 "::1".

Apparently other web servers (apache, nginx) are/were also affected by this problem. 
https://chrisjean.com/fix-nginx-emerg-bind-to-80-failed-98-address-already-in-use/

First fix was to bind to ipv6 first (success) and to ipv4 second (fails). Second fix
committed to git is to only listen to ipv6.

This works both on MacOS and on Linux. Linux reports the listener socket is "tcp6", MacOS reports 
the listener socket as "tcp46":

4ed0:javascript1 olchansk$ netstat -an | grep 808 | grep LISTEN
tcp46      0      0  *.8081                 *.*                    LISTEN     
tcp6       0      0  ::1.8080               *.*                    LISTEN     
tcp4       0      0  127.0.0.1.8080         *.*                    LISTEN     
4ed0:javascript1 olchansk$ 

K.O.
  2359   22 Mar 2022 Konstantin OlchanskiBug Fixfix for event buffer corruption in bm_flush_cache()
multithreaded frontends have an unusual event buffer corruption if the write 
cache is enabled. For a long time now I had to disable the write cache on
all multithreaded frontends in alpha-g, I was hitting this bug quite often.
(somehow I do not see this problem reported on bitbucket!)

last week I reworked the multithread locking of event buffers, in hope
that this bug will turn up, but nope, all mutexes and locking look okey,
except for a number of unrelated problems (races against bm_close_buffer()
were the most troublesome to fix).

but finally found the trouble.

first, some background.

because multiprocess locking is expensive, frontends that generate
a large number of small events can use the write cache to reduce
this overhead. instead of locking the shared memory event buffer for
each event, events are accumulated in the write cache, and periodic
calls to bm_flush_buffer() flush them to shared memory. For best effect,
one should increase the size of the write cache until lock rate is around
10/second.

it turns out introduction of multithreading broke bm_flush_cache().

it does this:

- int ask_free = pbuf->wp; // how much data we have in the write cache now
- call bm_wait_for_free_space(ask_free); // ensure we have this much free shared 
memory space
- copy pbuf->wp worth if events to shared memory

looks okey at first sight. this is what happens to trigger the bug:

- int ask_free = pbuf->wp; // ok
- call bm_wait_for_free_space(ask_free); // ok, but if shared memory is full, it 
will go to sleep waiting for free space
- in the mean time, another thread calls bm_send_event(), this adds more data to 
the write cache, moves pbuf->wp
- bm_wait_for_free_space() eventually returns
- copy pbuf->wp worth of data to shared memo KABOOM! shared memory corruption!

we just overwrote some unlucky event in shared memory: we only have "ask_free"
free bytes available, but pbuf->wp moved and now has more data,
and it does not fit, and there is no check against it.

of course in the single threaded world this bug did not exist, there was no 
other thread to call bm_send_event() while bm_flush_cache() is sleeping.

the obvious fix is to ask for more free space if cached data does not fit.

this is now implemented on the branch feature/buffer_mutex. after a bit more 
tested I will merge it into develop.

so that's it?

not so fast. there was more going on. as described, the bug will only happen
when shared memory event buffer is full. (i.e. rarely or never). It turns
out the old version of thread locking code was defective and permitted
a race between bm_send_event() and bm_send_event() in another thread:

thread 1: while (1) { bm_send_event(very small event); }
thread 2:
-> bm_send_event(very big event)
-> no space in the cache for the very big event, call bm_flush_cache()
-> bm_flush_cache() asks bm_wait_for_free_space() to make space for cached data
-> this was done with write cache mutex released (mistake!)
-> at the same time bm_send_event(very small event) added 1 more small event to 
the cache
-> back in bm_flush_cache() write cache mutex is locked correctly, we copy 
cached data to shared memory and again KABOOM because we now have more data than 
we asked free space for.

So in the original implementation, corruption was possible even when share 
memory event buffer was pretty much empty.

The reworked locking code closed that loop hole - bm_flush_cache() is now
called with write cache locked, and bm_send_event() from another thread
cannot confuse things, unless shared memory buffer is full and we go to
sleep inside bm_wait_for_free_space(). And this is now fixed, too.

K.O.
  2360   22 Mar 2022 Stefan RittBug Fixfix for event buffer corruption in bm_flush_cache()
Thanks Konstantin for your detailed description. 

I wonder why we never saw this problem at PSI. Here is the reason: In multil-threaded environments, we never call bm_send_event() directly 
from all threads (since in the old days nothing was thread safe in midas). Instead, we use a collector thread which gets all events via the 
rb_xxx functions from the individual readout threads. This is well integrated into the mfe.cxx framework. Look at examples/mtfe/mfte.cxx. 
Each thread does (simplified):

while (true) {
  do {
     status = rb_get_wp(&pevent);
  } while (status == DB_TIMEOUT)

  bm_compose_event_threadsafe(pevent, ..., &serial_number);
  bk_init32(pevent+1);
  ... fill event ...
  bk_close(pevent)

  rb_increment_wp(sizeof(EVENT_HEADER) + pevent->data_size);
}

The framework now collects all these events in receive_trigger_event() which runs in the main thread:

  for (i=0 ; i<n_thread ; i++) {
     rb_get_rp(i, pevent);
     if (pevent->serial_number == prev_serial+1)
        break;
  }
  
  prev_serial = pevent->serial_number;
  rpc_send_event(pevent);
  rb_increment_rp(sizeof(EVENT_HEADER) + pevent->data_size);

This code ensures that all events are in the right sequence (before the serial numbers where mixed up) and that all events are sent only 
from a single thread, so the write buffer can be used effectively without complicated multi-thread locks.

This solution works nicely at PSI since many years, maybe you should put some thought to use it in your tmfe framework in Alpha-g as well 
instead of struggling with all your locks.

Stefan
  2361   23 Mar 2022 Ivo SchulthessBug Fixfix for event buffer corruption in bm_flush_cache()
Thanks for the investigation. Back in 2020, we had some issues of losing data between the system buffer and the logger writing them to disk (https://daq00.triumf.ca/elog-midas/Midas/1966). This was polled equipment but we had a multithreaded FE running at the same time. Could this be related to the same problem?

Best, Ivo
ELOG V3.1.4-2e1708b5