Back Midas Rome Roody Rootana
  Midas DAQ System, Page 4 of 120  Not logged in ELOG logo
ID Date Author Topic Subject
  2354   15 Mar 2022 Konstantin OlchanskiBug Fixmhttpd ipv6 bind should be fixed now
Something changed after my initial implementation of ipv6 in mhttpd
and listening to ipv6 http/https connections was broken.

It turns out, I do not need to listen to both ipv4 and ipv6 sockets,
it is sufficient to listen to just ipv6. ipv4 connections will also
magically work. see linux kernel "bindv6only" sysctl setting: https://sysctl-
explorer.net/net/ipv6/bindv6only/

The specific bug in mhttpd was to bind to ipv4 socket first, subsequent bind() to ipv6 socket 
fails with error "Address already in use", which is silent, not reported by the mongoose library. 
For reasons unknown, this does not happen to bind() to "localhost" aka ipv6 "::1".

Apparently other web servers (apache, nginx) are/were also affected by this problem. 
https://chrisjean.com/fix-nginx-emerg-bind-to-80-failed-98-address-already-in-use/

First fix was to bind to ipv6 first (success) and to ipv4 second (fails). Second fix
committed to git is to only listen to ipv6.

This works both on MacOS and on Linux. Linux reports the listener socket is "tcp6", MacOS reports 
the listener socket as "tcp46":

4ed0:javascript1 olchansk$ netstat -an | grep 808 | grep LISTEN
tcp46      0      0  *.8081                 *.*                    LISTEN     
tcp6       0      0  ::1.8080               *.*                    LISTEN     
tcp4       0      0  127.0.0.1.8080         *.*                    LISTEN     
4ed0:javascript1 olchansk$ 

K.O.
  2353   10 Mar 2022 Gennaro TortoneBug ReportPython ODB watch
Hi,

I have an issue with ODB watch on MIDAS Python library;

I wrote a simple frontend that read/write FPGA registers through 
ODB keys (simplified version at link below):

https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75

Everything works fine but there is a boolean array
in Settings (Enable ADC sampling) that I need to "toggle" 
(19 bit to 0 and 19 bit to 1). This operation is handled by
detailed_settings_changed_func that write the value of
toggled bit to FPGA.

The issue is that if I quickly toggle the boolean array by
odbedit:

set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0
set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1

I see in the Python script the following list of callbacks:

detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1    ***
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1

It seems that the second write operation "overlaps" the first one...

The same behavior is not observed using a 'watch' in odbedit...

I can overcame this problem using the value of register as ODB key to avoid 
array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch;

Cheers,
Gennaro
  2352   07 Mar 2022 Marius KoeppelBug ReportWritting MIDAS Events via FPGAs
> This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than 
> the logger can digest, so not many people see it.

I just pulled the current version (d945fa9) but the problem as explained in 2347 stays the same.

Best,
Marius
  2351   03 Mar 2022 Konstantin OlchanskiInfomanalyzer updated
manalyzer was updated to latest version. mostly multi-threading improvements from 
Joseph and myself. K.O.
  2350   03 Mar 2022 Konstantin OlchanskiInfozlib required, lz4 internal
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression
library is required, not optional.

this fixes midas and manalyzer mis-build if the system gzip library
is accidentally not installed. (is there any situation where
gzip library is not installed on purpose?)

midas internal lz4 compression library was renamed to mlz4 to avoid collision
against system lz4 library (where present). lz4 files from midasio are now
used, lz4 files in midas/include and midas/src are removed.

I see that on recent versions of ubuntu we could switch to the system version 
of the lz4 library. however, on centos-7 systems it is usually not present
and it still is a supported and widely used platform, so we stay
with the midas-internal library for now.

K.O.
  2349   03 Mar 2022 Stefan RittBug ReportWritting MIDAS Events via FPGAs
> Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. 
> But when I stop the run and try to start the next run the frontend is sending no events anymore.
> It get stuck at line 221 (if (status == DB_TIMEOUT)).
> I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI.
> If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again.

This problem has (likely) been fixed in the current version. Please pull develop and try again. Was a recursive call to the event collection routine which is only triggered if you send events faster than 
the logger can digest, so not many people see it.

Best,
Stefan
  2348   23 Feb 2022 Stefan RittInfoMidas slow control event generation switched to 32-bit banks
The midas slow control system class drivers automatically read their equipment and generate events containing midas banks. So far these have been 16-bit banks using bk_init(). But now more and more experiments use large amount of channels, so the 16-bit address space is exceeded. Until last week, there was even no check that this happens, leading to unpredictable crashes.

Therefore I switched the bank generation in the drivers generic.cxx, hv.cxx and multi.cxx to 32-bit banks via bk_init32(). This should be in principle transparent, since the midas bank functions automatically detect the bank type during reading. But I thought I let everybody know just in case.

Stefan
  2347   16 Feb 2022 Marius KoeppelBug ReportWritting MIDAS Events via FPGAs
I just came back to this and started to use the dummy frontend.
Unfortunately, I have a problem during run cycles: 

Starting the frontend and starting a run works fine -> seeing events with mdump and also on the web GUI. 
But when I stop the run and try to start the next run the frontend is sending no events anymore.
It get stuck at line 221 (if (status == DB_TIMEOUT)).
I tried to reduce the nEvents to 1 which helped in terms of DB_TIMEOUT but still I don't get any events after I did a stop / start cycle -> no events in mdump and no events counting up at the web GUI.
If I kill the frontend in the terminal (ctrl+c) and restart it, while the run is still running, it starts to send events again.

Cheers,
Marius
  2346   16 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing?
> 
> The bug was definitively fixed in this modification:
> 
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
> 
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
> a "ver" you see which git revision you are currently running. Make sure to get this output:
> 
> MIDAS version:      2.1
> GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
> develop
> ODB version:        3
> 
> 
> Stefan

We we're having some problems compiling but have got it sorted now - thanks for your help:)

Jago
  Draft   15 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > But the error still persists. Is there another way to update which we are missing?
> 
> The bug was definitively fixed in this modification:
> 
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
> 
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
> a "ver" you see which git revision you are currently running. Make sure to get this output:
> 
> MIDAS version:      2.1
> GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
> develop
> ODB version:        3
> 
> 
> Stefan

Hey Stefan,

We are running the GIT revision midas-2020-08-a-509-g585faa96:

[local:mu3eMSci:S]/>ver
MIDAS version:      2.1
GIT revision:       Tue Feb 15 16:31:07 2022 +0000 - midas-2020-08-a-521-ge43ea7c5 on branch develop
ODB version:        3


which is still giving the error unfortunately. 
  2344   15 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
> But the error still persists. Is there another way to update which we are missing?

The bug was definitively fixed in this modification:

https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2

You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
a "ver" you see which git revision you are currently running. Make sure to get this output:

MIDAS version:      2.1
GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
develop
ODB version:        3


Stefan
  2343   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> > year. Indeed we have not installed any later versions of MIDAS since then so we will 
> > double check we have the latest version.
> 
> As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
> update.
> 
> Stefan

We have tried updating using both:

git submodule update --init --recursive

and:

git pull --recurse-submodules

But the error still persists. Is there another way to update which we are missing?

Cheers
Jago
  2342   14 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
> I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> year. Indeed we have not installed any later versions of MIDAS since then so we will 
> double check we have the latest version.

As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
update.

Stefan
  2341   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> Just post here a minimal script which produces the error, so that I can try myself.
> 
> ... and make sure that you have the latest develop version of midas.
> 
> Stefan

Here is the simplest script which produces the error:

WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_

I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
year. Indeed we have not installed any later versions of MIDAS since then so we will 
double check we have the latest version.

Jago
  2340   14 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
Just post here a minimal script which produces the error, so that I can try myself.

... and make sure that you have the latest develop version of midas.

Stefan
  2339   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> > 
> > [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> > [local:mu3eMSci:S]Variables>ls -l
> > Key name                        Type    #Val  Size  Last Opn Mode Value
> > ---------------------------------------------------------------------------
> > _T_                             FLOAT   1     4     1h   0   RWD  20.93
> > _F_                             FLOAT   1     4     1h   0   RWD  12.8
> > _P_                             FLOAT   1     4     1h   0   RWD  56
> > _S_                             FLOAT   1     4     1h   0   RWD  5
> > _H_                             FLOAT   1     4     60h  0   RWD  44.74
> > _B_                             FLOAT   1     4     60h  0   RWD  18.54
> > _A_                             FLOAT   1     4     1h   0   RWD  14.41
> > _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> > _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> > SP                              INT16   1     2     1h   0   RWD  10
> > 
> 
> This looks okey, so we still have no explanation for your error. Please post your sequencer 
> script?
> 
> K.O.

Hey, thanks for getting back to me

We are fairly confident the syntax is correct. Having tried the test script posted by Stefan:

> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC 

The same error is returned:/

We will take another look today.

Jago
  2338   14 Feb 2022 jago aartsenBug FixODBINC/Sequencer Issue
> I tried following script:
> 
> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC /Equipment/ArduinoTestStation/Variables/_S_
> ENDLOOP
> 
> and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
> the ODB path pointing to a 32-byte string instead to a 4-byte float.
> 
> Stefan  

Hi Stefan,

Cheers for the reply. I believe the syntax we are using is correct. I have tried copying the script 
you used above and it results in the same error as before. Perhaps something is going wrong 
elsewhere - I will take another look today.

Jago
  2337   11 Feb 2022 Alexey KalininBug Reportsome frontend kicked by cm_periodic_tasks
Thanks for the answer.
As soon as I can(possible in a month) I'll try suggestion below:

> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").

I tried to change this ODB for one of the frontend via mhttpd/browser, and eventually it goes back 
to default value (1000 as I remember). but this frontend has the minimum rate 50DWORD/~10sec. and 
depending on cashe size it appears in mdump once per 31 events but all aff them . SO its different 
story, but m.b. it has the same solution to play with Write Cashe Size.    


double free message goes from mserver terminal. 
all of the frontends are remote.
I can't exclude crashes of frontend , but when I run ./frontend -i 1(2,3 etc) thet means that I run 
one code for all, and only several causes crash.also I found that crash in frontend happened while 
it do nothing with collected data (last event reached and new data is not ready), but it tries to 
watch for the ODB changes.I mean it crashes iside (while {odb_changes(value in watchdog)}),and I don't 
know what else happenned meanwhile with cahed buffer.

Future plans is to use event buider for frontends when data/signals will be perfectly reasonable 
i/e/ without broken events. for now i kinda worry about if one of frontends will skip one of the 
event inside its buffer.


Thanks for the way to dig into.
A.    

> > The problem is that eventually some of frontend closed with message 
> > :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
> > 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist
> 
> This messages means what it says. A client was registered with the SYSMSG buffer and this 
> client had pid 9789. At some point some other client (rootana, in this case) checked it and 
> process pid 9789 was no longer running. (it then proceeded to remove the registration).
> 
> There is 2 possibilities:
> - simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for 
> the crash.
> - unlikely: reported pid is bogus, real pid of your frontend is different, the client 
> registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared 
> memory buffers, not impossible if your frontend misbehaves and writes to random memory 
> addresses. ODB has protection against this (normally turned off, easy to enable, set ODB 
> "/experiment/protect odb" to yes), shared memory buffers do not have protection against this 
> (should be added?).
> 
> Do this. When you start your frontend, write down it's pid, when you see the crash message, 
> confirm pid number printed is the same. As additional test, run your frontend inside gdb, 
> after it crashes, you can print the stack trace, etc.
> 
> > 
> > in the meantime mserver loggging :
> > mserver started interactively
> > mserver will listen on TCP port 1175
> > double free or corruption (!prev)
> > double free or corruption (!prev)
> > free(): invalid next size (normal)
> > double free or corruption (!prev)
> > 
> 
> Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run 
> them in different terminals, not all in the same terminal?).
> 
> If messages are coming from the mserver, this confirms possibility (1),
> except that for frontends connected remotely, the pid is the pid of the mserver,
> and what we see are crashes of mserver, not crashes of your frontend. These are much harder to 
> debug.
> 
> You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"),
> confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created
> in the directory where you started the mserver), reproduce the crash, run "gdb mserver 
> core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me 
> directly).
> 
> >
> > I can find some correlation between number of events/event size produced by 
> > frontend, cause its failed when its become big enough. 
> > 
> 
> There is no limit on event size or event rate in midas, you should not see any crash
> regardless of what you do. (there is a limit of event size, because an event has
> to fit inside an event buffer and event buffer size is limited to 2 GB).
> 
> Obviously you hit a bug in mserver that makes it crash. Let's debug it.
> 
> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").
> 
> >
> > frontend scheme is like this:
> > 
> 
> Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not
> have to debug the convoluted old code in mfe.c.
> 
> K.O.
> 
> >
> > poll event time set to 0;
> > 
> > poll_event{
> > //if buffer not transferred return (continue cutting the main buffer)
> > //read main buffer from hardware
> > //buffer not transfered
> > }
> > 
> > read event{
> > // cut the main buffer to subevents (cut one event from main buffer) return;
> > //if (last subevent) {buffer transfered ;return}
> > }
> > 
> > What is strange to me that 2 frontends (1 per remote pc) causing this.
> > 
> > Also, I'm executing one FEcode with -i # flag , put setting eventid in 
> > frontend_init , and using SYSTEM buffer for all.
> > 
> > Is there something I'm missing?
> > Thanks. 
> > A.
  2336   10 Feb 2022 Stefan RittBug ReportHistory plots deceiving users into thinking data is still logging
The problem has been fixed on commit 825935dc on Oct. 2021 and runs fine since then at PSI. If TRIUMF people 
agree, we can close that issue and proceed.

Stefan
  2335   10 Feb 2022 Stefan RittBug FixODBINC/Sequencer Issue
I tried following script:

ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10

LOOP 10
  WAIT seconds, 3
  ODBINC /Equipment/ArduinoTestStation/Variables/_S_
ENDLOOP

and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
the ODB path pointing to a 32-byte string instead to a 4-byte float.

Stefan  
ELOG V3.1.4-2e1708b5