Back Midas Rome Roody Rootana
  Midas DAQ System, Page 7 of 45  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
Entry  10 Mar 2022, Gennaro Tortone, Bug Report, Python ODB watch 
Hi,

I have an issue with ODB watch on MIDAS Python library;

I wrote a simple frontend that read/write FPGA registers through 
ODB keys (simplified version at link below):

https://gist.github.com/gtortone/cd035a9ac4ea7a78ea9cd931e80e2c75

Everything works fine but there is a boolean array
in Settings (Enable ADC sampling) that I need to "toggle" 
(19 bit to 0 and 19 bit to 1). This operation is handled by
detailed_settings_changed_func that write the value of
toggled bit to FPGA.

The issue is that if I quickly toggle the boolean array by
odbedit:

set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 0
set "/Equipment/odbtest/Settings/Enable ADC sampling[0-18]" 1

I see in the Python script the following list of callbacks:

detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 0
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1    ***
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[0] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[1] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[2] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[3] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[4] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[5] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[6] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[7] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[8] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[9] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[10] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[11] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[12] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[13] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[14] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[15] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[16] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[17] - new value 1
detailed_settings_changed_func: /Equipment/odbtest/Settings/Enable ADC sampling[18] - new value 1

It seems that the second write operation "overlaps" the first one...

The same behavior is not observed using a 'watch' in odbedit...

I can overcame this problem using the value of register as ODB key to avoid 
array of boolean... but I report this issue as "possible" bug/limitation on Python ODB watch;

Cheers,
Gennaro
    Reply  16 Mar 2022, Ben Smith, Bug Report, Python ODB watch 
> It seems that the second write operation "overlaps" the first one...

Hi Gennaro,

In principle the same issue can happen in C++ code, but is much less likely as the callbacks get executed  more quickly (partly due to C++/python in general, and partly because the python code does some extra work to make the interface more user-friendly). The C++ code at the end of this message adds a 100ms sleep to the callback and can result in output like this when you do quick edits of "Test[0-19]" in odbedit.

Element 1 is 0
Element 2 is 0
Element 3 is 0
Element 4 is 0
Element 5 is 0
Element 6 is 0
Element 7 is 1
Element 8 is 1
Element 9 is 1
etc...


I agree that this can be a really nasty source of bugs if you need to react to every change. I'll add a warning to the python docstrings, but I can't think of a way to make this more robust at the midas level - I think we'd need some sort of ODB "snapshot" system...









#include "midas.h"

void watch_fn(HNDLE hDB, HNDLE hKey, int index, void *info) {
  DWORD data = 0;
  INT buf_size = sizeof(data);
  db_get_data_index(hDB, hKey, &data, &buf_size, index, TID_DWORD);
  printf("Element %d is %u\n", index, data);
  ss_sleep(100);
}

int main() {
  HNDLE hDB, hClient, hTestKey;

  std::string host, expt;
  cm_get_environment(&host, &expt);
  cm_connect_experiment(host.c_str(), expt.c_str(), "test_odb", nullptr);
  cm_get_experiment_database(&hDB, &hClient);

  static const DWORD numValues = 20;
  DWORD data[numValues] = {};
  db_set_value(hDB, 0, "Test", data, sizeof(DWORD) * numValues, numValues, TID_DWORD);
  db_find_key(hDB, 0, "Test", &hTestKey);
  db_watch(hDB, hTestKey, watch_fn, nullptr);

  printf("Press any key to exit loop...\n");

  while (!ss_kbhit()) {
    cm_yield(1);
  }

  db_unwatch_all();
  db_delete_key(hDB, hTestKey, FALSE);
  cm_disconnect_experiment();

  return 0;
}
       Reply  21 Mar 2022, Stefan Ritt, Bug Report, Python ODB watch 
What you describe is a well-known problem with the ODB. At PSI we have similar issues. There are
two approaches to solve it:

1) Write values one-by-one to the ODB, but do not trigger a watch update. In the sequencer, this
can be achieved with the ODBSET command (see https://daq00.triumf.ca/MidasWiki/index.php/Sequencer 
and the last paragraph right of the ODBSET command). You use notify=0 for all set commands except
the last one where you use notify=1. On the C++ API, you can use db_set_data_index1() which has
this notify flag as the last parameter.

2) You add intelligence to your front-end. If you get a watchdog update, you do not apply this
directly to the hardware, but put it into a FIFO. Once you do not get any more update for a certain
period (like 1s is a good value), you empty the FIFO and apply all setting immediately.

Both methods have been used at PSI successfully, although 1) is much easier to implement, especially
if you use the midas sequencer.

Stefan
Entry  03 Mar 2022, Konstantin Olchanski, Info, manalyzer updated 
manalyzer was updated to latest version. mostly multi-threading improvements from 
Joseph and myself. K.O.
Entry  03 Mar 2022, Konstantin Olchanski, Info, zlib required, lz4 internal 
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression
library is required, not optional.

this fixes midas and manalyzer mis-build if the system gzip library
is accidentally not installed. (is there any situation where
gzip library is not installed on purpose?)

midas internal lz4 compression library was renamed to mlz4 to avoid collision
against system lz4 library (where present). lz4 files from midasio are now
used, lz4 files in midas/include and midas/src are removed.

I see that on recent versions of ubuntu we could switch to the system version 
of the lz4 library. however, on centos-7 systems it is usually not present
and it still is a supported and widely used platform, so we stay
with the midas-internal library for now.

K.O.
Entry  23 Feb 2022, Stefan Ritt, Info, Midas slow control event generation switched to 32-bit banks 
The midas slow control system class drivers automatically read their equipment and generate events containing midas banks. So far these have been 16-bit banks using bk_init(). But now more and more experiments use large amount of channels, so the 16-bit address space is exceeded. Until last week, there was even no check that this happens, leading to unpredictable crashes.

Therefore I switched the bank generation in the drivers generic.cxx, hv.cxx and multi.cxx to 32-bit banks via bk_init32(). This should be in principle transparent, since the midas bank functions automatically detect the bank type during reading. But I thought I let everybody know just in case.

Stefan
Entry  08 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue mslerror.PNG
Hi all,

I am having some issues getting the ODBINC command to work within the MIDAS 
sequencer. I am trying to increment one of the ODB values but it is returning a 
mismatch data-type size error (see attached image).

All the ODB variables are MIDAS data-type FLOAT and should all be 32-bit values, 
but for some reason MIDAS is thinking they are 4-bit values. I have tried creating 
new ODB keys of type INT32, UINT32 and DOUBLE but they all give the same error.

If anybody has any suggestions I would really appreciate some help:)

Thanks



 
    Reply  08 Feb 2022, Konstantin Olchanski, Bug Fix, ODBINC/Sequencer Issue 
Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the 
variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is 
correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32 
and this is wrong, we need to see the output of "ls -l".
K.O.
       Reply  09 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the 
> variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is 
> correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32 
> and this is wrong, we need to see the output of "ls -l".
> K.O.

Hi,

Thanks for getting back to me regarding this. The output of "ls -l" is:

[local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
[local:mu3eMSci:S]Variables>ls -l
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
_T_                             FLOAT   1     4     1h   0   RWD  20.93
_F_                             FLOAT   1     4     1h   0   RWD  12.8
_P_                             FLOAT   1     4     1h   0   RWD  56
_S_                             FLOAT   1     4     1h   0   RWD  5
_H_                             FLOAT   1     4     60h  0   RWD  44.74
_B_                             FLOAT   1     4     60h  0   RWD  18.54
_A_                             FLOAT   1     4     1h   0   RWD  14.41
_RH_                            FLOAT   1     4     1h   0   RWD  41.81
_AT_                            FLOAT   1     4     1h   0   RWD  20.46
SP                              INT16   1     2     1h   0   RWD  10

Many Thanks
Jago
          Reply  09 Feb 2022, Konstantin Olchanski, Bug Fix, ODBINC/Sequencer Issue 
> 
> [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> [local:mu3eMSci:S]Variables>ls -l
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> _T_                             FLOAT   1     4     1h   0   RWD  20.93
> _F_                             FLOAT   1     4     1h   0   RWD  12.8
> _P_                             FLOAT   1     4     1h   0   RWD  56
> _S_                             FLOAT   1     4     1h   0   RWD  5
> _H_                             FLOAT   1     4     60h  0   RWD  44.74
> _B_                             FLOAT   1     4     60h  0   RWD  18.54
> _A_                             FLOAT   1     4     1h   0   RWD  14.41
> _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> SP                              INT16   1     2     1h   0   RWD  10
> 

This looks okey, so we still have no explanation for your error. Please post your sequencer 
script?

K.O.
             Reply  10 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue 
I tried following script:

ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10

LOOP 10
  WAIT seconds, 3
  ODBINC /Equipment/ArduinoTestStation/Variables/_S_
ENDLOOP

and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
the ODB path pointing to a 32-byte string instead to a 4-byte float.

Stefan  
                Reply  14 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> I tried following script:
> 
> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC /Equipment/ArduinoTestStation/Variables/_S_
> ENDLOOP
> 
> and it worked as expected. So I conclude the problem must be in your script. Probably a typo in 
> the ODB path pointing to a 32-byte string instead to a 4-byte float.
> 
> Stefan  

Hi Stefan,

Cheers for the reply. I believe the syntax we are using is correct. I have tried copying the script 
you used above and it results in the same error as before. Perhaps something is going wrong 
elsewhere - I will take another look today.

Jago
                   Reply  14 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue 
Just post here a minimal script which produces the error, so that I can try myself.

... and make sure that you have the latest develop version of midas.

Stefan
                      Reply  14 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> Just post here a minimal script which produces the error, so that I can try myself.
> 
> ... and make sure that you have the latest develop version of midas.
> 
> Stefan

Here is the simplest script which produces the error:

WAIT seconds, 3
ODBINC /Equipment/ArduinoTestStation/Variables/_S_

I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
year. Indeed we have not installed any later versions of MIDAS since then so we will 
double check we have the latest version.

Jago
                         Reply  14 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue 
> I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> year. Indeed we have not installed any later versions of MIDAS since then so we will 
> double check we have the latest version.

As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
update.

Stefan
                            Reply  14 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> > I noticed that "Jacob Thorne"  in the forum had the same issue as us in Novemeber last 
> > year. Indeed we have not installed any later versions of MIDAS since then so we will 
> > double check we have the latest version.
> 
> As you see from my reply to Jacob, the bug has been fixed in midas since then, so just 
> update.
> 
> Stefan

We have tried updating using both:

git submodule update --init --recursive

and:

git pull --recurse-submodules

But the error still persists. Is there another way to update which we are missing?

Cheers
Jago
                               Reply  15 Feb 2022, Stefan Ritt, Bug Fix, ODBINC/Sequencer Issue 
> But the error still persists. Is there another way to update which we are missing?

The bug was definitively fixed in this modification:

https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2

You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
a "ver" you see which git revision you are currently running. Make sure to get this output:

MIDAS version:      2.1
GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
develop
ODB version:        3


Stefan
                                  Reply  16 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> > But the error still persists. Is there another way to update which we are missing?
> 
> The bug was definitively fixed in this modification:
> 
> https://bitbucket.org/tmidas/midas/commits/5f33f9f7f21bcaa474455ab72b15abc424bbebf2
> 
> You probably forgot to compile/install correctly after your pull. Of you start "odbedit" and do 
> a "ver" you see which git revision you are currently running. Make sure to get this output:
> 
> MIDAS version:      2.1
> GIT revision:       Fri Feb 11 08:56:02 2022 +0100 - midas-2020-08-a-509-g585faa96 on branch 
> develop
> ODB version:        3
> 
> 
> Stefan

We we're having some problems compiling but have got it sorted now - thanks for your help:)

Jago
             Reply  14 Feb 2022, jago aartsen, Bug Fix, ODBINC/Sequencer Issue 
> > 
> > [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> > [local:mu3eMSci:S]Variables>ls -l
> > Key name                        Type    #Val  Size  Last Opn Mode Value
> > ---------------------------------------------------------------------------
> > _T_                             FLOAT   1     4     1h   0   RWD  20.93
> > _F_                             FLOAT   1     4     1h   0   RWD  12.8
> > _P_                             FLOAT   1     4     1h   0   RWD  56
> > _S_                             FLOAT   1     4     1h   0   RWD  5
> > _H_                             FLOAT   1     4     60h  0   RWD  44.74
> > _B_                             FLOAT   1     4     60h  0   RWD  18.54
> > _A_                             FLOAT   1     4     1h   0   RWD  14.41
> > _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> > _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> > SP                              INT16   1     2     1h   0   RWD  10
> > 
> 
> This looks okey, so we still have no explanation for your error. Please post your sequencer 
> script?
> 
> K.O.

Hey, thanks for getting back to me

We are fairly confident the syntax is correct. Having tried the test script posted by Stefan:

> ODBSET /Equipment/ArduinoTestStation/Variables/_S_, 10
> 
> LOOP 10
>   WAIT seconds, 3
>   ODBINC 

The same error is returned:/

We will take another look today.

Jago
Entry  02 Dec 2021, Alexey Kalinin, Bug Report, some frontend kicked by cm_periodic_tasks 
Hello,
We have a small experiment with MIDAS based DAQ.
Status page shows :
ES	ESFrontend@192.168.0.37	207	0.2	0.000
Trigger06	Sample Frontend06@192.168.0.37	1.297M	0.3	0.000
Trigger01	Sample Frontend01@192.168.0.37	1.297M	0.3	0.000
Trigger16	Sample Frontend16@192.168.0.37	1.297M	0.3	0.000
Trigger38	Sample Frontend38@192.168.0.37	1.297M	0.3	0.000
Trigger37	Sample Frontend37@192.168.0.37	1.297M	0.3	0.000
Trigger03	Sample Frontend03@192.168.0.38	1.297M	0.3	0.000
Trigger07	Sample Frontend07@192.168.0.38	1.297M	0.3	0.000
Trigger04	Sample Frontend04@192.168.0.38	59898	0.0	0.000
Trigger08	Sample Frontend08@192.168.0.38	59898	0.0	0.000
Trigger17	Sample Frontend17@192.168.0.38	59898	0.0	0.000


And SYSTEM buffers page shows:
ESFrontend	1968	198	47520	0	0x00000000	0		
193 ms
Sample Frontend06	1332547	1330826	379729872	0	0x00000000	
0		1.1 sec
Sample Frontend16	1332542	1330839	361988208	0	0x00000000	
0		94 ms
Sample Frontend37	1332530	1330841	337798408	0	0x00000000	
0		1.1 sec
Sample Frontend01	1332543	1330829	467136688	0	0x00000000	
0		34 ms
Sample Frontend38	1332528	1330830	291453608	0	0x00000000	
0		1.1 sec
Sample Frontend04	63254	61467	20882584	0	0x00000000	
0		208 ms
Sample Frontend08	63262	61476	27904056	0	0x00000000	
0		205 ms
Sample Frontend17	63271	61473	20433840	0	0x00000000	
0		213 ms
Sample Frontend03	1332549	1330818	386821728	0	0x00000000	
0		82 ms
Sample Frontend07	1332554	1330821	462210896	0	0x00000000	
0		37 ms
Logger	968742	0w+9500418r	0w+2718405736r	0	0x00000000	0	
GET_ALL Used 0 bytes 0.0%	303 ms
rootana	254561	0w+29856958r	0w+8718288352r	0	0x00000000	0		
762 ms


The problem is that eventually some of frontend closed with message 
:19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist

in the meantime mserver loggging :
mserver started interactively
mserver will listen on TCP port 1175
double free or corruption (!prev)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)


I can find some correlation between number of events/event size produced by 
frontend, cause its failed when its become big enough. 

frontend scheme is like this:

poll event time set to 0;

poll_event{
//if buffer not transferred return (continue cutting the main buffer)
//read main buffer from hardware
//buffer not transfered
}

read event{
// cut the main buffer to subevents (cut one event from main buffer) return;
//if (last subevent) {buffer transfered ;return}
}

What is strange to me that 2 frontends (1 per remote pc) causing this.

Also, I'm executing one FEcode with -i # flag , put setting eventid in 
frontend_init , and using SYSTEM buffer for all.

Is there something I'm missing?
Thanks. 
A.
    Reply  26 Jan 2022, Konstantin Olchanski, Bug Report, some frontend kicked by cm_periodic_tasks 
> The problem is that eventually some of frontend closed with message 
> :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
> 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist

This messages means what it says. A client was registered with the SYSMSG buffer and this 
client had pid 9789. At some point some other client (rootana, in this case) checked it and 
process pid 9789 was no longer running. (it then proceeded to remove the registration).

There is 2 possibilities:
- simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for 
the crash.
- unlikely: reported pid is bogus, real pid of your frontend is different, the client 
registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared 
memory buffers, not impossible if your frontend misbehaves and writes to random memory 
addresses. ODB has protection against this (normally turned off, easy to enable, set ODB 
"/experiment/protect odb" to yes), shared memory buffers do not have protection against this 
(should be added?).

Do this. When you start your frontend, write down it's pid, when you see the crash message, 
confirm pid number printed is the same. As additional test, run your frontend inside gdb, 
after it crashes, you can print the stack trace, etc.

> 
> in the meantime mserver loggging :
> mserver started interactively
> mserver will listen on TCP port 1175
> double free or corruption (!prev)
> double free or corruption (!prev)
> free(): invalid next size (normal)
> double free or corruption (!prev)
> 

Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run 
them in different terminals, not all in the same terminal?).

If messages are coming from the mserver, this confirms possibility (1),
except that for frontends connected remotely, the pid is the pid of the mserver,
and what we see are crashes of mserver, not crashes of your frontend. These are much harder to 
debug.

You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"),
confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created
in the directory where you started the mserver), reproduce the crash, run "gdb mserver 
core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me 
directly).

>
> I can find some correlation between number of events/event size produced by 
> frontend, cause its failed when its become big enough. 
> 

There is no limit on event size or event rate in midas, you should not see any crash
regardless of what you do. (there is a limit of event size, because an event has
to fit inside an event buffer and event buffer size is limited to 2 GB).

Obviously you hit a bug in mserver that makes it crash. Let's debug it.

One thing to try is set the write cache size to zero and see if your crash goes away. I see
some indication of something rotten in the event buffer code if write cache is enabled. This
is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
where odb settings have no effect depending on value of "equipment_common_overwrite").

>
> frontend scheme is like this:
> 

Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not
have to debug the convoluted old code in mfe.c.

K.O.

>
> poll event time set to 0;
> 
> poll_event{
> //if buffer not transferred return (continue cutting the main buffer)
> //read main buffer from hardware
> //buffer not transfered
> }
> 
> read event{
> // cut the main buffer to subevents (cut one event from main buffer) return;
> //if (last subevent) {buffer transfered ;return}
> }
> 
> What is strange to me that 2 frontends (1 per remote pc) causing this.
> 
> Also, I'm executing one FEcode with -i # flag , put setting eventid in 
> frontend_init , and using SYSTEM buffer for all.
> 
> Is there something I'm missing?
> Thanks. 
> A.
       Reply  11 Feb 2022, Alexey Kalinin, Bug Report, some frontend kicked by cm_periodic_tasks 
Thanks for the answer.
As soon as I can(possible in a month) I'll try suggestion below:

> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").

I tried to change this ODB for one of the frontend via mhttpd/browser, and eventually it goes back 
to default value (1000 as I remember). but this frontend has the minimum rate 50DWORD/~10sec. and 
depending on cashe size it appears in mdump once per 31 events but all aff them . SO its different 
story, but m.b. it has the same solution to play with Write Cashe Size.    


double free message goes from mserver terminal. 
all of the frontends are remote.
I can't exclude crashes of frontend , but when I run ./frontend -i 1(2,3 etc) thet means that I run 
one code for all, and only several causes crash.also I found that crash in frontend happened while 
it do nothing with collected data (last event reached and new data is not ready), but it tries to 
watch for the ODB changes.I mean it crashes iside (while {odb_changes(value in watchdog)}),and I don't 
know what else happenned meanwhile with cahed buffer.

Future plans is to use event buider for frontends when data/signals will be perfectly reasonable 
i/e/ without broken events. for now i kinda worry about if one of frontends will skip one of the 
event inside its buffer.


Thanks for the way to dig into.
A.    

> > The problem is that eventually some of frontend closed with message 
> > :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer 
> > 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist
> 
> This messages means what it says. A client was registered with the SYSMSG buffer and this 
> client had pid 9789. At some point some other client (rootana, in this case) checked it and 
> process pid 9789 was no longer running. (it then proceeded to remove the registration).
> 
> There is 2 possibilities:
> - simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for 
> the crash.
> - unlikely: reported pid is bogus, real pid of your frontend is different, the client 
> registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared 
> memory buffers, not impossible if your frontend misbehaves and writes to random memory 
> addresses. ODB has protection against this (normally turned off, easy to enable, set ODB 
> "/experiment/protect odb" to yes), shared memory buffers do not have protection against this 
> (should be added?).
> 
> Do this. When you start your frontend, write down it's pid, when you see the crash message, 
> confirm pid number printed is the same. As additional test, run your frontend inside gdb, 
> after it crashes, you can print the stack trace, etc.
> 
> > 
> > in the meantime mserver loggging :
> > mserver started interactively
> > mserver will listen on TCP port 1175
> > double free or corruption (!prev)
> > double free or corruption (!prev)
> > free(): invalid next size (normal)
> > double free or corruption (!prev)
> > 
> 
> Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run 
> them in different terminals, not all in the same terminal?).
> 
> If messages are coming from the mserver, this confirms possibility (1),
> except that for frontends connected remotely, the pid is the pid of the mserver,
> and what we see are crashes of mserver, not crashes of your frontend. These are much harder to 
> debug.
> 
> You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"),
> confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created
> in the directory where you started the mserver), reproduce the crash, run "gdb mserver 
> core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me 
> directly).
> 
> >
> > I can find some correlation between number of events/event size produced by 
> > frontend, cause its failed when its become big enough. 
> > 
> 
> There is no limit on event size or event rate in midas, you should not see any crash
> regardless of what you do. (there is a limit of event size, because an event has
> to fit inside an event buffer and event buffer size is limited to 2 GB).
> 
> Obviously you hit a bug in mserver that makes it crash. Let's debug it.
> 
> One thing to try is set the write cache size to zero and see if your crash goes away. I see
> some indication of something rotten in the event buffer code if write cache is enabled. This
> is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
> where odb settings have no effect depending on value of "equipment_common_overwrite").
> 
> >
> > frontend scheme is like this:
> > 
> 
> Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not
> have to debug the convoluted old code in mfe.c.
> 
> K.O.
> 
> >
> > poll event time set to 0;
> > 
> > poll_event{
> > //if buffer not transferred return (continue cutting the main buffer)
> > //read main buffer from hardware
> > //buffer not transfered
> > }
> > 
> > read event{
> > // cut the main buffer to subevents (cut one event from main buffer) return;
> > //if (last subevent) {buffer transfered ;return}
> > }
> > 
> > What is strange to me that 2 frontends (1 per remote pc) causing this.
> > 
> > Also, I'm executing one FEcode with -i # flag , put setting eventid in 
> > frontend_init , and using SYSTEM buffer for all.
> > 
> > Is there something I'm missing?
> > Thanks. 
> > A.
Entry  28 May 2021, Joseph McKenna, Bug Report, History plots deceiving users into thinking data is still logging flatline.pngflatline.png
I have been trying to fix this myself but my javascript isn't strong... The 'new' history plot render fills in missing data with the last ODB value (even when this value is very old... elog:2180/1 shows this... The data logging stopped, but the history plot can fool users into thinking data is logging (The export button generates CSVs with entires every 10 seconds also). Grepping through the history files behind the scenes, I found only one match for an example variable from this plot, so it looks like there are no entries after March 24th (although I may be mistaken, I've not studied the history files data structure in detail), ie this is a artifact from the mhistory.js rather than the mlogger... Have I missed something simple? Would it be possible to not draw the line if there are no datapoints in a significant time? Or maybe render a dashed line that doesn't export to CSV? Thanks in advance Edit, I see certificate errors this forum and I think its preventing my upload an image... inlining it into the text here:
    Reply  28 May 2021, Stefan Ritt, Bug Report, History plots deceiving users into thinking data is still logging 

This is a known problem and I'm working on. See the discussion at: 

https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for

Stefan

       Reply  02 Jun 2021, Konstantin Olchanski, Bug Report, History plots deceiving users into thinking data is still logging 
https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for

this problem is a blocker for the next midas release.

the best I can tell, current development version of midas writes history data incorrectly,
but I do not have time to look at it at this moment.

I recommend that people use the latest released version, midas-2020-12. (this is what we have on alphag and 
should have in alpha2).

midas-2020-12 uses mlogger from midas-2020-08.

If I cannot find time to figure out what is going on in the mlogger,
the next release may have to be done the same way (with mlogger from midas-2020-08).

K.O.
          Reply  10 Feb 2022, Stefan Ritt, Bug Report, History plots deceiving users into thinking data is still logging 
The problem has been fixed on commit 825935dc on Oct. 2021 and runs fine since then at PSI. If TRIUMF people 
agree, we can close that issue and proceed.

Stefan
Entry  30 Sep 2021, Francesco Renga, Forum, OPC client within MIDAS 
Dear all,
     I need to integrate in my MIDAS project the communication with an OPC UA 
server. My plan is to develop an OPC UA client as a "device" in 
midas/drivers/device.

Two questions:

1) Is anybody aware of some similar effort for some other project, so that I can 
get some example?

2) What could be the more appropriate driver's class to be used? generic.cxx? 
multi.cxx?

Thank you for your help,
           Francesco
    Reply  10 Feb 2022, Francesco Renga, Forum, OPC client within MIDAS opc.cxxopc.h
Dear all,
        I finally succeeded to get a working driver for the communication with an OPC 
UA server. It is based on the open62541 library and I use it in combination with the 
generic.h driver class. This is still a crude implementation, but let me post it here, 
maybe it can be useful to somebody else.

BTW, if there is somebody more skilled than me with OPC UA and MIDAS drivers, who is 
willing to give suggestions for improving the implementation, it would be extremely 
appreciated.

Best Regards,
      Francesco



> Dear all,
>      I need to integrate in my MIDAS project the communication with an OPC UA 
> server. My plan is to develop an OPC UA client as a "device" in 
> midas/drivers/device.
> 
> Two questions:
> 
> 1) Is anybody aware of some similar effort for some other project, so that I can 
> get some example?
> 
> 2) What could be the more appropriate driver's class to be used? generic.cxx? 
> multi.cxx?
> 
> Thank you for your help,
>            Francesco
Entry  07 Feb 2022, Konstantin Olchanski, Forum, MidasWiki moved from ladd00 to daq00.triumf.ca and updated to MediaWiki 1.35 
MidasWiki moved from ladd00 (obsolete SL6) to daq00.triumf.ca (Ubuntu LTS 20.04) 
and updated from obsolete MediaWiki LTS 1.27.7 to MediaWiki LTS 1.35, supported 
until mid-2023, see https://www.mediawiki.org/wiki/Version_lifecycle

Old URL https://midas.triumf.ca and https://midas.triumf.ca/MidasWiki/...
redirect to new URL https://daq00.triumf.ca/MidasWiki/index.php/Main_Page

All old links and bookmarks should continue to work (via redirect).

To report problems with this MediaWiki instance and to request
any changes in configuration or installed extensions, please reply to this
message here.

K.O.
Entry  26 Jan 2022, Frederik Wauters, Forum, .gz files 
I adapted our analyzer to compile against the manalyzer included in the midas repo.

All our data files are .mid.gz, which now fail to process :(

frederik@frederik-ThinkPad-T550:~/new_daq/build/analyzer$ ./analyzer -e100 -s100 ../../run_backup_11783.mid.gz 
...
...n
Registered modules: 1
file[0]: ../../run_backup_11783.mid.gz
Setting up the analysis!
TMReadEvent: error: short read 0 instead of -1193512213

Which is in the TMEvent* TMReadEvent(TMReaderInterface* reader) class in the midasio.cxx file

Reading the unzipped files works. But we have always processed our .gz files directly, for the unzipping we would need ~2x disk space.

Am I doing something wrong? I see that there is some activity on lz4 in the midasio repo, is gunzip next?
    Reply  26 Jan 2022, Konstantin Olchanski, Forum, .gz files 
> I adapted our analyzer to compile against the manalyzer included in the midas repo.
> TMReadEvent: error: short read 0 instead of -1193512213

I think this problem is fixed in the latest version of midasio and manalyzer, but this update
was not pulled into midas yet. (Canada is in the middle of a covid wave since December).

What happens is you do not have the gzip library installed on your computer and
your analyzer is built without support for gzip.

The fix is done the hard way, the gzip library is no longer optional, but required.

You do not say what linux you use, so I cannot give exact instructions, but for:
ubuntu: apt -y install libz-dev
centos7: installed by default
centos8: installed by default
debian11/raspbian: same as ubuntu

K.O.
       Reply  31 Jan 2022, Frederik Wauters, Forum, .gz files 
> > I adapted our analyzer to compile against the manalyzer included in the midas repo.
> > TMReadEvent: error: short read 0 instead of -1193512213
> 
> I think this problem is fixed in the latest version of midasio and manalyzer, but this update
> was not pulled into midas yet. (Canada is in the middle of a covid wave since December).
> 
> What happens is you do not have the gzip library installed on your computer and
> your analyzer is built without support for gzip.
> 
> The fix is done the hard way, the gzip library is no longer optional, but required.
> 
> You do not say what linux you use, so I cannot give exact instructions, but for:
> ubuntu: apt -y install libz-dev
> centos7: installed by default
> centos8: installed by default
> debian11/raspbian: same as ubuntu
> 
> K.O.

My libz under ubuntu

-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- MIDAS: Found ZLIB version 1.2.11

I got both the manalyzer example and mine going with
* the latest midas dev
* + the latest manalyzer (cf6c233)
* + almost latest midasio (568a617, otherwise I get an linking error 

./libmidas.a(midasio.cxx.o): In function `Lz4Error(int)':
midasio.cxx:(.text+0x359): undefined reference to `MLZ4F_getErrorName(unsigned long)'

So this works, I will assume that in the near future this all will come together in the standard midas release.

thanks  
Entry  29 Jan 2022, Isaac Labrie Boulay, Forum, MIDAS and GRIF-16 digitizer (Standalone Mode). 
Hi all,

I was sent a version of the frontend for the TIGRESS Detector lab setup so that 
I can test detectors using a GRIF-16 digitizer in standalone mode.

I followed the GRIF-16 wiki (https://grsi.wiki.triumf.ca/wiki/GRIF-16#One-
level_operation) to setup the GRIF-16 through the webpage. The digitized data is 
supposed to come into my UDP port 8800 but it is never retrieved in the 
frontend.

Here's the readout scheme:
// readout sequence ...
// poll_event() true (if still have data in buffer or testmsg() true)
// -> read_trigger_event() -> read_grifc_event() - re-buffers into midas events
//                         -> grifc_eventread()  - returns single grif fragment
//                         -> grifc_dataread()   - returns single net-pkt 


Here's poll_event():
INT poll_event(INT source, INT count, BOOL test)
{
   int i, have_data=0;

   for(i=0; i<count; i++){
      if( data_available ){ break; }
      have_data = ( testmsg(data_socket, 0) > 0 );
      if( have_data && !test ){ break; }
   }
   return( (have_data || data_available) && !test );
}

This being said, testmsg() always returns empty and "data_available" is only set 
to TRUE when there's leftover data after a GRIF-C reading (I'm obviously not 
using a GRIF-C).

I know that when GRIF-16 is in standalone mode, MIDAS does not change GRIF-16s 
settings based on the ODB, it has to be done through the GRIF-16 webpage. Is the 
user frontend code even responsible for the GRIF-16 data readout in standalone 
mode? If not, could it just be that my UDP offloader is incorrectly setup?

Here are its current settings:

SETTINGS/UDP
- Offloader: ON
- Dst IP: my IP
- Dst Port: 8800 (DATA_PORT)

SETTINGS/MIDAS
- Use MIDAS: OFF
- MIDAS Hostname: my hostname
- MIDAS IP: same as Dst IP from UDP settings
- Dst Port: 8080 (I'm assuming that this is the mhttpd port)

Again, the frontend runs but I get 0 events. What might I be missing?

Thanks for helping me out!

Isaac
Entry  22 Oct 2021, Francesco Renga, Forum, mhttpd error 
Dear all,
      I am trying to make the MIDAS web server for my DAQ project accessible from other machines. In the ODB, I activated the necessary flags:

[local:CYGNUS_RD:S]/WebServer>ls
Enable localhost port           y
localhost port                  8080
localhost port passwords        n
Enable insecure port            y
insecure port                   8081
insecure port passwords         y
insecure port host list         y
Enable https port               y
https port                      8443
https port passwords            y
https port host list            n
Host list
                                localhost
Enable IPv6                     y
Proxy                           
mime.types

Following the instructions on the Wiki I enabled the SSL support. When running mhttpd, I get these messages:

Mongoose web server will use HTTP Digest authentication with realm "CYGNUS_RD" and password file "/home/cygno/DAQ/online/htpasswd.txt"
Mongoose web server will use the hostlist, connections will be accepted only from: localhost
Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
Mongoose web server listening on http address "[::1]:8080", passwords OFF, hostlist OFF
Mongoose web server listening on http address "8081", passwords enabled, hostlist enabled
[mhttpd,ERROR] [mhttpd.cxx:19166:mongoose_listen,ERROR] Cannot mg_bind address "[::0]:8081"
Mongoose web server will use https certificate file "/home/cygno/DAQ/online/ssl_cert.pem"
Mongoose web server listening on https address "8443", passwords enabled, hostlist OFF
[mhttpd,ERROR] [mhttpd.cxx:19166:mongoose_listen,ERROR] Cannot mg_bind address "[::0]:8443"

and the server is not accessible from other machines. Any suggestion to solve or better investigate this problem?

Thank you very much,
         Francesco
    Reply  22 Oct 2021, Stefan Ritt, Forum, mhttpd error 
> Enable IPv6                     y

Probably the IPv6 problem, see here elog:2269

I asked to turn off IPv6 by default, or at least mention this in the documentation,
but unfortunately nothing happened.

Stefan
       Reply  25 Oct 2021, Francesco Renga, Forum, mhttpd error 
It worked, thank you very much!

Francesco

> > Enable IPv6                     y
> 
> Probably the IPv6 problem, see here elog:2269
> 
> I asked to turn off IPv6 by default, or at least mention this in the documentation,
> but unfortunately nothing happened.
> 
> Stefan
       Reply  26 Jan 2022, Konstantin Olchanski, Forum, mhttpd error 
> > Enable IPv6                     y
> 
> Probably the IPv6 problem, see here elog:2269
> 
> I asked to turn off IPv6 by default, or at least mention this in the documentation,
> but unfortunately nothing happened.

But IPv4 and IPv6 code is completely separate, if IPv6 bind fails, IPv4 should still 
work.

This is all very strange.

It does not help that the OP does not say in which way things do not work,
"the server is not accessible from other machines" is not an error message
reported by any browser, and we do not know what URL he is using
to access mhttpd - http: or https:

Also he is enabling the "insecure" port 8081, I am pretty sure the documentation
is pretty clear, either use the secure https port or the insecure port,
but not both at the same time.

In any case, I see current version of mongoose have removed support
for password files, so all this stuff will likely become reworked
and at the end mhttpd will only listen to localhost ports. To make it "accessible
to other machines", one will have to use the apache https proxy. (or mtpcproxy from 
midas).

K.O.
Entry  29 Oct 2021, Kushal Kapoor, Bug Report, Unknown Error 319 from client Screenshot_2021-10-26_114015.png
I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops 
after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on 
localhost.

Frontend code compiled without any errors and MIDAS reads the frontend 
successfully, this only comes when I start the new run on MIDAS, here are a few 
more details from the terminal-

11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size 
mismatch for "/" (expected size: 1, size in ODB: 41920)

11:46:32 [Logger,INFO] Deleting previous file 
"/home/rcmp/online3/run00621_000.root"

11:46:32 [ODBEdit,ERROR] [midas.cxx:5073:cm_transition,ERROR] transition START 
aborted: client "fetiglab" returned status 319

11:46:32 [ODBEdit,ERROR] [midas.cxx:5246:cm_transition,ERROR] Could not start a 
run: cm_transition() status 319, message 'Unknown error 319 from client 
'fetiglab' on host "localhost"'

TR_STARTABORT transition: cleanup after failure to start a run

&#8204;

I’ve also enclosed a screenshot for the same, any suggestions would be highly 
appreciated. thanks
    Reply  26 Jan 2022, Konstantin Olchanski, Bug Report, Unknown Error 319 from client 
> I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops 
> after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on 
> localhost.

actually run never starts.

> 11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size 
> mismatch for "/" (expected size: 1, size in ODB: 41920)

this is the error that causes run start to fail. for reasons unknown
your frontend is trying to do a db_get_record() from "/" (ODB root top directory).

if this is an mfe.c frontend, I do not think I have ever seen it do something
like this.

so, a puzzle.

K.O.
Entry  01 Dec 2021, Lars Martin, Bug Report, Off-by-one in sequencer documentation 
The documentation for the sequencer loop says:

<quote>
LOOP [name ,] n ... ENDLOOP	To execute a loop n times. For infinite loops, "infinite" 
can be specified as n. Optionally, the loop variable running from 0...(n-1) can be accessed 
inside the loop via $name.
</quote>

In fact the loop variable runs from 1...n, as can be seen by running this exciting 
sequencer code:

1 COMMENT "Figuring out MSL"
2 
3 LOOP n,4
4   MESSAGE $n,1
5 ENDLOOP
    Reply  02 Dec 2021, Stefan Ritt, Bug Report, Off-by-one in sequencer documentation 
> The documentation for the sequencer loop says:
> 
> <quote>
> LOOP [name ,] n ... ENDLOOP	To execute a loop n times. For infinite loops, "infinite" 
> can be specified as n. Optionally, the loop variable running from 0...(n-1) can be accessed 
> inside the loop via $name.
> </quote>
> 
> In fact the loop variable runs from 1...n, as can be seen by running this exciting 
> sequencer code:
> 
> 1 COMMENT "Figuring out MSL"
> 2 
> 3 LOOP n,4
> 4   MESSAGE $n,1
> 5 ENDLOOP

Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation.

Stefan
       Reply  26 Jan 2022, Konstantin Olchanski, Bug Report, Off-by-one in sequencer documentation 
> > 3 LOOP n,4
> > 4   MESSAGE $n,1
> > 5 ENDLOOP
> 
> Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation.

Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.

K.O.
          Reply  26 Jan 2022, Stefan Ritt, Bug Report, Off-by-one in sequencer documentation 
> Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.

   for (i=1 ; i<=10 ; i++);     ;-)
             Reply  26 Jan 2022, Konstantin Olchanski, Bug Report, Off-by-one in sequencer documentation 
> > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.
> 
>    for (i=1 ; i<=10 ; i++);     ;-)

Similar code made big news just recently: (scroll down to the example main() program)

https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-
vulnerability-discovered-in-polkits-pkexec-cve-2021-4034

I forget if the FORTRAN rules were "loop once" or "never loop" or if it was different
between Fortran-4, fortran-77, DEC extensions and IBM extension, or if it was a compiler switch.

We should check that we do something reasonable with such loops to zero:

LOOP n,0
   MESSAGE $n,1
ENDLOOP

P.S. Yup. "man g77" option "-fonetrip".

K.O.
Entry  09 Nov 2021, Francesco Renga, Forum, Issue in data writing speed 
Dear all,
       I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera). 
I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz) 
writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back 
into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time, 
but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and 
the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout 
of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you 
explain this behavior? Is there any way to improve it?

Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion 
to improve it, it would be really appreciated.

Thank you very much,
          Francesco



  const char* pSrc = (const char*)bufframe.buf;

  for(int y = 0; y < bufframe.height; y++ ){

    //Copy one row
    const unsigned short* pDst = (const unsigned short*)pSrc;

    //go through the row
    for(int x = 0; x < bufframe.width; x++ ){

      WORD tmpData = *pDst++; 

      *pdata++ = tmpData;

    }

    pSrc += bufframe.rowbytes;

  }
 
    Reply  10 Nov 2021, Stefan Ritt, Forum, Issue in data writing speed 
Midas uses various buffers (in the frontend, at the server side before the SYSTEM buffer, the SYSTEM buffer itself, on the 
logger before writing to disk. All these buffers are in RAM and have fast access, so you can fill them pretty quickly. When
they are full, the logger writes to disk, which is slower. So I believe at 2 Hz your disk can keep up with your writing 
speed, but at 4 Hz (2x8MBx4=32 MB/sec) your disk starts slowing down the writing process. Now 32MB/s is pretty slow for
a disk, so I presume you have turned compression on which takes quite some time.

To verify this, disable logging. The disable compression and keep logging. Then report back here again.

> Dear all,
>        I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera). 
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz) 
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back 
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time, 
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and 
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout 
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you 
> explain this behavior? Is there any way to improve it?
> 
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion 
> to improve it, it would be really appreciated.
> 
> Thank you very much,
>           Francesco
> 
> 
> 
>   const char* pSrc = (const char*)bufframe.buf;
> 
>   for(int y = 0; y < bufframe.height; y++ ){
> 
>     //Copy one row
>     const unsigned short* pDst = (const unsigned short*)pSrc;
> 
>     //go through the row
>     for(int x = 0; x < bufframe.width; x++ ){
> 
>       WORD tmpData = *pDst++; 
> 
>       *pdata++ = tmpData;
> 
>     }
> 
>     pSrc += bufframe.rowbytes;
> 
>   }
>  
    Reply  26 Jan 2022, Konstantin Olchanski, Forum, Issue in data writing speed 
Francesco, when you say "writing an event is slow", do you mean it in the frontend
or in the output data file?

Stefan is quite right about the data file, it can take seconds between generating
an event in the frontend and seeing it written to the data file. (if compression
buffers are too big, an event can sit there forever, until pushed out by next events
or by run stop).

But maybe you see this on the frontend side.

What you are looking at is "real time" performance of the frontend and of the linux kernel.

The mfe.c frontend has many problems with real time performance, it can stall and take a long
time between calls to read_event(), for many reasons.

There are ways around that, but it is simpler to switch to the tmfe c++ frontend
that was designed for good real time performance.

In the tmfe frontend, if you use the polled equipment and enable the poll thread,
your frontend will be limited only by the linux kernel real time performance (i.e.
on a single-core CPU, other programs will delay execution of your frontend
and you will see it as long delays (usec, millisec) between calls to your read_event().

Next limit to real time performance (common to mfe.c and tmfe frontends) is the writing
of event data to the midas shared event buffer. One has to lock the shared memory semaphore
and this has to wait until other users of the event buffer finish their reading
or writing and unlock it. Arbitrary amount of time (usec, millisec, sec) can pass.
(there is also problems with "fairness" of the linux semaphores, a different story, again).

Making things more interesting, midas event buffers implement a write cache (default size 100 kbytes),
events smaller than the cache are quickly accumulated (no need to lock the shared memory semaphore),
them flushed to shared memory when cache is full. This is done to reduce the number
of shared memory semaphore locks per event, in the case of very high rate of very small events.

Solution to all this is to use 2 threads: read the data from hardware in one thread and write the data to midas
in a different thread. Between the threads would be an event fifo (circular buffer in mfe.c,
std::deque<EVENT> in tmfe c++ frontends).

For remote connected frontends, things are a bit different. Event data is written directly
into the TCP socket and as long as socket buffers are big enough, there is no real-time delays,
unless SYSTEM buffer is very congested and mserver does not read the TCP socket quickly enough.
So depending on event size, data rate and tcp socket buffer size, the extra 2nd thread
may not be necessary and poll thread real time performance may be good enough.

I hope this clarifies the situation somewhat.

K.O.

> Dear all,
>        I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera). 
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz) 
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back 
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time, 
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and 
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout 
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you 
> explain this behavior? Is there any way to improve it?
> 
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion 
> to improve it, it would be really appreciated.
> 
> Thank you very much,
>           Francesco
> 
> 
> 
>   const char* pSrc = (const char*)bufframe.buf;
> 
>   for(int y = 0; y < bufframe.height; y++ ){
> 
>     //Copy one row
>     const unsigned short* pDst = (const unsigned short*)pSrc;
> 
>     //go through the row
>     for(int x = 0; x < bufframe.width; x++ ){
> 
>       WORD tmpData = *pDst++; 
> 
>       *pdata++ = tmpData;
> 
>     }
> 
>     pSrc += bufframe.rowbytes;
> 
>   }
>  
       Reply  26 Jan 2022, Konstantin Olchanski, Forum, Issue in data writing speed 
> Francesco, when you say "writing an event is slow", do you mean it in the frontend
> or in the output data file?

Another explanation just occurred to me. We do not know your event size and we do not
know the size of your SYSTEM buffer. But if you have an unlucky combination,
this can happen:

Consider event size is 6 Mbytes, buffer size is 8 Mbytes, enough space for only 1 event.

First event is written quickly (buffer is empty).
Second event will be delayed, there is not enough free space in the buffer, we have
to wait for mlogger to finish reading the first event.

Same thing happens if event size is 3 Mbytes, the first 2 events will write quickly,
writing the 3rd event will be delayed until mlogger does it's thing.

The mlogger reads the SYSTEM buffer "fast" and "quickly", but it can be delayed
for a number of reasons, i.e. handling a history event, a delay writing to disk,
a delay writing to network connected storage, etc.

In general, it is best to size the SYSTEM buffer to hold about 1 second worth
of data (of average size, average rate). If your event size is 4 Mbytes, and you
record them at 10/sec, SYSTEM buffer should be at least 40 Mbytes big. (this is
set in ODB /Experiment/Buffer Sizes). (MIDAS event buffer size is limited to 2 GBytes).

K.O.
Entry  09 Nov 2021, Hunter Lowe, Forum, MityCAMAC Login  
Hello all,

I've recently acquired a MityCAMAC system that was built at TRIUMF and I'm 
having issues accessing it over ethernet.

The system: Ubuntu VM inside Windows 10 machine.

I've tried reconfiguring the network settings for the VM but nmap and arp/ip 
commands have yielded me no results in finding the crate controller. 

I was getting help from Pierre Amaudruz but I think he is now busy for some 
time. I have the mac address of the crate controller and its name. The 
controller seems to initialize fine inside of the CAMAC crate. The windows side 
of the workstation also tells me that an unknown network is in fact connected.

I suspect I either need to do something with an ssh key (which I thought we 
accomplished but maybe not), or perhaps the domain name in the controller needs 
to be changed. 

If anybody has experience working with MityARM I would appreciate any advice I 
could get.

Best,
Hunter Lowe
UNBC Graduate Physics
    Reply  11 Nov 2021, Thomas Lindner, Forum, MityCAMAC Login  
Hi Hunter 

This sounds like a Triumf specific problem; 
not a MIDAS problem.  Please email me directly 
and we can try to solve this problem.

Thomas Lindner
TRIUMF DAQ


 Hello all,
> 
> I've recently acquired a MityCAMAC system 
that was built at TRIUMF and I'm 
> having issues accessing it over ethernet.
> 
> The system: Ubuntu VM inside Windows 10 
machine.
> 
> I've tried reconfiguring the network 
settings for the VM but nmap and arp/ip 
> commands have yielded me no results in 
finding the crate controller. 
> 
> I was getting help from Pierre Amaudruz but 
I think he is now busy for some 
> time. I have the mac address of the crate 
controller and its name. The 
> controller seems to initialize fine inside 
of the CAMAC crate. The windows side 
> of the workstation also tells me that an 
unknown network is in fact connected.
> 
> I suspect I either need to do something with 
an ssh key (which I thought we 
> accomplished but maybe not), or perhaps the 
domain name in the controller needs 
> to be changed. 
> 
> If anybody has experience working with 
MityARM I would appreciate any advice I 
> could get.
> 
> Best,
> Hunter Lowe
> UNBC Graduate Physics
       Reply  26 Jan 2022, Konstantin Olchanski, Info, MityCAMAC Login  
For those curious about CAMAC controllers, this one was built around 2014 to 
replace the aging CAMAC A1/A2 controllers (parallel and serial) in the TRIUMF 
cyclotron controls system (around 50 CAMAC crates). It implements the main
and the auxiliary controller mode (single width and double width modules).

The design predates Altera Cyclone-5 SoC and has separate
ARM processor (TI 335x) and Cyclone-4 FPGA connected by GPMC bus.

ARM processor boots Linux kernel and CentOS-7 userland from an SD card,
FPGA boots from it's own EPCS flash.

User program running on the ARM processor (i.e. a MIDAS frontend)
initiates CAMAC operations, FPGA executes them. Quite simple.

K.O.
Entry  16 Dec 2021, Zaher Salman, Forum, Device driver for modbus 
Dear all, does anyone have an example of for a device driver using modbus or modbus tcp to communicate with a device and willing to share it? Thanks.
    Reply  26 Jan 2022, Konstantin Olchanski, Forum, Device driver for modbus 
> Dear all, does anyone have an example of for a device driver using modbus or modbus tcp to communicate with a device and willing to share it? Thanks.

I have not seen any modbus devices recently, so all my code and examples are quite old.

Basic modbus/tcp communication driver is in the midas repo:

daq00:midas$ find . | grep -i modbus
./drivers/divers/ModbusTcp.cxx
./drivers/divers/ModbusTcp.h
daq00:midas$ 

This driver worked for communication to a modbus PLC (T2K/ND280/TPC experiment in Japan).

An example program to use this driver and test modbus communication is here:
https://bitbucket.org/expalpha/agdaq/src/master/src/modbus.cxx

Because at the end, we do not have any modbus devices in any recent experiment,
I do not have any example of using this driver in the midas frontend. Sorry.

K.O.
Entry  19 Nov 2021, Jacob Thorne, Forum, Sequencer error with ODB Inc 
Hi,

I am having problems with the midas sequencer, here is my code:

1  COMMENT "Example to move a Standa stage"
2  RUNDESCRIPTION "Example movement sequence - each run is one position of a single stage
3  
4  PARAM numRuns
5  PARAM sequenceNumber
6  PARAM RunNum
7  
8  PARAM positionT2
9  PARAM deltapositionT2
10 
11 ODBSet "/Runinfo/Run number", $RunNum
12 ODBSet "/Runinfo/Sequence number", $sequenceNumber
13 
14 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Type of Measurement", 2
15 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Time Bins", 10
16 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Sweeps", 1
17 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Dwell Time", 100000
18 
19 ODBSet "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $positionT2
20 
21 LOOP $numRuns
22  WAIT ODBvalue, "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Ready", ==, 1
23  TRANSITION START
24  WAIT ODBvalue, "/Equipment/Neutron Detector/Statistics/Events sent", >=, 1
25  WAIT ODBvalue, "/Runinfo/State", ==, 1
26  WAIT ODBvalue, "/Runinfo/Transition in progress", ==, 0
27  TRANSITION STOP
28  ODBInc "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $deltapositionT2
29 
30 ENDLOOP
31 
32 ODBSet "/Runinfo/Sequence number", 0

The issue comes with line 28, the ODBInc does not work, regardless of what number I put I get the following error:

[Sequencer,ERROR] [odb.cxx:7046:db_set_data_index1,ERROR] "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position" invalid element data size 32, expected 4

I don't see why this should happen, the format is correct and the number that I input is an int.

Sorry if this is a basic question.

Jacob
    Reply  02 Dec 2021, Stefan Ritt, Forum, Sequencer error with ODB Inc 
Thanks for reporting that bug. Indeed there was a problem in the sequencer code which I fixed now. Please try the updated develop branch.

Stefan
Entry  29 Oct 2021, Frederik Wauters, Bug Report, midas::odb::iterator + operator 
I have 16 array odb key

{"FIR Energy", {
            {"Energy Gap Value", std::array<uint32_t,16>(10) },

I can get the maximum of this array like 


uint32_t max_value = *std::max_element(values.begin(),values.end());

but when I need the maximum of a sub range

uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);

I get

/home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
  584 |   max_value = *std::max_element(values.begin(),values.begin()+4);
      |                                                ~~~~~~~~~~~~~~^~
      |                                                            |  |
      |                                                            |  int
      |               

As the + operator is overloaded for midas::odb::iterator, I was expected this to work.

(and yes, I can find the max element by accessing the elements on by one)
    Reply  29 Oct 2021, Frederik Wauters, Bug Report, midas::odb::iterator + operator | work around 
ok, so retrieving as a std::array (as it was defined) does not work

    std::array<uint32_t,16> avalues = settings["FIR Energy"]["Energy Gap Value"];

but retrieving as an std::vector does, and then I have a standard c++ iterator which I can use in std stuff

    std::vector<uint32_t> values = settings["FIR Energy"]["Energy Gap Value"];

> I have 16 array odb key
> 
> {"FIR Energy", {
>             {"Energy Gap Value", std::array<uint32_t,16>(10) },
> 
> I can get the maximum of this array like 
> 
> 
> uint32_t max_value = *std::max_element(values.begin(),values.end());
> 
> but when I need the maximum of a sub range
> 
> uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);
> 
> I get
> 
> /home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
>   584 |   max_value = *std::max_element(values.begin(),values.begin()+4);
>       |                                                ~~~~~~~~~~~~~~^~
>       |                                                            |  |
>       |                                                            |  int
>       |               
> 
> As the + operator is overloaded for midas::odb::iterator, I was expected this to work.
> 
> (and yes, I can find the max element by accessing the elements on by one)
Entry  25 Oct 2021, Francesco Renga, Forum, Logger crash 
Hello,
     I'm experiencing crashes of the mlogger program on the time scale of a couple 
of days. The only messages from MIDAS are:

05:34:47.336 2021/10/24 [mhttpd,INFO] Client 'Logger' (PID 14281) on database 
'ODB' removed by db_cleanup called by cm_periodic_tasks (idle 10.2s,TO 10s)
05:34:47.335 2021/10/24 [mhttpd,INFO] Client 'Logger' on buffer 'SYSMSG' removed 
by cm_periodic_tasks (idle 10.2s, timeout 10s)

Any suggestion to further investigate this issue?

Thank you very much,
         Francesco
    Reply  25 Oct 2021, Stefan Ritt, Forum, Logger crash 
The short term solution would be to increase the logger timeout in the ODB under

/Programs/Logger/Watchdog timeout

and set it to 6000 (one minute). But that is curing just the symptoms. It would be 
interesting to understand the cause of this error. Probably the logger takes more than 10 
seconds to start or stop the run. The reason could be that the history grow too big (what 
we have right now in MEG II), or some disk problems. But that needs detailed debugging on 
the logger side.

Stefan
ELOG V3.1.4-2e1708b5