Back Midas Rome Roody Rootana
  Midas DAQ System, Page 12 of 47  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
Entry  12 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button. logicCtrl.cppstart_daq.PNG
Hi all,

I'm running into a curious problem when I try to run a program using my custom 
script button. I have been using a script button to start my DAQ, this button 
has always worked. It starts by exporting an absolute path to scripts and then 
runs scripts, my frontend, my analyzer, and mlogger relative to this path.

I recently added a line of code to run a new script "logic_controller". If I run 
the script_daq from my terminal (./start_daq), mhttpd accepts the client and the 
program works as intended. But, if I use the script button, the logic_controller 
program is immediately deleted by MIDAS. It can be seen appearing in the status 
page clients list and then immediately gets deleted. This is a client that runs 
on the local experiment host.

What might be the issue? What is the difference between running the script 
through the terminal as opposed to running it through the mhttpd button?

I have added a picture of my simple script and the logic_controller code.

Any help would be greatly appreciated.

Cheers.

Isaac
    Reply  12 Apr 2021, Ben Smith, Forum, Client gets immediately removed when using a script button. 
> if I use the script button, the logic_controller program is immediately deleted by MIDAS.

This is indeed very curious, and I can't reproduce it on my test experiment. Can you redirect stdout and stderr from the logic_controller program into a file, to see how far the program gets? If it gets to the while loop at the end, then it would be useful to add some debug statements to see what condition causes it to exit the loop.

Are there any relevant messages in the midas message log about the program being killed? What's the value of "/Programs/logic_controller/Watchdog timeout"? 
       Reply  12 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button. debug_logic_controller.txt
> > if I use the script button, the logic_controller program is immediately deleted by MIDAS.
> 
> This is indeed very curious, and I can't reproduce it on my test experiment. Can you redirect stdout and stderr from the logic_controller program into a file, to see how far the program gets? If it gets to the while loop at the end, then it would be useful to add some debug statements to see what condition causes it to exit the loop.

I have redirected stdout and stderr into a text file and I have attached it to this entry. From what the stdout says, it seems that the lambda
function gets called 4 times before the program disconnects from the experiment. Somehow the status must become SS_ABORT or RPC_SHUTDOWN.

> Are there any relevant messages in the midas message log about the program being killed? What's the value of "/Programs/logic_controller/Watchdog timeout"? 

There are no interesting messages in the midas.log and "/Programs/logic_controller/Watchdog timeout" is 10000 when I run the command from the terminal window.
What happens when you run it on your test experiment?

I'll try some more debugging.

Thanks for helping me out! Cheers.

Isaac
          Reply  12 Apr 2021, Ben Smith, Forum, Client gets immediately removed when using a script button. 
I think it would be useful to find the minimal example that exhibits this behaviour.

What happens if your logic controller code is simply the 17 lines below? What happens if you create another script button that only starts the logic controller, not any of the other programs? etc. Gradually re-add features until you hit the problem (or scream in horror if it breaks with 17 lines of C++ and a 1 line shell script).



#include "midas.h"
#include "stdio.h"

int main() {
   cm_connect_experiment("", "", "logic_controller", NULL);

   do {
     int status = cm_yield(100);
     printf("cm_yield returned %d\n", status);
     if (status == SS_ABORT || status == RPC_SHUTDOWN)
       break;
   } while (!ss_kbhit());

   cm_disconnect_experiment();

   return 0;
}
             Reply  13 Apr 2021, Isaac Labrie Boulay, Forum, Client gets immediately removed when using a script button. 
> I think it would be useful to find the minimal example that exhibits this behaviour.
> 
> What happens if your logic controller code is simply the 17 lines below? What happens if you create another script button that only starts the logic controller, not any of the other programs? etc. Gradually re-add features until you hit the problem (or scream in horror if it breaks with 17 lines of C++ and a 1 line shell script).
> 

Hi Ben,

I have followed your suggestions and the program still stops immediately. My status as returned from "cm_yield(100)" is always 412 (SS_TIMEOUT) which is fine. 
The issue is that, when run with the script button, the do-wile loop stops immediately because the !ss_kbhit() always evaluates to FALSE.

My temporary solution has been to let the loop run forever :)

Let me know what think. Thanks again!

Isaac

> 
> 
> #include "midas.h"
> #include "stdio.h"
> 
> int main() {
>    cm_connect_experiment("", "", "logic_controller", NULL);
> 
>    do {
>      int status = cm_yield(100);
>      printf("cm_yield returned %d\n", status);
>      if (status == SS_ABORT || status == RPC_SHUTDOWN)
>        break;
>    } while (!ss_kbhit());
> 
>    cm_disconnect_experiment();
> 
>    return 0;
> }
                Reply  13 Apr 2021, Stefan Ritt, Forum, Client gets immediately removed when using a script button. 
> I have followed your suggestions and the program still stops immediately. My status as returned from "cm_yield(100)" is always 412 (SS_TIMEOUT) which is fine. 
> The issue is that, when run with the script button, the do-wile loop stops immediately because the !ss_kbhit() always evaluates to FALSE.
> 
> My temporary solution has been to let the loop run forever :)

Ahh, could be that ss_kbhit() misbehaves if there is no keyboard, meaning that it is started in the background as a script. 
We never had the issue before, since all "standard" midas programs like mlogger, mhttpd etc. also use ss_kbhit() and they 
can be started in the background via the "-D" flag, but maybe the stdin is then handled differentlhy. 

So just remove the ss_kbhit(), but keep the break, so that you can stop your program via the web page, like

#include "midas.h"
#include "stdio.h"

int main() {
  cm_connect_experiment("", "", "logic_controller", NULL);

  do {
    int status = cm_yield(100);
    printf("cm_yield returned %d\n", status);
    if (status == SS_ABORT || status == RPC_SHUTDOWN)
      break;
  } while (TRUE);

  cm_disconnect_experiment();

  return 0;
}
Entry  04 Apr 2021, Konstantin Olchanski, Info, bk_init32a data format 
In April 4th 2020 Stefan added a new data format that fixes the well known problem with alternating banks being 
misaligned against 64-bit addresses. (cannot find announcement on this forum. midas commit 
https://bitbucket.org/tmidas/midas/commits/541732ea265edba63f18367c7c9b8c02abbfc96e)

This brings the number of midas data formats to 3:

bk_init: bank_header_flags set to 0x0000001 (BANK_FORMAT_VERSION)
bk_init32: bank_header_flags set to 0x0000011 (BANK_FORMAT_VERSION | BANK_FORMAT_32BIT)
bk_init32a: bank_header_flags set to 0x0000031 (BANK_FORMAT_VERSION | BANK_FORMAT_32BIT | BANK_FORMAT_64BIT_ALIGNED;

TMEvent (midasio and manalyzer) support for "bk_init32a" format added today (commit 
https://bitbucket.org/tmidas/midasio/commits/61b7f07bc412ea45ed974bead8b6f1a9f2f90868)

TMidasEvent (rootana) support for "bk_init32a" format added today (commit 
https://bitbucket.org/tmidas/rootana/commits/3f43e6d30daf3323106a707f6a7ca2c8efb8859f)

ROOTANA should be able to handle bk_init32a() data now.

TMFE MIDAS c++ frontend switched from bk_init32() to bk_init32a() format (midas commit 
https://bitbucket.org/tmidas/midas/commits/982c9c2f8b1e329891a782bcc061d4c819266fcc)

K.O.
    Reply  13 Apr 2021, Konstantin Olchanski, Info, bk_init32a data format 
Until commit a4043ceacdf241a2a98aeca5edf40613a6c0f575 today, mdump mostly did not work with bank32a data.
K.O.


> In April 4th 2020 Stefan added a new data format that fixes the well known problem with alternating banks being 
> misaligned against 64-bit addresses. (cannot find announcement on this forum. midas commit 
> https://bitbucket.org/tmidas/midas/commits/541732ea265edba63f18367c7c9b8c02abbfc96e)
> 
> This brings the number of midas data formats to 3:
> 
> bk_init: bank_header_flags set to 0x0000001 (BANK_FORMAT_VERSION)
> bk_init32: bank_header_flags set to 0x0000011 (BANK_FORMAT_VERSION | BANK_FORMAT_32BIT)
> bk_init32a: bank_header_flags set to 0x0000031 (BANK_FORMAT_VERSION | BANK_FORMAT_32BIT | BANK_FORMAT_64BIT_ALIGNED;
> 
> TMEvent (midasio and manalyzer) support for "bk_init32a" format added today (commit 
> https://bitbucket.org/tmidas/midasio/commits/61b7f07bc412ea45ed974bead8b6f1a9f2f90868)
> 
> TMidasEvent (rootana) support for "bk_init32a" format added today (commit 
> https://bitbucket.org/tmidas/rootana/commits/3f43e6d30daf3323106a707f6a7ca2c8efb8859f)
> 
> ROOTANA should be able to handle bk_init32a() data now.
> 
> TMFE MIDAS c++ frontend switched from bk_init32() to bk_init32a() format (midas commit 
> https://bitbucket.org/tmidas/midas/commits/982c9c2f8b1e329891a782bcc061d4c819266fcc)
> 
> K.O.
Entry  22 Sep 2020, Frederik Wauters, Forum, INT INT32 in experim.h 
For my analyzer I generate the experim.h file from the odb.

Before midas commit 13c3b2b this generates structs with INT data types. compiles fine with my analysis code (using the old mana.cpp)

newer midas versions generate INT32, ... types. I get a 

‘INT32’ does not name a type   

although I include midas.h 

how to fix this?
    Reply  22 Sep 2020, Konstantin Olchanski, Forum, INT INT32 in experim.h 
> For my analyzer I generate the experim.h file from the odb.
> 
> Before midas commit 13c3b2b this generates structs with INT data types. compiles fine with my analysis code (using the old mana.cpp)
> 
> newer midas versions generate INT32, ... types. I get a 
> 
> ‘INT32’ does not name a type   
> 
> although I include midas.h 
> 
> how to fix this?

You could run experim.h through "sed" to replace the "wrong" data types with the correct data types.

You can also #define the "wrong" data types before doing #include experim.h.

I put your bug report into our bug tracker, but for myself I am very busy
with the alpha-g experiment and cannot promise to fix this quickly.

https://bitbucket.org/tmidas/midas/issues/289/int32-types-in-experimh

Here is an example to substitute things using "sed" (it can also do "in-place" editing, "man sed" and google sed examples)
sed "sZshm_unlink(.*)Zshm_unlink(SHM)Zg"

K.O.
       Reply  09 Mar 2021, Andreas Suter, Forum, INT INT32 in experim.h 
> > For my analyzer I generate the experim.h file from the odb.
This issue is still open. Shouldn't midas.h provide the 'new' data types as typedefs like  

typedef int INT32;

etc. Of course you would need to deal with all the supported targets and wrap it accordingly.

A.S.

> > 
> > Before midas commit 13c3b2b this generates structs with INT data types. compiles fine with my analysis code (using the old mana.cpp)
> > 
> > newer midas versions generate INT32, ... types. I get a 
> > 
> > ‘INT32’ does not name a type   
> > 
> > although I include midas.h 
> > 
> > how to fix this?
> 
> You could run experim.h through "sed" to replace the "wrong" data types with the correct data types.
> 
> You can also #define the "wrong" data types before doing #include experim.h.
> 
> I put your bug report into our bug tracker, but for myself I am very busy
> with the alpha-g experiment and cannot promise to fix this quickly.
> 
> https://bitbucket.org/tmidas/midas/issues/289/int32-types-in-experimh
> 
> Here is an example to substitute things using "sed" (it can also do "in-place" editing, "man sed" and google sed examples)
> sed "sZshm_unlink(.*)Zshm_unlink(SHM)Zg"
> 
> K.O.
          Reply  10 Mar 2021, Stefan Ritt, Forum, INT INT32 in experim.h 
Ok, I added

/* define integer types with explicit widths */
#ifndef NO_INT_TYPES_DEFINE
typedef unsigned char      UINT8;
typedef char               INT8;
typedef unsigned short     UINT16;
typedef short              INT16;
typedef unsigned int       UINT32;
typedef int                INT32;
typedef unsigned long long UINT64;
typedef long long          INT64;
#endif

to cover all new types. If there is a collision with user defined types, compile your program with -DNO_INT_TYPES_DEFINE and you remove the 
above definition. I hope there are no other conflicts.

Stefan
             Reply  15 Mar 2021, Frederik Wauters, Forum, INT INT32 in experim.h 
works!

> Ok, I added
> 
> /* define integer types with explicit widths */
> #ifndef NO_INT_TYPES_DEFINE
> typedef unsigned char      UINT8;
> typedef char               INT8;
> typedef unsigned short     UINT16;
> typedef short              INT16;
> typedef unsigned int       UINT32;
> typedef int                INT32;
> typedef unsigned long long UINT64;
> typedef long long          INT64;
> #endif
> 
> to cover all new types. If there is a collision with user defined types, compile your program with -DNO_INT_TYPES_DEFINE and you remove the 
> above definition. I hope there are no other conflicts.
> 
> Stefan
                Reply  30 Mar 2021, Konstantin Olchanski, Forum, INT INT32 in experim.h 
> > 
> > /* define integer types with explicit widths */
> > #ifndef NO_INT_TYPES_DEFINE
> > typedef unsigned char      UINT8;
> > typedef char               INT8;
> > typedef unsigned short     UINT16;
> > typedef short              INT16;
> > typedef unsigned int       UINT32;
> > typedef int                INT32;
> > typedef unsigned long long UINT64;
> > typedef long long          INT64;
> > #endif
> > 

NIH at work. In C and C++ the standard fixed bit length data types are available
in #include <stdint.h> as uint8_t, uint16_t, uint32_t, uint64_t & co.

BTW, the definition of UINT32 as "unsigned int" is technically incorrect, on 16-bit machines
"int" is 16-bit wide and on some 64-bit machines "int" is 64-bit wide.

K.O.
Entry  05 Mar 2021, Svetlana Chesnevskaya, Bug Report, New MIDAS old frontend incompatibility error.log
Hello!

Could you help me solve the problem of compatibility between our frontend (created in 2017) and the fresh MIDAS? The old MIDAS (2017) worked well, then we did not use it.
While compiling the frontend, I get a lot of warnings and a few compilation errors.

Any help will be greatly appreciated.

Thanks in advance.
With the best regards,
Svetlana
Entry  01 Mar 2021, Marius Koeppel, Forum, Using JSROOT.openFile with Midas 
Hi everyone,

I am currently trying to access a ROOT file produced by manalyzer. By calling JSROOT.openFile("MIDAS_DOMAIN/outputRUN.root"). I can download the rootfile via MIDAS_DOMAIN/outputRUN.root. Using JSROOT.openFile results in an 501 error, 
since the request feature is not provided. Using a simple API and uploading outputRUN.root there worked fine (when the run finised). 

Is there a way to use JSROOT.openFile with the current analyzed root file in Midas (so during the run)? I know that one can access histograms of the THttpServer via JSON but I need to get the full root tree.

Cheers,
Marius
    Reply  03 Mar 2021, Konstantin Olchanski, Forum, Using JSROOT.openFile with Midas 
> 
> I am currently trying to access a ROOT file produced by manalyzer. By calling JSROOT.openFile("MIDAS_DOMAIN/outputRUN.root"). I can download the rootfile via MIDAS_DOMAIN/outputRUN.root. Using JSROOT.openFile results in an 501 error, 
> since the request feature is not provided. Using a simple API and uploading outputRUN.root there worked fine (when the run finised). 
> 
> Is there a way to use JSROOT.openFile with the current analyzed root file in Midas (so during the run)? I know that one can access histograms of the THttpServer via JSON but I need to get the full root tree.
> 

Good questions. Right now in manalyzer I do not do anything more than starting the ROOT web server (so whatever
they support should work) and providing two "standard" location: one in the output file for histograms
and other permanent output and one in memory for transient objects, such as waveform plots, etc.

At some point I would like to provide a function to "get" TAFlowEvent objects so you can do things
like event displays in javascript. But I need a c++ to json serializer and standard c++ does not have it.
So I will have to use the clang serializer (also used by ROOT) and it will take me a few days
to figure it out.

Back to "openFile".

If you figure out the missing bits that need to be added to our code,
please post them here or submit them as a pull request or a bug report in bitbucket.

Also it would be good if you can provide a code example of "openFile" working elsewhere
but not with manalyzer, if I can run it, maybe I can figure out what's missing. But lacking
some example code, there is nothing for me to hack at.

K.O.
       Reply  04 Mar 2021, Marius Koeppel, Forum, Using JSROOT.openFile with Midas test.htmlexample_root.root
Thank you for the answer :)

> At some point I would like to provide a function to "get" TAFlowEvent objects so you can do things
> like event displays in javascript. But I need a c++ to json serializer and standard c++ does not have it.
> So I will have to use the clang serializer (also used by ROOT) and it will take me a few days
> to figure it out.

That sounds exactly what I was searching for. Because I wanted to create an interface between rootana and 
an event display build in javascript. Since all I tried did not really worked with the current rootana
I have now a "solution". I use the MIDAS python client, read directly events from the MIDAS buffer and provided
the events in a JSON format with the python flask API. Since the rendering of the event display is the bottleneck 
and I only need a view events to display this solution worked really well for me. Maybe having such a JSON API of 
the event buffer in MIDAS directly would also work for most of the event display applications or other simple javescript 
applications (my opinion).

> If you figure out the missing bits that need to be added to our code,
> please post them here or submit them as a pull request or a bug report in bitbucket.

One of the problems I had was the CORS domain of the THttpServer. In manalyzer:1894 you do 
sprintf(str, "http:127.0.0.1:%d", httpPort); but there are additional options for the THttpServer (like "?cors=DOMAIN").
So maybe a flag while starting manalyzer passing such options would be nice. I will create a pull request passing them later.

> Also it would be good if you can provide a code example of "openFile" working elsewhere
> but not with manalyzer, if I can run it, maybe I can figure out what's missing. But lacking
> some example code, there is nothing for me to hack at.

The problem is that it was not even running on simple THttpServer using interactive root:

    serv = new THttpServer("http:8088?cors=*");
    TFile *_file0 = TFile::Open("example_root.root")
    serv->Register("File", _file0);

So I tried just saving the file in the $MIDAS_DIR and tried to use mserver with JSROOT.openFile. I attached the html file and 
a test root file.

Cheers,
Marius
          Reply  04 Mar 2021, Konstantin Olchanski, Forum, Using JSROOT.openFile with Midas 
well, if this is something in ROOT, perhaps you can pursue it with the ROOT crowd,
they are quite friendly.

on my side, if all you need is to pull event data banks, this is easy to add
in mhttpd.

the jsonrpc request will look something like this:

get_event {
"buffer":"system",
"get_type":"GET_LATEST", (or whatever bm_receive_event() can do)
"include_banks":["AAAA","BBBB"],
"exclude_banks":["CCCC","DDDD"]
}

and return something like this:

event {
"header":{"event_id":1,...},
"banks":{
"AAAA":[1,2,3,4],
"BBBB":NULL (you asked for it, so you always get it, but it is NULL if bank does not exist)
}
}

would this work for what you are doing?

(this is not good enough if data has to be pre-digested by c++ analysis in rootana)

K.O.
             Reply  04 Mar 2021, Stefan Ritt, Forum, Using JSROOT.openFile with Midas 
I also need midas events going back to the browser for single event display, so put +1 for me.

Please also consider to use JavaScript typed arrays instead of JSON. For large midas banks, type 
arrays are 5-10 times faster than JSON encoding/decoding.

Best,
Stefan
             Reply  04 Mar 2021, Marius Koeppel, Forum, Using JSROOT.openFile with Midas 
> would this work for what you are doing?

Yes, having such a function would be perfect for the applications I have a the moment.

> (this is not good enough if data has to be pre-digested by c++ analysis in rootana)

Also agree, if one wants to have a more sophisticated applications it is definitely needed to preprocess the data.

Cheers,
Marius
Entry  25 Feb 2021, Isaac Labrie Boulay, Bug Report, Undefined client causing issues in transition. error_message.PNGundefined_client.PNG
Hi all,

I'm currently experiencing an issue during run transitions. It comes in the form 
of an alert saying "TypeError: Cannot read property 'length' of undefined" 
whenever I'm in the "transition" window on mhttpd. I have attached an image of 
what the transition window looks like when this happens. 

By the looks of it and by peering at the lines in transition.html where the 
error occurs, it's pretty obvious that there is some strange undefined client 
that the web page tries to access.

I don't know how to find what this client is. Is there a way to see it in the 
ODB? 

The issues happens in show_client() of transition.html (called by callback()). 
Here's the trace:

Uncaught (in promise) TypeError: Cannot read property 'length' of undefined
    at show_client (?cmd=Transition:227)
    at callback (?cmd=Transition:420)
    at ?cmd=Transition:430

Any help would be very appreciated!

Thanks so much.

Isaac
    Reply  25 Feb 2021, Konstantin Olchanski, Bug Report, Undefined client causing issues in transition. 
Clearly something goes wrong with the STARTABORT transition. Actually from your 
sceenshot, it is not clear why the STARTABORT transition was initiated.

Usually it is called after some client fails the "start run" transition to inform 
other clients that the run did not start after all. (mlogger uses this to close the 
output file, etc).

But in the screenshot, we do not see any client fail the transition (only rootana1 
was called, and it returned "green").

So, a puzzle. One possibility is that the transition code gets so confused
that it does not record correct transition data to ODB, then the web page
gets even more confused.

One way to see what happens, is to run the odbedit command "start now -v".

Can you try that? And attach all it's output here?

K.O.
       Reply  26 Feb 2021, Isaac Labrie Boulay, Bug Report, Undefined client causing issues in transition. start_now_-v_(1).PNGstop.PNG
> Clearly something goes wrong with the STARTABORT transition. Actually from your 
> sceenshot, it is not clear why the STARTABORT transition was initiated.
> 
> Usually it is called after some client fails the "start run" transition to inform 
> other clients that the run did not start after all. (mlogger uses this to close the 
> output file, etc).
> 
> But in the screenshot, we do not see any client fail the transition (only rootana1 
> was called, and it returned "green").
> 
> So, a puzzle. One possibility is that the transition code gets so confused
> that it does not record correct transition data to ODB, then the web page
> gets even more confused.
> 
> One way to see what happens, is to run the odbedit command "start now -v".
> 
> Can you try that? And attach all its output here?
> 
> K.O.

Thanks for getting back to me right away. I've attached two screenshots. The first one 
is the output after running "start now -v" (everything seemed to work nicely there), the 
second output is after using odbedit to stop the run with "stop". Notice that the DAQ 
never stops because it gets stuck in between transitions (You can see the run status 
being "stopping run" with the cancel transition button).

Thanks.

Isaac
          Reply  26 Feb 2021, Konstantin Olchanski, Bug Report, Undefined client causing issues in transition. 
So there is no error on run start anymore? To debug the stuck run stop, please use "stop -v" 
to see where it got stuck. You can also play with the RPC timeouts (the connect timeout and 
the response timeout), to make it get "unstuck" quicker. Definitely it should not be stuck 
forever, it should timeout at maximum of "rpc timeout * number of clients". K.O.
             Reply  26 Feb 2021, Isaac Labrie Boulay, Bug Report, Undefined client causing issues in transition. 
> So there is no error on run start anymore? To debug the stuck run stop, please use "stop -v" 
> to see where it got stuck. You can also play with the RPC timeouts (the connect timeout and 
> the response timeout), to make it get "unstuck" quicker. Definitely it should not be stuck 
> forever, it should timeout at maximum of "rpc timeout * number of clients". K.O.

You're right it does not stay stuck forever, it eventually gets unstuck. I forgot to mention this. 
I will try to play with these timeout parameters. It does not get stuck if I run my DAQ using the 
odbedit commands (start/stop). I don't know if this is relevant information that could help us 
identify the problem.

Thanks for all your help as always!

Isaac
                Reply  03 Mar 2021, Konstantin Olchanski, Bug Report, Undefined client causing issues in transition. 
> It does not get stuck if I run my DAQ using the odbedit commands (start/stop).

Interesting. Run start/stop from odbedit works but from mhttpd gets stuck.

I think they do not run the transition quite the same way. mhttpd uses the multithreaded transition.

So we can debug this using the "mtransition" program. Try:

- mtransition -v -d 1 START/STOP -- this should be same as odbedit
- mtransition -m -v -d 1 START/STOP -- this should be same as mhttpd

The "-v" and "-d 1" flags should cause lots of output, for failed transitions,
cut-and-paste it all into this elog here, should give us plenty of meat to debug.

K.O.
Entry  02 Mar 2021, Konstantin Olchanski, Info, shortest possible sleep 
since I am implementing a polled equipment, I was curious what is the smallest possible sleep time on current computers.

in current UNIX, there are 2 system calls available for sleeping: select() (with microsecond granularity) and nanosleep() (with nanosecond granularity).

So I wrote a little test program to check it out (progs/test_sleep).

First, Linux result using select(). Typical run on AMD 3700X CPU (4.1 GHz turbo boost) with Ubuntu LTS 20, linux kernel 5.8:

daq13:midas$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1003368.855 usec actual, 100336.885 usec actual per loop, oversleep 336.885 usec, 0.3%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1008512.020 usec actual, 10085.120 usec actual per loop, oversleep 85.120 usec, 0.9%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1062137.842 usec actual, 1062.138 usec actual per loop, oversleep 62.138 usec, 6.2%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1528650.999 usec actual, 152.865 usec actual per loop, oversleep 52.865 usec, 52.9%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  6250898.123 usec actual, 62.510 usec actual per loop, oversleep 52.510 usec, 525.1%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 54056918.144 usec actual, 54.057 usec actual per loop, oversleep 53.057 usec, 5305.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,   210875.988 usec actual, 0.211 usec actual per loop, oversleep 0.111 usec, 110.9%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total,   204804.897 usec actual, 0.205 usec actual per loop, oversleep 0.195 usec, 1948.0%
daq13:midas$ 

How to read this:

First line is 10 sleeps of 100 ms, for a total of 1 sec. this actually sleeps for a bit longer,
average over-sleep is 300 usec out of 100 ms is 0.3%.

Next few lines use progressively shorter sleep, 10 ms, 1 ms and 0.1 ms. over-sleep is consistently around 50-60 usec,
which I conclude to be this linux sleep granularity.

Last two lines try sleep for 0.1 usec and 0.01 usec, resulting in a zero-time sleep of select(),
so we just measure the average time cost of a linux syscall, around 200 ns in this machine.

Going to different machines:

Intel E-2236 (4.8 GHz tutboboost), Ubuntu LTS 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 400 ns.
Intel E-2226G (same, see arc.intel.com), CentOS-7, linux kernel 3.10: over-sleep is 60 usec, zero-sleep is 600 ns.
VME processor (2 GHz Intel T7400), Ubuntu 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 1700 ns.

This is pretty consistent, select() over-sleep is 60 usec on all hardware, zero-sleep tracks CPU GHz ratings.

Next, MacOS result, MacBookAir2020, MacOS 10.15.7, CPU 1.2 GHz i7-1060G7:

4ed0:midas olchansk$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1031108.856 usec actual, 103110.886 usec actual per loop, oversleep 3110.886 usec, 3.1%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1091104.984 usec actual, 10911.050 usec actual per loop, oversleep 911.050 usec, 9.1%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1270800.829 usec actual, 1270.801 usec actual per loop, oversleep 270.801 usec, 27.1%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1370345.116 usec actual, 137.035 usec actual per loop, oversleep 37.035 usec, 37.0%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  1706473.112 usec actual, 17.065 usec actual per loop, oversleep 7.065 usec, 70.6%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total,  5150341.034 usec actual, 5.150 usec actual per loop, oversleep 4.150 usec, 415.0%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,   595654.011 usec actual, 0.596 usec actual per loop, oversleep 0.496 usec, 495.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total,   591560.125 usec actual, 0.592 usec actual per loop, oversleep 0.582 usec, 5815.6%
4ed0:midas olchansk$ 

things are quite different here, OS is Mach microkernel with an oldish FreeBSD UNIX single-server (from NextSTEP),
so the sleep granularity is different, better than linux. zero-sleep still measures the syscall time, 600 ns on this machine.

Next we measure the same using the nansleep() syscall.

daq13:midas$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1004133.940 usec actual, 100413.394 usec actual per loop, oversleep 413.394 usec, 0.4%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1046117.067 usec actual, 10461.171 usec actual per loop, oversleep 461.171 usec, 4.6%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1096894.979 usec actual, 1096.895 usec actual per loop, oversleep 96.895 usec, 9.7%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1526744.843 usec actual, 152.674 usec actual per loop, oversleep 52.674 usec, 52.7%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  6250154.018 usec actual, 62.502 usec actual per loop, oversleep 52.502 usec, 525.0%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 53344123.125 usec actual, 53.344 usec actual per loop, oversleep 52.344 usec, 5234.4%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 52641665.936 usec actual, 52.642 usec actual per loop, oversleep 52.542 usec, 52541.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 52637501.001 usec actual, 52.638 usec actual per loop, oversleep 52.628 usec, 526275.0%
daq13:midas$ 

Here everything is simple. sleep longer than 1000 usec works the same as select(), sleep for shorter than 100 usec sleeps for 52 usec, regardless of what 
we ask for.

MacOS does no better, long sleeps are same as select(), sleeps is 1 usec or less sleep for too long. no improvement over select().

4ed0:midas olchansk$ ./bin/test_sleep 
sleep      10 loops, 0.100000 sec per loop, 1.000000 sec total,  1023327.827 usec actual, 102332.783 usec actual per loop, oversleep 2332.783 usec, 2.3%
sleep     100 loops, 0.010000 sec per loop, 1.000000 sec total,  1130330.086 usec actual, 11303.301 usec actual per loop, oversleep 1303.301 usec, 13.0%
sleep    1000 loops, 0.001000 sec per loop, 1.000000 sec total,  1333846.807 usec actual, 1333.847 usec actual per loop, oversleep 333.847 usec, 33.4%
sleep   10000 loops, 0.000100 sec per loop, 1.000000 sec total,  1402330.160 usec actual, 140.233 usec actual per loop, oversleep 40.233 usec, 40.2%
sleep   99999 loops, 0.000010 sec per loop, 0.999990 sec total,  2034706.831 usec actual, 20.347 usec actual per loop, oversleep 10.347 usec, 103.5%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total,  6646192.074 usec actual, 6.646 usec actual per loop, oversleep 5.646 usec, 564.6%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total,  7556284.189 usec actual, 7.556 usec actual per loop, oversleep 7.456 usec, 7456.3%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 15720005.035 usec actual, 15.720 usec actual per loop, oversleep 15.710 usec, 157100.1%
4ed0:midas olchansk$ 

On Linux, strace tells us that the actual syscall behind nanosleep() is this:
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=10000}, 0x7fffc159e200) = 0

Let's try it directly... result is the same.
Let's try it with CLOCK_MONOTONIC... result is the same.

The man page of clock_nanosleep() specifies that this syscall always suspends the calling thread,
so what we see here is the Linux scheduler tick size.

Bottom line.

On current linux, shortest sleep is around 100 usec both select() and nanosleep().
On MacOS, shortest sleep is down to 5 usec using select(), but I cannot tell if CPU sleeps or busy-loops.

select() is still the best syscall for sleeping.

K.O.
    Reply  02 Mar 2021, Stefan Ritt, Info, shortest possible sleep 
Why do you need that? Periodic equipment typically runs ever ten seconds or so, meaning one can do this easily in a scheduler.

For polled equipment, you don't want to sleep at all. Because if you sleep, you might miss an event. That's why I put my poll in mfe.c into a for() loop. No 
sleep, maximum polling rate. I just double checked on my macbook air. 

- If poll is always false (no event available), the loop executes 50M times in 100ms (calibrated during startup of the frontend). That means one iteration 
takes 2ns (!). So if an event occurs, the readout is started with a 2ns overhead. No sleep can beat that. In a real world application, one has to add of course 
the VME access or so to poll for the event.

- If poll is always true, the framework generates about 700k events each second (returning jus a few bytes of event data).

So if one adds any sleep here, things can get only worse, so I don't see the point for that. Of course polling eats one kernel at 100%, but these days every 
CPU has more than one, even my 800 MHz Xilinx embedded ARM CPU (Zynq).

Best,
Stefan
       Reply  03 Mar 2021, Konstantin Olchanski, Info, shortest possible sleep 
> Why do you need that?

UNIX/POSIX advertises functions for sleeping in microseconds and nanoseconds,
for sure it is interesting to know what they actually do and what happens
when you ask them to sleep for 1 microsecond or 1 nanosecond.

To sleep or not to sleep that is a question.

But if I do decide to sleep, and I call the sleep function, I want to know what actually happens.

Now I do and I share it with all.

On current Linux, shortest sleep is around 60 usec. select() with sleep
shorter than that will not sleep at all, nanosleep() will always sleep for
the shortest amount.

P.S. For fans of interrupts ("because they are fast"), sleep waiting for interrupt
probably has same latency/granularity as above (60 usec), so if I drive a DMA engine
and I except the DMA transfer to complete under 60 usec, I should use a busy loop
to poll the "DMA done" bit instead of going to sleep and wait for the DMA interrupt.

K.O.
Entry  25 Feb 2021, Lars Martin, Bug Report, tmfe_main.cxx missing include <signal.h> 
The most recent commit (b43aef648c2f8a7e710a327d0b322751ae44afea) throws this 
compiler error:
src/tmfe_main.cxx:39:11: error: 'SIGPIPE' was not declared in this scope
    signal(SIGPIPE, SIG_IGN);

It's fixed by adding #include <signal.h> to that file.
    Reply  25 Feb 2021, Konstantin Olchanski, Bug Report, tmfe_main.cxx missing include <signal.h> 
> The most recent commit (b43aef648c2f8a7e710a327d0b322751ae44afea) throws this 
> compiler error:
> src/tmfe_main.cxx:39:11: error: 'SIGPIPE' was not declared in this scope
>     signal(SIGPIPE, SIG_IGN);
> 
> It's fixed by adding #include <signal.h> to that file.

"but it works just fine on my mac!"

anyhow, thank you for reporting this problem, it already fixed. the bitbucket auto-
build also caught it. I also boogered up "make remoteonly", also fixed now.

BTW, for production use I recommend midas from the "release" branches, unless one 
needs a bug fix or new feature from the development branch.

K.O.
       Reply  26 Feb 2021, Lars Martin, Bug Report, tmfe_main.cxx missing include <signal.h> 
> BTW, for production use I recommend midas from the "release" branches, unless one 
> needs a bug fix or new feature from the development branch.

Fair point. I would suggest adding that recommendation to the wiki instructions. I 
forget to add that step otherwise.
Entry  10 Feb 2021, Isaac Labrie Boulay, Forum, Javascript error during run transitions. 
Hi all,

I am encountering a Javascript error (TypeError: client.error is undefined) when 
I transition between run states. Does anybody have an idea of what my problem 
might be? I have pasted an example of what MIDAS logs during such sequences.

Thanks for all the help!

Isaac


09:24:08.611 2021/02/10 [mhttpd,INFO] Executing script 
"~/ANIS_20210106/scripts/start_daq.sh" from ODB "/Script/Start DAQ"

09:24:13.833 2021/02/10 [Logger,LOG] Program Logger on host localhost started

09:24:28.598 2021/02/10 [fevme,LOG] Program fevme on host localhost started

09:24:33.951 2021/02/10 [mhttpd,INFO] Run #234 started

09:26:30.970 2021/02/10 [mhttpd,ERROR] [midas.cxx:4260:cm_transition_call,ERROR] 
Client "Logger" transition 2 aborted while waiting for client "fevme": 
"/Runinfo/Transition in progress" was cleared

09:26:31.015 2021/02/10 [mhttpd,ERROR] [midas.cxx:5120:cm_transition,ERROR] 
transition STOP aborted: "/Runinfo/Transition in progress" was cleared

09:27:27.270 2021/02/10 [mhttpd,ERROR] 
[system.cxx:4937:ss_recv_net_command,ERROR] timeout receiving network command 
header

09:27:27.270 2021/02/10 [mhttpd,ERROR] [midas.cxx:12262:rpc_client_call,ERROR] 
call to "fevme" on "localhost" RPC "rc_transition": timeout waiting for reply
    Reply  10 Feb 2021, Konstantin Olchanski, Forum, Javascript error during run transitions. 
> I am encountering a Javascript error (TypeError: client.error is undefined) when 
> I transition between run states. Does anybody have an idea of what my problem 
> might be? I have pasted an example of what MIDAS logs during such sequences.


Not enough information. Can you do this:

a) for the javascript error, if you get it every time, open the javascript debugger 
and capture the stack trace? or at least the file name, function name and line number 
where the javascript exception is thrown?

b) for the run start failure, start the run from odbedit "start now -v" or from 
"mtransition -v -d 1 START" (or "stop" as the case may be). capture the output, email 
to me directly or put in this elog here.

K.O.


> 
> Thanks for all the help!
> 
> Isaac
> 
> 
> 09:24:08.611 2021/02/10 [mhttpd,INFO] Executing script 
> "~/ANIS_20210106/scripts/start_daq.sh" from ODB "/Script/Start DAQ"
> 
> 09:24:13.833 2021/02/10 [Logger,LOG] Program Logger on host localhost started
> 
> 09:24:28.598 2021/02/10 [fevme,LOG] Program fevme on host localhost started
> 
> 09:24:33.951 2021/02/10 [mhttpd,INFO] Run #234 started
> 
> 09:26:30.970 2021/02/10 [mhttpd,ERROR] [midas.cxx:4260:cm_transition_call,ERROR] 
> Client "Logger" transition 2 aborted while waiting for client "fevme": 
> "/Runinfo/Transition in progress" was cleared
> 
> 09:26:31.015 2021/02/10 [mhttpd,ERROR] [midas.cxx:5120:cm_transition,ERROR] 
> transition STOP aborted: "/Runinfo/Transition in progress" was cleared
> 
> 09:27:27.270 2021/02/10 [mhttpd,ERROR] 
> [system.cxx:4937:ss_recv_net_command,ERROR] timeout receiving network command 
> header
> 
> 09:27:27.270 2021/02/10 [mhttpd,ERROR] [midas.cxx:12262:rpc_client_call,ERROR] 
> call to "fevme" on "localhost" RPC "rc_transition": timeout waiting for reply
       Reply  11 Feb 2021, Isaac Labrie Boulay, Forum, Javascript error during run transitions. start_now_-v.PNGCall_Stack_for_JavaScript_Error.PNG
> > I am encountering a Javascript error (TypeError: client.error is undefined) when 
> > I transition between run states. Does anybody have an idea of what my problem 
> > might be? I have pasted an example of what MIDAS logs during such sequences.
> 
> 
> Not enough information. Can you do this:
> 
> a) for the javascript error, if you get it every time, open the javascript debugger 
> and capture the stack trace? or at least the file name, function name and line number 
> where the javascript exception is thrown?

I've attached a screenshot of the call stack showing the file names and line numbers.

> b) for the run start failure, start the run from odbedit "start now -v" or from 
> "mtransition -v -d 1 START" (or "stop" as the case may be). capture the output, email 
> to me directly or put in this elog here.

I have also attached a screen capture of the output.

Thanks for your help as always.

Isaac

> K.O.
> 
> 
> > 
> > Thanks for all the help!
> > 
> > Isaac
> > 
> > 
> > 09:24:08.611 2021/02/10 [mhttpd,INFO] Executing script 
> > "~/ANIS_20210106/scripts/start_daq.sh" from ODB "/Script/Start DAQ"
> > 
> > 09:24:13.833 2021/02/10 [Logger,LOG] Program Logger on host localhost started
> > 
> > 09:24:28.598 2021/02/10 [fevme,LOG] Program fevme on host localhost started
> > 
> > 09:24:33.951 2021/02/10 [mhttpd,INFO] Run #234 started
> > 
> > 09:26:30.970 2021/02/10 [mhttpd,ERROR] [midas.cxx:4260:cm_transition_call,ERROR] 
> > Client "Logger" transition 2 aborted while waiting for client "fevme": 
> > "/Runinfo/Transition in progress" was cleared
> > 
> > 09:26:31.015 2021/02/10 [mhttpd,ERROR] [midas.cxx:5120:cm_transition,ERROR] 
> > transition STOP aborted: "/Runinfo/Transition in progress" was cleared
> > 
> > 09:27:27.270 2021/02/10 [mhttpd,ERROR] 
> > [system.cxx:4937:ss_recv_net_command,ERROR] timeout receiving network command 
> > header
> > 
> > 09:27:27.270 2021/02/10 [mhttpd,ERROR] [midas.cxx:12262:rpc_client_call,ERROR] 
> > call to "fevme" on "localhost" RPC "rc_transition": timeout waiting for reply
          Reply  25 Feb 2021, Konstantin Olchanski, Forum, Javascript error during run transitions. 
> 
> I have also attached a screen capture of the output.
> 

so the error is gone?

K.O.
Entry  18 Feb 2021, Pintaudi Giorgio, Bug Report, Unexpected end-of-file 
Hello!
Sometimes when I mess around with the history plots I get the following error:

[mhttpd,ERROR] [history.cxx:97:xread,ERROR] Error: Unexpected end-of-file when 
reading file "/home/wagasci-ana/Data/online/210219.hst"

I have tried the following without success:

- Remove the MIDAS history files
- Restart mhttpd and mlogger

I do not know what triggers the error but when it triggers the above message is 
printed hundres of times a second, completely spamming the message log.

It happened again today after I set the label of a frontend too long making 
mlogger crash. After fixing the label length, the above message appeared and it 
does not seem to go away.
    Reply  18 Feb 2021, Pintaudi Giorgio, Bug Report, Unexpected end-of-file Screenshot_from_2021-02-19_15-41-23.png
It appears that the issue is trigger by a nonexisting Event and Variable as shown 
in the attached picture. This issue can arise when restoring the ODB from a 
previous version or importing ODB values from other MIDAS instances.
It might be useful if the error message were more clear about the source of the 
problem.

> Hello!
> Sometimes when I mess around with the history plots I get the following error:
> 
> [mhttpd,ERROR] [history.cxx:97:xread,ERROR] Error: Unexpected end-of-file when 
> reading file "/home/wagasci-ana/Data/online/210219.hst"
> 
> I have tried the following without success:
> 
> - Remove the MIDAS history files
> - Restart mhttpd and mlogger
> 
> I do not know what triggers the error but when it triggers the above message is 
> printed hundres of times a second, completely spamming the message log.
> 
> It happened again today after I set the label of a frontend too long making 
> mlogger crash. After fixing the label length, the above message appeared and it 
> does not seem to go away.
       Reply  25 Feb 2021, Konstantin Olchanski, Bug Report, Unexpected end-of-file 
> > [mhttpd,ERROR] [history.cxx:97:xread,ERROR] Error: Unexpected end-of-file when 
> > reading file "/home/wagasci-ana/Data/online/210219.hst"

I am puzzled. We can try two things:

a) look inside the "bad" hst file, maybe we can see something. run "mhdump -L 
/home/wagasci-ana/Data/online/210219.hst". If there is anything wrong with the file, it 
will be probably at the end. You can also try to run it without "-L".

b) switch from "midas" history (.hst files) to "FILE" history (mh*.dat files), the 
"FILE" history code is newer and the file format is more robust, with luck it may 
survive whatever trouble is happening in your experiment. This is controlled in ODB 
/Logger/History/XXX/Active (set to "y/n").

c) the output of "mlogger -v" may give us some clue, it usually complains if something 
is not right with definitions of history data.

K.O.
Entry  25 Feb 2021, Lars Martin, Forum, TMFePollHandlerInterface timing 
Am I right in thinking that the TMFE HandlePoll function is calle once per 
PollMidas()? And what is the difference to HandleRead()?
    Reply  25 Feb 2021, Konstantin Olchanski, Forum, TMFePollHandlerInterface timing 
> Am I right in thinking that the TMFE HandlePoll function is calle once per 
> PollMidas()? And what is the difference to HandleRead()?

Actually, polled equipment is not implemented yet in TMFE, as you noted, the 
internal scheduler needs to be reworked.

Anyhow, I think with modern c++ and with threads, both "periodic" and "polled" 
equipments are not strictly necessary.

Periodic equipment is effectively this:

in a thread:
while (1) {
do stuff, read data, send events
sleep
}

Polled equipment is effectively this:

in a thread:
while (1) {
if (poll()) { read data, send events }
else { sleep for a little bit }
}

Example of such code is the "bulk" equipment in progs/fetest.cxx.

But to implement the same in a single threaded environment (eliminates
problems with data locking, race conditions, etc) and to provide additional
structure to the user code, the plan is to implement polled equipment in TMFE
frontends. (periodic equipment is already implemented).

K.O.
Entry  24 Feb 2021, Zaher Salman, Bug Report, history reload 
I have a history that is embedded in a custom page using

<div class="mjshistory" data-group="SampleCryo" data-panel="SampleTemp" data-scale="30m" style="'+size+' position: relative;left: 640px;top: -205px;"></div>

this works fine when I load the page but seems to cause a timeout when reloading (F5) the page. It used to work fine last year but since a midas update this year it does not work. 

When I manually stop the script when firefox reports that it is slowing down the browse I get the following in the console:

Script terminated by timeout at:
binarySearch@http://xxx.psi.ch:8081/mhistory.js:1051:11
MhistoryGraph.prototype.findMinMax@http://xxx.psi.ch:8081/mhistory.js:1583:28
MhistoryGraph.prototype.loadInitialData/<@http://xxx.psi.ch:8081/mhistory.js:780:15

any ideas what may be causing this?

thanks.

Another hint to the problem, the custom page is accessible via 
http://xxx.psi.ch:8081/?cmd=custom&page=SampleCryo&
once loaded the address changes to where A and B change values as time passes (I guess B-A=30m).
http://xxx.psi.ch:8081/?cmd=custom&page=SampleCryo&&A=1614173369&B=1614175169
    Reply  25 Feb 2021, Stefan Ritt, Bug Report, history reload 
I have to reproduce the problem. Can you please send me the full link by direct email. As you know, I'm also at PSI.

Stefan
Entry  12 Feb 2021, Konstantin Olchanski, Bug Report, mlogger history snafu 
there is a problem with mlogger between commits xxx (17 Nov 2020) and a762bb8 (12 feb 2021). because of 
confusion between seconds and milliseconds, FILE (mhf*.dat files) and SQL history are recording with 
incorrect timestamps.

- traditional MIDAS history (*.hst files) does not have this problem (because of a buglet)
- midas-2020-12 release does not have this problem (it has mlogger from midas-2020-08 release)

there are some additional changes in mlogger that we are sorting out, when ready, we will make a new 
release of midas.

K.O.
Entry  10 Feb 2021, Konstantin Olchanski, Release, midas-2020-12-a 
midas-2020-12-a is here, see https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12

notable change from previous midas releases:

Use of ODB "Common" by mfe.c frontends has changed. New preferred behaviour
is to have the values defined in the equipment structure in the source code
to always overwrite values in ODB /Equipment/Foo/Common, except for the value
of "Common/enabled" (equipment_common_overwrite set to TRUE).

All mfe.c frontends will need to be modified for this change:

- for old behaviour (use ODB "Common"), add: BOOL equipment_common_overwrite = false;
- for new behaviour (use equipment values in the source code), add: BOOL equipment_common_overwrite = true;

The TMFE C++ frontend does not implement this change yet, it uses all "Common" values from ODB
and there is no way to overwrite things like the MIDAS event buffer name from C++ code. This may
change with the next version.

notable updates since midas-2020-08:

- new ODB variable /Experiment/Enable sound can be used to globally prevent mhttpd from playing sounds.
- Lazylogger now supports writing data over SFTP.
- odbvalue elements on custom pages now support an onload() callback as well as onchange(). Most elements now also 
support a data-validate callback.
- custom pages can now tie a select drop-down box to an ODB value using modbselect.
- ability to choose whether the code or the current ODB values take precendence for the "Common" settings of an 
equipment when starting a frontend. See elog thread 2014 for more details, and the "Upgrade guide" below for 
instructions.
- minor improvements to mdump program - support for 64-bit data types and ability to load larger events if needed.
- minor improvements to History plots and Buffers webpage.
- bug fixes

plans for next development: major update of mlogger to simplify channel 
configuration in odb, improvements to mhttpd multithreading, new history plot 
configuration page, more c++ification.

To obtain this release, either checkout the top of branch release/midas-2020-12 
(recommended) or checkout the tag midas-2020-12-a.

K.O.
Entry  25 Jan 2021, Thomas Lindner, Suggestion, mhttpd browser caching 
I have a more subtle point about the new ODB key for using an external elog I mentioned in [1].  I was very confused after changing the ODB "External Elog" because mhttpd still wasn't using my external elog URL.  I started trying to debug mhttpd.cxx, but found a lot of bits of mhttpd didn't seem to be getting called.  I eventually realized that my browser had been caching the responses for some (though not all) of the MIDAS navigation buttons.  Clearing my browser cache fixed the problem and allowed me to use the MIDAS button to the external ELOG.  This caching happens on my macbook for both Firefox 84.0.2 and Safari 13.1.

Many of the requests to mhttpd end up going to send_fp(), where we explicitly set the cache time to 24 hours.

   // send HTTP cache control headers                                                                                               
   time_t now = time(NULL);
   now += (int) (3600 * 24);
   struct tm* gmt = gmtime(&now);
   const char* format = "%A, %d-%b-%y %H:%M:%S GMT";
   char str[256];
   strftime(str, sizeof(str), format, gmt);
   r->rsprintf("Expires: %s\r\n", str);

Some other MIDAS buttons don't seem to be cached by the browser; for instance the response for the 'OldHistory' button doesn't get cached.

Should we remove the cache instruction for at least some of the buttons?  At least for the elog button where we want the link direction to get switched by an ODB key the caching seems a bad idea.

[1] https://midas.triumf.ca/elog/Midas/2078
    Reply  25 Jan 2021, Stefan Ritt, Suggestion, mhttpd browser caching 
Let me first explain a bit why caching is there. Once we had the case that someone from 
TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 

When we looked at it, we found that the custom page pulled about 100 items with individual
HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
the custom page communication so that many ODB entries could be retrieved in one operation,
which improved the loading time from 100s to about 2s.

With the buttons we will have to make the same compromise. If we do not cache anything,
loading the midas status page over the Pacific takes many seconds. If we cache all, any
change on the midas side will not be reflected on the web page. So there is a compromise
to be made. I thought I designed it such that the side menu is cached locally, but when
the user presses "reload", then the full menu is fetched from the server. Of course one
has to remember this, so changing the ELOG URL or other things on the menu require a
reload (or wait a certain time for the cache to expire). So try again if that's working
for you. If not, I can visit it again and check if there is any bug.

If we go the route to disable the cache, better try this to T2K and see what you get before
we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
load times.

Best,
Stefan
       Reply  25 Jan 2021, Thomas Lindner, Suggestion, mhttpd browser caching 
I tried reloading the pages.  If I reloaded the actual elog page 

https://server.triumf.ca/?cmd=Elog

then it bypassed the cache and got the correct updated page from mhttpd.

However, if when I reloaded the status page

https://server.triumf.ca/?cmd=Status

and then clicked the Elog button then I just got the cached (old) page.  Admittedly reloading the status page doesn't make so much sense (once I thought about it), but it is what I tried first (I'm good at modelling unexpected user behaviour); so there is some risk that the user will try reloading the wrong page and will be stuck not getting the external elog page (until 24 hours runs out).

Anyway, I will update the documentation to note that you need to reload the elog page after changing this variable.  That's probably an adequate solution.

I certainly don't suggest getting rid of caching entirely.  I was trying to think whether there was a set of pages where it would make sense to disable the cache (like the elog page).  But maybe that will just cause more problems.


> Let me first explain a bit why caching is there. Once we had the case that someone from 
> TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 
> 
> When we looked at it, we found that the custom page pulled about 100 items with individual
> HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
> the custom page communication so that many ODB entries could be retrieved in one operation,
> which improved the loading time from 100s to about 2s.
> 
> With the buttons we will have to make the same compromise. If we do not cache anything,
> loading the midas status page over the Pacific takes many seconds. If we cache all, any
> change on the midas side will not be reflected on the web page. So there is a compromise
> to be made. I thought I designed it such that the side menu is cached locally, but when
> the user presses "reload", then the full menu is fetched from the server. Of course one
> has to remember this, so changing the ELOG URL or other things on the menu require a
> reload (or wait a certain time for the cache to expire). So try again if that's working
> for you. If not, I can visit it again and check if there is any bug.
> 
> If we go the route to disable the cache, better try this to T2K and see what you get before
> we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
> load times.
> 
> Best,
> Stefan
    Reply  08 Feb 2021, Konstantin Olchanski, Suggestion, mhttpd browser caching 
>    r->rsprintf("Expires: %s\r\n", str);

The best I can tell, none of this works in current browsers. with google-chrome,
I see it cache pretty much everything regardless of "expires", "no cache", etc
and anything else I tried.

Things like shift-<reload>, etc used to work to refresh the cache, but not any more.

So, I too, see confusing side-effects of caching, where I change something in ODB,
but "nothing happens". Then I scratch my head for 30 minutes until I remember
to open the javascript debugger where shift-<reload> (or is it ctrl-<reload>) actually works.

It seems that the only reliable way to bypass the browser cache is to add
a tag with a random number to the URL ("&ts=currenttime").

This is for HTTP GET requests. HTTP POST does not seem to be cached, so I do not worry
about this nonsense for json-rpc requests.

Perhaps we should do this random number trick for all user actions. User can
press buttons only so fast, we should be able to sustain the rate. Anything
loaded automatically or from a timer, we should allow caching.

BTW, things like midas.js are also cached, and it is common to see problems
after updating midas, where status.html is newly loaded, but midas.js is an old
stale version from cache.

Messy.

K.O.
       Reply  08 Feb 2021, Stefan Ritt, Suggestion, mhttpd browser caching 
> It seems that the only reliable way to bypass the browser cache is to add
> a tag with a random number to the URL ("&ts=currenttime").

Indeed that's the only reliable way to avoid caching across browsers. An alternative is

("&r=" + Math.random())

to add a random number.


> BTW, things like midas.js are also cached, and it is common to see problems
> after updating midas, where status.html is newly loaded, but midas.js is an old
> stale version from cache.

Reloading JavaScript file NOT from the cache is really tricky these days. I added a
special Google Chrome extension to clear my browser cache, which works reliably:

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn

Stefan
Entry  13 Jan 2021, Isaac Labrie Boulay, Forum, poll_event() is very slow. 
Hi all,

I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
There must be a way to make this faster. From what I understand, changing the poll 
time (500ms by default) won't affect the frequency of polling just the 'lam' 
period.

Any suggestions?

Thanks for your help!

Isaac

Hi,

What is the actual readout time, event size?
Do you have multiple equipment and of what type if any?

PAA
    Reply  13 Jan 2021, Konstantin Olchanski, Forum, poll_event() is very slow. 
> 
> I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
> Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
> There must be a way to make this faster. From what I understand, changing the poll 
> time (500ms by default) won't affect the frequency of polling just the 'lam' 
> period.
> 
> Any suggestions?
> 

You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
where all this "lam" and "poll" business is removed.

At the moment, there are two example programs using the C++ TMFE frontend,
single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).

K.O.
       Reply  15 Jan 2021, Isaac Labrie Boulay, Forum, poll_event() is very slow. 
> > 
> > I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
> > Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
> > There must be a way to make this faster. From what I understand, changing the poll 
> > time (500ms by default) won't affect the frequency of polling just the 'lam' 
> > period.
> > 
> > Any suggestions?
> > 
> 
> You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
> where all this "lam" and "poll" business is removed.
> 
> At the moment, there are two example programs using the C++ TMFE frontend,
> single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).
> 
> K.O.

Ok. I did not know that there was a C++ OOD frontend example in MIDAS. I'll take a look at 
it. Is there any documentation on it works?

Thanks for the support!

Isaac
    Reply  13 Jan 2021, Stefan Ritt, Forum, poll_event() is very slow. 
Something must be wrong on your side. If you take the example frontend under

midas/examples/experiment/frontend.cxx

and let it run to produce dummy events, you get about 90 Hz. This is because we have a

  ss_sleep(10);

in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
you get an event rate of about 500'000 Hz. So the framework is really quick.

Probably your routine which looks for a 'lam' takes really long and should be fixed.

Stefan
       Reply  14 Jan 2021, Pintaudi Giorgio, Forum, poll_event() is very slow. 
> Something must be wrong on your side. If you take the example frontend under
> 
> midas/examples/experiment/frontend.cxx
> 
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
> 
>   ss_sleep(10);
> 
> in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
> you get an event rate of about 500'000 Hz. So the framework is really quick.
> 
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
> 
> Stefan

Sorry if I am going off-topic but, because the ss_sleep function was mentioned here, I 
would like to take the chance and report an issue that I am having.

In all my slow control frontends, the CPU usage for each frontend is close to 100%. This 
means that each frontend is monopolizing a single core. When I did some profiling, I 
noticed that 99% of the time is spent inside the ss_sleep function. Now, I would expect 
that the ss_sleep function should not require any CPU usage at all or very little.

So my two questions are:
Is this a bug or a feature?
Would you able to check/reproduce this behavior or do you need additional info from my 
side?
       Reply  14 Jan 2021, Isaac Labrie Boulay, Forum, poll_event() is very slow. 
> Something must be wrong on your side. If you take the example frontend under
> 
> midas/examples/experiment/frontend.cxx
> 
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
> 
>   ss_sleep(10);
> 
> in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
> you get an event rate of about 500'000 Hz. So the framework is really quick.
> 
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
> 
> Stefan

Hi Stefan,

I should mention that I was using midas/examples/Triumf/c++/fevme.cxx. I was trying to see 
the max speed so I had the 'lam' always = 1 with nothing else to add overhead in the 
poll_event(). I was getting <200 Hz. I am assuming that this is a bug. There is no 
ss_sleep() in that function.

Thanks for your quick response!

Isaac
          Reply  08 Feb 2021, Konstantin Olchanski, Forum, poll_event() is very slow. 
> I should mention that I was using midas/examples/Triumf/c++/fevme.cxx

this is correct, the fevme frontend is written to do 100% CPU-busy polling.

there is several reasons for this:
- on our VME processors, we have 2 core CPUs, 1st core can poll the VME bus, 2nd core can run 
mfe.c and the ethernet transmitter.
- interrupts are expensive to use (in latency and in cpu use) because kernel handler has to call 
use handler, return back etc
- sub-millisecond sleep used to be expensive and unreliable (on 1-2GHz "core 1" and "core 2" 
CPUs running SL6 and SL7 era linux). As I understand, current linux and current 3+GHz CPUs can 
do reliable microsecond sleep.

K.O.
Entry  21 Jan 2021, Thomas Lindner, Info, Using external ELOG with newer mhttpd 
A warning, in case others have the same problem I had.

In the past you could configure mhttpd so that the 'Elog' button would redirect to an external ELOG server; to do this you only needed to create and set the ODB variable '/Elog/URL' to the URL of your external ELOG server.

But with the newer MIDAS you need to set two ODB variables:

* "/Elog/URL" needs to be set to the URL of the external ELOG.
* "/Elog/External Elog" needs to be set to 'y'

I hadn't noticed this and was confused why my Elog button wasn't working after upgrading MIDAS.

MIDAS documentation was updated to reflect this change:

https://midas.triumf.ca/MidasWiki/index.php/Electronic_Logbook_(ELOG)
Entry  09 Dec 2020, Frederik Wauters, Forum, history and variables confusion 
I have a fe, with 2 "equipments" (2 different types of LV supplies).

Equipment/../Setting has a "Names" key, with the actual channel names (ch1, ch2, ...) of the devices.

Equipment/../Variables has channel states, voltage, etc. 

I also write a separate midas bank for each supply.

When I turn the "Log History" on, 2 things happen which cause troubles:
1. It writes the data of both bank to both the /Equipment/(Device1/Device2)/Variables .
2. I have e.g. 4 channels. In the banks I write current and voltages, so 8 numbers. When I turn on the logger I get an "Array size mismatch" between names and the midas bank size.

The only way around this is to disable the history logging in the equipment, and set "virtual" History events?
    Reply  09 Dec 2020, Stefan Ritt, Forum, history and variables confusion 
First, the writing of banks is completely independent of the history system. Banks go to the log file only, 
while the history is only linked to the "Variables" section in the ODB.

Second, it's advisable to group similar equipment into one. Like if you have five power supplies powering
and experiment, you don't want to have five equipments Supply1, Supply2, ..., but only one equipment
"Power Supplies". In the frontend belonging to that equipment, you define a DEVICE_DRIVER list with
one entry for each power supply. If you interact with an mscb device, there are some helper functions
which simplify the definition of the equipment and which I can send you privately. So your device
driver looks a bit like the one attached.

If you cannot do that and absolutely want separate equipments, please post a complete ODB subtree of your
settings, and I can try to reproduce your problem.

Stefan

======================

DEVICE_DRIVER power_driver[] = {
   {"Power Supply 1", mscbdev, 0, NULL, DF_INPUT | DF_MULTITHREAD},
   {"Power Supply 2", mscbdev, 0, NULL, DF_INPUT | DF_MULTITHREAD},
   {"Power Supply 3", mscbdev, 0, NULL, DF_INPUT | DF_MULTITHREAD},
   {""}
};

...

INT frontend_init()
{
   mscb_define("mscbxxx.psi.ch", "Power Supplies", "Power Supply 1", power_driver, 1, 0, "Output 1", 0.1);
   mscb_define("mscbxxx.psi.ch", "Power Supplies", "Power Supply 1", power_driver, 1, 1, "Output 2", 0.1);
   ...
}

/*-- Function to define MSCB variables in a convenient way ---------*/

void mscb_define(const char *submaster, const char *equipment, const char *devname, 
                 DEVICE_DRIVER *driver, int address, unsigned char var_index, 
                 const char *name, double threshold)
{
   int i, dev_index, chn_index, chn_total;
   char str[256];
   float f_threshold;
   HNDLE hDB;

   cm_get_experiment_database(&hDB, NULL);

   if (submaster && submaster[0]) {
      sprintf(str, "/Equipment/%s/Settings/Devices/%s/Device", equipment, devname);
      db_set_value(hDB, 0, str, submaster, 32, 1, TID_STRING);
      sprintf(str, "/Equipment/%s/Settings/Devices/%s/Pwd", equipment, devname);
      db_set_value(hDB, 0, str, "meg", 32, 1, TID_STRING);
   }

   /* find device in device driver */
   for (dev_index=0 ; driver[dev_index].name[0] ; dev_index++)
      if (equal_ustring(driver[dev_index].name, devname))
         break;

   if (!driver[dev_index].name[0]) {
      cm_msg(MERROR, "mscb_define", "Device \"%s\" not present in device driver list", devname);
      return;
   }

   /* count total number of channels */
   for (i=chn_total=0 ; i<=dev_index ; i++)
      if (((driver[dev_index].flags & DF_INPUT) > 0 && (driver[i].flags & DF_INPUT)) ||
          ((driver[dev_index].flags & DF_OUTPUT) > 0 && (driver[i].flags & DF_OUTPUT)))
      chn_total += driver[i].channels;

   chn_index = driver[dev_index].channels;
   sprintf(str, "/Equipment/%s/Settings/Devices/%s/MSCB Address", equipment, devname);
   db_set_value_index(hDB, 0, str, &address, sizeof(int), chn_index, TID_INT, TRUE);
   sprintf(str, "/Equipment/%s/Settings/Devices/%s/MSCB Index", equipment, devname);
   db_set_value_index(hDB, 0, str, &var_index, sizeof(char), chn_index, TID_BYTE, TRUE);

   if (threshold != -1 && (driver[dev_index].flags & DF_INPUT) > 0) {
     sprintf(str, "/Equipment/%s/Settings/Update Threshold", equipment);
     f_threshold = (float) threshold;
     db_set_value_index(hDB, 0, str, &f_threshold, sizeof(float), chn_total, TID_FLOAT, TRUE);
   }

   if (name && name[0]) {
      sprintf(str, "/Equipment/%s/Settings/Names %s", equipment, devname);
      db_set_value_index(hDB, 0, str, name, 32, chn_total, TID_STRING, TRUE);
   }

   /* increment number of channels for this driver */
   driver[dev_index].channels++;
}
       Reply  10 Dec 2020, Frederik Wauters, Forum, history and variables confusion genesys.odb
I wanted to have a c++ style driver, e.g. a instance of a "PowerSupply" class. This was not compatible with the list of DEVICE_DRIVER structs, with needs a C function entry point with variable arguments. 

Anyways, I attach my odb. I believe the issue stands regardless of the specific design choice here. Setting the History Log flag copies the banks created to the "Variables" of every equipments initialized, leading to a mismatch between the names array, and the variables. Can be solved by not using FE history events, but Virtual, but the flag in the Equipment is confusing.  

Bank creation in readout function:

for(const auto& d: drivers)
{
...
...
  bk_create(pevent,bk_name, TID_FLOAT, (void **)&pdata);
...			
  std::vector<float> voltage = d->GetVoltage();
  std::vector<float> current = d->GetCurrent();
  for(channels)
  {
    *pdata++ = voltage.at(iChannel);
    *pdata++ = current.at(iChannel);
  }
  bk_close(pevent, pdata);
}
          Reply  11 Dec 2020, Frederik Wauters, Forum, history and variables confusion 
1. ok, so calling the same readout functions from different equipments is just a bad idea, my bad, no blame for Midas to write data from both bank to both odb trees ...

2. One needs the same amount of bank entries as the size of settings/names[] . Otherwise the "History Log" flag does not work. So just don`t us "names" but "channel names" or something. 

" Second, it's advisable to group similar equipment into one. Like if you have five power supplies powering
and experiment, you don't want to have five equipments Supply1, Supply2, ..., but only one equipment
"Power Supplies".  "

It would be nice if this also works with c++ style drivers, i.e. a instance of a class. I don`t now how one would give an entry point to the "DEVICE_DRIVER" struct then.



> I wanted to have a c++ style driver, e.g. a instance of a "PowerSupply" class. This was not compatible with the list of DEVICE_DRIVER structs, with needs a C function entry point with variable arguments. 
> 
> Anyways, I attach my odb. I believe the issue stands regardless of the specific design choice here. Setting the History Log flag copies the banks created to the "Variables" of every equipments initialized, leading to a mismatch between the names array, and the variables. Can be solved by not using FE history events, but Virtual, but the flag in the Equipment is confusing.  
> 
> Bank creation in readout function:
> 
> for(const auto& d: drivers)
> {
> ...
> ...
>   bk_create(pevent,bk_name, TID_FLOAT, (void **)&pdata);
> ...			
>   std::vector<float> voltage = d->GetVoltage();
>   std::vector<float> current = d->GetCurrent();
>   for(channels)
>   {
>     *pdata++ = voltage.at(iChannel);
>     *pdata++ = current.at(iChannel);
>   }
>   bk_close(pevent, pdata);
> }
             Reply  15 Dec 2020, Konstantin Olchanski, Forum, history and variables confusion 
I think you are facing several problems:

a) mlogger does not clearly explain what history names will be used for which entries
in /eq/xxx/variables. "mlogger -v" almost does it, but we also need
"mlogger -v -n" to "show what you will do, but do not do it yet".

b) the mfe.c and the device class driver structure is very dated, tries to "do c++ in c". If it works for you,
certainly use it, but if it confuses you (as it confuses me), it probably only takes a few lines of c++
to replace the whole thing (minus the actual device drivers, which is the meat of it).

It think today, you face a choice:
- invest some time to understand the old device driver framework
- invest some time to do it "by hand" in c++, write your own device drivers (or use 3rd party drivers or snarf the "c" drivers from midas). Use the TMFE C++ frontend class if you go this route.

I would estimate that both choices are about the same amount of work.

K.O.


> 1. ok, so calling the same readout functions from different equipments is just a bad idea, my bad, no blame for Midas to write data from both bank to both odb trees ...
> 
> 2. One needs the same amount of bank entries as the size of settings/names[] . Otherwise the "History Log" flag does not work. So just don`t us "names" but "channel names" or something. 
> 
> " Second, it's advisable to group similar equipment into one. Like if you have five power supplies powering
> and experiment, you don't want to have five equipments Supply1, Supply2, ..., but only one equipment
> "Power Supplies".  "
> 
> It would be nice if this also works with c++ style drivers, i.e. a instance of a class. I don`t now how one would give an entry point to the "DEVICE_DRIVER" struct then.
> 
> 
> 
> > I wanted to have a c++ style driver, e.g. a instance of a "PowerSupply" class. This was not compatible with the list of DEVICE_DRIVER structs, with needs a C function entry point with variable arguments. 
> > 
> > Anyways, I attach my odb. I believe the issue stands regardless of the specific design choice here. Setting the History Log flag copies the banks created to the "Variables" of every equipments initialized, leading to a mismatch between the names array, and the variables. Can be solved by not using FE history events, but Virtual, but the flag in the Equipment is confusing.  
> > 
> > Bank creation in readout function:
> > 
> > for(const auto& d: drivers)
> > {
> > ...
> > ...
> >   bk_create(pevent,bk_name, TID_FLOAT, (void **)&pdata);
> > ...			
> >   std::vector<float> voltage = d->GetVoltage();
> >   std::vector<float> current = d->GetCurrent();
> >   for(channels)
> >   {
> >     *pdata++ = voltage.at(iChannel);
> >     *pdata++ = current.at(iChannel);
> >   }
> >   bk_close(pevent, pdata);
> > }
                Reply  08 Jan 2021, Stefan Ritt, Forum, history and variables confusion 
We kind of agreed to rewrite the slow control system in C++. Each device will have its own driver derived from a common base class implementing the general communication. The reason we need a "system" and not only a "hand-written" driver is because we want:

- glue many device drivers together for a single equipment
- have a dedicated readout thread for every device, in order not to block other devices
- have a common error reporting scheme working with several threads
- being able to disable/enable individual devices without changing the history system each time
- having a common naming scheme for all devices (like "enforce" /Equipment/<name>/Settings/Names xxx) which is needed by the history system
- ...

Will see when we have time for that.

Stefan
Entry  06 Jan 2021, Isaac Labrie Boulay, Info, Recovering a corrupted ODB using odbinit. 
Hi all,

I am currently trying to recover my corrupted ODB using odbinit and I am still 
getting issues after doing 'odbinit --cleanup' and trying to reload the saved 
ODB (last.json). Here is the output:


************************************************
(odbinit cleanup) Note* the ERROR in system.cxx
************************************************


[caendaq@cu332 ANIS]$ odbinit --cleanup
Checking environment... experiment name is "ANIS", remote hostname is ""
Checking command line... experiment "ANIS", cleanup 1, dry_run 0, create_exptab                                
0, create_env 0
Checking MIDASSYS....../home/caendaq/packages/midas
Checking exptab... experiments defined in exptab file "/home/caendaq/ANIS/exptab                               
":
0: "ANIS" <-- selected experiment
 

Checking exptab... selected experiment "ANIS", experiment directory "/home/caend                               
aq/ANIS/"
 

Checking experiment directory "/home/caendaq/ANIS/"
Found existing ODB save file: "/home/caendaq/ANIS/.ODB.SHM"
 

Checking shared memory...
Deleting old ODB shared memory...
[system.cxx:1052:ss_shm_delete,ERROR] shm_unlink(/1001_ANIS_ODB__home_caendaq_AN                               
IS_) errno 2 (No such file or directory)
Good: no ODB shared memory
Deleting old ODB semaphore...
Deleting old ODB semaphore... create status 1, delete status 1
Preserving old ODB save file /home/caendaq/ANIS/.ODB.SHM" to "/home/caendaq/ANIS                               
/.ODB.SHM.1609951022"
 

Checking ODB size...
Requested ODB size is 0 bytes (0.00B)
ODB size file is "/home/caendaq/ANIS//.ODB_SIZE.TXT"
Saved ODB size from "/home/caendaq/ANIS//.ODB_SIZE.TXT" is 1048576 bytes (1.05MB                               
)
We will initialize ODB for experiment "ANIS" on host "" with size 1048576 bytes                                
(1.05MB)

 
Creating ODB...
Creating ODB... db_open_database() status 302
Saving ODB...
Saving ODB... db_close_database() status 1
Connecting to experiment...
 

Connected to ODB for experiment "ANIS" on host "" with size 1048576 bytes (1.05M                               
B)
Checking experiment name... status 1, found "ANIS"
Disconnecting from experiment...
 

Done
 

****************************************
(Loading the last copy of my ODB)
*************************************



[caendaq@cu332 data]$ odbedit
[local:ANIS:S]/>load last.json
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
[ODBEdit,INFO] Reloading RPC hosts access control list via hotlink callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
11:38:12 [ODBEdit,INFO] Reloading RPC hosts access control list via hotlink 
callback
 


**********************************************
(Now trying to run my frontend and analyzer)
*********************************************
 


[caendaq@cu332 ANIS]$ ./start_daq.sh

mlogger: no process found
fevme: no process found
manalyzer.exe: no process found
manalyzer_example_cxx.exe: no process found
roody: no process found
[ODBEdit,ERROR] [midas.cxx:6616:bm_open_buffer,ERROR] Buffer "SYSTEM" is 
corrupted, mismatch of buffer name in shared memory ""
 

11:38:30 [ODBEdit,ERROR] [midas.cxx:6616:bm_open_buffer,ERROR] Buffer "SYSTEM" 
is corrupted, mismatch of buffer name in shared memory ""
Becoming a daemon...
Becoming a daemon...
Please point your web browser to http://localhost:8081
To look at live histograms, run: roody -Hlocalhost
Or run: mozilla http://localhost:8081
[caendaq@cu332 ANIS]$ Frontend name          :     fevme
Event buffer size      :     1048576
User max event size    :     204800
User max frag. size    :     1048576
# of events per buffer :     5
 

Connect to experiment ANIS...

OK

[fevme,ERROR] [midas.cxx:6616:bm_open_buffer,ERROR] Buffer "SYSTEM" is 
corrupted, mismatch of buffer name in shared memory ""
[fevme,ERROR] [mfe.cxx:596:register_equipment,ERROR] Cannot open event buffer 
"SYSTEM" size 33554432, bm_open_buffer() status 219

Has anyone ever encountered these issues?

Thanks for your time.

Isaac
Entry  05 Jan 2021, Isaac Labrie Boulay, Bug Report, Logger: Disk nearly full. 
Hi all,

I've ran into a problem where my experiment gets interrupted with a message from 
the logger saying that my disk is nearly full. This does not make sense to me 
because I have deleted almost all the data files from my data directory. I'm 
guessing that somewhere the ODB perceives that the directory is full when in 
reality its not.

Here is the exact message:

[ODBEdit,INFO] Run #252 stopped

09:22:19 [Logger,TALK] disk nearly full, stopping the run

09:22:19 [Logger,ERROR] [mlogger.cxx:4475:log_write,ERROR] Disk '/home/caendaq/A       
NIS/data/run00252.mid.lz4' is almost full: 81 MiBytes free out of 922497 MiBytes       
, stopping the run

Does any body have a solution for this? Thanks so much.

Isaac
    Reply  06 Jan 2021, Stefan Ritt, Bug Report, Logger: Disk nearly full. 
The logger simple requests the disk free space level from the operating system in the same 
way as the "df" command does. Can you do a "df" on your system? I have seen that some file 
systems free up space not immediately if you delete files, but some times later (like 24h).

Stefan
       Reply  06 Jan 2021, Isaac Labrie Boulay, Bug Report, Logger: Disk nearly full. 
> The logger simple requests the disk free space level from the operating system in the same 
> way as the "df" command does. Can you do a "df" on your system? I have seen that some file 
> systems free up space not immediately if you delete files, but some times later (like 24h).
> 
> Stefan

Thanks Stefan. Yes the files were still held open by some processes. It's solved now.

Cheers.

Isaac
ELOG V3.1.4-2e1708b5