Back Midas Rome Roody Rootana
  Midas DAQ System, Page 6 of 145  Not logged in ELOG logo
    Reply  10 Jul 2016, Zhe Wang, Suggestion, Frontend crush on high event rate frontend.c
Dear friends,

In case anyone need the source code, it is attached. 
We use optic fiber to connect to a VME controler, which talks to V1751 via VME bus.

--
Zhe Wang

> Dear friends,
> 
> I may add a little more information.
> For polling event, we check the data-ready register for the status of the digitizer.
> In the readout routine, we create a bank, readout the data and write it out.
> 
> We commented out or made some replacement for each part of the subroutines to figure our where exactly goes wrong.
> for example, replace the readout from the digitizer with a random generation of some fake events.
> By replacing the readout by a random generation, the program runs fine and reach a very high event rates.
> 
> Any suggestions or ideas from experts?
> 
> Thank you very much.
> 
> --
> Best regards,
> Zhe Wang
> 
> 
> > Dear friends,
> > 
> > We have some questions on using midas.
> > We use a Caen digitizer V1751 to take waveforms.
> > When testing with caen provided programs, we roughly know it can work fine at 1000 Hz event rate, and 30 M/s data can be written to disk.
> > The test with Midas, however, is a little confusing. We use CAENDigitizer library with Midas. First, it works, data were taken, and there seems no error.
> > The only problem is we cannot go to a higher event rate, for example we can only work on a rate of 40 Hz, and only 3 M/s data recording. Otherwise it will crush.
> > 
> > We may miss something really simple. Would you please give some suggestions? for example, other people's discussions or documents?
> > 
> > Thank you very much.
    Reply  13 Jul 2016, Zhe Wang, Suggestion, Frontend crush on high event rate frontend.c
Somehow I don't understand why people's reply is only in my mail box.
So I pasted them here. I hope they don't mind and these information may be useful for others.

The following is some discussion.
==========================================================================================
> In read_trigger_event(), you creating a secondary bank with time in
> second. For your information, this time in second is already written in
> the event header. You can retrieve the time using macros from the
> midas.h   time = TIME_STAMP(pevent)

Removed.

>
> In frontend_init() you loop over NFADC (1) and call for each loop
> frontend_config() after opening the device on that card. In
> frontend_config() you redo a loop over NFADC, meaning that in case of
> more than one card you will find the second one not open on the first
> frontend_config (ok for one card though).
>

Corrected.

> In frontend_config() what is the return sCAEN from MallocReadoutBuffer()?
> What is the size of the requested allocated buffer?

The return size of allocated buffer is 134936.

>
> What is the value of the sCAEN from the ReadData() function in
> read_trigger_event()?

It is always 0 for success until it crashes.
However, even for the event it crashes, it also appears as 0.

>
> I didn't check all the config parameters!
>
> What is the value of count in the poll_event(). It is true if the test
> in poll_event() is too short, it cause timing corruption during
> calibration. 

Do you mean Midas timing calibration for poll_event() before all finally start up?
We havn't observed corruption at this stage.

> This never happen during CAMAC time... to be fixed!
> The alternative is to include a ss_sleep(1) instead of the prescale.
> a 1ms delay between every poll is short enough to ensure your 1KHz trigger.

We tried ss_sleep(1) in poll_event(), and it doesn't help.
We also tried add a ss_sleep(10) in the read_trigger_event().
This may work. But we can only reach 100 Hz and 1 MB/s rate. Still low.

>
> How long do you spend in the read_trigger_event()? To be measured.

We add some timers in this part of the program.
The time spent on CAEN_DGTZ_ReadData is about 100 us.
To sleep 1 ms in read_trigger_event may delay the crush, but just one minute.
To sleep 10 ms works.

>
> I still don't understand your setup as you mention using optic fiber to
> access the VME controller? do you have a A3818 or similar to the
> controller? If so why don't you connect directly the optic to the VX1751
> and prevent the use of the VME backplane?

Our connect is:
A2818 (PCI) - fiber - V2718 (Bridge) - VME - V1751
We probably need to configure other vme boards through VME at the same time,
however, these boards don't have a fiber connection.

We also tested direct fiber connect for V1751 today.
But it crashes with the same symptom.
==========================================================================================
    Reply  13 Jul 2016, Zhe Wang, Suggestion, Frontend crush on high event rate 
Suggestion from John and my reply.

> We have achieved very high rates, but only with some care.

> The biggest issue was to make sure when you compile the CAEN driver for the A3818 board that you turn on the MIDAS switch.  Without that problems occur with some 
> probability given by the number of bytes processed - which translates into very soon if you have a high rate.  (The underlying cause is that both MIDAS and the A3818
> use unix Alarm signals, but the CAEN folks have a compile option to turn this off.)

> We use as little as possible of the CAENDigitizerLibrary - instead we program the registers directly on the board.

> There is still some kind of memory leak which we have not yet tracked down, so every few hours we shut down the frontend then restart it. 

We use A2818 (PCI) - fiber - V2718 (Bridge) - VME - V1751.
I actually didn't find a MIDAS switch in the Makefile.
    Reply  13 Jul 2016, Zhe Wang, Suggestion, Frontend crush on high event rate 

More suggestions from John and my reply.

> we also don't use the VME back plane - it's just too slow - mixing VME commands to plain modules and digitizer modules is unreliable....

> We use CAEN fiberoptic version 2 to talk to the digitizers directly, we have upto 12 digitizers, and can use all channels for several hours, and can fill to about 75% 
of the A3818 bandwidth... 

So far we are limitted to 30 MB/s, if tested with CAEN examples, for example, the wavedump program by CAEN.
I think is kind of the limit by IDE hard drive.
Unfortunately we are still far from that limit, only ~ 1 MB/s now.  :(
Entry  09 Sep 2016, Amy Roberts, Suggestion, AJAX jmsg "get messages since t" ability - add to docs? 
I recently needed to watch the Midas messages for a particular error - and 
thus needed a command to "get all the messages since a time t".

The documentation (https://midas.triumf.ca/MidasWiki/index.php/AJAX#jmsg) 
documents a way to "get the most recent n messages" - but when I dug into the 
code, I was delighted to find that the existing Midas code also supports the 
"get all messages since t" query.

For the "get all messages since t" query, the parameter t should be the unix 
timestamp in seconds, and the parameter n should be zero: curl -X GET 
"http://localhost:8081/?cmd=jmsg&n=0&t=1473437918".

Pretty useful!  Perhaps this should be added to the AJAX documentation?
    Reply  09 Sep 2016, Suzannah Daviel, Suggestion, AJAX jmsg "get messages since t" ability - add to docs? 
> I recently needed to watch the Midas messages for a particular error - and 
> thus needed a command to "get all the messages since a time t".
> 
> The documentation (https://midas.triumf.ca/MidasWiki/index.php/AJAX#jmsg) 
> documents a way to "get the most recent n messages" - but when I dug into the 
> code, I was delighted to find that the existing Midas code also supports the 
> "get all messages since t" query.
> 
> For the "get all messages since t" query, the parameter t should be the unix 
> timestamp in seconds, and the parameter n should be zero: curl -X GET 
> "http://localhost:8081/?cmd=jmsg&n=0&t=1473437918".
> 
> Pretty useful!  Perhaps this should be added to the AJAX documentation?

Thank you - I have added it to the documentation.
    Reply  30 Sep 2016, Konstantin Olchanski, Suggestion, AJAX jmsg "get messages since t" ability - add to docs? 
> I recently needed to watch the Midas messages for a particular error - and 
> thus needed a command to "get all the messages since a time t".
> 
> The documentation (https://midas.triumf.ca/MidasWiki/index.php/AJAX#jmsg) 
> documents a way to "get the most recent n messages" - but when I dug into the 
> code, I was delighted to find that the existing Midas code also supports the 
> "get all messages since t" query.
> 
> For the "get all messages since t" query, the parameter t should be the unix 
> timestamp in seconds, and the parameter n should be zero: curl -X GET 
> "http://localhost:8081/?cmd=jmsg&n=0&t=1473437918".
> 
> Pretty useful!  Perhaps this should be added to the AJAX documentation?

The "jmsg" methods are obsolete - please use the JSON-RPC method "cm_msg_retrieve" as shown in resources/example.html. It takes all the same parameters as the midas.h 
cm_msg_retrieve() function, see the snipped from example.html below.

To see the full list of JSON-RPC methods, go to the "help" page and press the button for "json-rpc schema in text table format".

The entry for "cm_msg_retrieve" has this:

------------------------------------------------------------------------------------
cm_msg_retrieve?      | Retrieve midas messages using cm_msg_retrieve2()
                      | ------------------------------------------------------------
                      | params   | facility?           | string         | message facility, default is "midas"
                      |          | min_messages?       | integer        | get at least this many messages, default is 1
                      |          | time?               | number         | start from given timestamp, value 0 means give me newest messages, default is 0
                      | ------------------------------------------------------------
                      | result   | num_messages        | integer        | number of messages returned
                      |          | messages            | string         | messages separated by \n
                      |          | status              | integer        | return status of cm_msg_retrieve2()
------------------------------------------------------------------------------------

Snippet from resources/example.html: (to add "time" parameter, put "time":12345 next to "min_messages").

<input type=button value='Get last 10 midas messages'
          onClick='mjsonrpc_call("cm_msg_retrieve", { "min_messages": 10 })
                   .then(function(rpc) {
                   document.getElementById("cm_msg_retrieve_num_messages").innerHTML = JSON.stringify(rpc.result.num_messages);
                   document.getElementById("cm_msg_retrieve_messages").innerHTML = JSON.stringify(rpc.result.messages);
                   //mjsonrpc_debug_alert(rpc);
                   })
                   .catch(function(error) {
                   mjsonrpc_error_alert(error);
                   });'></input>
    Reply  30 Sep 2016, Konstantin Olchanski, Suggestion, Frontend crush on high event rate 
> 
> More suggestions from John and my reply.
> 
> > we also don't use the VME back plane - it's just too slow - mixing VME commands to plain modules and digitizer modules is unreliable....
> 
> > We use CAEN fiberoptic version 2 to talk to the digitizers directly, we have upto 12 digitizers, and can use all channels for several hours, and can fill to about 75% 
> of the A3818 bandwidth... 
> 
> So far we are limitted to 30 MB/s, if tested with CAEN examples, for example, the wavedump program by CAEN.
> I think is kind of the limit by IDE hard drive.
> Unfortunately we are still far from that limit, only ~ 1 MB/s now.  :(
>

From writing MIDAS frontends for many years, I am starting to form an opinion that this type of problem is undebuggable
in the current midas frontend framework - it is impossible to separate problems in vendor-supplied libraries and linux kernel modules
from problems with midas (i.e. incorrectly created data banks, too-small event buffers getting full) from problems with
bad interaction (collision over the SIGALARM handlers).

I am pondering on a new scheme for midas frontend writing. Perhaps such a new scheme should have a "no midas" mode where you can
compile and link a midas frontend "without midas", leaving you to debug just your code and the vendor code and their interactions.

K.O.
    Reply  05 Oct 2016, Lee Pool, Suggestion, checksums for midas data files 
Hi

> On one side, such checksums help me confirm that uncompressed data contents is the same as original 
> data (compression/decompression is okey).
> 

> I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or 
> both).
> 

Just a thought on my side. I have been using a checksum, on data produced  by our experiments via mlogger, the runxxxx.mid.gz, in 
the same manner you proposed and I see now implemented. 

I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in 
run00007.mid.gz.sha256 as an example.


$ cat ~/Data/run00007.mid.gz.sha256
f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz


It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.

May I suggest that the pairing should be ,

f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee  run00007.mid as an example.

As I find, this information will sit in an archive, database in my case for a long period, and it might
be confusing later on, when verification of the checksum is required.
    Reply  13 Oct 2016, Konstantin Olchanski, Suggestion, checksums for midas data files 
Confirmed, this is a bug in mlogger. It should be creating *2* files, one with the before-compression checksum and one with the after-compression checksum. At 
least both checksums are written to midas.log, so you can grep them from there. K.O.

> Hi
> 
> > On one side, such checksums help me confirm that uncompressed data contents is the same as original 
> > data (compression/decompression is okey).
> > 
> 
> > I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or 
> > both).
> > 
> 
> Just a thought on my side. I have been using a checksum, on data produced  by our experiments via mlogger, the runxxxx.mid.gz, in 
> the same manner you proposed and I see now implemented. 
> 
> I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in 
> run00007.mid.gz.sha256 as an example.
> 
> 
> $ cat ~/Data/run00007.mid.gz.sha256
> f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz
> 
> 
> It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.
> 
> May I suggest that the pairing should be ,
> 
> f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee  run00007.mid as an example.
> 
> As I find, this information will sit in an archive, database in my case for a long period, and it might
> be confusing later on, when verification of the checksum is required.
Entry  27 Feb 2017, William Moore, Suggestion, analyzer failing to load ODB parameters 
Hi,

I am attempting to compile and run analysis code on a completely different,
unconnected system than the DAQ computer for the experiment. The analyzer was
developed previously and my goal is to get it running and then update it to
achieve my needs. Before compiling the analyzer, I load a backup ODB file in
odbedit, and compile experim.h. I then compile the analyzer with that experim.h
file. When I run the analyzer I get the following output:

> MIDAS version 2.1ROOT version 5.34/36Root server listening on port 9090...
> Running analyzer offline. Stop with "!"
> Configuration file "/somedir/switches.odb" loaded
> [Analyzer,INFO] Set run number 1290 in ODB
> Load ODB from run 1290...[Analyzer,INFO] cannot load value "Client Notify":
write protected
> [Analyzer,INFO] cannot load value "Prompt": write protected
.
.
.
> [Analyzer,INFO] cannot load value "LANSCE-ops": write protected
> MIDAS version 2.1ROOT version 5.34/36OK
> Configuration file "/somedir/switches.odb" loaded
> Data_Raw/run01290.mid.gz:16355  Data_Analyzed/run01290.root:15208  events, 0.43s

I have confirmed all files being used have read/write access to all users. The
analyzer does populate a .root output file with filled histograms, however not
all histograms are filled. I believe this is because histograms that relied on
an ODB paramater that failed to load did not populate. Any idea as to what I am
doing wrong or how I could resolve this issue are greatly appreciated.

Thanks,
William Moore
    Reply  13 Mar 2017, Konstantin Olchanski, Suggestion, checksums for midas data files 
> Confirmed, this is a bug in mlogger. It should be creating *2* files, one with the before-compression checksum and one with the after-compression checksum. At 
> least both checksums are written to midas.log, so you can grep them from there. K.O.

This should be fixed now. Thank you for nudging me.

K.O.



> 
> > Hi
> > 
> > > On one side, such checksums help me confirm that uncompressed data contents is the same as original 
> > > data (compression/decompression is okey).
> > > 
> > 
> > > I can write the computed checksums into midas.log, or into runNNN.crc32, runNNN.sha256, etc files. (or 
> > > both).
> > > 
> > 
> > Just a thought on my side. I have been using a checksum, on data produced  by our experiments via mlogger, the runxxxx.mid.gz, in 
> > the same manner you proposed and I see now implemented. 
> > 
> > I have a slight, objection, if I may call it that, to how the checksum is saved to disk, in 
> > run00007.mid.gz.sha256 as an example.
> > 
> > 
> > $ cat ~/Data/run00007.mid.gz.sha256
> > f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee 650597 Data/run00007.mid.gz
> > 
> > 
> > It seems a little misleading to have the gzip'd filename paired with the checksum of the uncompressed content.
> > 
> > May I suggest that the pairing should be ,
> > 
> > f315af7caf6ca204cc082132862cb4227d77066cb60c6e2b1039d6dc5b04d1ee  run00007.mid as an example.
> > 
> > As I find, this information will sit in an archive, database in my case for a long period, and it might
> > be confusing later on, when verification of the checksum is required.
Entry  05 Apr 2017, Andreas Suter, Suggestion, nicer header?! 
We use the customHeader to display some useful information. Currently I do not
like its style. What about to make it more alike the footer?

I just changed in resources/mhttpd.css

diff --git a/resources/mhttpd.css b/resources/mhttpd.css
index fb0070d..f3264c8 100644
--- a/resources/mhttpd.css
+++ b/resources/mhttpd.css
@@ -280,6 +280,15 @@ table.headerTable td{
        border: none;
 }
 
+div.headerDiv{
+       background-color: #6F6F6F;
+       text-align: center;
+       padding:1em;
+       color:#EEEEEE;
+       border-bottom:1px solid #000000;
+       height:3em;
+}
+
 div.footerDiv{
        background-color: #808080;
        text-align: center;

and

diff --git a/resources/mhttpd.js b/resources/mhttpd.js
index de8bc6c..972c261 100644
--- a/resources/mhttpd.js
+++ b/resources/mhttpd.js
@@ -172,7 +172,7 @@ function mhttpd_goto_page(page) {
 
 function mhttpd_navigation_bar(current_page, path)
 {
-   document.write("<div id=\"customHeader\">\n");
+   document.write("<div class=\"headerDiv\" id=\"customHeader\">\n");
    document.write("</div>\n");
 
    document.write("<div class=\"mnavcss\">\n");

What do you think?
    Reply  05 Apr 2017, Stefan Ritt, Suggestion, nicer header?! 
In my opinion this makes sense. If KO agrees, you should commit your change.

Stefan

> We use the customHeader to display some useful information. Currently I do not
> like its style. What about to make it more alike the footer?
> 
> I just changed in resources/mhttpd.css
> 
> diff --git a/resources/mhttpd.css b/resources/mhttpd.css
> index fb0070d..f3264c8 100644
> --- a/resources/mhttpd.css
> +++ b/resources/mhttpd.css
> @@ -280,6 +280,15 @@ table.headerTable td{
>         border: none;
>  }
>  
> +div.headerDiv{
> +       background-color: #6F6F6F;
> +       text-align: center;
> +       padding:1em;
> +       color:#EEEEEE;
> +       border-bottom:1px solid #000000;
> +       height:3em;
> +}
> +
>  div.footerDiv{
>         background-color: #808080;
>         text-align: center;
> 
> and
> 
> diff --git a/resources/mhttpd.js b/resources/mhttpd.js
> index de8bc6c..972c261 100644
> --- a/resources/mhttpd.js
> +++ b/resources/mhttpd.js
> @@ -172,7 +172,7 @@ function mhttpd_goto_page(page) {
>  
>  function mhttpd_navigation_bar(current_page, path)
>  {
> -   document.write("<div id=\"customHeader\">\n");
> +   document.write("<div class=\"headerDiv\" id=\"customHeader\">\n");
>     document.write("</div>\n");
>  
>     document.write("<div class=\"mnavcss\">\n");
> 
> What do you think?
    Reply  15 Apr 2017, Konstantin Olchanski, Suggestion, nicer header?! 
> In my opinion this makes sense. If KO agrees, you should commit your change.

Please go ahead (sorry for slow reply). I have no idea what this change does. A screenshot of "before" 
and "after" would be nice. The reason I ask is:

note that I am getting rid of the css hell in mhttpd.css. all the new pages will be using the simplified css 
rules in midas.css.

the main change is: the new css rules only change the appearance of html elements that request the 
"midas look" and one can still use the normal html formatting if desired. The old css changed all (and I 
do mean *all*) html elements, making it impossible to write custom web pages using common examples 
from the web - the insane formatting from mhttpd.css was applied to everything indiscriminantly, i.e. h1, 
h2, h3 all look the same.

K.O.


> 
> Stefan
> 
> > We use the customHeader to display some useful information. Currently I do not
> > like its style. What about to make it more alike the footer?
> > 
> > I just changed in resources/mhttpd.css
> > 
> > diff --git a/resources/mhttpd.css b/resources/mhttpd.css
> > index fb0070d..f3264c8 100644
> > --- a/resources/mhttpd.css
> > +++ b/resources/mhttpd.css
> > @@ -280,6 +280,15 @@ table.headerTable td{
> >         border: none;
> >  }
> >  
> > +div.headerDiv{
> > +       background-color: #6F6F6F;
> > +       text-align: center;
> > +       padding:1em;
> > +       color:#EEEEEE;
> > +       border-bottom:1px solid #000000;
> > +       height:3em;
> > +}
> > +
> >  div.footerDiv{
> >         background-color: #808080;
> >         text-align: center;
> > 
> > and
> > 
> > diff --git a/resources/mhttpd.js b/resources/mhttpd.js
> > index de8bc6c..972c261 100644
> > --- a/resources/mhttpd.js
> > +++ b/resources/mhttpd.js
> > @@ -172,7 +172,7 @@ function mhttpd_goto_page(page) {
> >  
> >  function mhttpd_navigation_bar(current_page, path)
> >  {
> > -   document.write("<div id=\"customHeader\">\n");
> > +   document.write("<div class=\"headerDiv\" id=\"customHeader\">\n");
> >     document.write("</div>\n");
> >  
> >     document.write("<div class=\"mnavcss\">\n");
> > 
> > What do you think?
Entry  27 Jul 2017, Wes Gohn, Suggestion, Increasing Max Number of Frontends 
Below are the steps we used to increase the maximum number of frontends that we could run.

In midas.h

#define MAX_CLIENTS            64

changed to

#define MAX_CLIENTS            128

In msystem.h:

#define MAX_RPC_CONNECTION     64

changed to

#define MAX_RPC_CONNECTION     128

In odb.c:

assert(sizeof(BUFFER_HEADER) == 16444); 

GUESS: 256*64+60 = 16444, so change 64 to 128

changed to:                                                                                                                         

assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60

 

DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448

changed to:

DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832.
    Reply  10 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends 
The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a 
compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense 
here. We could replace the 16444 by

NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))

which makes this value automatically scale when one changes MAX_CLIENTS.

People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled 
and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.

Any opinion?

Stefan


> Below are the steps we used to increase the maximum number of frontends that we could run.
> 
> In midas.h
> 
> #define MAX_CLIENTS            64
> 
> changed to
> 
> #define MAX_CLIENTS            128
> 
> In msystem.h:
> 
> #define MAX_RPC_CONNECTION     64
> 
> changed to
> 
> #define MAX_RPC_CONNECTION     128
> 
> In odb.c:
> 
> assert(sizeof(BUFFER_HEADER) == 16444); 
> 
> GUESS: 256*64+60 = 16444, so change 64 to 128
> 
> changed to:                                                                                                                         
> 
> assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> 
>  
> 
> DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> 
> changed to:
> 
> DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832.
    Reply  12 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends 
The checks for byte sizes of critical data structures have been added to ensure (enforce) binary compatibility
of midas with itself on different platforms (32-bit and 64-bit intel, on PPC, on ARM, etc).

This has worked well in the past and helped avoid problems and subtle bugs in the transition
from 32-bit to 64-bit machines a few years ago. Of course now 32-bit machines are back
as ARM CPUs and FPGA synthetic CPUs.

Replacing the checks with "computed" values will defeat this purpose because the values may be computed
differently on different machines.

Specifically as proposed by Stefan, sizeof(int) can change depending on the target machine and depending
on the compiler settings.

Of course this needs to be balanced against flexibility to adjust important settings like MAX_CLIENTS and MAX_EVENT_REQUESTS.

I would say the present system is just fine. You can change MAX_CLIENTS, rebuild MIDAS and it will not run (assert failure) giving
you an indication that you are doing something non-trivial that will cause problems if you do it without thinking about it.

For example, one may think nothing of changing midas.h and recompiling MIDAS. But having to change odb.c
may ring the little bell to tell you that you *also* have to rebuild *all* of your frontends. Even one unrebuilt frontend
will corrupt all shared memory and crash everything.

I guess one other way to look at this is as a balance between something a few people do rarely against
a function that protects everybody all the time.

That said, I think the checks should be reworked, instead of an assert failure they should give the error message
and tell the user exactly what number to adjust in the size test. Also some checks are obsolete, there is no longer
need to check the size of many ODB structures (equipment, etc). Once we are done with the db_get_record() rework,
only checks for data structures in shared memory shall remain.

As the bottom line, to change MAX_CLIENTS, you already have to edit midas.h, asking you to also edit odb.c does
not add much to the burden.

P.S. We are thinking how to make all these values dynamically changable, but basically it requires rolling out
a new binary-incompatible version of MIDAS with added bugs. Maybe some day.

K.O.


> The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a 
> compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense 
> here. We could replace the 16444 by
> 
> NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
> 
> which makes this value automatically scale when one changes MAX_CLIENTS.
> 
> People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled 
> and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
> 
> Any opinion?
> 
> Stefan
> 
> 
> > Below are the steps we used to increase the maximum number of frontends that we could run.
> > 
> > In midas.h
> > 
> > #define MAX_CLIENTS            64
> > 
> > changed to
> > 
> > #define MAX_CLIENTS            128
> > 
> > In msystem.h:
> > 
> > #define MAX_RPC_CONNECTION     64
> > 
> > changed to
> > 
> > #define MAX_RPC_CONNECTION     128
> > 
> > In odb.c:
> > 
> > assert(sizeof(BUFFER_HEADER) == 16444); 
> > 
> > GUESS: 256*64+60 = 16444, so change 64 to 128
> > 
> > changed to:                                                                                                                         
> > 
> > assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> > 
> >  
> > 
> > DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> > 
> > changed to:
> > 
> > DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832.
    Reply  13 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends 
I agree that the binary compatibility checks are crucial. But I kind of find it strange if one gets an assert failure some where if one tries to change MAX_CLIENTS. It is then not straight 
forward to relate both things and understand the consequences. That's why I put a comment next to the definition of MAX_CLIENTS saying:

/* note that if you change any of the following items, the ODB and the event shared memory buffers 
   become binary incopatible and one has to recompile ALL programs which are locally connected to the 
   ODB and to event buffers */

I think this is more descriptive than just a failing assert. 

If you look carefully in my proposal below, you will see that I rather used

sizeof(INT)

and not 

sizeof(int)

since as KO stated correctly sizeof(int) can change between different architectures. The derived type INT (all uppercase) has been carefully designed to have 32 bits on all architectures. So 
it will NOT change between them. If it does change, then we have a principal problem and many more things will break down. We should therefore have something like

if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()

Now given that sizeof(INT) is everywhere the same, we can use it in the test

sizeof(BUFFER_HEADER) == NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))

which then basically tests the structure byte alignment and padding. The comment above should warn users to change MAX_CLIENTS without thinking. 

Another strategy would be to put sizeof(BUFFER_HEADER) as the first two byes of the structure itself. We can the dynamically test the size of each bm_open_buffer(), and if the local size 
differs from the one saved in the buffer header, the program refuses to start, so we know exactly which program should have to be recompiled. The downside of this would be that the 
header structure has to be changed and we break binary compatibility with all existing programs. But maybe we should do this step once and be safe in the future.

Stefan


> The checks for byte sizes of critical data structures have been added to ensure (enforce) binary compatibility
> of midas with itself on different platforms (32-bit and 64-bit intel, on PPC, on ARM, etc).
> 
> This has worked well in the past and helped avoid problems and subtle bugs in the transition
> from 32-bit to 64-bit machines a few years ago. Of course now 32-bit machines are back
> as ARM CPUs and FPGA synthetic CPUs.
> 
> Replacing the checks with "computed" values will defeat this purpose because the values may be computed
> differently on different machines.
> 
> Specifically as proposed by Stefan, sizeof(int) can change depending on the target machine and depending
> on the compiler settings.
> 
> Of course this needs to be balanced against flexibility to adjust important settings like MAX_CLIENTS and MAX_EVENT_REQUESTS.
> 
> I would say the present system is just fine. You can change MAX_CLIENTS, rebuild MIDAS and it will not run (assert failure) giving
> you an indication that you are doing something non-trivial that will cause problems if you do it without thinking about it.
> 
> For example, one may think nothing of changing midas.h and recompiling MIDAS. But having to change odb.c
> may ring the little bell to tell you that you *also* have to rebuild *all* of your frontends. Even one unrebuilt frontend
> will corrupt all shared memory and crash everything.
> 
> I guess one other way to look at this is as a balance between something a few people do rarely against
> a function that protects everybody all the time.
> 
> That said, I think the checks should be reworked, instead of an assert failure they should give the error message
> and tell the user exactly what number to adjust in the size test. Also some checks are obsolete, there is no longer
> need to check the size of many ODB structures (equipment, etc). Once we are done with the db_get_record() rework,
> only checks for data structures in shared memory shall remain.
> 
> As the bottom line, to change MAX_CLIENTS, you already have to edit midas.h, asking you to also edit odb.c does
> not add much to the burden.
> 
> P.S. We are thinking how to make all these values dynamically changable, but basically it requires rolling out
> a new binary-incompatible version of MIDAS with added bugs. Maybe some day.
> 
> K.O.
> 
> 
> > The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a 
> > compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense 
> > here. We could replace the 16444 by
> > 
> > NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
> > 
> > which makes this value automatically scale when one changes MAX_CLIENTS.
> > 
> > People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled 
> > and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
> > 
> > Any opinion?
> > 
> > Stefan
> > 
> > 
> > > Below are the steps we used to increase the maximum number of frontends that we could run.
> > > 
> > > In midas.h
> > > 
> > > #define MAX_CLIENTS            64
> > > 
> > > changed to
> > > 
> > > #define MAX_CLIENTS            128
> > > 
> > > In msystem.h:
> > > 
> > > #define MAX_RPC_CONNECTION     64
> > > 
> > > changed to
> > > 
> > > #define MAX_RPC_CONNECTION     128
> > > 
> > > In odb.c:
> > > 
> > > assert(sizeof(BUFFER_HEADER) == 16444); 
> > > 
> > > GUESS: 256*64+60 = 16444, so change 64 to 128
> > > 
> > > changed to:                                                                                                                         
> > > 
> > > assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> > > 
> > >  
> > > 
> > > DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> > > 
> > > changed to:
> > > 
> > > DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832.
    Reply  13 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends 
> if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()

Quick reply.

Today, for fixed size data types one should use uint32_t & co, see
stdint.h
https://en.wikipedia.org/wiki/C_data_types#stdint.h
https://en.wikipedia.org/wiki/C99 (scroll down and click to open "implementation -> compiler support"

The other popular convention is "u32" used by the Linux kernel, you will see it in the linux kernel drivers.

If I remember right, WORD and DWORD grow legs from the 16-bit Motorolla 68xxx processors,
VxWorks and the VME bus. At some point the data buses were 16-bit wide and that we the WORD.

(I do not think UNIX ever used the WORD/DWORD names, i.e. MacOS has int32_t and u_int32_t).

K.O.
ELOG V3.1.4-2e1708b5