Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  27 Jul 2017, Wes Gohn, Suggestion, Increasing Max Number of Frontends 
    Reply  10 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends 
       Reply  12 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends 
          Reply  13 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends 
             Reply  13 Aug 2017, Konstantin Olchanski, Suggestion, Increasing Max Number of Frontends 
                Reply  13 Aug 2017, Stefan Ritt, Suggestion, Increasing Max Number of Frontends 
Message ID: 1314     Entry time: 13 Aug 2017     In reply to: 1313     Reply to this: 1315
Author: Stefan Ritt 
Topic: Suggestion 
Subject: Increasing Max Number of Frontends 
I agree that the binary compatibility checks are crucial. But I kind of find it strange if one gets an assert failure some where if one tries to change MAX_CLIENTS. It is then not straight 
forward to relate both things and understand the consequences. That's why I put a comment next to the definition of MAX_CLIENTS saying:

/* note that if you change any of the following items, the ODB and the event shared memory buffers 
   become binary incopatible and one has to recompile ALL programs which are locally connected to the 
   ODB and to event buffers */

I think this is more descriptive than just a failing assert. 

If you look carefully in my proposal below, you will see that I rather used

sizeof(INT)

and not 

sizeof(int)

since as KO stated correctly sizeof(int) can change between different architectures. The derived type INT (all uppercase) has been carefully designed to have 32 bits on all architectures. So 
it will NOT change between them. If it does change, then we have a principal problem and many more things will break down. We should therefore have something like

if (sizeof(INT) != 4) then severe_error_and_stop_all_programs()

Now given that sizeof(INT) is everywhere the same, we can use it in the test

sizeof(BUFFER_HEADER) == NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))

which then basically tests the structure byte alignment and padding. The comment above should warn users to change MAX_CLIENTS without thinking. 

Another strategy would be to put sizeof(BUFFER_HEADER) as the first two byes of the structure itself. We can the dynamically test the size of each bm_open_buffer(), and if the local size 
differs from the one saved in the buffer header, the program refuses to start, so we know exactly which program should have to be recompiled. The downside of this would be that the 
header structure has to be changed and we break binary compatibility with all existing programs. But maybe we should do this step once and be safe in the future.

Stefan


> The checks for byte sizes of critical data structures have been added to ensure (enforce) binary compatibility
> of midas with itself on different platforms (32-bit and 64-bit intel, on PPC, on ARM, etc).
> 
> This has worked well in the past and helped avoid problems and subtle bugs in the transition
> from 32-bit to 64-bit machines a few years ago. Of course now 32-bit machines are back
> as ARM CPUs and FPGA synthetic CPUs.
> 
> Replacing the checks with "computed" values will defeat this purpose because the values may be computed
> differently on different machines.
> 
> Specifically as proposed by Stefan, sizeof(int) can change depending on the target machine and depending
> on the compiler settings.
> 
> Of course this needs to be balanced against flexibility to adjust important settings like MAX_CLIENTS and MAX_EVENT_REQUESTS.
> 
> I would say the present system is just fine. You can change MAX_CLIENTS, rebuild MIDAS and it will not run (assert failure) giving
> you an indication that you are doing something non-trivial that will cause problems if you do it without thinking about it.
> 
> For example, one may think nothing of changing midas.h and recompiling MIDAS. But having to change odb.c
> may ring the little bell to tell you that you *also* have to rebuild *all* of your frontends. Even one unrebuilt frontend
> will corrupt all shared memory and crash everything.
> 
> I guess one other way to look at this is as a balance between something a few people do rarely against
> a function that protects everybody all the time.
> 
> That said, I think the checks should be reworked, instead of an assert failure they should give the error message
> and tell the user exactly what number to adjust in the size test. Also some checks are obsolete, there is no longer
> need to check the size of many ODB structures (equipment, etc). Once we are done with the db_get_record() rework,
> only checks for data structures in shared memory shall remain.
> 
> As the bottom line, to change MAX_CLIENTS, you already have to edit midas.h, asking you to also edit odb.c does
> not add much to the burden.
> 
> P.S. We are thinking how to make all these values dynamically changable, but basically it requires rolling out
> a new binary-incompatible version of MIDAS with added bugs. Maybe some day.
> 
> K.O.
> 
> 
> > The sizeof checks were originally invented by KO to check for binary compatibility between processes attached to the same ODB and event buffers. So if a 
> > compiler generates different structure sizes due to different padding, one would see that immediately. I wonder however if the absolute numbers make sense 
> > here. We could replace the 16444 by
> > 
> > NAME_LENGTH + 7*sizeof(INT) + MAX_CLIENTS *(NAME_LENGTH+13*sizeof(INT)+sizeof(float)+2*sizeof(DWORD)+MAX_EVENT_REQUESTS*4*sizeof(INT))
> > 
> > which makes this value automatically scale when one changes MAX_CLIENTS.
> > 
> > People of course have to be aware that if one changes MAX_CLIENTS, then all programs connected to the same ODB or event buffer need to be re-compiled 
> > and the ODB needs to be re-created from an ASCII file, but at least this would avoid tedious manual calculations.
> > 
> > Any opinion?
> > 
> > Stefan
> > 
> > 
> > > Below are the steps we used to increase the maximum number of frontends that we could run.
> > > 
> > > In midas.h
> > > 
> > > #define MAX_CLIENTS            64
> > > 
> > > changed to
> > > 
> > > #define MAX_CLIENTS            128
> > > 
> > > In msystem.h:
> > > 
> > > #define MAX_RPC_CONNECTION     64
> > > 
> > > changed to
> > > 
> > > #define MAX_RPC_CONNECTION     128
> > > 
> > > In odb.c:
> > > 
> > > assert(sizeof(BUFFER_HEADER) == 16444); 
> > > 
> > > GUESS: 256*64+60 = 16444, so change 64 to 128
> > > 
> > > changed to:                                                                                                                         
> > > 
> > > assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
> > > 
> > >  
> > > 
> > > DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
> > > 
> > > changed to:
> > > 
> > > DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832.
ELOG V3.1.4-2e1708b5