Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  01 May 2020, Joseph McKenna, Forum, Taking MIDAS beyond 64 clients 
    Reply  01 May 2020, Stefan Ritt, Forum, Taking MIDAS beyond 64 clients 
       Reply  01 May 2020, Pierre Gorel, Forum, Taking MIDAS beyond 64 clients 
          Reply  02 May 2020, Stefan Ritt, Forum, Taking MIDAS beyond 64 clients 
             Reply  02 May 2020, Joseph McKenna, Forum, Taking MIDAS beyond 64 clients 
                Reply  02 May 2020, Stefan Ritt, Forum, Taking MIDAS beyond 64 clients 
    Reply  02 May 2020, Konstantin Olchanski, Forum, Taking MIDAS beyond 64 clients 
       Reply  02 May 2020, Konstantin Olchanski, Forum, Taking MIDAS beyond 64 clients 
          Reply  02 May 2020, Konstantin Olchanski, Forum, Taking MIDAS beyond 64 clients 
Message ID: 1892     Entry time: 01 May 2020     In reply to: 1891     Reply to this: 1893
Author: Stefan Ritt 
Topic: Forum 
Subject: Taking MIDAS beyond 64 clients 
Hi Joseph,

here some thoughts from my side:

- Breaking ODB compatibility in the master/develop midas branch is very bad, since almost all experiments worldwide are affected if they just do blindly a pull and want to recompile and rerun. Currently, 
even during our Corona crisis, still some experiments are running and monitored remotely.

- On the other hand, if we have to break compatibility, now is maybe a good time since most accelerators worldwide are off. But before doing so, I would like to get feedback from the main experiments 
around the world (MEG, T2K, g-2, DEAP besides ALPHA).

- Having a maximum of 64 clients was originally decided when memory was scarce. In the early days one had just a couple of megabytes of share memory. Now this is not an issue any more, but I see 
another problem. The main status page gives a nice overview of the experiment. This only works because there is a limited number of midas clients and equipments. If we blow up to 1000+, the status 
page would be rather long and we have to scroll up and down forever. In such a scenario one would have at least to redesign the status and program pages. To start your experiment, you would have to 
click 1000 times to start each front-end, also not very practicable.

- Having 100's or 1000's of front-ends calls rather for a hierarchical design, like the LHC experiments have. That would be a major change of midas and cannot be done quickly. It would also result in 
much slower run start/stops.

- If you see limitations with your LabVIEW PCs, have you considered multi-threading on your front-ends? Note that the standard midas slow control system supports multithreaded devices 
(DF_MULTITHREAD). In MEG, we use about 800 microcontrollers via the MSCB protocol. They are grouped together and each group is a multithreaded device in the midas slow control lingo, meaning the 
group gets its own thread for control and readout in the midas frontend. This way, one group cannot slow down all other groups. There is one front-end for all groups, which can be started/stopped with 
a single click, it shows up just as one line in the status page, and still it's pretty fast. Have you considered such a scheme? So your LabVIEW PCs would not be individual front-ends, but just make a 
network connection to the midas front-end which then manages all LabVEIW PCs. The midas slow control system allows to define custom commands (besides the usual read/write command for slow 
control data), so you could maybe integrate all you need into that scheme.

Best,
Stefan

> 
> Hi all,
> I have been experimenting with a frontend solution for my experiment 
> (ALPHA). The intention to replace how we log data from PCs running LabVIEW. 
> I am at the proof of concept stage. So far I have some promising 
> performance, able to handle 10-100x more data in my test setup (current 
> limitations now are just network bandwith, MIDAS is impressively efficient). 
> ==========================================================================
> Our experiment has many PCs using LabVIEW which all log to MIDAS, the 
> experiment has grown such that we need some sort of load balancing in our 
> frontend.
> The concept was to have a 'supervisor frontend' and an array of 'worker 
> frontend' processes. 
> -A LabVIEW client would connect to the supervisor, then be referred to a 
> worker frontend for data logging. 
> -The supervisor could start a 'worker frontend' process as the demand 
> required.
> To increase accountability within the experiment, I intend to have a 'worker 
> frontend' per PC connecting. Then any rouge behavior would be clear from the 
> MIDAS frontpage.
> Presently there around 20-30 of these LabVIEW PCs, but given how the group 
> is growing, I want to be sure that my data logging solution will be viable 
> for the next 5-10 years. With the increased use of single board computers, I 
> chose the target of benchmarking upto 1000 worker frontends... but I quickly 
> hit the '64 MAX CLIENTS' and '64 RPC CONNECTION' limit. Ok...
> branching and updating these limits:
> https://bitbucket.org/tmidas/midas/branch/experimental-beyond_64_clients
> I have two commits. 
> 1. update the memory layout assertions and use MAX_CLIENTS as a variable
> https://bitbucket.org/tmidas/midas/commits/302ce33c77860825730ce48849cb810cf
> 366df96?at=experimental-beyond_64_clients
> 2. Change the MAX_CLIENTS and MAX_RPC_CONNECTION
> https://bitbucket.org/tmidas/midas/commits/f15642eea16102636b4a15c8411330969
> 6ce3df1?at=experimental-beyond_64_clients
> Unintended side effects:
> I break compatibility of existing ODB files... the database layout has 
> changed and I read my old ODB as corrupt. In my test setup I can start from 
> scratch but this would be horrible for any existing experiment.
> Edit: I noticed 'make testdiff' is failing... also fails lok
> Early performance results:
> In early tests, ~700 PCs logging 10 unique arrays of 10 doubles into 
> Equipment variables in the ODB seems to perform well... All transactions 
> from client PCs are finished within a couple of ms or less
> ==========================================================================
> Questions:
> Does the community here have strong opinions about increasing the 
> MAX_CLIENTS and MAX_RPC_CONNECTION limits? 
> Am I looking at this problem in a naive way?
> 
> Potential solutions other than increasing the MAX_CLIENTS limit:
> -Make worker threads inside the supervisor (not a separate process), I am 
> using TMFE, so I can dynamically create equipment. I have not yet taken a 
> deep dive into how any multithreading is implemented
> -One could have a round robin system to load balance between a limited pool 
> of 'worker frontend' proccesses. I don't like this solution as I want to 
> able to clearly see which client PCs have been setup to log too much data
> ==========================================================================
ELOG V3.1.4-2e1708b5