Back Midas Rome Roody Rootana
  Midas DAQ System, Page 70 of 154  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  1894   02 May 2020 Stefan RittForumTaking MIDAS beyond 64 clients
TRIUMF stayed quiet, probably they have other things to do.

I allowed myself to move the maximum number of clients back to its original value, in order not to break running experiments. 
This does not mean that the increase is a bad idea, we just have to be careful not to break running experiments. Let's discuss it
more thoroughly here before we make a decision in that direction.

Best regards,
Stefan
  1895   02 May 2020 Joseph McKennaForumTaking MIDAS beyond 64 clients
Thank you very much for feedback.

I am satisfied with not changing the 64 client limit. I will look at re-writing my frontend to spawn threads rather than 
processses. The load of my frontend is low, so I do not anticipate issues with a threaded implementation. 

In this threaded scenario, it will be a reasonable amount of time until ALPHA bumps into the 64 client limit.

If it avoids confusion, I am happy for my experimental branch 'experimental-beyond_64_clients' to be deleted.

Perhaps a item for future discussion would be for the odbinit program to be able to 'upgrade' the ODB and enable some backwards 
compatibility.

Thanks again
Joseph
  1896   02 May 2020 Stefan RittForumTaking MIDAS beyond 64 clients
> Perhaps a item for future discussion would be for the odbinit program to be able to 'upgrade' the ODB and enable some backwards 
> compatibility.

We had this discussion already a few times. There is an ODB version number (DATABSE_VERSION 3 in midas.h) which is intended for that. If we break teh 
binary compatibility, programs should complain "ODB version has changed, please run ...", then odbinit (written by KO) should have a well-defined 
procedure to upgrade existing ODBs by re-creating them, but keeping all old contents. This should be tested on a few systems.

Stefan
  1897   02 May 2020 Konstantin OlchanskiForumTaking MIDAS beyond 64 clients
> 
> Does the community here have strong opinions about increasing the 
> MAX_CLIENTS and MAX_RPC_CONNECTION limits? 
> Am I looking at this problem in a naive way?
> 

I think MAX_CLIENTS set at 64 is on the low side for today.

And in the past, we did have experiments that did not work without increasing MAX_CLIENTS. I 
think T2K/ND280 needed MAX_CLIENTS bumped to about 100 (200?).

If ALPHA needs MAX_CLIENTS bigger than the default 64, nothing stops the experiment
from changing this number in the local copy of MIDAS.

It is not necessary to change it in the central repository for everybody.

K.O.
  1898   02 May 2020 Konstantin OlchanskiForumTaking MIDAS beyond 64 clients
> > 
> > Does the community here have strong opinions about increasing the 
> > MAX_CLIENTS and MAX_RPC_CONNECTION limits? 
> > Am I looking at this problem in a naive way?
> > 

The issue is binary compatibility.

MIDAS has been binary compatible with itself for a long time, 20 years now, easily.

If we are to give this up, we must gain more than we lose.

On the technical level, bumping MAX_CLIENTS from 64 to 100 gives us nothing, Tomorrow an experiment
will come along asking for 101 clients. Any number you pick, it is too small for somebody. And MIDAS
already has a solution for this: edit midas.h, hit make, done.

If we are to break binary compatibility, we should go big. Remove these limits completely!

Move the MAX_CLIENTS & co fixed size arrays out of the headers in ODB and in event buffers, put
them where they can be resized as needed.

That's a binary-compatibility breaking solution I would vote for.

K.O.
  Draft   02 May 2020 Konstantin OlchanskiForumTaking MIDAS beyond 64 clients
> > > 
> > > Does the community here have strong opinions about increasing the 
> > > MAX_CLIENTS and MAX_RPC_CONNECTION limits? 
> > > Am I looking at this problem in a naive way?
> > > 

The issue is
  1899   02 May 2020 Konstantin OlchanskiForumTaking MIDAS beyond 64 clients
> > > 
> > > Does the community here have strong opinions about increasing the 
> > > MAX_CLIENTS and MAX_RPC_CONNECTION limits? 
> > > Am I looking at this problem in a naive way?
> > > 

The issue is: how to organize an experiment? how many frontends should I have?

There are two extremes:

- collect all data in 1 frontend (and today with c++ threads and c++ ring buffers, this is trivial)
- instantiate 1 frontend for each data source. (for example, ALPHA-g detector has 8 ADCs, 64 PWBs plus some
small fish. No that's wrong. Each ADC looks like 48 individual data sources, each PWB looks like 4 data sources,
so this would be 8*48+4*64=640 data sources, could be 640 frontends easily, plus small fish).

Which way is best? Every experiment is different, but consider simple things:

640 frontends writing into 1 event buffer will probably cause large contention for the event buffer lock. bad.
640 frontends running on a 4 core CPU will probably cause unhappiness in the OS. bad.
starting and stopping 640 frontends requires some scripting, monitoring that they all still run, etc. extra work. bad.
640 frontends on the midas status page? your cell phone web browser will explode. bad.

What I am saying is - arbitrary limits are good for you. Make you think about what is going on before throwing 
resources at the problem.

K.O.
  163   13 Oct 2004 Konstantin OlchanskiBug ReportTWIST upgrade bombed...
The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
crashes during shared memory data buffer accesses. I am looking into it and I
will add information as I figure things out. K.O.
  164   13 Oct 2004 Pierre-Andre AmaudruzBug ReportTWIST upgrade bombed...
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
where the new mevb scheme is explained.
Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
successfully. Data rate at the EventBuilder were measured about 50MB/s without the
logger and ~30MB/s with the logger.
  165   13 Oct 2004 Konstantin OlchanskiBug ReportTWIST upgrade bombed...
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

I traced buffer memory corruption to a logic error in system.c::ss_shm_open(). If
a .SHM file exists, it's size is used as the size of the sysv shared memory
segment, even if the requested shared memory size is bigger, but the caller of
ss_shm_open()  thinks it got all the requested memory. Eventually we try to use
the unallocated memory and crash. This is the proposed fix and I will commit it
after I retest the upgrade during the next few days.

[olchansk@send src]$ cvs diff -u system.c
olchansk@midas.psi.ch's password: 
Index: system.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/system.c,v
retrieving revision 1.83
diff -u -r1.83 system.c
--- system.c    4 Oct 2004 07:04:01 -0000       1.83
+++ system.c    14 Oct 2004 05:51:16 -0000
@@ -544,8 +544,14 @@
       } else {
          /* if file exists, retrieve its size */
          file_size = (INT) ss_file_size(file_name);
-         if (file_size > 0)
+         if (file_size > 0) {
+            if (file_size < size) {
+               cm_msg(MERROR, "ss_shm_open", "Shared memory segment \'%s\' size
%d is smaller than requested size %d. Please remove it and try
again",file_name,file_size,size);
+               return SS_NO_MEMORY;
+            }
+            
             size = file_size;
+         }
       }
 
       /* get the shared memory, create if not existing */

K.O.
  166   13 Oct 2004 Konstantin OlchanskiBug ReportTWIST upgrade bombed...
> > The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> > crashes during shared memory data buffer accesses. I am looking into it and I
> > will add information as I figure things out. K.O.
> 
> Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
> where the new mevb scheme is explained.
> Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
> successfully. Data rate at the EventBuilder were measured about 50MB/s without the
> logger and ~30MB/s with the logger.

It turns out that TWIST uses a private mevb.c. We will consider upgrading to the
standard one.

K.O.
  167   14 Oct 2004 Stefan RittBug ReportTWIST upgrade bombed...
Agree.

Once you did the modification, please check following situation: Create a fresh
ODB withe increased size ("odbedit -s 2000000" for example). Then check that the
other clients "adopt" this increased size. Note that some experiments need a
bigger ODB, and I don't want to have them recompile all clients, that's why the
code in ss_shm_open() can attach to a *larger* shared memory. However, it should
not matter to the process, since the ODB (or SYSTEM) shared memory size is
stored in the pheader->key_size and pheader->data_size of each participating
process. So they should never write beyond the limits defined in that header.
The size to ss_shm_open() is only a "hint" if the shared memory does not exist,
and is nowhere later used in the code.
  169   14 Oct 2004 Konstantin OlchanskiBug ReportTWIST upgrade bombed...
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.

On second try, it looks like we are in business- the first try did not work
because of two mistakes:

1) I did not delete *all* old .SHM files (.ODB.SHM, .SYSTEM.SHM, .YBUF1.SHM,
.YBUF2.SHM). I deleted ODB.SHM, so odb worked, but forgot about the data buffers
SYSTEM.SHM & co and ended up with segmentation faults and core dumps in the buffer
management code caused by a mismatch of the old-midas buffers and new-midas code.
2) while debugging these core dumps, I made an error in my test code, so even
after I deleted the old data buffers, things still did not work. Talk about
over-debugging a problem...

K.O.
  597   24 Jun 2009 Konstantin OlchanskiBug ReportTR_STARTABORT transition, mlogger duplicate event problem
> > > We have seen on several daq systems this problem: we start a run and observe that the number of 
> > > events written by mlogger to the output file is double the number of events actually collected. Upon 
> > > inspection of the output file, we see that every event is written twice. Restarting the run usually fixes 
> > > this problem.
> > 
> > mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).
> 
> There is a new problem: after an unsuccessful run start, the next run start bombs with the error "output file runNNN.mid already exists". One way around this is to 
> manually remove the useless data file, another is to bump up the run number. Better solution is to automatically erase the output file created by unsuccessful run 
> starts.

Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.

This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I 
plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.

svn rev 4514
K.O.
  599   25 Jun 2009 Stefan RittBug ReportTR_STARTABORT transition, mlogger duplicate event problem
> Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
> 
> This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I 
> plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
> 
> svn rev 4514
> K.O.

There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from 
2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works.
  601   25 Jun 2009 Konstantin OlchanskiBug ReportTR_STARTABORT transition, mlogger duplicate event problem
> > Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
> > 
> > This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I 
> > plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
> > 
> > svn rev 4514
> > K.O.
> 
> There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from 
> 2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works.

Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.

As of now, the only client that registers and receives this transition is mlogger.

I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).

K.O.
  602   25 Jun 2009 Stefan RittBug ReportTR_STARTABORT transition, mlogger duplicate event problem
> Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.
> 
> As of now, the only client that registers and receives this transition is mlogger.
> 
> I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).

Ok, then we are fine.
  883   06 May 2013 Konstantin OlchanskiInfoTRIUMF MIDAS page moved to DAQWiki
The MIDAS web page at TRIUMF (http://midas.triumf.ca) moved from the daq-plone site to the DAQWiki 
(MediaWiki) site. Links were updated, checked and corrected:
https://www.triumf.info/wiki/DAQwiki/index.php/MIDAS

Included is the link to our MIDAS installation instructions. These are more complete compared to the 
instructions in the MIDAS documentation:
https://www.triumf.info/wiki/DAQwiki/index.php/Setup_MIDAS_experiment
K.O.
  2942   12 Feb 2025 Mark GrimesForumTMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
Hi,
I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to 
Midas during online running.  At the end of each run it saves out custom data in the 
TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is 
never saved.  Is this intentional?  I understand that there is no point for the HandleBinaryRpc method offline, 
but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose.  Or is it a conscious 
choice to ignore all of TMFeRpcHandlerInterface when offline?

Thanks,

Mark.
  2945   26 Feb 2025 Thomas LindnerForumTMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
Hi,

Sorry, we have been slammed with a couple projects on the TRIUMF side in the past weeks and haven't found time for a 
response.  I am hopeful that we will be able to answer this question (and the cache size question) within the next 10 
days.  

Again apologies,
Thomas

> Hi,
> I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to 
> Midas during online running.  At the end of each run it saves out custom data in the 
> TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is 
> never saved.  Is this intentional?  I understand that there is no point for the HandleBinaryRpc method offline, 
> but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose.  Or is it a conscious 
> choice to ignore all of TMFeRpcHandlerInterface when offline?
> 
> Thanks,
> 
> Mark.
ELOG V3.1.4-2e1708b5