ID |
Date |
Author |
Topic |
Subject |
1894
|
02 May 2020 |
Stefan Ritt | Forum | Taking MIDAS beyond 64 clients |
TRIUMF stayed quiet, probably they have other things to do.
I allowed myself to move the maximum number of clients back to its original value, in order not to break running experiments.
This does not mean that the increase is a bad idea, we just have to be careful not to break running experiments. Let's discuss it
more thoroughly here before we make a decision in that direction.
Best regards,
Stefan |
1895
|
02 May 2020 |
Joseph McKenna | Forum | Taking MIDAS beyond 64 clients |
Thank you very much for feedback.
I am satisfied with not changing the 64 client limit. I will look at re-writing my frontend to spawn threads rather than
processses. The load of my frontend is low, so I do not anticipate issues with a threaded implementation.
In this threaded scenario, it will be a reasonable amount of time until ALPHA bumps into the 64 client limit.
If it avoids confusion, I am happy for my experimental branch 'experimental-beyond_64_clients' to be deleted.
Perhaps a item for future discussion would be for the odbinit program to be able to 'upgrade' the ODB and enable some backwards
compatibility.
Thanks again
Joseph |
1896
|
02 May 2020 |
Stefan Ritt | Forum | Taking MIDAS beyond 64 clients |
> Perhaps a item for future discussion would be for the odbinit program to be able to 'upgrade' the ODB and enable some backwards
> compatibility.
We had this discussion already a few times. There is an ODB version number (DATABSE_VERSION 3 in midas.h) which is intended for that. If we break teh
binary compatibility, programs should complain "ODB version has changed, please run ...", then odbinit (written by KO) should have a well-defined
procedure to upgrade existing ODBs by re-creating them, but keeping all old contents. This should be tested on a few systems.
Stefan |
1897
|
02 May 2020 |
Konstantin Olchanski | Forum | Taking MIDAS beyond 64 clients |
>
> Does the community here have strong opinions about increasing the
> MAX_CLIENTS and MAX_RPC_CONNECTION limits?
> Am I looking at this problem in a naive way?
>
I think MAX_CLIENTS set at 64 is on the low side for today.
And in the past, we did have experiments that did not work without increasing MAX_CLIENTS. I
think T2K/ND280 needed MAX_CLIENTS bumped to about 100 (200?).
If ALPHA needs MAX_CLIENTS bigger than the default 64, nothing stops the experiment
from changing this number in the local copy of MIDAS.
It is not necessary to change it in the central repository for everybody.
K.O. |
1898
|
02 May 2020 |
Konstantin Olchanski | Forum | Taking MIDAS beyond 64 clients |
> >
> > Does the community here have strong opinions about increasing the
> > MAX_CLIENTS and MAX_RPC_CONNECTION limits?
> > Am I looking at this problem in a naive way?
> >
The issue is binary compatibility.
MIDAS has been binary compatible with itself for a long time, 20 years now, easily.
If we are to give this up, we must gain more than we lose.
On the technical level, bumping MAX_CLIENTS from 64 to 100 gives us nothing, Tomorrow an experiment
will come along asking for 101 clients. Any number you pick, it is too small for somebody. And MIDAS
already has a solution for this: edit midas.h, hit make, done.
If we are to break binary compatibility, we should go big. Remove these limits completely!
Move the MAX_CLIENTS & co fixed size arrays out of the headers in ODB and in event buffers, put
them where they can be resized as needed.
That's a binary-compatibility breaking solution I would vote for.
K.O. |
Draft
|
02 May 2020 |
Konstantin Olchanski | Forum | Taking MIDAS beyond 64 clients |
> > >
> > > Does the community here have strong opinions about increasing the
> > > MAX_CLIENTS and MAX_RPC_CONNECTION limits?
> > > Am I looking at this problem in a naive way?
> > >
The issue is |
1899
|
02 May 2020 |
Konstantin Olchanski | Forum | Taking MIDAS beyond 64 clients |
> > >
> > > Does the community here have strong opinions about increasing the
> > > MAX_CLIENTS and MAX_RPC_CONNECTION limits?
> > > Am I looking at this problem in a naive way?
> > >
The issue is: how to organize an experiment? how many frontends should I have?
There are two extremes:
- collect all data in 1 frontend (and today with c++ threads and c++ ring buffers, this is trivial)
- instantiate 1 frontend for each data source. (for example, ALPHA-g detector has 8 ADCs, 64 PWBs plus some
small fish. No that's wrong. Each ADC looks like 48 individual data sources, each PWB looks like 4 data sources,
so this would be 8*48+4*64=640 data sources, could be 640 frontends easily, plus small fish).
Which way is best? Every experiment is different, but consider simple things:
640 frontends writing into 1 event buffer will probably cause large contention for the event buffer lock. bad.
640 frontends running on a 4 core CPU will probably cause unhappiness in the OS. bad.
starting and stopping 640 frontends requires some scripting, monitoring that they all still run, etc. extra work. bad.
640 frontends on the midas status page? your cell phone web browser will explode. bad.
What I am saying is - arbitrary limits are good for you. Make you think about what is going on before throwing
resources at the problem.
K.O. |
163
|
13 Oct 2004 |
Konstantin Olchanski | Bug Report | TWIST upgrade bombed... |
The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
crashes during shared memory data buffer accesses. I am looking into it and I
will add information as I figure things out. K.O. |
164
|
13 Oct 2004 |
Pierre-Andre Amaudruz | Bug Report | TWIST upgrade bombed... |
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.
Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
where the new mevb scheme is explained.
Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
successfully. Data rate at the EventBuilder were measured about 50MB/s without the
logger and ~30MB/s with the logger. |
165
|
13 Oct 2004 |
Konstantin Olchanski | Bug Report | TWIST upgrade bombed... |
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.
I traced buffer memory corruption to a logic error in system.c::ss_shm_open(). If
a .SHM file exists, it's size is used as the size of the sysv shared memory
segment, even if the requested shared memory size is bigger, but the caller of
ss_shm_open() thinks it got all the requested memory. Eventually we try to use
the unallocated memory and crash. This is the proposed fix and I will commit it
after I retest the upgrade during the next few days.
[olchansk@send src]$ cvs diff -u system.c
olchansk@midas.psi.ch's password:
Index: system.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/system.c,v
retrieving revision 1.83
diff -u -r1.83 system.c
--- system.c 4 Oct 2004 07:04:01 -0000 1.83
+++ system.c 14 Oct 2004 05:51:16 -0000
@@ -544,8 +544,14 @@
} else {
/* if file exists, retrieve its size */
file_size = (INT) ss_file_size(file_name);
- if (file_size > 0)
+ if (file_size > 0) {
+ if (file_size < size) {
+ cm_msg(MERROR, "ss_shm_open", "Shared memory segment \'%s\' size
%d is smaller than requested size %d. Please remove it and try
again",file_name,file_size,size);
+ return SS_NO_MEMORY;
+ }
+
size = file_size;
+ }
}
/* get the shared memory, create if not existing */
K.O. |
166
|
13 Oct 2004 |
Konstantin Olchanski | Bug Report | TWIST upgrade bombed... |
> > The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> > crashes during shared memory data buffer accesses. I am looking into it and I
> > will add information as I figure things out. K.O.
>
> Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
> where the new mevb scheme is explained.
> Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
> successfully. Data rate at the EventBuilder were measured about 50MB/s without the
> logger and ~30MB/s with the logger.
It turns out that TWIST uses a private mevb.c. We will consider upgrading to the
standard one.
K.O. |
167
|
14 Oct 2004 |
Stefan Ritt | Bug Report | TWIST upgrade bombed... |
Agree.
Once you did the modification, please check following situation: Create a fresh
ODB withe increased size ("odbedit -s 2000000" for example). Then check that the
other clients "adopt" this increased size. Note that some experiments need a
bigger ODB, and I don't want to have them recompile all clients, that's why the
code in ss_shm_open() can attach to a *larger* shared memory. However, it should
not matter to the process, since the ODB (or SYSTEM) shared memory size is
stored in the pheader->key_size and pheader->data_size of each participating
process. So they should never write beyond the limits defined in that header.
The size to ss_shm_open() is only a "hint" if the shared memory does not exist,
and is nowhere later used in the code. |
169
|
14 Oct 2004 |
Konstantin Olchanski | Bug Report | TWIST upgrade bombed... |
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.
On second try, it looks like we are in business- the first try did not work
because of two mistakes:
1) I did not delete *all* old .SHM files (.ODB.SHM, .SYSTEM.SHM, .YBUF1.SHM,
.YBUF2.SHM). I deleted ODB.SHM, so odb worked, but forgot about the data buffers
SYSTEM.SHM & co and ended up with segmentation faults and core dumps in the buffer
management code caused by a mismatch of the old-midas buffers and new-midas code.
2) while debugging these core dumps, I made an error in my test code, so even
after I deleted the old data buffers, things still did not work. Talk about
over-debugging a problem...
K.O. |
597
|
24 Jun 2009 |
Konstantin Olchanski | Bug Report | TR_STARTABORT transition, mlogger duplicate event problem |
> > > We have seen on several daq systems this problem: we start a run and observe that the number of
> > > events written by mlogger to the output file is double the number of events actually collected. Upon
> > > inspection of the output file, we see that every event is written twice. Restarting the run usually fixes
> > > this problem.
> >
> > mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).
>
> There is a new problem: after an unsuccessful run start, the next run start bombs with the error "output file runNNN.mid already exists". One way around this is to
> manually remove the useless data file, another is to bump up the run number. Better solution is to automatically erase the output file created by unsuccessful run
> starts.
Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
svn rev 4514
K.O. |
599
|
25 Jun 2009 |
Stefan Ritt | Bug Report | TR_STARTABORT transition, mlogger duplicate event problem |
> Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
>
> This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
> plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
>
> svn rev 4514
> K.O.
There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from
2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works. |
601
|
25 Jun 2009 |
Konstantin Olchanski | Bug Report | TR_STARTABORT transition, mlogger duplicate event problem |
> > Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
> >
> > This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
> > plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
> >
> > svn rev 4514
> > K.O.
>
> There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from
> 2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works.
Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.
As of now, the only client that registers and receives this transition is mlogger.
I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).
K.O. |
602
|
25 Jun 2009 |
Stefan Ritt | Bug Report | TR_STARTABORT transition, mlogger duplicate event problem |
> Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.
>
> As of now, the only client that registers and receives this transition is mlogger.
>
> I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).
Ok, then we are fine. |
883
|
06 May 2013 |
Konstantin Olchanski | Info | TRIUMF MIDAS page moved to DAQWiki |
The MIDAS web page at TRIUMF (http://midas.triumf.ca) moved from the daq-plone site to the DAQWiki
(MediaWiki) site. Links were updated, checked and corrected:
https://www.triumf.info/wiki/DAQwiki/index.php/MIDAS
Included is the link to our MIDAS installation instructions. These are more complete compared to the
instructions in the MIDAS documentation:
https://www.triumf.info/wiki/DAQwiki/index.php/Setup_MIDAS_experiment
K.O. |
2942
|
12 Feb 2025 |
Mark Grimes | Forum | TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file |
Hi,
I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
Midas during online running. At the end of each run it saves out custom data in the
TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
choice to ignore all of TMFeRpcHandlerInterface when offline?
Thanks,
Mark. |
2945
|
26 Feb 2025 |
Thomas Lindner | Forum | TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file |
Hi,
Sorry, we have been slammed with a couple projects on the TRIUMF side in the past weeks and haven't found time for a
response. I am hopeful that we will be able to answer this question (and the cache size question) within the next 10
days.
Again apologies,
Thomas
> Hi,
> I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
> Midas during online running. At the end of each run it saves out custom data in the
> TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
> never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
> but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
> choice to ignore all of TMFeRpcHandlerInterface when offline?
>
> Thanks,
>
> Mark. |