13 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed...
|
> > The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> > crashes during shared memory data buffer accesses. I am looking into it and I
> > will add information as I figure things out. K.O.
>
> Since 1.9.5 the EventBuilder has been modified. Please consult the documentation
> where the new mevb scheme is explained.
> Test of the mevb with up to 16 frontends (15 different CPUs) has been tested
> successfully. Data rate at the EventBuilder were measured about 50MB/s without the
> logger and ~30MB/s with the logger.
It turns out that TWIST uses a private mevb.c. We will consider upgrading to the
standard one.
K.O. |
14 Oct 2004, Stefan Ritt, Bug Report, TWIST upgrade bombed...
|
Agree.
Once you did the modification, please check following situation: Create a fresh
ODB withe increased size ("odbedit -s 2000000" for example). Then check that the
other clients "adopt" this increased size. Note that some experiments need a
bigger ODB, and I don't want to have them recompile all clients, that's why the
code in ss_shm_open() can attach to a *larger* shared memory. However, it should
not matter to the process, since the ODB (or SYSTEM) shared memory size is
stored in the pheader->key_size and pheader->data_size of each participating
process. So they should never write beyond the limits defined in that header.
The size to ss_shm_open() is only a "hint" if the shared memory does not exist,
and is nowhere later used in the code. |
14 Oct 2004, Konstantin Olchanski, Bug Report, TWIST upgrade bombed...
|
> The upgrade of TWIST to the latest midas has bombed- we see mevb and mlogger
> crashes during shared memory data buffer accesses. I am looking into it and I
> will add information as I figure things out. K.O.
On second try, it looks like we are in business- the first try did not work
because of two mistakes:
1) I did not delete *all* old .SHM files (.ODB.SHM, .SYSTEM.SHM, .YBUF1.SHM,
.YBUF2.SHM). I deleted ODB.SHM, so odb worked, but forgot about the data buffers
SYSTEM.SHM & co and ended up with segmentation faults and core dumps in the buffer
management code caused by a mismatch of the old-midas buffers and new-midas code.
2) while debugging these core dumps, I made an error in my test code, so even
after I deleted the old data buffers, things still did not work. Talk about
over-debugging a problem...
K.O. |
24 Jun 2009, Konstantin Olchanski, Bug Report, TR_STARTABORT transition, mlogger duplicate event problem
|
> > > We have seen on several daq systems this problem: we start a run and observe that the number of
> > > events written by mlogger to the output file is double the number of events actually collected. Upon
> > > inspection of the output file, we see that every event is written twice. Restarting the run usually fixes
> > > this problem.
> >
> > mlogger.c fixed svn rev 4497. (from tr_start(), call tr_stop() if somehow it was not called already by end-run transition).
>
> There is a new problem: after an unsuccessful run start, the next run start bombs with the error "output file runNNN.mid already exists". One way around this is to
> manually remove the useless data file, another is to bump up the run number. Better solution is to automatically erase the output file created by unsuccessful run
> starts.
Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
svn rev 4514
K.O. |
25 Jun 2009, Stefan Ritt, Bug Report, TR_STARTABORT transition, mlogger duplicate event problem
|
> Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
>
> This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
> plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
>
> svn rev 4514
> K.O.
There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from
2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works. |
25 Jun 2009, Konstantin Olchanski, Bug Report, TR_STARTABORT transition, mlogger duplicate event problem
|
> > Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
> >
> > This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I
> > plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
> >
> > svn rev 4514
> > K.O.
>
> There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from
> 2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works.
Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.
As of now, the only client that registers and receives this transition is mlogger.
I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).
K.O. |
25 Jun 2009, Stefan Ritt, Bug Report, TR_STARTABORT transition, mlogger duplicate event problem
|
> Are you sure? Only clients that register themselves to receive the TR_STARTABORT transition (via cm_register_transition()) will receive this transition.
>
> As of now, the only client that registers and receives this transition is mlogger.
>
> I also confirm that old clients that know nothing about TR_STARTABORT are *not* sent this transition. (this is tested).
Ok, then we are fine. |
06 May 2013, Konstantin Olchanski, Info, TRIUMF MIDAS page moved to DAQWiki
|
The MIDAS web page at TRIUMF (http://midas.triumf.ca) moved from the daq-plone site to the DAQWiki
(MediaWiki) site. Links were updated, checked and corrected:
https://www.triumf.info/wiki/DAQwiki/index.php/MIDAS
Included is the link to our MIDAS installation instructions. These are more complete compared to the
instructions in the MIDAS documentation:
https://www.triumf.info/wiki/DAQwiki/index.php/Setup_MIDAS_experiment
K.O. |
12 Feb 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
Hi,
I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
Midas during online running. At the end of each run it saves out custom data in the
TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
choice to ignore all of TMFeRpcHandlerInterface when offline?
Thanks,
Mark. |
26 Feb 2025, Thomas Lindner, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
Hi,
Sorry, we have been slammed with a couple projects on the TRIUMF side in the past weeks and haven't found time for a
response. I am hopeful that we will be able to answer this question (and the cache size question) within the next 10
days.
Again apologies,
Thomas
> Hi,
> I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
> Midas during online running. At the end of each run it saves out custom data in the
> TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
> never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
> but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
> choice to ignore all of TMFeRpcHandlerInterface when offline?
>
> Thanks,
>
> Mark. |
20 Mar 2025, Konstantin Olchanski, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
> I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
> Midas during online running. At the end of each run it saves out custom data in the
> TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
> never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
> but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
> choice to ignore all of TMFeRpcHandlerInterface when offline?
apologies for delayed response.
I saw the question, completely did not understand it, only now got around to figure out what is going on.
according to manalyzer/README.md, section "manalyzer module and object life time", BeginRun() and EndRun() is called
always. Offline and online. What you see would be a bug that we do not see in our environment. I confirm this by
running manalyzer in a demo mode: ./bin/manalyzer_test.exe --demo -e10 -t
no, wait, you say you use HandleBeginRun() and HandleEndRun(). this is not right, they are not part of the manalyzer
API and they indeed are only used when running online.
correct solution would be to use BeginRun() and EndRun() instead of HandleBeginRun() and HandleEndRun().
you could also save your data in the module destructor (although good programming recommendation is to use
destructor only for unavoidable things, like freeing memory, etc).
K.O. |
25 Mar 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
Hi,
The question was about the TMFeRpcHandlerInterface, not the TARunObject interface. Derived classes of TARunObject do indeed work as expected in our
environment. We have worked around the issue by using an implementation of TARunObject as well as the (separate) implementation of
TMFeRpcHandlerInterface.
Thanks,
Mark.
> > I have a manalyzer that uses a derived class of TMFeRpcHandlerInterface to communicate information to
> > Midas during online running. At the end of each run it saves out custom data in the
> > TMFeRpcHandlerInterface::HandleEndRun override. This works really well.
> > However, when I run offline on a Midas output file the HandleEndRun method is never called and my data is
> > never saved. Is this intentional? I understand that there is no point for the HandleBinaryRpc method offline,
> > but the other methods (HandleEndRun, HandleBeginRun etc) could serve a purpose. Or is it a conscious
> > choice to ignore all of TMFeRpcHandlerInterface when offline?
>
> apologies for delayed response.
>
> I saw the question, completely did not understand it, only now got around to figure out what is going on.
>
> according to manalyzer/README.md, section "manalyzer module and object life time", BeginRun() and EndRun() is called
> always. Offline and online. What you see would be a bug that we do not see in our environment. I confirm this by
> running manalyzer in a demo mode: ./bin/manalyzer_test.exe --demo -e10 -t
>
> no, wait, you say you use HandleBeginRun() and HandleEndRun(). this is not right, they are not part of the manalyzer
> API and they indeed are only used when running online.
>
> correct solution would be to use BeginRun() and EndRun() instead of HandleBeginRun() and HandleEndRun().
>
> you could also save your data in the module destructor (although good programming recommendation is to use
> destructor only for unavoidable things, like freeing memory, etc).
>
> K.O. |
25 Mar 2025, Konstantin Olchanski, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
> The question was about the TMFeRpcHandlerInterface, not the TARunObject interface. Derived classes of TARunObject do indeed work as expected in our
> environment. We have worked around the issue by using an implementation of TARunObject as well as the (separate) implementation of
> TMFeRpcHandlerInterface.
then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the
picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
maybe if you point me to your source code, I can see what you are doing?
K.O. |
26 Mar 2025, Mark Grimes, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
This was exactly the question, should I expect it to run? There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
I have the answer and we have a workaround, thanks.
> then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the
> picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
>
> maybe if you point me to your source code, I can see what you are doing?
>
> K.O. |
28 Mar 2025, Konstantin Olchanski, Forum, TMFeRpcHandlerInterface::HandleEndRun when running offline on a Midas file
|
I do not understand what you are doing. If you are offline, there is no TMFE singleton instance,
there is nothing TMFeRpcHandlerInterface to attach to, there is nobody to call TMFeRpcHandlerInterface methods.
Maybe what you are asking for is a mode where you analyze data from a file, but you want your analysis code
to think that it is online and that it is analyzing live data. This requires creating a fake TMFE singleton, attaching
TMFeRpcHandlerInterface to this fake TMFE singleton and using ProcessMidasOnlineTmfe(), driven by all this fake stuff:
a fake OpenBuffer() that actually opens a file, a fake ReceiveEvent() that actually reads from a file, fake callbacks
for begin and end run, etc.
That's a lot of work, but for what purpose? What is it about the existing offline and online modes that you do not like
and how all this fake stuff will make it better for you?
P.S. This is a 3rd version of my reply. Wrote and deleted 2 version. I think I completely do not understand
what you are doing and you completely do not understand what I am saying. Communication is not happening.
P.P.S. Simplest if you show me your code (email, elog), I am quite good at reading code and divining what
people are trying to do. You do not have to show me any of your secret secret stuff.
K.O.
> This was exactly the question, should I expect it to run? There's no point in the HandleBinaryRpc method offline, but there's an argument that the HandleBeginRun/HandleEndRun methods have a use.
> I have the answer and we have a workaround, thanks.
>
> > then I do not understand the question. TMFeRpcHandlerInterface stuff is only used when running online and connected to MIDAS. How does it come into the
> > picture when you analyze a data file offline? ProcessMidasOnlineTmfe() does not run, the RpcHandler object is not constructed.
> >
> > maybe if you point me to your source code, I can see what you are doing?
> >
> > K.O. |
25 Feb 2021, Lars Martin, Forum, TMFePollHandlerInterface timing
|
Am I right in thinking that the TMFE HandlePoll function is calle once per
PollMidas()? And what is the difference to HandleRead()? |
25 Feb 2021, Konstantin Olchanski, Forum, TMFePollHandlerInterface timing
|
> Am I right in thinking that the TMFE HandlePoll function is calle once per
> PollMidas()? And what is the difference to HandleRead()?
Actually, polled equipment is not implemented yet in TMFE, as you noted, the
internal scheduler needs to be reworked.
Anyhow, I think with modern c++ and with threads, both "periodic" and "polled"
equipments are not strictly necessary.
Periodic equipment is effectively this:
in a thread:
while (1) {
do stuff, read data, send events
sleep
}
Polled equipment is effectively this:
in a thread:
while (1) {
if (poll()) { read data, send events }
else { sleep for a little bit }
}
Example of such code is the "bulk" equipment in progs/fetest.cxx.
But to implement the same in a single threaded environment (eliminates
problems with data locking, race conditions, etc) and to provide additional
structure to the user code, the plan is to implement polled equipment in TMFE
frontends. (periodic equipment is already implemented).
K.O. |
26 Nov 2024, Nick Hastings, Bug Report, TMFE::Sleep() errors
|
Hello,
I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
The error is :
[tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
This seems to happen in two different ways:
1. Error being reported repeatedly
2. Occasional single errors being reported
When the first of these presents, we typically restart the FE to "solve" the problem.
Case 2. is typically ignored.
The code in question is:
void TMFE::Sleep(double time)
{
int status;
fd_set fdset;
struct timeval timeout;
FD_ZERO(&fdset);
timeout.tv_sec = time;
timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
while (1) {
status = select(1, &fdset, NULL, NULL, &timeout);
#ifdef EINTR
if (status < 0 && errno == EINTR) {
continue;
}
#endif
break;
}
if (status < 0) {
TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
}
}
So it looks like either file descriptor of the timeval struct must have a problem.
From some reading it seems that invalid timeval structs are often caused by one or both
of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
correctly set initially.
From the select() man page I see:
RETURN VALUE
On success, select() and pselect() return the number of file descriptors contained in
the three returned descriptor sets (that is, the total number of bits that are set in
readfds, writefds, exceptfds). The return value may be zero if the timeout expired
before any file descriptors became ready.
On error, -1 is returned, and errno is set to indicate the error; the file descriptor
sets are unmodified, and timeout becomes undefined.
The second paragraph quoted from the man page above would indicate to me that perhaps the
timeout needs to be reset inside the if block. eg:
if (status < 0 && errno == EINTR) {
timeout.tv_sec = time;
timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
continue;
}
Please note that I've only just briefly looked at this and was hoping someone more
familiar with using select() as a way to sleep() might be better able to understand
what is happening.
I wonder also if now that midas requires stricter/newer c++ standards if there maybe
some more straightforward method to sleep that is sufficiently robust and portable.
Thanks,
Nick. |
26 Nov 2024, Maia Henriksson-Ward, Bug Report, TMFE::Sleep() errors
|
> Hello,
>
> I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
> The error is :
>
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
>
> This seems to happen in two different ways:
>
> 1. Error being reported repeatedly
> 2. Occasional single errors being reported
>
> When the first of these presents, we typically restart the FE to "solve" the problem.
> Case 2. is typically ignored.
>
> The code in question is:
>
> void TMFE::Sleep(double time)
> {
> int status;
> fd_set fdset;
> struct timeval timeout;
>
> FD_ZERO(&fdset);
>
> timeout.tv_sec = time;
> timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
>
> while (1) {
> status = select(1, &fdset, NULL, NULL, &timeout);
> #ifdef EINTR
> if (status < 0 && errno == EINTR) {
> continue;
> }
> #endif
> break;
> }
>
> if (status < 0) {
> TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
> }
> }
>
> So it looks like either file descriptor of the timeval struct must have a problem.
> From some reading it seems that invalid timeval structs are often caused by one or both
> of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
> correctly set initially.
>
> From the select() man page I see:
>
> RETURN VALUE
> On success, select() and pselect() return the number of file descriptors contained in
> the three returned descriptor sets (that is, the total number of bits that are set in
> readfds, writefds, exceptfds). The return value may be zero if the timeout expired
> before any file descriptors became ready.
>
> On error, -1 is returned, and errno is set to indicate the error; the file descriptor
> sets are unmodified, and timeout becomes undefined.
>
> The second paragraph quoted from the man page above would indicate to me that perhaps the
> timeout needs to be reset inside the if block. eg:
>
> if (status < 0 && errno == EINTR) {
> timeout.tv_sec = time;
> timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
> continue;
> }
>
> Please note that I've only just briefly looked at this and was hoping someone more
> familiar with using select() as a way to sleep() might be better able to understand
> what is happening.
>
> I wonder also if now that midas requires stricter/newer c++ standards if there maybe
> some more straightforward method to sleep that is sufficiently robust and portable.
>
> Thanks,
>
> Nick.
I had the same error a few months ago, though I wasn't using a tagged release. It happened because I was calling TMFE::Sleep()
with a negative time. If your issues were caused by the same reason, TMFE::Sleep() can handle negative times since commit
591f78f (https://bitbucket.org/tmidas/midas/commits/591f78f52893d5ffd64bf4e52a1daac537ebd672).
Early in my debugging, I did come to the same conclusions you did, and actually tried a similar solution the one you suggested.
This was a few months ago and I didn't write down what happened, but I believe it didn't work because in my case the errno was
something other than EINTR, and/or the timeval was still an invalid argument for sleep because the timeout was still negative. I
never followed it up because I was able to fix my problem by fixing my frontend. |
27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors
|
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
The very original copy of this function had an error and was spewing out this error quite often,
this was a missing handler for EINTR.
Now it looks like we are missing a handler for EINVAL.
Most likely sleep is called with a funny sleep time value that fills struct timeval with
values select() does not like.
I see Ben added a check for negative sleep times, and this is good.
I think I will do these changes:
a) add an error message for negative sleep time, I think user should never call ::Sleep with negative or zero sleep times and
if they do it is a bug and they should fix it, the error message will inform them so.
b) add a handler for EINVAL, which will report the requested sleep time and the values in struct timeval
K.O. |
|