Back Midas Rome Roody Rootana
  Midas DAQ System, Page 91 of 143  Not logged in ELOG logo
ID Date Author Topicdown Subject
  192   20 Jan 2005 Konstantin OlchanskiBug ReportPersistency problem with h1_book() & co
The current h1_book() macros (and the previous example analyzer code) have an
odd persistency problem: for example, the user wants to change some histogram
limits, edits the h1_book() calls, rebuilds and restarts the analyzer, starts a
new run, and observes that all histograms are filled using the old limits, his
changes "did not take". The user panics, I get paged during the Holy Lunch Hour,
everybody is unhappy.

This is what I think happens:

1) analyzer starts
2) LoadRootHistgrams() loads old histograms from file
3) user code calls h1_book()
4) h1_book template in midas.h does this (roughly):
      hist = (TH1X *) gManaHistosFolder->FindObjectAny(name);
      if (hist == NULL) {
         hist = new TH1X(name, title, bins, min, max);
5) since the histogram already exists (loaded from the file, with the old
limits), the TH1X constructor is not called at all, new histogram limits are
utterly ignored.

A possible solution is to unconditionally create the ROOT objects, like I do in
the example code posted at http://dasdevpc.triumf.ca:9080/Midas/191. That code
produces an annoying warning from ROOT about possible memory leaks. This could
be fixed by adding a two liner to "find and delete" the object before it is
created, trippling the number of user code lines per histogram (find & delete,
then create). Highly ugly.

midas.h macros (h1_book & co) can be fixed by adding checks for histogram limits
and such, but I would much prefer a generic solution/convention that would work
for arbitrary ROOT objects without MIDAS-specific wrappers (think TProfile,
TGraph, etc...).

Any suggestions?

K.O.
  193   21 Jan 2005 John M O'DonnellBug ReportPersistency problem with h1_book() & co
> The current h1_book() macros (and the previous example analyzer code) have an
> odd persistency problem: for example, the user wants to change some histogram
> limits, edits the h1_book() calls, rebuilds and restarts the analyzer, starts a
> new run, and observes that all histograms are filled using the old limits, his
> changes "did not take". The user panics, I get paged during the Holy Lunch Hour,
> everybody is unhappy.
> 
> This is what I think happens:
> 
> 1) analyzer starts
> 2) LoadRootHistgrams() loads old histograms from file

I can't get onto cvs@midas.psi.ch right now
(cvs update
cvs@midas.psi.ch's password: 
Permission denied, please try again.)

but when I changed LoadRootHistograms a few days ago I left it as:

    } else if (obj->InheritsFrom( "TH1")) {

      // still don't know how to do TH1s

so h1_book() is creating the first and only copy of the histograms.
I am able to create new histogram limits.
I don't get the memory leak problems.

However I have seen the memory leak problems before, and they are real.
They must be dealt with either by (1) first deleteing the old histogram
or (2) ensuring that histogram names are unique in the whole application
(different modules/folders can not use the same histogram names).

I will return to this once I can do a cvs update for midas.

John.

> 3) user code calls h1_book()
> 4) h1_book template in midas.h does this (roughly):
>       hist = (TH1X *) gManaHistosFolder->FindObjectAny(name);
>       if (hist == NULL) {
>          hist = new TH1X(name, title, bins, min, max);
> 5) since the histogram already exists (loaded from the file, with the old
> limits), the TH1X constructor is not called at all, new histogram limits are
> utterly ignored.
> 
> A possible solution is to unconditionally create the ROOT objects, like I do in
> the example code posted at <a
href="http://dasdevpc.triumf.ca:9080/Midas/191">http://dasdevpc.triumf.ca:9080/Midas/191</a>.
That code
> produces an annoying warning from ROOT about possible memory leaks. This could
> be fixed by adding a two liner to "find and delete" the object before it is
> created, trippling the number of user code lines per histogram (find & delete,
> then create). Highly ugly.
> 
> midas.h macros (h1_book & co) can be fixed by adding checks for histogram limits
> and such, but I would much prefer a generic solution/convention that would work
> for arbitrary ROOT objects without MIDAS-specific wrappers (think TProfile,
> TGraph, etc...).
> 
> Any suggestions?
> 
> K.O.
  194   21 Jan 2005 Stefan RittBug ReportPersistency problem with h1_book() & co
> I can't get onto cvs@midas.psi.ch right now
> (cvs update
> cvs@midas.psi.ch's password: 
> Permission denied, please try again.)

I had to upgrade midas.psi.ch today with Scientific Linux 3.03. Most things are back to work, but
 I failed to do the anonymous CVS account. I have to wait for next week when the experts are
there. I will let you know when it's working again.

- Stefan
  195   25 Jan 2005 Stefan RittBug ReportPersistency problem with h1_book() & co
> > I can't get onto cvs@midas.psi.ch right now
> > (cvs update
> > cvs@midas.psi.ch's password: 
> > Permission denied, please try again.)

cvs@midas.psi.ch should be up and running again.
  196   25 Jan 2005 John M O'DonnellBug ReportPersistency problem with h1_book() & co
So now that cvs is reachable again I have confirmed that
the code segment
 
     } else if (obj->InheritsFrom( "TH1")) {
 
       // still don't know how to do TH1s

is indeed still present.
If you want me to look at this some more, you need to provide some code to exhibit the problem.

John.

> > The current h1_book() macros (and the previous example analyzer code) have an
> > odd persistency problem: for example, the user wants to change some histogram
> > limits, edits the h1_book() calls, rebuilds and restarts the analyzer, starts a
> > new run, and observes that all histograms are filled using the old limits, his
> > changes "did not take". The user panics, I get paged during the Holy Lunch Hour,
> > everybody is unhappy.
> > 
> > This is what I think happens:
> > 
> > 1) analyzer starts
> > 2) LoadRootHistgrams() loads old histograms from file
> 
> I can't get onto cvs@midas.psi.ch right now
> (cvs update
> cvs@midas.psi.ch's password: 
> Permission denied, please try again.)
> 
> but when I changed LoadRootHistograms a few days ago I left it as:
> 
>     } else if (obj->InheritsFrom( "TH1")) {
> 
>       // still don't know how to do TH1s
> 
> so h1_book() is creating the first and only copy of the histograms.
> I am able to create new histogram limits.
> I don't get the memory leak problems.
> 
> However I have seen the memory leak problems before, and they are real.
> They must be dealt with either by (1) first deleteing the old histogram
> or (2) ensuring that histogram names are unique in the whole application
> (different modules/folders can not use the same histogram names).
> 
> I will return to this once I can do a cvs update for midas.
> 
> John.
> 
> > 3) user code calls h1_book()
> > 4) h1_book template in midas.h does this (roughly):
> >       hist = (TH1X *) gManaHistosFolder->FindObjectAny(name);
> >       if (hist == NULL) {
> >          hist = new TH1X(name, title, bins, min, max);
> > 5) since the histogram already exists (loaded from the file, with the old
> > limits), the TH1X constructor is not called at all, new histogram limits are
> > utterly ignored.
> > 
> > A possible solution is to unconditionally create the ROOT objects, like I do in
> > the example code posted at <a
> href="<a
href="http://dasdevpc.triumf.ca:9080/Midas/191">http://dasdevpc.triumf.ca:9080/Midas/191</a>">http://dasdevpc.triumf.ca:9080/Midas/191"><a
href="http://dasdevpc.triumf.ca:9080/Midas/191</a>">http://dasdevpc.triumf.ca:9080/Midas/191</a></a></a>.
> That code
> > produces an annoying warning from ROOT about possible memory leaks. This could
> > be fixed by adding a two liner to "find and delete" the object before it is
> > created, trippling the number of user code lines per histogram (find & delete,
> > then create). Highly ugly.
> > 
> > midas.h macros (h1_book & co) can be fixed by adding checks for histogram limits
> > and such, but I would much prefer a generic solution/convention that would work
> > for arbitrary ROOT objects without MIDAS-specific wrappers (think TProfile,
> > TGraph, etc...).
> > 
> > Any suggestions?
> > 
> > K.O.
  198   25 Jan 2005 John M O'DonnellBug Reporthistograms not saved in replay mode
is there a reason why histograms are not saved after a replay?

   /* save histos if requested */
   if (out_info.histo_dump && clp.online) {
                              ^^^^^^^^^^

perhaps the && should be ||?
  199   26 Jan 2005 Stefan RittBug Reporthistograms not saved in replay mode
> is there a reason why histograms are not saved after a replay?
> 
>    /* save histos if requested */
>    if (out_info.histo_dump && clp.online) {
>                               ^^^^^^^^^^
> 
> perhaps the && should be ||?

The original reason for that is the for running online, you want some histos for
monitoring after each run. For running offline, you specify a root output file via
"-o xxx.root" which contains trees AND histos. So the histos would there be twice
if you remove the "clp.online" from above.

Having "-o xxx.root" is IMHO a cleaner way, since you might want to analyze a run
in different ways (like using different calibrations). So what you do is specify
different "-o cal00123.root", "-o final00123.root" and so on, while with the
mechanism in eor() you always get the same file name. So try using "-o xxx.root"
and see if that fits your needs.
  206   05 Apr 2005 Donald ArseneauBug Reportpointers and segfault in yb_any_file_rclose
I'm getting segfaults in yb_any_file_rclose (closing a file opened with
yb_any_file_ropen with type MIDAS).

I think there are bugs with freeing from uninitialized pointers my.pmagta,
my.pyh, and my.pylrl (which are only set when opening a YBOS file).  These
should be set to NULL in yb_any_file_ropen (case MIDAS).  Likewise, the MIDAS
format pointers my.pmp and my.pmrd should be NULLed for YBOS opens. 

It might be wise to also initialize the pointers in the "my" structure to null.

--Donald              
  207   21 Apr 2005 Konstantin OlchanskiBug Reportpointers and segfault in yb_any_file_rclose
> I'm getting segfaults in yb_any_file_rclose (closing a file opened with
> yb_any_file_ropen with type MIDAS).
> 
> I think there are bugs with freeing from uninitialized pointers my.pmagta,
> my.pyh, and my.pylrl (which are only set when opening a YBOS file).  These
> should be set to NULL in yb_any_file_ropen (case MIDAS).  Likewise, the MIDAS
> format pointers my.pmp and my.pmrd should be NULLed for YBOS opens. 
> 
> It might be wise to also initialize the pointers in the "my" structure to null.

Do you see this crash even after my fix to (another?) double free?

K.O.
  237   14 Dec 2005 Konstantin OlchanskiBug Reportmisc problems
I would like to document a few problems I ran into while setting up a new
experiment (two USB interfaces to Alice TPC electronics, plus maybe a USB
interface to CAMAC). I am using a midas cvs checkout from last October, so I am
not sure if these problems exist in the very latest code. I have fixes for all
of them and I will commit them after some more testing and after I figure out
how to commit into this new svn thingy.

- mxml: writing xml into an in-memory buffer probably produces invalid xml
because one of the mxml functions always writes "/>" into writer->fh, which is 0
for in-memory writers, so the "/>" tag goes to the console instead of the xml
data stream.

- hs_write_event() closes fd 0 (standard input), which confuses ss_getch(),
which makes mlogger not work (at least on my machine). I traced this down to the
history file file descriptors being initialized to zero and hs_write_event()
closing files without checking that it ever opened them.

- mevb: event builder did not work with a single frontend (a two-liner fix, once
Pierre showed me where to look. Why? My second TPC-USB interface did not yet
arrive and I wanted to test my frontend code. Yes, it had enough bugs to prevent
the event builder from working).

- mevb: consumes 100% CPU. Fix: add a delay in the main busy-loop.

- mlogger ROOT tree output does not work for data banks coming through the event
builder: mlogger looks for the bank definition under the event_id of mevb, in 
/equipment/evb/variables, which is empty, as the data banks are under
/equipment/frontendNN/variables. This may be hard to fix: bank "TPCA" may be
under "fe01", "TPCB" under "fe02" and mlogger knows nothing about any of this.
Fix: go back to .mid files.

K.O.
  242   23 Dec 2005 Konstantin OlchanskiBug Reportminor changes to run transition code
> Minor changes to run transitions code:
> - fail transition if cannot connect to one of the clients

This change introduced a problem:
1) a run is happily taking data
2) a frontend crashes
3) the web interface cannot stop the run (cannot contact the crashed frontend)
until  it is removed by the timeout (10-60 seconds?).

I am now considering allowing the run to end even if some clients cannot be
contacted. The begin, pause and resume transitions would continue to fail if
clients cannot be contacted.

K.O.
  243   24 Dec 2005 Stefan RittBug Reportminor changes to run transition code
> I am now considering allowing the run to end even if some clients cannot be
> contacted. The begin, pause and resume transitions would continue to fail if
> clients cannot be contacted.

Sounds like a good idea.

- Stefan
  245   30 Dec 2005 Konstantin OlchanskiBug Reportmhttpd "edit on start" broken for arrays
If a variable under "/experiment/edit on start/" is an array, it is correctly
offered for editing on the "start run page", but then all elements in the array
end up set to the value of the first element.

This appears to be an error in mhttpd.c:interprete(), in the "start dialog"
section. The non-working version in CVS reads:

               for (j = 0; j < key.num_values; j++) {
                  size = key.item_size;
                  sprintf(str, "x%d", n++);
                  db_sscanf(getparam(str), data, &size, j, key.type);
                  db_set_data_index(hDB, hsubkey, data, size + 1, j, key.type);
               }

the fix that works for me reads:
                  db_sscanf(getparam(str), data, &size, 0, key.type);

(notice: the argument "j" is replaced with "0").

The way I understand this, all array elements are encoded into individual HTTP
thingy strings, named sequentially x0, x1, ... and when we parse the values out
of them, the array index should never show up.

(Stefan, if you can, please commit a fix to svn).

K.O.
  246   03 Jan 2006 Stefan RittBug Reportmhttpd "edit on start" broken for arrays
> If a variable under "/experiment/edit on start/" is an array, it is correctly
> offered for editing on the "start run page", but then all elements in the array
> end up set to the value of the first element.

You are right. This was was there from the beginning, you are just the first one
trying "edit on start" with an array. I applied your fix and committed to SVN
reviwion 3013.

Stefan
  253   07 May 2006 Konstantin OlchanskiBug Reportcm_register_transition gyrations
I am debugging a Rome-based DAQ system setup by Pierre A. (the system does not
work because of bugs in Rome).

One problem I see is with my copy of cm_register_transition() in midas.c. Rome
calls it with a NULL function to register a "queued" transition, but the
cm_register_transition() code has changed around (rev 3051) to make NULL mean
"unregister" a transition (this broke the queued transitions used by Rome), then
it got changed back (rev 3085). Of course, I was stuck with the broken version,
so Rome did not work at all, and it cost me real wall time to get to the bottom
of all this, only to discover that this problem is already fixed. So-

I would greatly appreciate it if, in the future, changes (and bug fixes) to the
MIDAS API were announced on this mailing list here.

K.O.
  254   08 May 2006 Stefan RittBug Reportcm_register_transition gyrations
> I am debugging a Rome-based DAQ system setup by Pierre A. (the system does not
> work because of bugs in Rome).
> 
> One problem I see is with my copy of cm_register_transition() in midas.c. Rome
> calls it with a NULL function to register a "queued" transition, but the
> cm_register_transition() code has changed around (rev 3051) to make NULL mean
> "unregister" a transition (this broke the queued transitions used by Rome), then
> it got changed back (rev 3085). Of course, I was stuck with the broken version,
> so Rome did not work at all, and it cost me real wall time to get to the bottom
> of all this, only to discover that this problem is already fixed. So-
> 
> I would greatly appreciate it if, in the future, changes (and bug fixes) to the
> MIDAS API were announced on this mailing list here.
> 
> K.O.

Yes you are right. I apologize. Fact was that I was not aware that anybody else uses
already ROME in online mode. Nevertheless, let me at least explain the reason for
that change:

Some experiments at PSI run a slow control front end, which talks to pretty slow
hardware, and thus can be nonresponsive for many seconds. Since each frontend by
default registers in the start and stop transitions, this frontend delayed the start
/stop of each run. To solve this problem in the short run, the frontend should not
register in the transition. Originally I implemented this by using the NULL function
pointer, until we figured out that ROME uses this to register (not de-register)
together with the cm_query_transition() function. Therefore a new function
cm_deregister_transition() was implemented and is used now by the slow frontends.

In the long run this will be solved by implementing multi-threaded frontends which
get one thread for each equipment and therefore do not block any transition anymore.
  255   11 May 2006 Konstantin OlchanskiBug ReportMIDAS and Fedora 4
Fellow Midasites- we are receiving reports that current Midas sources do not compile on Fedora 4 (and 5?) 
with errors "invalid lvalue in assignment". It looks like the new compilers reject what looks to my eye like 
perfectly valid C code that we have been writing since the beginning of C. Any suggestions on the best fix? 
K.O.
  261   30 May 2006 Konstantin OlchanskiBug Reportbadness with vxworks/ppc
It appears that the latest version of MIDAS malfunctions on PowerPC/VxWorks
machines, below are two problem reports. As reported, previous versions of MIDAS
work fine, I guess that reduces the probability of it being buggy user code. At
least one of the problems feels like a missing endian conversion somewhere, but
I am not aware of any recent changes in the MIDAS RPC code... We will be trying
to debug both problems, but any insight would be greatly appreciated.

K.O.


From suz@triumf.ca  Tue May 30 16:58:16 2006
Date: Tue, 30 May 2006 16:58:16 -0700 (PDT)
From: Suzannah Daviel <suz@triumf.ca>
To: konstantin olchanski <olchansk@triumf.ca>
Subject: rpc problems

Hi Konstantin,

Herewith a description of the problems,

Suzannah

Problem on system A:
--------------------

After upgrading the Linux operating system from RH9 to SL4, and installing
latest Midas software, the first time a manual trigger is issued, the VxWorks
frontend (running
on a PPC) crashes:


Output on PPC consol:

trigger histo event from status page

rpc_client_accept: starting with sock:11

program
Exception current instruction address: 0x01ac7388
Machine Status Register: 0x0008b030
Condition Register: 0x24000082
Task: 0x1b47908 "mfe"



The histo event is usually large so is fragmented. It is sent out by a
manual trigger and at end of run. When the run is ended (before an event
request using a manual trigger so program has not yet crashed) the histo
event is sent successfully.

After returning to the previous version of Midas but still running SL4,
this problem disappeared.




Problem on system B:
--------------------

Again, SL9 was installed, and the Midas software updated to the latest.
When sending a periodic (non-fragmented) event, after a while, one of the
parameters appears to become corrupted, and a lot of rpc_call error
messages appear. These continue while data is still successfully sent out
until the run is ended.



Tue May  9 05:20:29 2006 [Mdarc] *** data saved in file
/is01_data/bnmr/dlog/2006/040377.msr_v5 at Tue May  9 05:20:29
2006 (SN=5) ***
Tue May  9 05:21:30 2006 [Mdarc] *** data saved in file
/is01_data/bnmr/dlog/2006/040377.msr_v6 at Tue May  9 05:21:30
2006 (SN=6) ***
Tue May  9 05:22:31 2006 [Mdarc] *** data saved in file
/is01_data/bnmr/dlog/2006/040377.msr_v7 at Tue May  9 05:22:31
2006 (SN=7) ***

Tue May  9 05:23:12 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808
Tue May  9 05:23:12 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808
........................................
Tue May  9 05:23:31 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808
Tue May  9 05:23:32 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808

Tue May  9 05:23:32 2006 [Mdarc] *** data saved in file
/is01_data/bnmr/dlog/2006/040377.msr_v8 at Tue May  9 05:23:32
2006 (SN=8) ***

Tue May  9 05:23:32 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808
Tue May  9 05:23:33 2006 [feBNMR] [midas.c:9325:rpc_call] parameters
(1099059848) too large for network buffer
(524344); param_size=1099059808
etc.

Another example showing that the corrupted parameter varies in size:

Thu Apr 13 19:00:00 2006 [mhttpd] Run #30005 started
Thu Apr 13 19:00:08 2006 [Mdarc] *** Saved data file
/is01_data/bnmr/dlog/2006/030005.msr_v1 at Thu Apr 13 19:00:08 2006 ***
Thu Apr 13 19:01:10 2006 [Mdarc] *** Saved data file
/is01_data/bnmr/dlog/2006/030005.msr_v2 at Thu Apr 13 19:01:10 2006 ***
Thu Apr 13 19:02:14 2006 [Mdarc] *** Saved data file
/is01_data/bnmr/dlog/2006/030005.msr_v3 at Thu Apr 13 19:02:14 2006 ***
Thu Apr 13 19:03:20 2006 [Mdarc] *** Saved data file
/is01_data/bnmr/dlog/2006/030005.msr_v4 at Thu Apr 13 19:03:20 2006 ***
Thu Apr 13 19:04:22 2006 [Mdarc] *** Saved data file
/is01_data/bnmr/dlog/2006/030005.msr_v5 at Thu Apr 13 19:04:22 2006 ***
Thu Apr 13 19:05:12 2006 [feBNMR] [midas.c:9323:rpc_call] parameters
(1077739560) too large for network buffer
(524344)
Thu Apr 13 19:05:13 2006 [feBNMR] [midas.c:9323:rpc_call] parameters
(1077739560) too large for network buffer
(524344)
etc.
  266   08 Jun 2006 Konstantin OlchanskiBug ReportMidas does not build on Fedora 5
Fresh svn checkout of MIDAS does not build on Fedora 5, I get this error:

cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_ROOT -pthread
-I/triumfcs/trshare/olchansk/root/root_v5.10.00_SL40/include -m32 -DOS_LINUX
-fPIC -Wno-unused-function -o linux/lib/odb.o src/odb.c
src/odb.c: In function 'db_open_database':
src/odb.c:805: warning: dereferencing type-punned pointer will break
strict-aliasing rules
src/odb.c: In function 'db_lock_database':
src/odb.c:1350: warning: dereferencing type-punned pointer will break
strict-aliasing rules
cc: Internal error: Segmentation fault (program cc1)
Please submit a full bug report.
See <URL:http://bugzilla.redhat.com/bugzilla> for instructions.
make: *** [linux/lib/odb.o] Error 1

If I compile odb.c without "-O2", the rest of MIDAS builds without any more errors.

The observed warnings are (I do not know what they mean):
warning: dereferencing type-punned pointer will break strict-aliasing rules
warning: missing sentinel in function call (Cannot do without sentinels, eh?)
warning: pointer targets in passing argument 3 of 'getsockname' differ in signedness
warning: non-local variable '<anonymous struct> out_info' uses anonymous type

The "invalid lvalue" errors seem to have been successfully vanquished.

K.O.
  275   24 Jul 2006 Art OlinBug ReportElog attachments
Hi. When I attach the file below, Mix+Positronorig.xlx to an elog, and then open it or download it to disk, the file, 060... is severely truncated.
-rw-r--r-- 1 alpha users 17408 Jul 24 11:25 Mix+Positronorig.xls
-rw-r--r-- 1 alpha users 1 Jul 24 11:04 060724_100544_Mix+Positron Cabling 20060723.xls

It's something to do with long filenames or special characters in filenames. Worked OK when I renamed the original file to M1.xls.
Attachment 1: Mix+Positronorig.xls
Attachment 2: Mix+Positron.xls
ELOG V3.1.4-2e1708b5