Back Midas Rome Roody Rootana
  Midas DAQ System, Page 53 of 142  Not logged in ELOG logo
ID Date Author Topicdown Subject
  375   14 May 2007 Carl MetelkoForumSplitting data transfer and control onto different networks
Hi,
   thanks for the advice. We do have dual core Xeons so we'll try running
most things on the server. Unless it proves to be a problem we'll run all
MIDAS signals on one network and NFS etc on the other.

I do have one more query about running systems like Konstantin.
What we would like to do is have a 'mirror' server serving multiple
online monitoring machines so that the load on the server is constant nomatter
the demands on the mirror.

Is there a way to set this up? Or would it be best to have a remote analyser
making short (1min) root files shared with the online monitoring? 
  382   07 Jun 2007 Randolf PohlForumcrash when analyzing multiple runs offline
Hello,

I am having a problem with the root-based analyzer. It crashes when I try to 
analyze multiple runs OFFLINE using the "-i run%05d.mid -o result%05d.root -r 
1 2" feature.

I can reproduce the problem with the example experiment which comes with the 
MIDAS distribution:
Running the analyzer ONLINE works fine: One can start and stop runs one after 
the other, roody shows the histograms being reset and then filled again and 
such.

But OFFLINE, the analyzer crashes when trying to analyze the SECOND run in a 
sequence. So
./analyzer -i run%05d.mid -o result%05d.root -r 1 1   works (only run 1)
./analyzer -i run%05d.mid -o result%05d.root -r 1 3   dies on run 2
Output attached (I added printf's to the "init"-modules, but that's irrelevant 
here)


My own analyzer shows the same effect. There I got the impression the segfault 
happens on the first attempt to Fill/Reset/SetName etc. a histogram in the 2nd 
run. But with the midas example it looks like the analyzer finishes filling 
histos even for run 2, but then dies in eor.

Can you reproduce the problem?

I run MIDAS on an Intel Quadcore, 64 bit SuSE Linux 10.2.
pohl@lamb2:~/midas/examples/root> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)

(maybe 4.1.2 "PRERELEASE" is the problem? See message ID 344)

I am using midas rev. 3674 (April 19, 2007), but I got the impression there 
has since not been a change relevant to this problem. Please correct me if I 
am wrong, then I would try it with Rev HEAD.
(My version includes already the fix to the x86_64 segfault problem of message 
ID 337)


Best regards,

Randolf
Attachment 1: crash.out
pohl@lamb:~/midas/examples/root> ./analyzer -e exa_root -i run%05d.mid -o /tmp/pohl/test%05d.root -r 1 3

analyzer_init

Root server listening on port 9090...
adc_calib_init
adc_summing_init
scaler_init
Running analyzer offline. Stop with "!"
Set run number 1 in ODB
Load ODB from run 1...
analyzer_init

OK
run00001.mid:777  /tmp/pohl/test00001.root:775  events, 0.00s
Set run number 2 in ODB
Load ODB from run 2...
analyzer_init

OK
run00002.mid:7227  /tmp/pohl/test00002.root:7225  events, 0.04s

 *** Break *** segmentation violation
Using host libthread_db library "/lib64/libthread_db.so.1".
Attaching to program: /proc/10558/exe, process 10558
[Thread debugging using libthread_db enabled]
[New Thread 47688414288800 (LWP 10558)]
[New Thread 1082132800 (LWP 10559)]
0x00002b5f52afae1f in waitpid () from /lib64/libc.so.6
Thread 2 (Thread 1082132800 (LWP 10559)):
#0  0x00002b5f5234d0ab in __accept_nocancel () from /lib64/libpthread.so.0
#1  0x00002b5f4e4cc510 in TUnixSystem::AcceptConnection ()
   from /usr/local/root/lib/root/libCore.so.5.14
#2  0x00002b5f4e4c1592 in TServerSocket::Accept () from /usr/local/root/lib/root/libCore.so.5.14
#3  0x000000000041075f in root_socket_server (arg=<value optimized out>) at src/mana.c:5453
#4  0x00002b5f5188428a in TThread::Function () from /usr/local/root/lib/root/libThread.so.5.14
#5  0x00002b5f5234609e in start_thread () from /lib64/libpthread.so.0
#6  0x00002b5f52b294cd in clone () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 47688414288800 (LWP 10558)):
#0  0x00002b5f52afae1f in waitpid () from /lib64/libc.so.6
#1  0x00002b5f52aa3491 in do_system () from /lib64/libc.so.6
#2  0x00002b5f52aa3817 in system () from /lib64/libc.so.6
#3  0x00002b5f4e4d0851 in TUnixSystem::StackTrace ()
   from /usr/local/root/lib/root/libCore.so.5.14
#4  0x00002b5f4e4cfa4a in TUnixSystem::DispatchSignals ()
   from /usr/local/root/lib/root/libCore.so.5.14
#5  <signal handler called>
#6  0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7  0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489
#8  0x0000000000410b45 in eor (run_number=<value optimized out>, error=<value optimized out>)
    at src/mana.c:1981
#9  0x0000000000412d9b in analyze_run (run_number=2, 
    input_file_name=0x7fff5cafd020 "run00002.mid", output_file_name=<value optimized out>)
    at src/mana.c:4471
#10 0x00000000004130b4 in loop_runs_offline () at src/mana.c:4518
#11 0x0000000000413e05 in main (argc=<value optimized out>, argv=<value optimized out>)
    at src/mana.c:5757
#0  0x00002b5f52afae1f in waitpid () from /lib64/libc.so.6
[midas.c:1592:] cm_disconnect_experiment not called at end of program
  385   08 Jun 2007 Stefan RittForumcrash when analyzing multiple runs offline
Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:

#5  <signal handler called>
#6  0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7  0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489

in the line 

    free(tree_struct.event_tree[i].branch);

If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():

      /* fill tree both online and offline */
      if (!exclude_all)
         et->tree->Fill();

Maybe one should put printf debugging statements in these places to see what's
going on.
  386   09 Jun 2007 Randolf PohlForumcrash when analyzing multiple runs offline
Hello Stefan,

tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should 
presumably not be the case, since CloseRootOutputFile() frees the trees at eor().

------------------- output ---------------------------
lamb@lamb2:~/midas/root_3705> ./analyzer -e 
exa_root -i /tmp/midas/examples/root/run%05d.mid -o /tmp/midas/run%05d.root -r 1 2
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
book_ttree: tree_struct.n_tree = 1
book_ttree: tree_struct.n_tree = 2
Set run number 1 in ODB
Load ODB from run 1...OK
/tmp/midas/examples/root/run00001.mid:2722  /tmp/midas/run00001.root:2720  events, 
0.21s
book_ttree: tree_struct.n_tree = 3     <<---- !!!!
book_ttree: tree_struct.n_tree = 4
Set run number 2 in ODB
Load ODB from run 2...OK
/tmp/midas/examples/root/run00002.mid:2347  /tmp/midas/run00002.root:2345  events, 
0.18s

 *** Break *** segmentation violation
----------------- \output ----------------------------

Adding this one line fixes the segfault problem for the root example expt.

----------------- code -------------------------
lamb@lamb2:/data/software/midas/midas_3705/src/src> svn diff mana.c
Index: mana.c
===================================================================
--- mana.c      (revision 3705)
+++ mana.c      (working copy)
@@ -1496,6 +1496,7 @@
    /* delete event tree */
    free(tree_struct.event_tree);
    tree_struct.event_tree = NULL;
+   tree_struct.n_tree = 0;
 
    // go to ROOT root directory
    gROOT->cd();
---------------- \code ---------------------------

Please check if this gives the intended behaviour. I am not very familiar with the 
midas internals.

Unfortunately my own analyzer's segfault problem is not solved by this patch. I 
guess I have to keep searching for a bug on my side.....  :-)


Cheers,

Randolf
  387   10 Jun 2007 Stefan RittForumcrash when analyzing multiple runs offline
> tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should 
> presumably not be the case, since CloseRootOutputFile() frees the trees at eor().

Yes this indeed a bug. I applied your change and committed the new code.
  388   11 Jun 2007 Randolf PohlForumcrash when analyzing multiple runs offline
Hello again,

just for the record, in case somebody else runs into the same problem...

I have hunted down "my" segfault problem to the fact that I book histograms not 
in <module>_init, but in <module>_bor. I have to do so, because only in bor do I 
know which histograms to book, as this information comes from the ODB (booking 
only histograms for CAMAC modules which were set to "read" in the ODB). The core 
dump happens on the first access (->Fill, ->SetName,...) of one of these histos 
in the 2nd run analyzed offline ("./analyzer -r n m").

In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module 
bor() functions go to the output file", and then does a gManaOutputFile->cd();
Consequently, the histograms vanish after the file is closed, therefore the 
segfault when trying to access them in the 2nd run. (I keep track of existing 
histograms, only booking the missing histos in bor.)

The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with 
TFolders and booking the histogram.


I do, however, not really understand the intention why histos booked in bor() go 
to only the file, whereas histos booked in init() go to memory. Could you please 
comment briefly? Maybe I missed the most important point. And what about online 
mode, should this work?


Thanks a lot in advance,

Randolf
  389   11 Jun 2007 Stefan RittForumcrash when analyzing multiple runs offline
> I have hunted down "my" segfault problem to the fact that I book histograms not 
> in <module>_init, but in <module>_bor. I have to do so, because only in bor do I 
> know which histograms to book, as this information comes from the ODB (booking 
> only histograms for CAMAC modules which were set to "read" in the ODB). The core 
> dump happens on the first access (->Fill, ->SetName,...) of one of these histos 
> in the 2nd run analyzed offline ("./analyzer -r n m").
> 
> In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module 
> bor() functions go to the output file", and then does a gManaOutputFile->cd();
> Consequently, the histograms vanish after the file is closed, therefore the 
> segfault when trying to access them in the 2nd run. (I keep track of existing 
> histograms, only booking the missing histos in bor.)
> 
> The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with 
> TFolders and booking the histogram.

ROOT has the strange concept of "current working directory", coming from the fact that
ROOT was written by Fortran and PAW people, being used to have directories and
subdirectories with a persistent state (not really object-oriented style). So one can
set the "current working directory" to the root (=memory) with gROOT->cd() and to a
subdirectory which will later be written into a file with gManaOutputFile->cd(). If
you do the first one, the histograms are created only in memory, while in the later
case they are also created in memory, but will later be written into the output file
in the routine CloseRootOutputFile(). So if you do a gROOT->cd() in <module>_bor,
these histograms will not be written to file. So I guess your solution is not a real
solution.

> I do, however, not really understand the intention why histos booked in bor() go 
> to only the file, whereas histos booked in init() go to memory. Could you please 
> comment briefly? Maybe I missed the most important point. And what about online 
> mode, should this work?

The root output file is opened in bor() and closed in eor(). For a histo to go to the
file, it must be booked after opening the file, that is after bor() in mana.c and
therefore after the gManaOutputFile->cd().

I agree with you that the current scheme is not satisfactory. When running online, you
want to keep the histos between the runs. When running offline, you delete and
re-create them for each run. It would be better to create all histos online and
offline under gROOT, and just copy them to gManaOutputFile before writing them. I have
to admit that this root code was never really used in a productive environment for
offline analysis, so there might be some issues here and there. Some people write
directly root files in the logger, and then do a root-only (without the midas
analyzer) analysis. Unfortunately I'm busy these days and cannot write any code right
now. But if you feel like something should be modified in mana.c, please send it to me
and I can incorporate it into the standard code.
  390   12 Jun 2007 Randolf PohlForumcrash when analyzing multiple runs offline
Hi

> So I guess your solution is not a real solution.

I was not precise enough on what I do. This way the histograms persist in memory, but 
they are also written to every file:

e.g. in module "trig_tdc":

  TDirectory *savedir = gDirectory;  // will restore this afterwards
  gROOT->cd();     // go to file

  // make sure we are in the right "analyzer module folder"
  TDC_Folder = (TFolder *) gROOT->FindObjectAny("trig_tdc");
  gHistoFolderStack->Add((TObject *) TDC_Folder);

  ...(loop over all TDCs, figure out which histos exist, and which need to be booked)

  open_subfolder("raw4208");
  hrTDC = h1_book(....);   // create histo in memory, but it shows up in the file, too.
  close_subfolder(); //raw4208

  // restore gHistoFolderStack (we added a folder when entering routine)
  gHistoFolderStack->Remove(gHistoFolderStack->Last());

  // restore current directory
  savedir->cd();

When deleting histos I do:

     gManaHistosFolder->RecursiveRemove(*pHisto);
    (*pHisto)->Delete();
    (*pHisto) = NULL;  // for my book-keeping of existing histos.

You don't have to clear the histos explicitly between runs. gManaHistosFolder does this 
magic to you.

> But if you feel like something should be modified in mana.c, please send it to me
> and I can incorporate it into the standard code.

No, the code is fine. I just wanted to explain my problem and a solution to it, because 
I thought that somebody might run into the same problem, too. 

Ciao,

Randolf
  395   12 Jul 2007 Konstantin OlchanskiForumMidas on a x86_64 - incompatible with x86_32
> We run 64-bit MIDAS on RHEL4 with 64-bit ROOT and everything generally works,
> except for compatibility problems with 32-bit MIDAS.
> 
> The big problem is that 64-bit and 32-bit ODB turned out to be incompatible ...

I have now identified 3 data structures that change size when compiled with "-m64":

EVENT_REQUEST: stores a pointer to a function. Pointer size is 4 bytes with -m32 and 8 bytes with -m64.
This structure is part of an array inside BUFFER_HEADER, resulting in a sizable size mismatch between 32
bit and 64 bit shared memory data buffers.

The fix is simple: the function pointer is not used anywhere. Replace is with a "DWORD unused_filler"
makes -m32 and -m64 data buffers compatible. (But breaks compatibility with previous -m64 compiled midas).

CHN_SETTINGS and CHN_STATISTICS: apparently, -m32 and -m64 GCC has different packing rules and in -m64
mode, 4 bytes of padding are added to these data structures. Size size mismatch appears to be benign,
but will result in "size mismatch" complaints from ODB.

The fix is simple: adding "__attribute__ ((__packed__))" to the definition of the data structure makes
-m64 identical to -m32.

The "svn diff" of changes involved is attached below.

The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
to these data structures and break the 32-bit ODB).

I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
environment because out latest host computers all run 64-bit Linux while all our VME processors and all
older machines can only run 32-bit code; this incompatibility causes us weekly headaches.

Any thoughts?

K.O.

(this output of svn diff is doctored for clarity)

ladd00:midas$ svn diff
Index: include/midas.h
===================================================================
--- include/midas.h     (revision 3744)
+++ include/midas.h     (working copy)
-   void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
+   INT unused; // was void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
 } EVENT_REQUEST;
 
--- include/msystem.h   (revision 3744)
+++ include/msystem.h   (working copy)

+#define PACKED __attribute__ ((__packed__))  <--- this goes into midas.h inside the #ifdef "we use GCC"
 
-typedef struct {
+typedef struct PACKED { ... CHN_SETTINGS
 
-typedef struct {
+typedef struct PACKED { ... CHN_STATISTICS
  396   13 Jul 2007 Stefan RittForumMidas on a x86_64 - incompatible with x86_32
> The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
> the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
> to these data structures and break the 32-bit ODB).
> 
> I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
> environment because out latest host computers all run 64-bit Linux while all our VME processors and all
> older machines can only run 32-bit code; this incompatibility causes us weekly headaches.
> 
> Any thoughts?

I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
This ensures to keep the native 64-bit packing, which probably will be somehow optimized for 64-bit
architectures and therefore might be a bit faster in the long run, when most systems are 64-bit. After this
has been implemented and well tested, I would go with an official announcement of the 32-bit break in the ODB,
and release a new version, so people can update from a TAR file if necessary. Existing ODB's can be converted
to the new format by exporting them in XML form and importing them again after the upgrade.
  399   12 Aug 2007 Konstantin OlchanskiForumMidas on a x86_64 - incompatible with x86_32
> I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
> in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.

I now have the patches to implement this. Changes turned out to be minimal:

1) midas.h: remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
2) msystem.h: add 32-bit padding to CHN_STATISTICS and CHN_SETTINGS

(Pedantic note: the C/C++ languages permit compilers to arbitrary pad data members inside structures and one is
not supposed to rely on the specific layout of "struct"s, they could changing from day to day depending on
compiler vendor, version, 32/64 bit, optimization level, etc. This is quite silly, but I guess it was the only way
"they" could agree on a standard)

In practice, compilers are will behaved and one can follow simple rules and stay out of trouble.
1) if all data members are of the same size -> no padding
2) do not use "double" (64-bit) and "short" (16-bit), make all char[] arrays divisible by 4 -> size of everything
is 32-bit, see rule 1
3) if you have to use "short", they have to come in pairs to keep everything else aligned to 32-bit
4) if you have to use "double" (or uint64_t), keep them aligned to 64-bit, i.e. struct { int a,b,c; double x;} is
*bad* (4-byte padding may be added between c and x). struct { int a,b,c,d; double x; } is good.

Below are is "svn diff include/midas.h include/msystem.h". These changes have been tested on SL4 32-bit and
64-bit, SL5 32/64, F7 32/64 and SL4/ICC (Intel compiler) 32 bit and 64 bit.

The testing was done by adding checks on sizes of all struct's kept on ODB, i.e.
   assert(sizeof(CHN_SETTINGS        ) ==    640); // ODB v3 with padding
   assert(sizeof(CHN_STATISTICS      ) ==     32); // ODB v3 with padding
   ... etc ...

K.O.

ladd03:midas$ svn diff include/midas.h include/msystem.h
Index: include/midas.h
===================================================================
--- include/midas.h     (revision 3798)
+++ include/midas.h     (working copy)
@@ -38,7 +38,7 @@
  *  @{  */

 /* has to be changed whenever binary ODB format changes */
-#define DATABASE_VERSION 2
+#define DATABASE_VERSION 3

 /* MIDAS version number which will be incremented for every release */
 #define MIDAS_VERSION "2.0.0"
@@ -810,8 +810,6 @@
    short int event_id;           /**< event ID                        */
    short int trigger_mask;       /**< trigger mask                    */
    INT sampling_type;            /**< GET_ALL, GET_SOME, GET_FARM     */
-                                 /**< dispatch function */
-   void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
 } EVENT_REQUEST;

 typedef struct {
Index: include/msystem.h
===================================================================
--- include/msystem.h   (revision 3798)
+++ include/msystem.h   (working copy)
@@ -454,6 +454,7 @@
    INT event_id;
    INT trigger_mask;
    DWORD event_limit;
+   INT pad; // FIXME 64-bit "double" should be 64-bit aligned
    double byte_limit;
    double tape_capacity;
    char subdir_format[32];
@@ -465,6 +466,7 @@
    double bytes_written;
    double bytes_written_total;
    INT files_written;
+   INT pad; // FIXME pad data structure to be 64-bit aligned
 } CHN_STATISTICS;

 typedef struct {
ladd03:midas$
  400   20 Aug 2007 Konstantin OlchanskiForumMidas on a x86_64 - incompatible with x86_32
> > I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
> > in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
> 
> I now have the patches to implement this. Changes turned out to be minimal:
> 
> 1) midas.h: remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
> 2) msystem.h: add 32-bit padding to CHN_STATISTICS and CHN_SETTINGS

The padding of CHN_STATISTICS and CHN_SETTINGS is not working right - somehow mhttpd and mlogger keep recreating the
data in ODB and erasing the padding fields. I am looking into this.

K.O.
  403   29 Aug 2007 Konstantin OlchanskiForumODBv3, second try - Midas on a x86_64 - incompatible with x86_32
> > > I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
> > > in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
> > 1) midas.h: remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
> > 2) msystem.h: add 32-bit padding to CHN_STATISTICS and CHN_SETTINGS

I am now trying a different solution of to fixing the issue of CHN_STATISTICS and CHN_SETTINGS changing size.

1) midas.h: (same as before) remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
2) msystem.h: in CHN_STATISTICS and CHN_SETTINGS change type of "event_limit" and "files_written" from int to "double".

Below are the latest ODBv3 meta patches:

ladd03:midas$ svn diff
Index: include/midas.h
===================================================================
--- include/midas.h     (revision 3844)
+++ include/midas.h     (working copy)
 /* has to be changed whenever binary ODB format changes */
-#define DATABASE_VERSION 2
+#define DATABASE_VERSION 3
.........
    short int trigger_mask;       /**< trigger mask                    */
    INT sampling_type;            /**< GET_ALL, GET_SOME, GET_FARM     */
-                                 /**< dispatch function */
-   void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
 } EVENT_REQUEST;

Index: include/msystem.h
===================================================================
--- include/msystem.h   (revision 3845)
+++ include/msystem.h   (working copy)
-"Event limit = DWORD : 0",\
+"Event limit = DOUBLE : 0",\
..................
-"Files written = INT : 0",\
+"Files written = DOUBLE : 0",\
..................
-   DWORD event_limit;
+   double event_limit;
..................
-   INT files_written;
+   double files_written;

K.O.
  412   17 Oct 2007 Randolf PohlForumAdding MIDAS .root-files
Dear MIDAS users,

I want to add several .root-files produced by the MIDAS analyzer, in a fast 
and convenient way. ROOT's hadd fails because it does not know how to treat 
TFolders. I guess this problem is not unique to me, so I hope that somebody of 
you might already have found a solution.

Why don't I just run "analyzer -r 1 10000"?
We have taken lots of runs under (rapidly) varying conditions, so it would be 
lots of "-r". And the analysis is quite involved, so rerunning all data takes 
about one hour on a fast PC making this quite painful.
Therefore, I would like to rerun all data only once, and then add the result 
files depending on different criteria.

Of course, I tried to write a script that does the adding. But somehow it is 
incredibly slow. And I am not the Master Of C++, too.

Is there any deeper reason for MIDAS using TFolders, not TDirectorys? ROOT's 
hadd can treat TDirectory. Can I simply patch "my" MIDAS? Is there general 
interest in a change like this? (Does anyone have experience with the speed of 
hadd?)

Looking forward to comments from the Forum.

Cheers,

Randolf
  413   17 Oct 2007 Randolf PohlForumMulti-core CPUs
Dear Forum,

I have this beautiful Intel Quadcore with fast disks, but MIDAS does obviously 
only make use of one CPU at a time. Has anyboy of you already done some work 
on making MIDAS parallel? Event-based data analysis should be the best 
candidate for this.

Has anybody done this with PVM? There is some PVM-related stuff in the MIDAS 
sources, but I got the impression this works only with HBOOK, not with ROOT. 
Or am I wrong?
But then PVM is probably also not the most efficient thing one ONE machine 
with multiple CPUs, right? And finally, with PVM we're back to 
adding .root-files efficiently (see my previous post).

Any thoughts?

Cheers,

Randolf
  414   17 Oct 2007 Stefan RittForumMulti-core CPUs
> I have this beautiful Intel Quadcore with fast disks, but MIDAS does obviously 
> only make use of one CPU at a time. Has anyboy of you already done some work 
> on making MIDAS parallel? Event-based data analysis should be the best 
> candidate for this.

There are ring buffer routines rb_xxx for distributed event analysis, but this is
currently only implemented in the front-end framework. These routines are pretty
simple, and their integration into the analyzer should not be very difficult.
Unfortunately I don't have time for that right now. We do our analysis such that we
analyze four different runs in parallel on a quadcore machine.

- Stefan
  415   17 Oct 2007 John M O'DonnellForumAdding MIDAS .root-files
The following program handles regular directories in a file, or folders (ugh).
Most histograms are added bin by bin.

For scaler events it is convenient to see the counts as a function of time (ala
sclaer history plots in mhttpd).  If the histogram looks like a scaler plot versus
time, then new bins are added on to the end (or into the middle!) of the histogram.

All different versions of cuts are kept.

TTrees are not explicitly supported, so probably don't do the right thing...

John.

> Dear MIDAS users,
> 
> I want to add several .root-files produced by the MIDAS analyzer, in a fast 
> and convenient way. ROOT's hadd fails because it does not know how to treat 
> TFolders. I guess this problem is not unique to me, so I hope that somebody of 
> you might already have found a solution.
> 
> Why don't I just run "analyzer -r 1 10000"?
> We have taken lots of runs under (rapidly) varying conditions, so it would be 
> lots of "-r". And the analysis is quite involved, so rerunning all data takes 
> about one hour on a fast PC making this quite painful.
> Therefore, I would like to rerun all data only once, and then add the result 
> files depending on different criteria.
> 
> Of course, I tried to write a script that does the adding. But somehow it is 
> incredibly slow. And I am not the Master Of C++, too.
> 
> Is there any deeper reason for MIDAS using TFolders, not TDirectorys? ROOT's 
> hadd can treat TDirectory. Can I simply patch "my" MIDAS? Is there general 
> interest in a change like this? (Does anyone have experience with the speed of 
> hadd?)
> 
> Looking forward to comments from the Forum.
> 
> Cheers,
> 
> Randolf
Attachment 1: histoAdd.cxx
#include <iostream>
#include <vector>
#include <iterator>
#include <cstring>
using namespace std;

#include "TROOT.h"
#include "TFile.h"
#include "TString.h"
#include "TDirectory.h"
#include "TObject.h"
#include "TClass.h"
#include "TKey.h"
#include "TH1.h"
#include "TAxis.h"
#include "TMath.h"
#include "TCutG.h"
#include "TFolder.h"

bool verbose (false);
TFolder *histosFolder (0);

//==============================================================================

void addObject (TObject *o, const TString &prefix, TFolder *opFolder=0);

/** called for each file to do the addition.
  * loops over each object in the file, and
  * uses addObject to dispatch the actuall addition.
  */

void add (TFile *file, const TString &prefix) {

//------------------------------------------------------------------------------

  TString dirName (file->GetName());
  if (verbose) cout << " scanning TFile" << endl;
  TString newPrefix (prefix + dirName + "/");

  TIter i (file->GetListOfKeys());
  while (TKey *k = static_cast<TKey *>( i())) {

    TObject *o (file->Get( k->GetName()));
    addObject( o, newPrefix);
  }

  return;
}

//==============================================================================

/** Most histograms are added bin by bin, but if simpleAdd == false,
  * the xaxis values are assumed different, in which case we look up
  * appropriate bin numbers, and use a new extended xaxis if needed.
  *
  * Use simpleAdd=false to accumulate scaler rate histograms.
  */
void add (const TH1 *newh, TH1 *&hsum, TFolder *opFolder) {

//------------------------------------------------------------------------------

  const bool simpleAdd (newh->GetXaxis()->GetTimeDisplay() ? false : true);

  if (!hsum) {
    hsum = (TH1 *)newh->Clone();
    TString title = "histoAdd: ";
    title += hsum->GetTitle();
    if (opFolder) hsum->SetDirectory( 0);
  }
  else if (simpleAdd) hsum->Add( newh);
  else { // extend axis - for 1D histos with equal sized bins

    size_t nBinsSum = hsum->GetNbinsX();
    size_t nBinsNew = newh->GetNbinsX();
    vector<Double_t>bin_contents;
    vector<Double_t>histo_edges;
    Int_t holder_bins;

    /* foundBin is either the overflow bin (if the 2 histograms don't overlap)
     * or it is the bin number in histoA that has the same low edge as the 
     * lowest bin edge in histoB. The only time that histograms can overlap 
     * is when the older scaler.cxx is used to create the histograms with 
     * fixed bin sizes. 
     */
    Int_t foundBin = hsum->FindBin( newh->GetBinLowEdge(1) );

    histo_edges.resize(foundBin);
    bin_contents.resize(foundBin);
    
    for( int i = 1; i <= foundBin; i++ ) {
      histo_edges[i-1] = hsum->GetBinLowEdge(i);
      bin_contents[i-1] = hsum->GetBinContent(i);
    }

    if(foundBin < nBinsSum)  { 
      //the histos overlap or we have already made holder bins
      holder_bins = 0;
    }
    else        {
      //create a "place holder" histo
      Int_t width = 10;
      holder_bins = (int)((newh->GetXaxis()->GetXmin() 
                    - hsum->GetXaxis()->GetXmax())/width);

      if( holder_bins < width ) holder_bins = width;

      TH1F *bin_holder = new TH1F("bin_holder", "bin_holder", holder_bins, 
                           hsum->GetXaxis()->GetXmax(), 
                           newh->GetXaxis()->GetXmin() );

      histo_edges.resize( foundBin + holder_bins );
      bin_contents.resize( foundBin + holder_bins );

      for( int i = 0; i < holder_bins; i++ )      {
        histo_edges[foundBin+i] = bin_holder->GetBinLowEdge(i+2);
        bin_contents[foundBin+i] = 0;
      }
      delete bin_holder;
    } //end else
  
    histo_edges.resize(  foundBin + holder_bins + nBinsNew+1 ); 
    bin_contents.resize( foundBin + holder_bins + nBinsNew+1 );

    for( int i = 0; i <= nBinsNew; i++ )   {
      histo_edges[i+foundBin+holder_bins] = newh->GetBinLowEdge(i+1);
      bin_contents[i+foundBin+holder_bins] = newh->GetBinContent(i+1);
    }

    hsum->SetBins( histo_edges.size()-1, &histo_edges[0] );

    for ( int i=1; i<histo_edges.size(); ++i) {
      hsum->SetBinContent( i, bin_contents[i-1]);
    }

    if (opFolder) {
      //opFolder->Remove( hsum);
      //opFolder->Add( exth);
      hsum->SetDirectory( 0);
    }
  }

  if (verbose) {
    if (simpleAdd) cout << " adding counts";
    else cout << " tagging on bins";
    cout << endl;
  }

  return;
}

//==============================================================================

/** Most cuts are written out just once, but if a cut is different from
  * the most recently written version of a cut with the same name, then
  * another copy of the cut is written out.  Thus if a cut changes during
  * a series of runs, all versions of the cut will be present in the
  * summed file.
  */
void add (const TCutG *o, TCutG *&oOut, TFolder *opFolder) {

//------------------------------------------------------------------------------

  const char *name (o->GetName());
  bool write (false);

  if (!oOut) write = true;
  else {

    Int_t n (o->GetN());
    if (n != oOut->GetN()) write = true;

    else {
      double x1, x2, y1, y2;
      for (Int_t i=0; i<n; ++i) {

        o   ->GetPoint( i, x1, y1);
        oOut->GetPoint( i, x2, y2);

        if ((x1 != x2) || (y1 != y2)) write = true;
      }
    }
  }

  if (write) {

    if (verbose) {
      if (oOut) cout << " changed";
      cout << " TCutG" << endl;
    }

    if (!opFolder) {

      oOut = const_cast<TCutG *>( o);
      o->Write( name, TObject::kSingleKey);

    } else {

      TCutG *clone = static_cast<TCutG *>( o->Clone());
      if (oOut) opFolder->Add( clone);
      oOut = clone;
    }
  } else if (verbose) cout << endl;

  return;
}

//==============================================================================

/** for most objects, we keep just the first version.
  */
void add (const TObject *o, TObject *&oSum, TFolder *opFolder) {

//------------------------------------------------------------------------------

  const char *name (o->GetName());

  if (!oSum) {
    if (verbose) cout << " saving TObject" << endl;
    if (!opFolder) o->Write( name, TObject::kSingleKey);
  } else {
    if (verbose) cout << endl;
  }

  return;
}

//==============================================================================

/** create the new directory and then start adding its contents
  */
void add (TDirectory *dir, TDirectory *&sumDir,
          TFolder *opFolder, const TString &prefix) {

//------------------------------------------------------------------------------

  TDirectory *currentDir (gDirectory);
  TString dirName (dir->GetName());
  if (verbose) cout << " scanning TDirectory" << endl;
  TString newPrefix (prefix + dirName + "/");

  if (!sumDir) sumDir = gDirectory->mkdir( dirName);
  sumDir->cd();

  TIter i (dir->GetListOfKeys());
  while (TKey *k = static_cast<TKey *>( i())) {

    TObject *o (dir->Get( k->GetName()));
    addObject( o, newPrefix, opFolder);
  }

  currentDir->cd();

  return;
}

//==============================================================================

/** create a new folder and then start adding its contents
  */
void add (const TFolder *folder, TFolder *&sumFolder,
          TFolder *parentFolder, const TString &prefix) {

//------------------------------------------------------------------------------

  if (verbose) cout << " scanning TFolder" << endl;
  const char *name (folder->GetName());
  TString newPrefix (prefix + name + "/");

  if (!sumFolder) sumFolder = new TFolder (name, name);
  if (!histosFolder) histosFolder = sumFolder;

  TIter i (folder->GetListOfFolders());
  while (TObject *o = i()) addObject( o, newPrefix, sumFolder);

  return;
}

//==============================================================================

int main (int argc, char **argv) {

//------------------------------------------------------------------------------

  if (argc < 3) {
    cerr << argv[0] << ": out_root_file in_root_file1 in_root_file2 ..." << endl;
    return 1;
  }

  TROOT root ("histoadd", "histoadd");
  root.SetBatch();

  TString opFileName (argv[1]);
  TFile *opFile (TFile::Open( opFileName, "RECREATE"));

  if (!opFile) {
    cerr << argv[0] << ": unable to open file: " << argv[1] << endl;
    return 1;
  }
  --argc;
  ++argv;
... 95 more lines ...
  417   21 Nov 2007 Konstantin OlchanskiForumODBv3, second try - Midas on a x86_64 - incompatible with x86_32
These changes to make 32-bit and 64-bit ODB binary compatible with each other are now commited to midas svn, revision 4080.

Starting with this revision, ODB version changes from 2 to 3, breaking binary compatibility with previous releases.

Before upgrading to this revision, save your ODB as an XML file, *and* try to reload it, to catch any potential problems with parsing of the XML file.

Part of this commit are checks for sizes of important midas data structures stored in ODB shared memory - if the compiled size does not match the expected 
value, binary compatibility is broken and the program will abort - to avoid further corruption of ODB shared memory. This feature is only enabled on Linux and 
it is expected to trigger only on compiler malfunctions (generates wrong data size) and on accidental or intentional changes to important data structures in 
midas, to warn the user that they broke ODB binary compatibility.

K.O.

> > > > I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
> > > > in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
> > > 1) midas.h: remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
> > > 2) msystem.h: add 32-bit padding to CHN_STATISTICS and CHN_SETTINGS
> 
> I am now trying a different solution of to fixing the issue of CHN_STATISTICS and CHN_SETTINGS changing size.
> 
> 1) midas.h: (same as before) remove unused field "dispatch" from EVENT_REQUEST and bump DATABASE_VERSION from 2 to 3
> 2) msystem.h: in CHN_STATISTICS and CHN_SETTINGS change type of "event_limit" and "files_written" from int to "double".
> 
> Below are the latest ODBv3 meta patches:
> 
> ladd03:midas$ svn diff
> Index: include/midas.h
> ===================================================================
> --- include/midas.h     (revision 3844)
> +++ include/midas.h     (working copy)
>  /* has to be changed whenever binary ODB format changes */
> -#define DATABASE_VERSION 2
> +#define DATABASE_VERSION 3
> .........
>     short int trigger_mask;       /**< trigger mask                    */
>     INT sampling_type;            /**< GET_ALL, GET_SOME, GET_FARM     */
> -                                 /**< dispatch function */
> -   void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
>  } EVENT_REQUEST;
> 
> Index: include/msystem.h
> ===================================================================
> --- include/msystem.h   (revision 3845)
> +++ include/msystem.h   (working copy)
> -"Event limit = DWORD : 0",\
> +"Event limit = DOUBLE : 0",\
> ..................
> -"Files written = INT : 0",\
> +"Files written = DOUBLE : 0",\
> ..................
> -   DWORD event_limit;
> +   double event_limit;
> ..................
> -   INT files_written;
> +   double files_written;
> 
> K.O.
  420   04 Feb 2008 Robert PattieForumanalyzer crashes at high rates
I'm using midas to read data from a waveform digitizer at event rates of
10-30kHz. To accomplish this the digitizer is read via Block transfers and the
raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
250 physical events and at maximum 350kBytes.  In the analyzer modules I had
been analyzing the first physics event contained in a MIDAS event with no
problem.  Recently I tried to analyze all the physical events.  At low rates,
100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
 the analyzer keeps up for a  few seconds then seg faults with " 'shared object
read from target memory' has disappear; keeping it symbols".  Any suggestions as
to why the analyzer is crashing would be very helpful.

Thanks,

Robert     
  421   05 Feb 2008 Stefan RittForumanalyzer crashes at high rates
> I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes.  In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem.  Recently I tried to analyze all the physical events.  At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
>  the analyzer keeps up for a  few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols".  Any suggestions as
> to why the analyzer is crashing would be very helpful.

I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:

- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.

- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.

Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.

Best regards,

  Stefan
ELOG V3.1.4-2e1708b5