Back Midas Rome Roody Rootana
  Midas DAQ System, Page 57 of 154  Not logged in ELOG logo
    Reply  11 Jan 2007, Steve Hardy, Forum, Shared memory problems 
Thanks for your help.  I tried again and it got me back to the initial problem I had.
 The frontend will start, and the analyzer starts (complains about there not being a
last.root, but other than that it's fine), and then when starting mlogger, I get:

[odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
04
[odb.c:3666:db_get_key] invalid key handle
[midas.c:1970:cm_check_client] cannot delete client info
[odb.c:3666:db_get_key] invalid key handle
[midas.c:1970:cm_check_client] cannot delete client info
[odb.c:3666:db_get_key] invalid key handle


And it continues to shoot out error messages about invalid key handles until I kill
it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
Any other ideas?

> > Hello,
> > 
> > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > everything compiles fine, but when I go to run the frontend (using dio), I get
> > the following error message:
> > 
> > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> >  Shared memory is 14, program is 2
> > [midas.c:1763:cm_connect_experiment1] cannot open database
> > 
> > 
> > Any ideas on what the problem could be, or how to fix it?  
> 
> You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> front, so you need a 'ls -alg' to see it). Delete that file and try again.
    Reply  11 Jan 2007, Stefan Ritt, Forum, Shared memory problems 
That sounds like you mix versions: You have an old executable (maybe your mlogger) which
has been linked against the old midas version, but you create the ODB with the new
odbedit or frontend. The new version complains if it finds an ODB from a previous version
(the error you reported first), but an old program does not have that version check, so
it finds a different binary ODB structure and crashes.

> Thanks for your help.  I tried again and it got me back to the initial problem I had.
>  The frontend will start, and the analyzer starts (complains about there not being a
> last.root, but other than that it's fine), and then when starting mlogger, I get:
> 
> [odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
> 04
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> 
> 
> And it continues to shoot out error messages about invalid key handles until I kill
> it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
> Any other ideas?
> 
> > > Hello,
> > > 
> > > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > > everything compiles fine, but when I go to run the frontend (using dio), I get
> > > the following error message:
> > > 
> > > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> > >  Shared memory is 14, program is 2
> > > [midas.c:1763:cm_connect_experiment1] cannot open database
> > > 
> > > 
> > > Any ideas on what the problem could be, or how to fix it?  
> > 
> > You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> > front, so you need a 'ls -alg' to see it). Delete that file and try again.
Entry  22 Jan 2007, Carl Metelko, Forum, Midas on a x86_64 
Hi,
   has anyone managed to get midas to work on a x86_64 processor. I followed the
instructions for the 64-bit opteron but i am getting runtime error when trying
the examples.

 When running example/basic/odb_test I getting errors like

[odb.c:6818:db_get_record] struct size mismatch for "/Alarms/Alarms/Demo ODB"
(464 instead of 452)
[odb.c:6818:db_get_record] struct size mismatch for "/Alarms/Alarms/Demo ODB"
(464 instead of 452)
[midas.c:16576:al_check] Cannot get alarm record

Any ideas what is wrong?
    Reply  22 Jan 2007, Konstantin Olchanski, Forum, Midas on a x86_64 
>    has anyone managed to get midas to work on a x86_64 processor. I followed the
> instructions for the 64-bit opteron but i am getting runtime error when trying
> the examples.


We run 64-bit MIDAS on RHEL4 with 64-bit ROOT and everything generally works,
except for compatibility problems with 32-bit MIDAS.

Everything should work if you ensure that on your 64-bit machine everything is
compiled 64-bit (including the mserver - we always forget to install the correct version
to /usr/local/bin). 32-bit MIDAS programs running on other machine
can talk to 64-bit MIDAS via the mserver.

The big problem is that 64-bit and 32-bit ODB turned out to be incompatible - several data
fields have different sizes - and we did not decide yet how to fix this. Any fix will involve
breaking the binary ODB for one of the two platforms (we could break both, just to be fair, heh!)

>  When running example/basic/odb_test I getting errors like
> [odb.c:6818:db_get_record] struct size mismatch for "/Alarms/Alarms/Demo ODB" (464 instead of 452)

Yes, data size mismatch errors indicates that you mixed 32-bit and 64-bit MIDAS. Recompile everyting
as 64-bit, remove all the dot-ODB files, remove all the shared memory segments (ipcrm),
then everything should work.

K.O.
    Reply  26 Jan 2007, Carl Metelko, Forum, Midas on a x86_64 
I upgraded from 1.9.5 to the latest on SVN an it works fine
Entry  26 Jan 2007, Carl Metelko, Forum, Front end electronics broadcast data over ethernet, can midas read this in 
Hi,
   the system I'm building will have data read into the frontend nodes
via ethernet (optic). Is this possible?>
Entry  27 Feb 2007, Piotr Zolnierczuk, Forum, event builder scalability 
Hi there:
I have a question if there's anybody out there running MIDAS with event builder
that assembles events from more that just a few front ends (say on the order of
0x10 or more)?
Any experiences with scalability?

Cheers
 Piotr
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).
    Reply  27 Feb 2007, John M O'Donnell, Forum, event builder scalability 
At Los Alamos, we have 15+1 frontends - the 15 between them read about 2 or 3
TB/hour and reduce it to 1 to 5 GB/hour which is then sent to the mevb on a 17th
computer.  The 16th frontend handles deadtime issues and scalers (small data rate).

frontends are 1GHz pentium 3, and backend is 2.8GHz dual CPU with hyperthreading.
Interconnect is 100Mb ethernet from frontends to switch, and 1Gb ethernet from
switch to backend.

Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

John
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.
    Reply  02 Mar 2007, Kevin Lynch, Forum, event builder scalability 
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
> 
> Cheers
>  Piotr

Mulan (which you hopefully remember with great fondness :-) is currently running
around ten frontends, six of which produce data at any rate.  If I'm remembering
correctly, the event builder handles about 30-40MB/s.  You could probably ping Tim
Gorringe or his current postdoc Volodya Tishenko (tishenko@pa.uky.edu) if you want
more details.  Volodya solved a significant number of throughput related
bottlenecks in the year leading up to our 2006 run.
    Reply  03 Mar 2007, Piotr Zolnierczuk, Forum, event builder scalability 
Hi all,
thank you for all responses. 

It seems that there's no problem running MIDAS with event builder assembling
data from ~10 front-ends. How about ~100? One possible solution is to have a
multi-tiered architecture. 

The reason I am asking is that we are in the process of designing an Ethernet
based DAQ system with front-ends running on embedded computers (Linux/ARM
CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
I am open for advice/suggestions.

Thanks again
  Piotr
    Reply  03 Mar 2007, Stefan Ritt, Forum, event builder scalability 
> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture. 
> 
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.

The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.

Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building). 

Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.

Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.

I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.

Best regards,

  Stefan
Entry  10 Apr 2007, Dan Gastler, Forum, Interrupt code for VME? 
Hello, 
   Is there any example code for using midas for interrupt driven data
collection over VME? I am using a Struck SIS3100 PCI/VME setup to connect to my
VME crate.  Thanks,
  -Dan
Entry  09 May 2007, Carl Metelko, Forum, Splitting data transfer and control onto different networks 
Hi,
   I'm setting up a system with two networks with the intension of having
control info (odb, alarm) on the 192.168.0.x
and the frontend readout on 192.168.1.x

Is there any easy way of doing this?
I'm also trying to separate processes onto different machines, is there
any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same
machine?
Thanks,
       Carl Metelko
    Reply  09 May 2007, Stefan Ritt, Forum, Splitting data transfer and control onto different networks 
Hi Carl,

so far I did not experience any problems of running odb&alarm on the same link as
the readout, since the data goes usually frontend->backend, and all other messages
from backend->frontend. So before you do something complicated, try it first the
easy way and check if you have problems at all. So far I don't know anybody who
did separate the network interfaces so I have not description for that.

You can however separate processes. The easiest is to buy a multi-core machine. If
you want to use however separate computers, note that receiving events over the
network is not very optimized. So you should run mserver connected to the frontend
, the event builder and mlogger on the same machine. mhttpd can easily live on
another machine, but there is not much CPU consumption from that (unless you don't
plot long history trends). Running mserver, the event builder and mlogger on the
same machine (dual Xenon mainboard) gave me easily 50 MB/sec (actually disk
limited), and not both CPUs were near 100%. If you put any receiving process (like
the event builder or mlogger or the analyzer) on a separate machine, you might see
a bottlened on the event receiving side of maybe 10MB/sec or so (never really
tried recently).

Best regards,

  Stefan

> Hi,
>    I'm setting up a system with two networks with the intension of having
> control info (odb, alarm) on the 192.168.0.x
> and the frontend readout on 192.168.1.x
> 
> Is there any easy way of doing this?
> I'm also trying to separate processes onto different machines, is there
> any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same
> machine?
> Thanks,
>        Carl Metelko
    Reply  09 May 2007, Konstantin Olchanski, Forum, Splitting data transfer and control onto different networks 
> I'm setting up a system with two networks with the intension of having
> control info (odb, alarm) on the 192.168.0.x
> and the frontend readout on 192.168.1.x

We have some experience with this at TRIUMF - the TWIST experiment we run with the main data 
generating frontends on a private network - it is a supported configuration and it works fine.

We ran into one problem after adding some code to the frontends for stopping the run upon detecting 
some data errors - stopping runs requires sending RPC transactions to every midas client, so we had to 
add static network routes for routing packets between midas nodes on the private network and midas 
nodes on the normal network.

> I'm also trying to separate processes onto different machines, is there
> any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same machine?

mserver runs on the machine with the ODB shared memory by definition (think of it as "nfs server").

mhttpd typically runs on the machine with the ODB shared memory and until recently it had no code for 
connecting to the mserver. I recently fixed some of it, and now you can run mhttpd in "history mode" 
through the mserver. This is useful for offloading the generation of history plots to another cpu or 
another machine. In our case, we run the "history mhttpd" on the machine that holds the history files.

mlogger could be made to run remotely via the mserver, but presently it will refuse to do so, as it has 
some code that requires direct access to midas shared memory. If data has to be written to a remote 
filesystem, the consensus is that it is more efficient to run mserver locally and let the OS handle remote 
filesystem access (NFS, etc).

All other midas programs should be able to run remotely via the mserver.

K.O.
    Reply  14 May 2007, Carl Metelko, Forum, Splitting data transfer and control onto different networks 
Hi,
   thanks for the advice. We do have dual core Xeons so we'll try running
most things on the server. Unless it proves to be a problem we'll run all
MIDAS signals on one network and NFS etc on the other.

I do have one more query about running systems like Konstantin.
What we would like to do is have a 'mirror' server serving multiple
online monitoring machines so that the load on the server is constant nomatter
the demands on the mirror.

Is there a way to set this up? Or would it be best to have a remote analyser
making short (1min) root files shared with the online monitoring? 
Entry  07 Jun 2007, Randolf Pohl, Forum, crash when analyzing multiple runs offline crash.out
Hello,

I am having a problem with the root-based analyzer. It crashes when I try to 
analyze multiple runs OFFLINE using the "-i run%05d.mid -o result%05d.root -r 
1 2" feature.

I can reproduce the problem with the example experiment which comes with the 
MIDAS distribution:
Running the analyzer ONLINE works fine: One can start and stop runs one after 
the other, roody shows the histograms being reset and then filled again and 
such.

But OFFLINE, the analyzer crashes when trying to analyze the SECOND run in a 
sequence. So
./analyzer -i run%05d.mid -o result%05d.root -r 1 1   works (only run 1)
./analyzer -i run%05d.mid -o result%05d.root -r 1 3   dies on run 2
Output attached (I added printf's to the "init"-modules, but that's irrelevant 
here)


My own analyzer shows the same effect. There I got the impression the segfault 
happens on the first attempt to Fill/Reset/SetName etc. a histogram in the 2nd 
run. But with the midas example it looks like the analyzer finishes filling 
histos even for run 2, but then dies in eor.

Can you reproduce the problem?

I run MIDAS on an Intel Quadcore, 64 bit SuSE Linux 10.2.
pohl@lamb2:~/midas/examples/root> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)

(maybe 4.1.2 "PRERELEASE" is the problem? See message ID 344)

I am using midas rev. 3674 (April 19, 2007), but I got the impression there 
has since not been a change relevant to this problem. Please correct me if I 
am wrong, then I would try it with Rev HEAD.
(My version includes already the fix to the x86_64 segfault problem of message 
ID 337)


Best regards,

Randolf
    Reply  08 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline 
Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:

#5  <signal handler called>
#6  0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7  0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489

in the line 

    free(tree_struct.event_tree[i].branch);

If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():

      /* fill tree both online and offline */
      if (!exclude_all)
         et->tree->Fill();

Maybe one should put printf debugging statements in these places to see what's
going on.
ELOG V3.1.4-2e1708b5