22 Jan 2007, Konstantin Olchanski, Forum, Midas on a x86_64
|
> has anyone managed to get midas to work on a x86_64 processor. I followed the
> instructions for the 64-bit opteron but i am getting runtime error when trying
> the examples.
We run 64-bit MIDAS on RHEL4 with 64-bit ROOT and everything generally works,
except for compatibility problems with 32-bit MIDAS.
Everything should work if you ensure that on your 64-bit machine everything is
compiled 64-bit (including the mserver - we always forget to install the correct version
to /usr/local/bin). 32-bit MIDAS programs running on other machine
can talk to 64-bit MIDAS via the mserver.
The big problem is that 64-bit and 32-bit ODB turned out to be incompatible - several data
fields have different sizes - and we did not decide yet how to fix this. Any fix will involve
breaking the binary ODB for one of the two platforms (we could break both, just to be fair, heh!)
> When running example/basic/odb_test I getting errors like
> [odb.c:6818:db_get_record] struct size mismatch for "/Alarms/Alarms/Demo ODB" (464 instead of 452)
Yes, data size mismatch errors indicates that you mixed 32-bit and 64-bit MIDAS. Recompile everyting
as 64-bit, remove all the dot-ODB files, remove all the shared memory segments (ipcrm),
then everything should work.
K.O. |
26 Jan 2007, Carl Metelko, Forum, Midas on a x86_64
|
I upgraded from 1.9.5 to the latest on SVN an it works fine |
26 Jan 2007, Carl Metelko, Forum, Front end electronics broadcast data over ethernet, can midas read this in
|
Hi,
the system I'm building will have data read into the frontend nodes
via ethernet (optic). Is this possible?> |
27 Feb 2007, Piotr Zolnierczuk, Forum, event builder scalability
|
Hi there:
I have a question if there's anybody out there running MIDAS with event builder
that assembles events from more that just a few front ends (say on the order of
0x10 or more)?
Any experiences with scalability?
Cheers
Piotr |
27 Feb 2007, Stefan Ritt, Forum, event builder scalability
|
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?). |
27 Feb 2007, John M O'Donnell, Forum, event builder scalability
|
At Los Alamos, we have 15+1 frontends - the 15 between them read about 2 or 3
TB/hour and reduce it to 1 to 5 GB/hour which is then sent to the mevb on a 17th
computer. The 16th frontend handles deadtime issues and scalers (small data rate).
frontends are 1GHz pentium 3, and backend is 2.8GHz dual CPU with hyperthreading.
Interconnect is 100Mb ethernet from frontends to switch, and 1Gb ethernet from
switch to backend.
Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.
John |
27 Feb 2007, Stefan Ritt, Forum, event builder scalability
|
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.
I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores. |
02 Mar 2007, Kevin Lynch, Forum, event builder scalability
|
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?
>
> Cheers
> Piotr
Mulan (which you hopefully remember with great fondness :-) is currently running
around ten frontends, six of which produce data at any rate. If I'm remembering
correctly, the event builder handles about 30-40MB/s. You could probably ping Tim
Gorringe or his current postdoc Volodya Tishenko (tishenko@pa.uky.edu) if you want
more details. Volodya solved a significant number of throughput related
bottlenecks in the year leading up to our 2006 run. |
03 Mar 2007, Piotr Zolnierczuk, Forum, event builder scalability
|
Hi all,
thank you for all responses.
It seems that there's no problem running MIDAS with event builder assembling
data from ~10 front-ends. How about ~100? One possible solution is to have a
multi-tiered architecture.
The reason I am asking is that we are in the process of designing an Ethernet
based DAQ system with front-ends running on embedded computers (Linux/ARM
CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
I am open for advice/suggestions.
Thanks again
Piotr |
03 Mar 2007, Stefan Ritt, Forum, event builder scalability
|
> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture.
>
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.
The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.
Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building).
Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.
Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.
I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.
Best regards,
Stefan |
10 Apr 2007, Dan Gastler, Forum, Interrupt code for VME?
|
Hello,
Is there any example code for using midas for interrupt driven data
collection over VME? I am using a Struck SIS3100 PCI/VME setup to connect to my
VME crate. Thanks,
-Dan |
09 May 2007, Carl Metelko, Forum, Splitting data transfer and control onto different networks
|
Hi,
I'm setting up a system with two networks with the intension of having
control info (odb, alarm) on the 192.168.0.x
and the frontend readout on 192.168.1.x
Is there any easy way of doing this?
I'm also trying to separate processes onto different machines, is there
any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same
machine?
Thanks,
Carl Metelko |
09 May 2007, Stefan Ritt, Forum, Splitting data transfer and control onto different networks
|
Hi Carl,
so far I did not experience any problems of running odb&alarm on the same link as
the readout, since the data goes usually frontend->backend, and all other messages
from backend->frontend. So before you do something complicated, try it first the
easy way and check if you have problems at all. So far I don't know anybody who
did separate the network interfaces so I have not description for that.
You can however separate processes. The easiest is to buy a multi-core machine. If
you want to use however separate computers, note that receiving events over the
network is not very optimized. So you should run mserver connected to the frontend
, the event builder and mlogger on the same machine. mhttpd can easily live on
another machine, but there is not much CPU consumption from that (unless you don't
plot long history trends). Running mserver, the event builder and mlogger on the
same machine (dual Xenon mainboard) gave me easily 50 MB/sec (actually disk
limited), and not both CPUs were near 100%. If you put any receiving process (like
the event builder or mlogger or the analyzer) on a separate machine, you might see
a bottlened on the event receiving side of maybe 10MB/sec or so (never really
tried recently).
Best regards,
Stefan
> Hi,
> I'm setting up a system with two networks with the intension of having
> control info (odb, alarm) on the 192.168.0.x
> and the frontend readout on 192.168.1.x
>
> Is there any easy way of doing this?
> I'm also trying to separate processes onto different machines, is there
> any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same
> machine?
> Thanks,
> Carl Metelko |
09 May 2007, Konstantin Olchanski, Forum, Splitting data transfer and control onto different networks
|
> I'm setting up a system with two networks with the intension of having
> control info (odb, alarm) on the 192.168.0.x
> and the frontend readout on 192.168.1.x
We have some experience with this at TRIUMF - the TWIST experiment we run with the main data
generating frontends on a private network - it is a supported configuration and it works fine.
We ran into one problem after adding some code to the frontends for stopping the run upon detecting
some data errors - stopping runs requires sending RPC transactions to every midas client, so we had to
add static network routes for routing packets between midas nodes on the private network and midas
nodes on the normal network.
> I'm also trying to separate processes onto different machines, is there
> any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same machine?
mserver runs on the machine with the ODB shared memory by definition (think of it as "nfs server").
mhttpd typically runs on the machine with the ODB shared memory and until recently it had no code for
connecting to the mserver. I recently fixed some of it, and now you can run mhttpd in "history mode"
through the mserver. This is useful for offloading the generation of history plots to another cpu or
another machine. In our case, we run the "history mhttpd" on the machine that holds the history files.
mlogger could be made to run remotely via the mserver, but presently it will refuse to do so, as it has
some code that requires direct access to midas shared memory. If data has to be written to a remote
filesystem, the consensus is that it is more efficient to run mserver locally and let the OS handle remote
filesystem access (NFS, etc).
All other midas programs should be able to run remotely via the mserver.
K.O. |
14 May 2007, Carl Metelko, Forum, Splitting data transfer and control onto different networks
|
Hi,
thanks for the advice. We do have dual core Xeons so we'll try running
most things on the server. Unless it proves to be a problem we'll run all
MIDAS signals on one network and NFS etc on the other.
I do have one more query about running systems like Konstantin.
What we would like to do is have a 'mirror' server serving multiple
online monitoring machines so that the load on the server is constant nomatter
the demands on the mirror.
Is there a way to set this up? Or would it be best to have a remote analyser
making short (1min) root files shared with the online monitoring? |
07 Jun 2007, Randolf Pohl, Forum, crash when analyzing multiple runs offline
|
Hello,
I am having a problem with the root-based analyzer. It crashes when I try to
analyze multiple runs OFFLINE using the "-i run%05d.mid -o result%05d.root -r
1 2" feature.
I can reproduce the problem with the example experiment which comes with the
MIDAS distribution:
Running the analyzer ONLINE works fine: One can start and stop runs one after
the other, roody shows the histograms being reset and then filled again and
such.
But OFFLINE, the analyzer crashes when trying to analyze the SECOND run in a
sequence. So
./analyzer -i run%05d.mid -o result%05d.root -r 1 1 works (only run 1)
./analyzer -i run%05d.mid -o result%05d.root -r 1 3 dies on run 2
Output attached (I added printf's to the "init"-modules, but that's irrelevant
here)
My own analyzer shows the same effect. There I got the impression the segfault
happens on the first attempt to Fill/Reset/SetName etc. a histogram in the 2nd
run. But with the midas example it looks like the analyzer finishes filling
histos even for run 2, but then dies in eor.
Can you reproduce the problem?
I run MIDAS on an Intel Quadcore, 64 bit SuSE Linux 10.2.
pohl@lamb2:~/midas/examples/root> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)
(maybe 4.1.2 "PRERELEASE" is the problem? See message ID 344)
I am using midas rev. 3674 (April 19, 2007), but I got the impression there
has since not been a change relevant to this problem. Please correct me if I
am wrong, then I would try it with Rev HEAD.
(My version includes already the fix to the x86_64 segfault problem of message
ID 337)
Best regards,
Randolf |
08 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline
|
Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:
#5 <signal handler called>
#6 0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7 0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489
in the line
free(tree_struct.event_tree[i].branch);
If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():
/* fill tree both online and offline */
if (!exclude_all)
et->tree->Fill();
Maybe one should put printf debugging statements in these places to see what's
going on. |
09 Jun 2007, Randolf Pohl, Forum, crash when analyzing multiple runs offline
|
Hello Stefan,
tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should
presumably not be the case, since CloseRootOutputFile() frees the trees at eor().
------------------- output ---------------------------
lamb@lamb2:~/midas/root_3705> ./analyzer -e
exa_root -i /tmp/midas/examples/root/run%05d.mid -o /tmp/midas/run%05d.root -r 1 2
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
book_ttree: tree_struct.n_tree = 1
book_ttree: tree_struct.n_tree = 2
Set run number 1 in ODB
Load ODB from run 1...OK
/tmp/midas/examples/root/run00001.mid:2722 /tmp/midas/run00001.root:2720 events,
0.21s
book_ttree: tree_struct.n_tree = 3 <<---- !!!!
book_ttree: tree_struct.n_tree = 4
Set run number 2 in ODB
Load ODB from run 2...OK
/tmp/midas/examples/root/run00002.mid:2347 /tmp/midas/run00002.root:2345 events,
0.18s
*** Break *** segmentation violation
----------------- \output ----------------------------
Adding this one line fixes the segfault problem for the root example expt.
----------------- code -------------------------
lamb@lamb2:/data/software/midas/midas_3705/src/src> svn diff mana.c
Index: mana.c
===================================================================
--- mana.c (revision 3705)
+++ mana.c (working copy)
@@ -1496,6 +1496,7 @@
/* delete event tree */
free(tree_struct.event_tree);
tree_struct.event_tree = NULL;
+ tree_struct.n_tree = 0;
// go to ROOT root directory
gROOT->cd();
---------------- \code ---------------------------
Please check if this gives the intended behaviour. I am not very familiar with the
midas internals.
Unfortunately my own analyzer's segfault problem is not solved by this patch. I
guess I have to keep searching for a bug on my side..... :-)
Cheers,
Randolf |
10 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline
|
> tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should
> presumably not be the case, since CloseRootOutputFile() frees the trees at eor().
Yes this indeed a bug. I applied your change and committed the new code. |
11 Jun 2007, Randolf Pohl, Forum, crash when analyzing multiple runs offline
|
Hello again,
just for the record, in case somebody else runs into the same problem...
I have hunted down "my" segfault problem to the fact that I book histograms not
in <module>_init, but in <module>_bor. I have to do so, because only in bor do I
know which histograms to book, as this information comes from the ODB (booking
only histograms for CAMAC modules which were set to "read" in the ODB). The core
dump happens on the first access (->Fill, ->SetName,...) of one of these histos
in the 2nd run analyzed offline ("./analyzer -r n m").
In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module
bor() functions go to the output file", and then does a gManaOutputFile->cd();
Consequently, the histograms vanish after the file is closed, therefore the
segfault when trying to access them in the 2nd run. (I keep track of existing
histograms, only booking the missing histos in bor.)
The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with
TFolders and booking the histogram.
I do, however, not really understand the intention why histos booked in bor() go
to only the file, whereas histos booked in init() go to memory. Could you please
comment briefly? Maybe I missed the most important point. And what about online
mode, should this work?
Thanks a lot in advance,
Randolf |
|