Back Midas Rome Roody Rootana
  Midas DAQ System, Page 17 of 150  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  26 Feb 2007, Stefan Ritt, Info, RFC- support for writing to removable hard disk storage 
In the MEG experiment, we simply installed 100TB of RAID disks and don't need to change anything Wink

But seriously, you are right that such a system might be beneficial. I propose to extend the current logger code to switch disks. In the current tr_start() funciton in mlogger, the code checks for "subdir_format" to create separate subdirectories like once per week. One could extend this code in the following way:

- Add an array of strings and name it "Path", such as

/dev/sda1/datadir/
/dev/sdb1/datadir/

- On each stop of the run, check if the current disk has enough space for one more run. Take either the "Byte limit" of that channel, or the actual size of the last run and multiply it by two or so. If the disk is "almost full", switch to the next array element in "Path". Append the file name, such as "/dev/sda1/datadir/run1234.mid" and put this into "Current filename" as a feedback for the user. Now write to the new disk/file.

- Add as string like "Execute on switch", which gets called after you switched to the next disk. This shell script can then handle the un-mounting of the full disk, notify the user etc. This is similar to the "/Programs/Execute on start run" in the ODB, but it gets only called if you switch the disk.
Entry  26 Feb 2007, Stefan Ritt, Info, Usage of event channel for improved throughput 
Starting from SVN revision 3642, sending events from the front-end has been revised.

Since long time ago, there is a special TCP socket established between any front-end and the mserver which can be used to bypass the midas RPC layer completely and purely send events. There was a #define USE_EVENT_CHANNEL but to my knowledge nobody used it.

While optimizing data throughput for the MEG experiment, I revisited this mechanism and got it finally working. Here are some benchmark tests made with the produce program on two dual-CPU machines running on Gigabit Ethernet:

Using normal RPC socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40          3            22                   100                
  1000         44            25                   100
100000        101            14                    50

Using new event socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40         12            100                   34                
  1000         99            58                    59
100000        101            14                    43

As can be seen, the CPU load on the server drops significantly for smaller events since the processing time per event is reduced. If the transfer was limited by the server, the throughput goes up significantly. For large events the bottleneck on the server side is the memcpy of events, so no big improvement is visible. The saved CPU time however can be used to analyze more events for example.

The event socket is now enabled by default in the front-end by setting
rpc_mode = 1

in mfe.c and should be checked carefully in various experiments. There is a small chance that events get stuck in the buffer cache on the server side at the end of the run, in which case they would show up as the first events of the next run. I know that this problem happened in some experiment before, but that must have been unrelated to the rpc_mode. So please check again and report any problem with the new rpc_mode.
Entry  26 Feb 2007, Stefan Ritt, Info, Fragmented polled events 
Fragmented polled events have been implemented in SVN revision 3625.
Fragmentation is a method of breaking down large (>MB) events into smaller
pieces and send them through the shared memory buffers, reassembling them at the
output. In the past this was only possible for periodic events (such as large
histograms read out once every few seconds), but now this is also possible for
polled events.
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.
    Reply  03 Mar 2007, Stefan Ritt, Forum, event builder scalability 
> It seems that there's no problem running MIDAS with event builder assembling
> data from ~10 front-ends. How about ~100? One possible solution is to have a
> multi-tiered architecture. 
> 
> The reason I am asking is that we are in the process of designing an Ethernet
> based DAQ system with front-ends running on embedded computers (Linux/ARM
> CPU/Xilinix FPGA) and MIDAS is one of my options as a DAQ framework.
> I am open for advice/suggestions.

The event builder is a standalone application not part of the "midas core". It
receives data from N producers and combines the fragments into events based on
their serial number as a dedicated process. If it would become a bottleneck, it
can simply be redesigned and optimized. I made currently good experience with
multi-threaded applications running on multi-core CPUs. Implementing your
multi-tiered architecture as a multi-threaded event builder, where each of ten
threads receives data from ten front-ends, combines them and passes them to the
"collector thread" would make sense to me. Between the threads you can pass data
with many GB/sec, as compared to an ethernet-based architecture. I currently
implemented the rb_xxx functions inside midas.c which lets you pass data between
threads on a zero-copy basis.

Inside the core functions of midas there is no limitations whatsoever. All
counters etc. are 32-bit, so you can run 2^32 data consumers etc. You will first
hit the OS process limit. What I'm more concerned is your network bandwidth. If
you run 100 front-ends each with more than 1MB/sec, you would hit the 1GBit limit
of your network card. If you put more network interfaces, you will hit the disk
I/O limit which is around 100-200MB/sec even on larger RAID1 disk arrays (unless
you do data compression during event building). 

Another limit I see is the run transition. On each start/stop of a run, the
process which wants to start/stop the run has to contact all producers via a TCP
connection. Opening 100 TCP connection will take maybe 10-30 seconds, which is not
very convenient. A multi-threaded approach will help, but this is not (yet)
implemented, maybe you would have to do it yourself.

Another approach would be that you put the event building "in front of midas". All
your front-ends run a specific protocol outside of midas. They send their data to
a collecting process which acts as a single front-end to midas. So in the midas
framework you see only a single front-end, which gets it's data not from hardware,
but from 100 other nodes. This way you can optimize the protocol between your
front-end nodes and the collector process for your application. Run transitions
can be done through multicast UDP messages for example, which will even work with
1000 front-ends. But you have to implement that yourself.

I would start with the first approach: Taking the out-of-the box midas, see how
far I get. If you have access to a normal linux cluster, you can simply run ten
dummy front-ends on each of ten nodes, thus simulating 100 front-ends and see how
far you get. If the event builder is the bottle neck, do an optimization or
redesign. If the run transitions become your bottle neck, switch to method two. In
both ways you can utilize the downstream part of midas, like the logger, the
history system, etc. so you would still gain a lot compared to a design from scratch.

Best regards,

  Stefan
    Reply  15 Mar 2007, Stefan Ritt, Info, mhdump: a standalone MIDAS history dump utility 
> I hope people find this program useful. If you have any feedback (patches, bug
> reports, requests for improvements), please post them as replies to this forum
> message.

I wouldn't mind putting this into the midas distribution. Put it under utils/, add
an entry to the Makefile, and fix that warning:


mhdump.cxx: In function `int readHstFile(FILE*)':
mhdump.cxx:161: warning: comparison between signed and unsigned integer expressions
Entry  03 Apr 2007, Stefan Ritt, Info, Switch to Visual C++ 2005 under Windows 
I had to switch to Visual C++ 2005 under Windows. This required the upgrade of
all project files under \midas\nt\ and fixing a few warnings, since the new
compiler is more picky. 

Note that in order to use most C RTL funcitons, you have to define two
preprocessor statements:

#define _CRT_SECURE_NO_DEPRECATE
#define _CRT_NONSTDC_NO_DEPRECATE 

either at the beginning of a file (before you include stdio.h), or via the
project property page under C/C++ / Preprocessor / Preprocessor Definitions,
where you also have the WIN32 and the _CONSOLE definitions. I adapted all
project files in the distribution, but for all local projects this has to be
done additionally.
    Reply  03 Apr 2007, Stefan Ritt, Bug Fix, SIGABT of "mlogger" and possible fix 

Exaos Lee wrote:
Version: svn 3658
Code: mlogger.c
Problem: After executation of "mlogger", a "SIGABT" appears.
Compiler: GCC 4.1.2, under Ubuntu Linux 7.04 AMD64
Possible fix:
Change the code in "mlogger.c" from
   /* append argument "-b" for batch mode without graphics */
   rargv[rargc] = (char *) malloc(3);
   rargv[rargc++] = "-b";

   TApplication theApp("mlogger", &rargc, rargv);

   /* free argument memory */
   free(rargv[0]);
   free(rargv[1]);
   free(rargv);
to
   /* append argument "-b" for batch mode without graphics */
   rargv[rargc] = (char *) malloc(3);
   rargv[rargc++] = "-b";

   TApplication theApp("mlogger", &rargc, rargv);

   /* free argument memory */
   free(rargv[0]);
   /*free(rargv[1]);*/
   free(rargv);

I think, it might be the problem of 'rargv[rargc++]="-b"'.


Actually the line
rargv[rargc] = (char *) malloc(3);

needs also to be removed, since rargv[1] points to "-b" which is some static memory and does not need any allocation. I committed the change.
    Reply  09 May 2007, Stefan Ritt, Forum, Splitting data transfer and control onto different networks 
Hi Carl,

so far I did not experience any problems of running odb&alarm on the same link as
the readout, since the data goes usually frontend->backend, and all other messages
from backend->frontend. So before you do something complicated, try it first the
easy way and check if you have problems at all. So far I don't know anybody who
did separate the network interfaces so I have not description for that.

You can however separate processes. The easiest is to buy a multi-core machine. If
you want to use however separate computers, note that receiving events over the
network is not very optimized. So you should run mserver connected to the frontend
, the event builder and mlogger on the same machine. mhttpd can easily live on
another machine, but there is not much CPU consumption from that (unless you don't
plot long history trends). Running mserver, the event builder and mlogger on the
same machine (dual Xenon mainboard) gave me easily 50 MB/sec (actually disk
limited), and not both CPUs were near 100%. If you put any receiving process (like
the event builder or mlogger or the analyzer) on a separate machine, you might see
a bottlened on the event receiving side of maybe 10MB/sec or so (never really
tried recently).

Best regards,

  Stefan

> Hi,
>    I'm setting up a system with two networks with the intension of having
> control info (odb, alarm) on the 192.168.0.x
> and the frontend readout on 192.168.1.x
> 
> Is there any easy way of doing this?
> I'm also trying to separate processes onto different machines, is there
> any way to not have mserver,mhttpd and (mlogger,mevt) all run on the same
> machine?
> Thanks,
>        Carl Metelko
    Reply  22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load 
The reason to call analyzer_init in odb_load is the following:

Assume you run the analyzer offline, analyzing many files in series. Then assume
that you have /Experiment/Run Parameters, which is actively used by the analyzer
(like beam settings etc.). In this case you do a db_open_record() to map
/Experiment/Run Parameters to the exp_param C structure. For this mapping to work,
the ODB structure and the C structure have to be exactly the same. Now assume that
you changed your run parameters over time, like you added some comment later. Now
you want to analyzer several runs, some before and some after the modification.
Both sets have a different structure in /Experiment/Run Parameters, which is a
problem, since the compiled analyzer can only have a single C structure. My "poor"
solution was to call analyzer_init after each loading of the ODB from the *.mid
file. The db_create_record() call matches the C structure to the ODB structure by
modifying the ODB structure if necessary. So if you added one parameter later, this
(modified) structure gets loaded by odb_load, but then it gets adjusted in
analyzer_init().

I understand now that this case might not happen so often, and you are more
bothered by the fact that analyzer_init gets called several time. There must
however be a hook for offline analysis that the user code can correct the ODB
structure. So I propose to add a flag to analyzer_init, such as

INT analyzer_init(BOOL bFirst)
{
}

If bFirst equals TRUE, the function got called from mana_init(), if FALSE, it got
called from odb_load. Then you can put code like

INT analyzer_init(BOOL bFirst)
{
   if (bFirst) {
      p = malloc()
      ...
   }
}

If you agree, I will modify the code and commit the change.

- Stefan
    Reply  22 May 2007, Stefan Ritt, Bug Report, analyzer_init called by odb_load 
> Thanks for the quick reply, Stefan.
> 
> Please don't change anything in the code unless you find it really important.
I guess 
> changing the analyzer_init prototype will break a lot of code out there?
> 
> In fact, I think I do understand this behavior now.
> And even without your suggested fix there is a simple workaround: I add a static 
> variable to my analyzer_init.cxx file, and do something similar to your bFirst
fix.
> 
> In conclusion, commit your fix if it does not harm others. Postpone this
commit to a 
> future new version of midas which breaks a lot of things anyway...
> 
> A last question, for me to understand: Why not call db_open_record in 
> ana_begin_of_run then?

I fully agree with you that db_open_record would better go into ana_begin_of_run
(and
analyzer_init not being called in odb_load), and I fully agree with you that
changing the
code would break many experiments. ;-)

So I guess we leave it as it is right now as you suggested.
    Reply  08 Jun 2007, Stefan Ritt, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access 
First I have a general question: mserver is started through xinetd, and xinetd has
the options "only_from" and "no_access". This is equivalent to the tcp_wrapper
functionality. Why not using this? It's possible without changing anything in midas.
Or am I missing anything?

If that does not work for some reason, here are some thought from my side:

- We don't have much of a problem with malicious hackers, but with institute-wide
security checking. Hackers are only interested in mechanisms where they can obtain
control over thousands of machines (like breaking ssh etc.). The few midas machines
are not a good target for them. But even at PSI there are security scans, which try
to connect to various ports and can crash systems, so I agree that something needs
to be done.

- Whatever we do, it should be consistent on linux and windows and should not rely
on external packages, since I don't want to get into dependencies there.

- I see that both having the security information in the ODB or having them in
external files can be advantageous. There is certainly the aspect of restoring old
ODBs, or keeping several experiments (ODB) on one machine consistent. On the other
hand storing data in the ODB might me liked by people who are familiar with this
concept, and want to change things though mhttpd for example.

- Having said all that, it would make sense to me to write a simple central routine
access_allowed(), which takes the IP address of a remote client wanting to connect,
and return true or false. This routine should read /etc/hosts.allow, /etc/hosts.deny
and interprete it, but only the section for midas, and maybe only a subset of the
functionality there (we probably don't need NIS netgroup names, external files and
spawn commands there). If the files /etc/hosts.x do not contain anything about midas
or are not preset (Windows!), the routine should look in the ODB under
/experiment/security/mserver/hosts.allow and /experiment/security/mserver/hosts.deny
and use that information instead of the files.

- We probably need different mechanisms for mserver and for mhttpd. The mserver
clients are usually only a few programs like the front-ends, while one may want to
control an experiment over mhttpd from much more machines. So we should establish a
second ACL for mhttpd. The already present "/experiment/security/allowed hosts" for
mhttpd should be converted into "/Experiment/Security/mhttpd/hosts.allow" and the
function access_allowed() should be used to interprete that, so that we only need to
write it once.
    Reply  08 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline 
Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:

#5  <signal handler called>
#6  0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7  0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489

in the line 

    free(tree_struct.event_tree[i].branch);

If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():

      /* fill tree both online and offline */
      if (!exclude_all)
         et->tree->Fill();

Maybe one should put printf debugging statements in these places to see what's
going on.
    Reply  10 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline 
> tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should 
> presumably not be the case, since CloseRootOutputFile() frees the trees at eor().

Yes this indeed a bug. I applied your change and committed the new code.
    Reply  11 Jun 2007, Stefan Ritt, Forum, crash when analyzing multiple runs offline 
> I have hunted down "my" segfault problem to the fact that I book histograms not 
> in <module>_init, but in <module>_bor. I have to do so, because only in bor do I 
> know which histograms to book, as this information comes from the ODB (booking 
> only histograms for CAMAC modules which were set to "read" in the ODB). The core 
> dump happens on the first access (->Fill, ->SetName,...) of one of these histos 
> in the 2nd run analyzed offline ("./analyzer -r n m").
> 
> In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module 
> bor() functions go to the output file", and then does a gManaOutputFile->cd();
> Consequently, the histograms vanish after the file is closed, therefore the 
> segfault when trying to access them in the 2nd run. (I keep track of existing 
> histograms, only booking the missing histos in bor.)
> 
> The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with 
> TFolders and booking the histogram.

ROOT has the strange concept of "current working directory", coming from the fact that
ROOT was written by Fortran and PAW people, being used to have directories and
subdirectories with a persistent state (not really object-oriented style). So one can
set the "current working directory" to the root (=memory) with gROOT->cd() and to a
subdirectory which will later be written into a file with gManaOutputFile->cd(). If
you do the first one, the histograms are created only in memory, while in the later
case they are also created in memory, but will later be written into the output file
in the routine CloseRootOutputFile(). So if you do a gROOT->cd() in <module>_bor,
these histograms will not be written to file. So I guess your solution is not a real
solution.

> I do, however, not really understand the intention why histos booked in bor() go 
> to only the file, whereas histos booked in init() go to memory. Could you please 
> comment briefly? Maybe I missed the most important point. And what about online 
> mode, should this work?

The root output file is opened in bor() and closed in eor(). For a histo to go to the
file, it must be booked after opening the file, that is after bor() in mana.c and
therefore after the gManaOutputFile->cd().

I agree with you that the current scheme is not satisfactory. When running online, you
want to keep the histos between the runs. When running offline, you delete and
re-create them for each run. It would be better to create all histos online and
offline under gROOT, and just copy them to gManaOutputFile before writing them. I have
to admit that this root code was never really used in a productive environment for
offline analysis, so there might be some issues here and there. Some people write
directly root files in the logger, and then do a root-only (without the midas
analyzer) analysis. Unfortunately I'm busy these days and cannot write any code right
now. But if you feel like something should be modified in mana.c, please send it to me
and I can incorporate it into the standard code.
    Reply  02 Jul 2007, Stefan Ritt, Bug Fix, mscb, musbstd fixed on Linux, MacOS 

KO wrote:
There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know.


I can confirm that revision 3713 still works under Windows.
    Reply  13 Jul 2007, Stefan Ritt, Forum, Midas on a x86_64 - incompatible with x86_32 
> The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
> the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
> to these data structures and break the 32-bit ODB).
> 
> I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
> environment because out latest host computers all run 64-bit Linux while all our VME processors and all
> older machines can only run 32-bit code; this incompatibility causes us weekly headaches.
> 
> Any thoughts?

I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
This ensures to keep the native 64-bit packing, which probably will be somehow optimized for 64-bit
architectures and therefore might be a bit faster in the long run, when most systems are 64-bit. After this
has been implemented and well tested, I would go with an official announcement of the 32-bit break in the ODB,
and release a new version, so people can update from a TAR file if necessary. Existing ODB's can be converted
to the new format by exporting them in XML form and importing them again after the upgrade.
Entry  26 Jul 2007, Stefan Ritt, Info, Change of pointer type in mvmestd.h 
I had to change the pointer type of mvme_read and mvme_write to (void *) instead
to (mvme_locaddr_t *) to avoid warnings under 64-bit linux. Please adjust your
VME drivers if necessary.

- Stefan
    Reply  03 Sep 2007, Stefan Ritt, Bug Report, how to handle end of run? 
> I am having problems with handling the end-of-run situation in my midas
> frontend. I have a device that continuously sends data (over USB) and I read
> this data in my "read_event" function.
> 
> Everything is good until the end-of-run, at which time this happens:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls my end_of_run()
> 2) here, I tell the device "please stop sending data"
> 3) all seems good, but wait!!!
> 4) there is all this data generated between step 0 and step 2 still sitting
> inside the device and it has nowhere to go: the run is ended, the output file is
> closed, my read_event() will never be called ever again (well, until the next run).
> 
> It seems to me mfe.c needs to have one more function, something like
> "pre_end_of_run()" that works like this:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls pre_end_of_run(), here I tell the device to stop sending data
> 2) mfe.c calls read_event() for the very last time, to give me the opportunity
> to read and send away any data I still may have.
> 3) mfe.c calls the end_of_run(). The run is truely finished.
> 
> Any thoughts?

You can achieve the desired functionality without changing mfe.c:

0) mfe.c calls read_event
1) mfe.c calls end_of_run. Your end_of_run tells the device to stop data and flushes
the remaining data. At this point you have to re-make actually a part of the mfe.c
functionality, but basically you need a bm_compose_event() and a bm_send_event(), so
just a few lines of code. If you want to have the final event number right in your
equipment, you also need to update eq->events_sent accordingly. 

Given the fact that 99% of the experiments do not need this functionality, I propose
that we keep mfe.c and you add the few lines of code into your user part of the
specific frontend.

Stefan
ELOG V3.1.4-2e1708b5