Back Midas Rome Roody Rootana
  Midas DAQ System, Page 32 of 50  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
Entry  24 Jan 2013, Konstantin Olchanski, Info, Compression benchmarks 
In the DEAP experiment, the normal MIDAS mlogger gzip compression  is not fast enough for some data 
taking modes, so I am doing tests of other compression programs. Here is the results.

Executive summary:

fastest compression is no compression (cat at 1800 Mbytes/sec - memcpy speed), next best are:
"lzf" at 300 Mbytes/sec and  "lzop" at 250 Mbytes/sec with 50% compression
"gzip -1" at around 70 Mbytes/sec with around 70% compression
"bzip2" at around 12 Mbytes/sec with around 80% compression
"pbzip2", as advertised, scales bzip2 compression linearly with the number of CPUs to 46 Mbytes/sec (4 
real CPUs), then slower to a maximum 60 Mbytes/sec (8 hyper-threaded CPUs).

This confirms that our original choice of "gzip -1" method for compression using zlib inside mlogger is 
still a good choice. bzip2 can gain an additional 10% compression at the cost of 6 times more CPU 
utilization. lzo/lzf can do 50% compression at GigE network speed and at "normal" disk speed.

I think these numbers make a good case for adding lzo/lzf compression to mlogger.

Comments about the data:

- time measured is the "elapsed" time of the compression program. it excludes the time spent flushing 
the compressed output file to disk.
- the relevant number is the first rate number (input data rate)
- test machine has 32GB of RAM, so all I/O is cached, disk speed does not affect these results
- "cat" gives a measure of overall machine "speed" (but test file is too small to give precise measurement)
- "gzip -1" is the recommended MIDAS mlogger compression setting
- "pbzip2 -p8" uses 8 "hyper-threaded" CPUs, but machine only has 4 "real" CPU cores

<pre>
cat                 : time   0.2s, size    431379371    431379371, comp   0%, rate 1797M/s 1797M/s
cat                 : time   0.6s, size   1013573981   1013573981, comp   0%, rate 1809M/s 1809M/s
cat                 : time   1.1s, size   2027241617   2027241617, comp   0%, rate 1826M/s 1826M/s

gzip -1             : time   6.4s, size    431379371    141008293, comp  67%, rate  67M/s  22M/s
gzip                : time  30.3s, size    431379371    131017324, comp  70%, rate  14M/s   4M/s
gzip -9             : time  94.2s, size    431379371    133071189, comp  69%, rate   4M/s   1M/s

gzip -1             : time  15.2s, size   1013573981    347820209, comp  66%, rate  66M/s  22M/s
gzip -1             : time  29.4s, size   2027241617    638495283, comp  69%, rate  68M/s  21M/s

bzip2 -1            : time  34.4s, size    431379371     91905771, comp  79%, rate  12M/s   2M/s
bzip2               : time  33.9s, size    431379371     86144682, comp  80%, rate  12M/s   2M/s
bzip2 -9            : time  34.2s, size    431379371     86144682, comp  80%, rate  12M/s   2M/s

pbzip2 -p1          : time  34.9s, size    431379371     86152857, comp  80%, rate  12M/s   2M/s (1 CPU)
pbzip2 -p1 -1       : time  34.6s, size    431379371     91935441, comp  79%, rate  12M/s   2M/s
pbzip2 -p1 -9       : time  34.8s, size    431379371     86152857, comp  80%, rate  12M/s   2M/s

pbzip2 -p2          : time  17.6s, size    431379371     86152857, comp  80%, rate  24M/s   4M/s (2 CPU)
pbzip2 -p3          : time  11.9s, size    431379371     86152857, comp  80%, rate  36M/s   7M/s (3 CPU)
pbzip2 -p4          : time   9.3s, size    431379371     86152857, comp  80%, rate  46M/s   9M/s (4 CPU)
pbzip2 -p4          : time  45.3s, size   2027241617    384406870, comp  81%, rate  44M/s   8M/s
pbzip2 -p8          : time  33.3s, size   2027241617    384406870, comp  81%, rate  60M/s  11M/s

lzop -1             : time   1.6s, size    431379371    213416336, comp  51%, rate 261M/s 129M/s
lzop                : time   1.7s, size    431379371    213328371, comp  51%, rate 249M/s 123M/s
lzop                : time   4.3s, size   1013573981    515317099, comp  49%, rate 234M/s 119M/s
lzop                : time   7.3s, size   2027241617    978374154, comp  52%, rate 277M/s 133M/s
lzop -9             : time 176.6s, size    431379371    157985635, comp  63%, rate   2M/s   0M/s

lzf                 : time   1.4s, size    431379371    210789363, comp  51%, rate 299M/s 146M/s
lzf                 : time   3.6s, size   1013573981    523007102, comp  48%, rate 282M/s 145M/s
lzf                 : time   6.7s, size   2027241617    972953255, comp  52%, rate 303M/s 145M/s

lzma -0             : time  27s, size    431379371    112406964, comp  74%, rate  15M/s   4M/s
lzma -1             : time  35s, size    431379371    111235594, comp  74%, rate  12M/s   3M/s
lzma: > 5 min, killed

xz -0               : time  28s, size    431379371    112424452, comp  74%, rate  15M/s   4M/s
xz -1               : time  35s, size    431379371    111252916, comp  74%, rate  12M/s   3M/s
xz: > 5 min, killed
</pre>

Columns are:
compression program
time: elapsed time of the compression program (excludes the time to flush output file to disk)
size: size of input file, size of output file
comp: compression ration (0%=no compression, 100%=file compresses into nothing)
rate: input data rate (size of input file divided by elapsed time), output data rate (size of output file 
divided by elapsed time)

Machine used for testing (from /proc/cpuinfo):
Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
quad core cpu with hyper-threading (8 CPU total)
32 GB quad-channel DDR3-1600.

Script used for testing:

#!/usr/bin/perl -w

my $x = join(" ", @ARGV);

my $in  = "test.mid";
my $out = "test.mid.out";
my $tout = "test.time";

my $cmd = "/usr/bin/time -o $tout -f \"%e\" /usr/bin/time $x < test.mid > test.mid.out";

print $cmd,"\n";

my $t0 = time();
system $cmd;
my $t1 = time();

my $c = `cat $tout`;
print "Elapsed time: $c";

my $t = $c;

#system "/bin/ls -l $in $out";

my $sin  = -s $in;
my $sout = -s $out;

my $xt = $t1-$t0;
$xt = 1 if $xt<1;

print "Total time: $xt\n";

print sprintf("%-20s: time %5.1fs, size %12d %12d, comp %3.0f%%, rate %3dM/s %3dM/s", $x, $t, $sin, 
$sout, 100*($sin-$sout)/$sin, ($sin/$t)/1e6, ($sout/$t)/1e6), "\n";

exit 0;
# end

Typical output:

[deap@deap00 pet]$ ./r.perl lzf    
/usr/bin/time -o test.time -f "%e" /usr/bin/time lzf < test.mid > test.mid.out
1.27user 0.15system 0:01.44elapsed 99%CPU (0avgtext+0avgdata 2800maxresident)k
0inputs+411704outputs (0major+268minor)pagefaults 0swaps
Elapsed time: 1.44
Total time: 3
lzf                 : time   1.4s, size    431379371    210789363, comp  51%, rate 299M/s 146M/s

K.O.
    Reply  06 Feb 2013, Stefan Ritt, Info, Compression benchmarks 
I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a 
size of 5 GB. Here are the results:


cat                 : time   7.8s, size   4960156030   4960156030, comp   0%, rate 639M/s 639M/s

gzip -1             : time 147.2s, size   4960156030   2468073901, comp  50%, rate  33M/s  16M/s

pbzip2 -p1          : time 679.6s, size   4960156030   1738127829, comp  65%, rate   7M/s   2M/s (1 CPU)
pbzip2 -p8          : time  96.1s, size   4960156030   1738127829, comp  65%, rate  51M/s  18M/s (8 CPU)


As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does 
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives 
us 98 TB/year, and we save quite some disks.

Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:


local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active                          y
Type                            Disk
Filename                        |pbzip2>/megdata/run%06d.mid.bz2
Format                          MIDAS
Compression                     0
ODB dump                        y
Log messages                    0
Buffer                          SYSTEM
Event ID                        -1
Trigger mask                    -1
Event limit                     0
Byte limit                      0
Subrun Byte limit               0
Tape capacity                   0
Subdir format                   
Current filename                /megdata/run197090.mid.bz2
</pre>
Entry  28 Jan 2013, Robert Pattie, Forum, analyzer cannot connect to the statistics database 
I've managed to put the analyzer into state where it cannot connect to the 
statistics database.  The error message suggests another analyzer is connected.  
I've recompiled MIDAS and the user code, restarted the computer etc..., and the 
analyzer cannot connect.  If I run "odbedit -c clean", I can start the analyzer, 
but get the same error when exiting or starting a run.  I've commented out all the
user code in the analyzer.c and its associated analyzer module's, and read event
code in the frontend and nothing resolves this issue.  Any suggestion?

The output from attempting to run the analyzer is:

Connect to experiment nnbarxwnr...[odb.c:1013:db_open_database,ERROR] Removed ODB
client 'Analyzer', index 0 because process pid 31982 does not exists
Deleted entry '/System/Clients/31982' for client 'Analyzer' because it is not
connected to ODB
OK
Root server listening on port 9090...
Loading previous online histos from ./data/last.root
ss_mutex_wait_for: pthread_mutex_lock() returned errno 22 (Invalid argument),
aborting...


When attempting to clean up the Analyzer tree in the ODB I receive the message
:"deletion of key not allowed."  

It appears that running the analyzer sets the permissions of the Statistics tree of
my analyzer module into RWDE.  

Adding the following lines to my start up script eliminate the above problem:
odbedit -c clean
odbedit -c "chmod 7 Analyzer/"
odbedit -c "rm /Analyzer/fADCs/Statistics"

Now when starting a run the analyzer crashes with this error:analyzer:
src/midas.c:11443: rpc_execute: Assertion `return_buffer' failed.
Aborted (core dumped)

and the messages in the odb are :

[system.c:4295:recv_tcp,ERROR] header: recv returned 0, n_received = 0, unexpected
connection closure
[midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed, routine = "rc_transition",
host = "LANL-FADC-DAQ"
[midas.c:4130:cm_transition,ERROR] Could not start a run: cm_transition() status 503,
message 'Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ'
Deleted entry '/System/Clients/1001' for client 'Analyzer' because process pid 1001
does not exists
[midas.c:8893:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
LANL-FADC-DAQ
Run #180 start aborted
Error: Unknown error 503 from client 'Analyzer' on host LANL-FADC-DAQ

20:05:02 [Logger,INFO] Deleting previous file "./data/run00180.mid"

20:05:02 [ODBEdit,ERROR] [system.c:4295:recv_tcp,ERROR] header: recv returned 0,
n_received = 0, unexpected connection closure

20:05:02 [ODBEdit,ERROR] [midas.c:10042:rpc_client_call,ERROR] recv_tcp() failed,
routine = "rc_transition", host = "LANL-FADC-DAQ"

20:05:02 [ODBEdit,ERROR] [midas.c:4130:cm_transition,ERROR] Could not start a run:
cm_transition() status 503, message 'Unknown error 503 from client 'Analyzer' on host
LANL-FADC-DAQ'

20:05:02 [ODBEdit,INFO] Deleted entry '/System/Clients/1001' for client 'Analyzer'
because process pid 1001 does not exists

20:05:02 [ODBEdit,ERROR] [midas.c:8893:rpc_client_check,ERROR] Connection broken to
"Analyzer" on host LANL-FADC-DAQ

20:05:02 [ODBEdit,INFO] Run #180 start aborted
20:05:03 [mdump,INFO] Client 'Analyzer' on buffer 'SYSTEM' removed by cm_watchdog
because process pid 1001 does not exist
20:05:11 [mhttpd,INFO] Client 'Analyzer' (PID 1001) on database 'ODB' removed by
cm_watchdog (idle 10.1s,TO 10s)


Thanks,
Robert Pattie
    Reply  01 Feb 2013, Randolf Pohl, Forum, analyzer cannot connect to the statistics database 
The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
one you specified in /etc/exptab).
This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
fresh start.

Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
"rm .[A-Z]*.SHM" 
       Reply  01 Feb 2013, Stefan Ritt, Forum, analyzer cannot connect to the statistics database 
> The simplest thing is probably to delete all files .[A-Z]*.SHM in the odb directory (the
> one you specified in /etc/exptab).
> This wipes the ODB, shared memory and all the other obscure stuff, giving you a clean,
> fresh start.
> 
> Of course it wipes all the valuable stuff, too. That's why it's handy to sometimes open
> odbedit and "save odb_<yyyymmdd>.odb". You can reload the thing after such a fatal 
> "rm .[A-Z]*.SHM" 

Thanks Randolf for helping out, I was not in the office this week.

In addition of deleting the *SHM files, it's sometimes necessary to delete the shared memory. You do this with the 
command line tools

ipcs -m
ipcrm -m <shmid>


/Stefan
Entry  09 Jan 2013, wenliang li, Bug Report, Outputting ADC and TDC data into ROOT tree with the MIDAS SVN Revision:5347. 
Dear Midas Experts

I am Wenliang Li, a graduate student from University of Regina. Our group have
encountered some difficulty on outputting ADC and TDC data into ROOT tree with
the MIDAS SVN Revision: 5347.

Our Linux Distribution: Scientific Linux release 6.0 (Carbon)
ROOT Version:           ROOT 5.28
gcc version:            g++ (GCC) 4.4.4 20100726 (Red Hat 4.4.4-13)
kernel version:         2.6.32-279.19.1.el6.i686


I am using the given example $MIDASSYS/examples/experiment to generate some
data, and the issue is that the analyzer refuses to turn on the  ADC0 and TDC0
back switches. 

If the ADC and TDC banks are switched off, the analyzer will successfully output
the histograms but not the ROOT tree, and the Trigger and Scaler root trees are
completely empty.

With the same example experiment: $MIDASSYS/examples/experiment, this issue does
not occur on MIDAS SVN Revision: 4309.


The output error messages in the analyzer window are shown if the ADC and TDC
banks are switched to 1:

*************************
Connect to experiment ...OK
Root server listening on port 9090...
Loading previous online histos from /home/billlee/experiment/test_exp/last.root
Running analyzer online. Stop with "!"
Error in <TTree::Branch>: The pointer specified for ADC0 is not of a class known
to ROOT and (null) is not a known class
ROOT TTree rebooked
Error in <TTree::Branch>: The pointer specified for ADC0 is not of a class known
to ROOT and (null) is not a known class
Error in <TTree::Branch>: The pointer specified for TDC0 is not of a class known
to ROOT and (null) is not a known class
ROOT TTree rebooked
***********************
***************************



If I analyze the data with TDC and ADC bank switched set to be 1 :
$ analyzer -i runXXXXX.mid -o runXXXXX.root

I get the following error messages:


************************************************************************
************************************************************************


Root server listening on port 9090...
Running analyzer offline. Stop with "!"
Error in <TTree::Branch>: The pointer specified for ADC0 is not of a class known
to ROOT and (null) is not a known class
Error in <TTree::Branch>: The pointer specified for TDC0 is not of a class known
to ROOT and (null) is not a known class
Set run number 1 in ODB
Load ODB from run 1...OK

 *** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 2 (Thread 0x7f46c6853700 (LWP 10808)):
#0  0x0000003b63a0e84d in accept () from /lib64/libpthread.so.0
#1  0x0000003b64e370f4 in TUnixSystem::AcceptConnection(int) () from
/usr/lib64/root/libCore.so.5.28
#2  0x0000003b6647849c in TServerSocket::Accept(unsigned char) () from
/usr/lib64/root/libNet.so.5.28
#3  0x000000000040c50e in root_socket_server (arg=<value optimized out>) at
src/mana.c:5275
#4  0x00007f46c8dc513a in TThread::Function(void*) () from
/usr/lib64/root/libThread.so.5.28
#5  0x0000003b63a07851 in start_thread () from /lib64/libpthread.so.0
#6  0x0000003b62ee811d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f46c8b94720 (LWP 10800)):
#0  0x0000003b62eabfdd in waitpid () from /lib64/libc.so.6
#1  0x0000003b62e3e899 in do_system () from /lib64/libc.so.6
#2  0x0000003b62e3ebd0 in system () from /lib64/libc.so.6
#3  0x0000003b64e3da31 in TUnixSystem::StackTrace() () from
/usr/lib64/root/libCore.so.5.28
#4  0x0000003b64e3d3f3 in TUnixSystem::DispatchSignals(ESignals) () from
/usr/lib64/root/libCore.so.5.28
#5  <signal handler called>
#6  0x000000000041245f in TIter (file=<value optimized out>,
pevent=0x7f46c5281010, par=0x665180) at /usr/include/root/TCollection.h:148
#7  write_event_ttree (file=<value optimized out>, pevent=0x7f46c5281010,
par=0x665180) at src/mana.c:2872
#8  0x0000000000412a4c in process_event (par=0x665180, pevent=0x7f46c5281010) at
src/mana.c:3195
#9  0x0000000000412e42 in analyze_run (run_number=1,
input_file_name=0x7fff4d738340 "run00001.mid", output_file_name=<value optimized
out>) at src/mana.c:4178
#10 0x0000000000413372 in loop_runs_offline () at src/mana.c:4366
#11 0x0000000000413ba5 in main (argc=<value optimized out>, argv=<value
optimized out>) at src/mana.c:5579
===========================================================


The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x000000000041245f in TIter (file=<value optimized out>,
pevent=0x7f46c5281010, par=0x665180) at /usr/include/root/TCollection.h:148
#7  write_event_ttree (file=<value optimized out>, pevent=0x7f46c5281010,
par=0x665180) at src/mana.c:2872
#8  0x0000000000412a4c in process_event (par=0x665180, pevent=0x7f46c5281010) at
src/mana.c:3195
#9  0x0000000000412e42 in analyze_run (run_number=1,
input_file_name=0x7fff4d738340 "run00001.mid", output_file_name=<value optimized
out>) at src/mana.c:4178
#10 0x0000000000413372 in loop_runs_offline () at src/mana.c:4366
#11 0x0000000000413ba5 in main (argc=<value optimized out>, argv=<value
optimized out>) at src/mana.c:5579
===========================================================


[midas.c:1973:,ERROR] cm_disconnect_experiment not called at end of program

**********************************************************************************************
**********************************************************************************************







I wonder if there is any program syntax change between MIDAS Version 4309 and
5347, and are there any simple working setup example which can output root tree
with the newest version of MIDAS?
 
In the end, I would like to thank the continuous effort from Triumf and PSI on
developing MIDAS, it is a pleasure to work with.

Many thanks
Bill 
    Reply  09 Jan 2013, Stefan Ritt, Bug Report, Outputting ADC and TDC data into ROOT tree with the MIDAS SVN Revision:5347. 
Dear Bill,

the Midas analyzer "mana.c" is currently not maintained. At PSI we use the ROME framework (which might be too complicated for a 
small experiment) and at TRIUMF the ROOTANA framework is used:

http://ladd00.triumf.ca/~olchansk/rootana/

You might be better off switching to that one.

Best regards,
Stefan
Entry  04 Jan 2013, Nabin Poudyal, Suggestion, how to start using midas 
Please, tell me how to choose a value of a "key" like DCM, pulser period,
presamples, upper thresholds to run a experiment? where can I find the related
informations? 
Entry  14 Dec 2012, Robert Casperson, Bug Report, MIDAS does not function correctly on F17 
When building MIDAS on Fedora 17 64-bit, the default zlib 1.2.5 shared library
is linked to.  When recording data, the "/Logger/Channels/*/Statistics/Bytes
written" value does not get set correctly beyond the first few seconds of the
run.  Occasionally, it appears to not get set at all, and mlogger aborts the run.

Installing zlib 1.2.3 in static form to /usr/local/lib (the default location),
and changing the NEED_ZLIB section of the MIDAS Makefile to the following seems
to function as a workaround:

ifdef NEED_ZLIB
CFLAGS   += -DHAVE-ZLIB
LIBS     += /usr/local/lib/libz.a
endif

Several Fedora 17 libraries expect zlib 1.2.5 specifically, so it seems safest
to not replace the default zlib shared library.

Some extra details are that the VME CPU is an XVB602, and the most recent GE-IP
drivers are being used for VME communication.  Fedora 17 was chosen to avoid a
bug with the VGA output in Fedora 13-16.
    Reply  20 Dec 2012, Stefan Ritt, Bug Report, MIDAS does not function correctly on F17 
If is not so easy to get out of zlib how many bytes have been written actually. I used an undocumented function, 
which breaks down on 64-bit systems.

I now rewrote the code in mlogger.cxx to use lseek() to "measure" actually the output file and set the values 
correctly. I tried on a few systems but am not 100% sure if it works everywhere. Can you please double check?

The fix is in SVN revision 5347.

/Stefan
Entry  18 Dec 2012, xelap, Forum, midas installation on SL6.3 
I try to do make in zlib folder and got  this
cc -O -o example example.o -L. -lz
/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches
non-TLS reference in ./libz.a(gzio.o)
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [example] Error 1

Do I miss any package to be installed?
Thanks in advance,
Xelap
Entry  14 Dec 2012, Vinzenz Bildstein, Suggestion, Midas + Elog with SSL 
I've been trying to set up midas to create an automatic elog entry at the end of
each run and I've run into a problem. I've setup an elog on our server which
uses SSL and it seems that the melog provided by midas to create logbook entries
doesn't know any SSL.

My solution to this was to copy the crypt.c from the elog package to the
computer running midas and changed melog.c and the makefile to use SSL if a flag
-s is used. Does this seem like a sensible solution or did I oversee the obvious
and/or right way to do this?
    Reply  14 Dec 2012, Stefan Ritt, Suggestion, Midas + Elog with SSL 
> I've been trying to set up midas to create an automatic elog entry at the end of
> each run and I've run into a problem. I've setup an elog on our server which
> uses SSL and it seems that the melog provided by midas to create logbook entries
> doesn't know any SSL.
> 
> My solution to this was to copy the crypt.c from the elog package to the
> computer running midas and changed melog.c and the makefile to use SSL if a flag
> -s is used. Does this seem like a sensible solution or did I oversee the obvious
> and/or right way to do this?

Indeed melog.c is an old version of the elog.c utility in the elog package, which has not been maintained since a 
long time. Can't you just use the recent elog.c utility from the elog package?
       Reply  17 Dec 2012, Vinzenz Bildstein, Suggestion, Midas + Elog with SSL 
> > I've been trying to set up midas to create an automatic elog entry at the end of
> > each run and I've run into a problem. I've setup an elog on our server which
> > uses SSL and it seems that the melog provided by midas to create logbook entries
> > doesn't know any SSL.
> > 
> > My solution to this was to copy the crypt.c from the elog package to the
> > computer running midas and changed melog.c and the makefile to use SSL if a flag
> > -s is used. Does this seem like a sensible solution or did I oversee the obvious
> > and/or right way to do this?
> 
> Indeed melog.c is an old version of the elog.c utility in the elog package, which has not been maintained since a 
> long time. Can't you just use the recent elog.c utility from the elog package?

Well, that's essentially what I did, I just didn't want to install the whole elog package on the midas server. Whether
the utility is called elog or melog doesn't really matter. I just wanted to make sure that this is the right way to do
it. 

Thanks!
Entry  12 Dec 2012, Shaun Mead, Bug Report, ss_thread_kill() kills entire program 
Hi, I'm having some trouble getting ss_thread_kill() to work properly. It seems 
to kill the entire program instead 
of just the thread. Here is a test program to show the error:

_________________________________
#include <stdio.h>
#include <stdlib.h>
#include "midas.h"
#include "msystem.h"

INT f(void *param)
{
  for (int x = 0; x < 100; x++)
    sleep(1);
  return 0;
}

int main()
{
  printf("creating thread\n");
  midas_thread_t thr = ss_thread_create(f, NULL);
  sleep(2);
  printf("killing thread\n");
  ss_thread_kill(thr);
  printf("success\n");
  return 0;
}
_________________________________

Makefile:
_________________________________
FLAGS=-g -Wall -DLINUX -DOS_LINUX -I/home/deap/packages/midas/include 
LIBS=-L/home/deap/packages/midas/linux-m64/lib -lmidas -lpthread -lrt -lutil

main.exe: main.cpp 
	g++ $(FLAGS) -o $@ $^ $(LIBS)

_________________________________

Output when run:

_________________________________

[deap@deap04 multithread]$ ./main.exe 
creating thread
killing thread
Killed
[deap@deap04 multithread]$ 
_________________________________

The last "Killed" indicated the whole program got killed, when it should 
actually just kill the thread and then 
print "success".

I noticed the function in system.c uses pthread_kill(). Some google searches 
show me that it may be better to use 
pthread_cancel() (ie http://stackoverflow.com/questions/3438536/when-to-use-
pthread-cancel-and-not-pthread-kill ).


Shaun
    Reply  13 Dec 2012, Stefan Ritt, Bug Report, ss_thread_kill() kills entire program 
The Linux thread functionality was introduced by Konstantin, so he might have a better idea about that.

What I usually do is a graceful thread shutdown just by a flag. Like

int stop_thread = 0;

INT f(void *param)
{
  for (int x = 0; x < 100; x++) {
    sleep(1);
    if (stop_thread) {
      // clean up things here...
      return 0;
    }
  }
  return 0;
}

int main()
{
 printf("creating thread\n");
 midas_thread_t thr = ss_thread_create(f, NULL);
 sleep(2);
 printf("killing thread\n");
 stop_thread = 1;
 sleep(2);
 printf("success\n");
 return 0;
}


This way I have a chance to clean up things in the thread, which otherwise I would not be able to.
    Reply  13 Dec 2012, Konstantin Olchanski, Bug Report, ss_thread_kill() kills entire program 
> Hi, I'm having some trouble getting ss_thread_kill() to work properly. It seems 
> to kill the entire program instead of just the thread.

You cannot kill a thread. It's not a well defined operation. Most OSes do have the 
technical possibility to kill threads, but if you use them, you will not like the 
results. For a taste of small trouble, if a thread is holding a lock and you kill 
it, who's job is it to release the lock?

The best you can do is to ask the thread to gracefully shutdown itself. (I.e. by 
using global variable flags).

P.S. I did not implement the ss_thread stuff, I do not know what ss_thread_kill() 
does, but I recommend that you do not use it.

P.P.S. Programming using threads is complicated, I recommend that you read at least 
some literature on the topic before using threads. At the least you must understand 
the common pitfalls and mistakes. At the least, you must know about deadlocks, 
livelocks, race conditions and semaphore priority inversions.

K.O.
       Reply  13 Dec 2012, Shaun Mead, Bug Report, ss_thread_kill() kills entire program 
> > Hi, I'm having some trouble getting ss_thread_kill() to work properly. It seems 
> > to kill the entire program instead of just the thread.
> 
> You cannot kill a thread. It's not a well defined operation. Most OSes do have the 
> technical possibility to kill threads, but if you use them, you will not like the 
> results. For a taste of small trouble, if a thread is holding a lock and you kill 
> it, who's job is it to release the lock?
> 
> The best you can do is to ask the thread to gracefully shutdown itself. (I.e. by 
> using global variable flags).
> 
> P.S. I did not implement the ss_thread stuff, I do not know what ss_thread_kill() 
> does, but I recommend that you do not use it.
> 
> P.P.S. Programming using threads is complicated, I recommend that you read at least 
> some literature on the topic before using threads. At the least you must understand 
> the common pitfalls and mistakes. At the least, you must know about deadlocks, 
> livelocks, race conditions and semaphore priority inversions.
> 
> K.O.

Yes, but unfortunately what I was attempting to do was use a library function that I
can't alter. It sometimes gets stuck and I wanted a way to kill it. Anyway I ended up
not doing this at all in c++; I was able to do what I needed in python.

Shaun
Entry  30 Aug 2012, Raquel Castillo, Forum, MIDAS in Windows 
Hi,

I need to install MIDAS on a Windows system (Microsoft Windows Server 2003). 
The computer has the Microsoft Visual C++ 2010 Express version.
I have downloaded the MIDAS packages using the tarball mechanism. I have create 
the environment variables without problems and I have create the file           
%SystemRoot%\system32\exptab 
But when I try to build MIDAS and I do 
nmake -f makefile.nt
I have the following problem:
Microsoft (R) Program Maintenance Utility Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

NMAKE : fatal error U1073: don't know how to make 'src/mhttpd.c'
Stop.

I don't understand this problem. Can anybody help me, please?

Thanks in advance!!!
    Reply  31 Aug 2012, Pierre-Andre Amaudruz, Forum, MIDAS in Windows 
Hi Raquel,

The makefile.nt has been corrected.
Obviously Midas on Windows has not been updated for quite a while.
mhttpd.c has been converted to c++ (mhttpd.cxx) as well as a couple of other 
applications.

Please give a try,  PAA

> Hi,
> 
> I need to install MIDAS on a Windows system (Microsoft Windows Server 2003). 
> The computer has the Microsoft Visual C++ 2010 Express version.
> I have downloaded the MIDAS packages using the tarball mechanism. I have create 
> the environment variables without problems and I have create the file           
> %SystemRoot%\system32\exptab 
> But when I try to build MIDAS and I do 
> nmake -f makefile.nt
> I have the following problem:
> Microsoft (R) Program Maintenance Utility Version 10.00.30319.01
> Copyright (C) Microsoft Corporation.  All rights reserved.
> 
> NMAKE : fatal error U1073: don't know how to make 'src/mhttpd.c'
> Stop.
> 
> I don't understand this problem. Can anybody help me, please?
> 
> Thanks in advance!!!
       Reply  23 Oct 2012, Raquel Castillo, Forum, MIDAS in Windows MIDAS_odbedit.bmp
Hi Pierre-André, 

sorry for the long delay, another things keep me out of this computer.
Thanks a lot for correcting makefile.nt and the other applications!

Now I have try, downloading the MIDAS packages from the tarball mechanism, as
before,
and now it seems that the previous problems are solved. It remains only one small
problem, it is related to the odbedit.

I attach here the figure with the error that is reported by the computer. Is it
possible that is another file that needs to be updated? Can you help me with that?

Thanks a lot in advance!!!!



> Hi Raquel,
> 
> The makefile.nt has been corrected.
> Obviously Midas on Windows has not been updated for quite a while.
> mhttpd.c has been converted to c++ (mhttpd.cxx) as well as a couple of other 
> applications.
> 
> Please give a try,  PAA
> 
> > Hi,
> > 
> > I need to install MIDAS on a Windows system (Microsoft Windows Server 2003). 
> > The computer has the Microsoft Visual C++ 2010 Express version.
> > I have downloaded the MIDAS packages using the tarball mechanism. I have create 
> > the environment variables without problems and I have create the file           
> > %SystemRoot%\system32\exptab 
> > But when I try to build MIDAS and I do 
> > nmake -f makefile.nt
> > I have the following problem:
> > Microsoft (R) Program Maintenance Utility Version 10.00.30319.01
> > Copyright (C) Microsoft Corporation.  All rights reserved.
> > 
> > NMAKE : fatal error U1073: don't know how to make 'src/mhttpd.c'
> > Stop.
> > 
> > I don't understand this problem. Can anybody help me, please?
> > 
> > Thanks in advance!!!
Entry  27 Sep 2012, Randolf Pohl, Bug Fix, [PATCH] mana.c compile fix, gz files diff.mana
Hi,

I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
gcc version is "(SUSE Linux) 4.6.2"

Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
in the case when we work on a gzfile (HAVE_ZLIB).

Could you please confirm that the patch is correct, and possibly apply it to trunk?

I haven't checked if mana works as advertised now.

Cheers,


Randolf
    Reply  09 Oct 2012, Stefan Ritt, Bug Fix, [PATCH] mana.c compile fix, gz files 
> Hi,
> 
> I had to apply the attached patch to convince SuSE Linux 12.2 to compile mana.c
> gcc version is "(SUSE Linux) 4.6.2"
> 
> Problem is that gz{write,close, etc.} expect a 1st argument of type gzFile (see
> zlib.h), whereas out_file is FILE*. In fact, out_file is a cast to FILE*, even
> in the case when we work on a gzfile (HAVE_ZLIB).
> 
> Could you please confirm that the patch is correct, and possibly apply it to trunk?
> 
> I haven't checked if mana works as advertised now.
> 
> Cheers,
> 
> 
> Randolf

I applied your patch to the trunk.

Best,
Stefan
Entry  16 Aug 2012, Cheng-Ju Lin, Bug Report, launching roody kills the analyzer 
Hi All,

I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody. 
All the packages compiled OK. The example code in $MIDASSYS/examples/experiment also runs OK 
provided that I don't launch roody. If I try to launch roody, then it immediately crashes the analyzer with 
the following trace:

#6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
#7 0x0000003219a1e13a in TThread::Function(void*) () from /usr/lib64/root/libThread.so.5.28
#8 0x0000003dd1207851 in start_thread () from /lib64/libpthread.so.0
#9 0x0000003dd0ee76dd in clone () from /lib64/libc.so.6

The line src/mana.c:5154 points to the following:

TObject *obj;
            if (strncmp(request + 10, "Any", 3) == 0)
               obj = folder->FindObjectAny(request + 14);
            else
               obj = folder->FindObject(request + 11);    // LINE 5154


Any suggestions on what may be going on here?  Thanks.


Cheng-Ju
    Reply  16 Aug 2012, Cheng-Ju Lin, Bug Fix, launching roody kills the analyzer 
OK, I've found the solution in the roody forum.  The solution for 64bit machine is to replace
   uint32_t p =0;
   with
   uintptr_t p =0;

in the roody header file roody/include/DataSourceTNetFolder.h

Cheng-Ju



> Hi All,
> 
> I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody. 
> All the packages compiled OK. The example code in $MIDASSYS/examples/experiment also runs OK 
> provided that I don't launch roody. If I try to launch roody, then it immediately crashes the analyzer with 
> the following trace:
> 
> #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
> #7 0x0000003219a1e13a in TThread::Function(void*) () from /usr/lib64/root/libThread.so.5.28
> #8 0x0000003dd1207851 in start_thread () from /lib64/libpthread.so.0
> #9 0x0000003dd0ee76dd in clone () from /lib64/libc.so.6
> 
> The line src/mana.c:5154 points to the following:
> 
> TObject *obj;
>             if (strncmp(request + 10, "Any", 3) == 0)
>                obj = folder->FindObjectAny(request + 14);
>             else
>                obj = folder->FindObject(request + 11);    // LINE 5154
> 
> 
> Any suggestions on what may be going on here?  Thanks.
> 
> 
> Cheng-Ju
    Reply  17 Aug 2012, Konstantin Olchanski, Bug Report, launching roody kills the analyzer 
> I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody. 
>
> #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154

You are connecting to mana, the old midas analyzer. The code for connecting to it is still present in roody,
but I cannot support the matching server code in mana.c - it is 2 revolutions behind the current state of
the ROOT object server (look in ROOTANA - the NetDirectory stuff and the latest is the XmlServer stuff).

I can offer 2 solutions - switch from mana.c to a ROOTANA based analyzer or graft the XmlServer code
into your analyzer (it is very simple - you need to create an XmlServer object and tell it which ROOT
containers you want to make visible to ROODY).

I guess you can also debug the old midas server code inside mana.c...

K.O.
       Reply  17 Aug 2012, Cheng-Ju Lin, Bug Report, launching roody kills the analyzer 
Hi Konstantin,

Many thanks for your feedback.  I was able to keep the analyzer from exiting when launching roody by making some changes in the roody code. 
This at least allows me to keep moving forward. I will look into your suggestion of converting to ROOTANA based analyzer as well.

Regards,

Cheng-Ju


> > I've installed midas (Rev:5294) on SLC6.3 (64bit), along with recent trunk versions of rootana and roody. 
> >
> > #6 root_server_thread (arg=ox7f54fc001150) at src/mana.c:5154
> 
> You are connecting to mana, the old midas analyzer. The code for connecting to it is still present in roody,
> but I cannot support the matching server code in mana.c - it is 2 revolutions behind the current state of
> the ROOT object server (look in ROOTANA - the NetDirectory stuff and the latest is the XmlServer stuff).
> 
> I can offer 2 solutions - switch from mana.c to a ROOTANA based analyzer or graft the XmlServer code
> into your analyzer (it is very simple - you need to create an XmlServer object and tell it which ROOT
> containers you want to make visible to ROODY).
> 
> I guess you can also debug the old midas server code inside mana.c...
> 
> K.O.
          Reply  26 Sep 2012, Konstantin Olchanski, Bug Report, launching roody kills the analyzer 
> > 
> > I guess you can also debug the old midas server code inside mana.c...
> > 

I ended up doing this. (After receiving some discussion by email).

Remembered that this is an old problem with the old midasServer network
protocol in mana.c - if mana.c is compiled 32-bit, it sends 32-bit pointers, if compiled 64-bit
it sends 64-bit pointers. On the receiving end (in roody), the ROOT TMessage object does not
provide any easy way to tell between them (i.e. object length is reported as 12 or 16 for the two cases).

To make things more interesting, the midasServer code in ROOTANA always sends 32-bit "pointers",
(which are not pointers but 32-bit integer cookies).

I use the ROOTANA midasServer to test ROODY (I have no working mana.c analyzers available),
and ROODY expects to receive 32-bit "pointers", so the two are consistent.

But if I compile my midasServer to send/receive 64-bit "pointers" (cookies), I reproduce this crash. What I can reproduce I can "fix".

If I change the code in ROODY to receive and return 64-bit "pointers" (cookies), both 32-bit and 64-bit midasServer seems to work okey.

This is committed as roody svn rev 248. (https://ladd00.triumf.ca/svn/roody/trunk)

It is the same fix as suggested by Cheng-Ju Stephen Lin [cjslin@lbl.gov].

I hope this helps (or breaks the ROODY midasServer connection for everybody. I hope not).

K.O.
Entry  10 Sep 2012, Shaun Mead, Info, MIDAS button to display image 
Hi,

I've written a python script that reads some data from a file and generates a
.png image. I want to have a button on my MIDAS status page that:

- executes the script and waits for it to finish,
- then displays the image

How can I do that? I tried using the sequencer to just execute the script every
30 seconds, but I can't get it to work, and it would be better to only execute
the script on demand anyway. 

I also am having trouble getting image display to work. I have the ODB keys set:

[local:oven1:S]/Custom>ls
Temperature Map&                /home/deap/ovendaq/online/index.html
Images

[local:oven1:S]/Custom>ls Images/temps.png/           
Background                      /home/deap/ovendaq/online/temps.png

And the HTML file is just this:
<img src="temps.png">

But the image won't display. It shows a "broken" picture, and when I try to view
it directly it says: Invalid custom page: Page not found in ODB.

Any help would be appreciated...

Thanks
Shaun
    Reply  11 Sep 2012, Stefan Ritt, Info, MIDAS button to display image Screen_Shot_2012-09-11_at_14.36.56_.png
> Hi,
> 
> I've written a python script that reads some data from a file and generates a
> .png image. I want to have a button on my MIDAS status page that:
> 
> - executes the script and waits for it to finish,
> - then displays the image
> 
> How can I do that? I tried using the sequencer to just execute the script every
> 30 seconds, but I can't get it to work, and it would be better to only execute
> the script on demand anyway. 
> 
> I also am having trouble getting image display to work. I have the ODB keys set:
> 
> [local:oven1:S]/Custom>ls
> Temperature Map&                /home/deap/ovendaq/online/index.html
> Images
> 
> [local:oven1:S]/Custom>ls Images/temps.png/           
> Background                      /home/deap/ovendaq/online/temps.png
> 
> And the HTML file is just this:
> <img src="temps.png">
> 
> But the image won't display. It shows a "broken" picture, and when I try to view
> it directly it says: Invalid custom page: Page not found in ODB.
> 
> Any help would be appreciated...
> 
> Thanks
> Shaun


If you use the "custom" image system, you need to use GIF images. mhttpd can dynamically create GIF 
images, 
with a background image and overlaid labels, bar graphs etc. But mhttpd just contains a GIF library to do 
that 
in memory, but no PNG library.

Actually I would recommend you not to use a script to create an image, but use the custom image system 
to 
display temperatures. In the attachment you see an page from our experiment which contains a 
background image (the greyish boxes), labels (white temperature boxes), bar graphs (blue level boxes) 
and history pages (left side). This is all dynamically created inside mhttpd using the custom page system 
without any external script. All you have to do is to get the temperatures and levels inside the ODB via the 
slow control system. If you want, I can send you the full code for that page.

Cheers,
Stefan
Entry  06 Sep 2012, shaun, Bug Report, "cannot find recent history file" 
Hi, when attempting to access a history window the following message is repeated
over and over in the MIDAS message log:

Thu Sep 6 11:37:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:38:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:38:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:39:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file
Thu Sep 6 11:39:16 2012 [mhttpd,ERROR] [history.c:886:hs_count_events,ERROR]
cannot find recent history file

It appears to be related to attempting to display a history graph that includes
some time periods that have no recorded history data. When I zoom in so that the
whole graph has data the error message goes away.

The graph displays fine either way, so this error message seems useless. Is
there a way to suppress it?

Thanks
Shaun
Entry  05 Sep 2012, Stefan Ritt, Info, New pipe compression implemented in mlogger 
A new pipe compression has been implemented in mlogger thanks to Fedor Ignatov from BINP 
Novosibirsk. The way it works that the logger write into a pipe instead directly into a file. The pipe can 
then be connected to any compression program without the need to copile against any additional C 
library.

To use is, enter as the filename for example

|bzip2>run%05d.mid     (note the pipe '|' in front of the bzip2)

This way the data stream is run through the bzip2 program, which is known to have better compression 
ratio than gzip. Furthermore, the parallel version of bzip2 can be used, which spreads over all available 
CPU cures and speeds up compression almost linearly with the number of cores. This parallel version 
called pbzip2 can be found here:

http://compression.ca/pbzip2/

It can be easily compiled and installed. Using this method in the MEG experiment at PSI, we can compress 
our waveform data to 37% or it's original size (49% with gzip), and on 8 cores we get a compression rate 
of about 40 MBytes/sec (23 MBytes with gzip on a single core).

The disadvantage of that method is that one cannot see the compression ratio online, but this is not a big 
deal I guess. The new version has been committed as rev. 5324. 

/Stefan
Entry  10 May 2011, Jianglai Liu, Forum, simple example frontend for V1720  
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai
    Reply  10 May 2011, Stefan Ritt, Forum, simple example frontend for V1720  

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


During "Calibrating", the framework calls your poll_event() routine. You code there accesses for the first time the VME crate and probably gets stuck.
    Reply  10 May 2011, Pierre-Andre Amaudruz, Forum, simple example frontend for V1720  

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Under the drivers/vme you can find code for the v1720.c (VME access) and ov1720.c
(A2818/A3818 PCIe optical link access). For testing the hardware, we use this code compiled and linked
with MAIN_ENABLE to confirm its functionality. You may want to do the same for your USB. Once this
is under control, the Midas frontend implementation using the same driver shouldn't give you trouble.
       Reply  24 May 2011, Jianglai Liu, Forum, simple example frontend for V1720  v1720.c
Thanks all for the kind help. This did point me to the right direction. I was now able to make v1720.c as well as my MIDAS frontend (thanks to
Jimmy's example) talking to V1720, and read out the waveform bank.

However the readout values did not seem quite right. I fed in a PMT-like pulse of about 0.1 V and 50 ns wide, with an external trigger just in time.
However, the readout by both v1720.c stand-alone code, and my midas frontend seemed to be flat noise.

I tried to play with the post trigger value, as well as the DAC setting of V1720. None seemed to help.

BTW I tested my V1720 board functionality by using the CAEN windows software (CAENScope and WaveDump). They worked just fine.

Any suggestions? Attached is my modified v1720.c code.


Pierre-Andre Amaudruz wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Under the drivers/vme you can find code for the v1720.c (VME access) and ov1720.c
(A2818/A3818 PCIe optical link access). For testing the hardware, we use this code compiled and linked
with MAIN_ENABLE to confirm its functionality. You may want to do the same for your USB. Once this
is under control, the Midas frontend implementation using the same driver shouldn't give you trouble.
    Reply  18 May 2011, Jimmy Ngai, Forum, simple example frontend for V1720  frontend.cv1718.hv1718.cv792n.hv792n.c

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy
       Reply  10 Aug 2012, Carl Blaksley, Forum, simple example frontend for V1720  

Jimmy Ngai wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy


Jimmy,

How exactly did you link the CAENVMElib with your frontend? That is the part which I can not seem to replicate using your example frontend!

Thanks,
-Carl
          Reply  12 Aug 2012, Jimmy Ngai, Forum, simple example frontend for V1720  Makefile

Carl Blaksley wrote:

Jimmy Ngai wrote:

Jianglai Liu wrote:
Hi,

Who has a good example of a frontend program using CAEN V1718 VME-USB bridge and
V1720 FADC? I am trying to set up the DAQ for such a simple system.

I put together a frontend which talks to the VME. However it gets stuck at
"Calibrating" in initialize_equipment().

I'd appreciate some help!

Thanks,
Jianglai


Hi Jianglai,

I don't have an exmaple of using V1718 with V1720, but I have been using V1718 with V792N for a long time.

You may find in the attachment an example frontend program and my drivers for V1718 and V792N written in MVMESTD format. They have to be linked with the CAENVMELib library and other essential MIDAS stuffs.

Regards,
Jimmy


Jimmy,

How exactly did you link the CAENVMElib with your frontend? That is the part which I can not seem to replicate using your example frontend!

Thanks,
-Carl


Hi Carl,

Attached is a cut-down version of my original Makefile just for demonstrating how to link the CAENVMElib. I didn't test it for bugs. Please make sure the libCAENVME.so is in your library path.

Jimmy
Entry  10 Aug 2012, Carl Blaksley, Forum, Problem with CAMAC controlled by CES8210 and read out by CAEN V1718 VME controller 
Hello all,

I am trying to put together a system to read out several camac adc. The camac is
read by a ces8210 camac to vme controller. The vme is then interfaced to a
computer through a CAEN v1718 usb control module. As anyone gotten the latter to
work?

Previous users seemed to indicate that they had here:

https://ladd00.triumf.ca/elog/Midas/493

but I am having problems to get this example frontend to compile. What is set as
the driver in the makefile for example? If I put v1718 there then I recieve
numerous errors from the CAENVMElib files. 

If someone else has gotten the V1718 running, I would be grateful for their
insight. 

Thanks, 
-Carl
Entry  27 Jul 2012, Cheng-Ju Lin, Info, MIDAS under Scientific Linux 6 
Hi All,

I was wondering if anyone has attempted to install MIDAS under Scientific Linux 6?  I am planning to install 
Scientific Linux on one of the PCs in our lab to run MIDAS. I would like to know if anyone has been 
successful in getting MIDAS to run under SL6.  Thanks.

Cheng-Ju
    Reply  31 Jul 2012, Pierre-Andre Amaudruz, Info, MIDAS under Scientific Linux 6 
Hi Cheng-Ju,

Midas will install and run under SL6. We're presently running SL6.2.
Cheers, PAA

> Hi All,
> 
> I was wondering if anyone has attempted to install MIDAS under Scientific Linux 6?  I am planning to install 
> Scientific Linux on one of the PCs in our lab to run MIDAS. I would like to know if anyone has been 
> successful in getting MIDAS to run under SL6.  Thanks.
> 
> Cheng-Ju
Entry  04 Jul 2012, Konstantin Olchanski, Bug Report, Crash after recursive use of rpc_execute() 
I am looking at a MIDAS kaboom when running out of space on the data disk - everything was freezing 
up, even the VME frontend crashed sometimes.

The freeze was traced to ROOT use in mlogger - it turns out that ROOT intercepts many signal handlers, 
including SIGSEGV - but instead of crashing the program as God intended, ROOT SEGV handler just hangs, 
and the rest of MIDAS hangs with it. One solution is to always build mlogger without ROOT support - 
does anybody use this feature anymore? Or reset the signal handlers back to the default setting somehow.

Freeze fixed, now I see a crash (seg fault) inside mlogger, in the newly introduced memmove() function 
inside the MIDAS RPC code rpc_execute(). memmove() replaced memcpy() in the same place and I am 
surprised we did not see this crash with memcpy().

The crash is caused by crazy arguments passed to memmove() - looks like corrupted RPC arguments 
data.

Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls 
ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted 
data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to 
rpc_execute() is no longer visible.

Note that rpc_execute() cannot be called recursively - it is not re-entrant as it uses a global buffer for RPC 
argument processing. (global tls_buffer structure).

Here is the mlogger stack trace:

#0  0x00000032a8032885 in raise () from /lib64/libc.so.6
#1  0x00000032a8034065 in abort () from /lib64/libc.so.6
#2  0x00000032a802b9fe in __assert_fail_base () from /lib64/libc.so.6
#3  0x00000032a802bac0 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000041d3e6 in rpc_execute (sock=14, buffer=0x7ffff73fc010 "\340.", convert_flags=0) at 
src/midas.c:11478
#5  0x0000000000429e41 in rpc_server_receive (idx=1, sock=<value optimized out>, check=<value 
optimized out>) at src/midas.c:12955
#6  0x0000000000433fcd in ss_suspend (millisec=0, msg=0) at src/system.c:3927
#7  0x0000000000429b12 in cm_yield (millisec=100) at src/midas.c:4268
#8  0x00000000004137c0 in close_channels (run_number=118, p_tape_flag=0x7fffffffcd34) at 
src/mlogger.cxx:3705
#9  0x000000000041390e in tr_stop (run_number=118, error=<value optimized out>) at 
src/mlogger.cxx:4148
#10 0x000000000041cd42 in rpc_execute (sock=12, buffer=0x7ffff73fc010 "\340.", convert_flags=0) at 
src/midas.c:11626
#11 0x0000000000429e41 in rpc_server_receive (idx=0, sock=<value optimized out>, check=<value 
optimized out>) at src/midas.c:12955
#12 0x0000000000433fcd in ss_suspend (millisec=0, msg=0) at src/system.c:3927
#13 0x0000000000429b12 in cm_yield (millisec=1000) at src/midas.c:4268
#14 0x0000000000416c50 in main (argc=<value optimized out>, argv=<value optimized out>) at 
src/mlogger.cxx:4431


K.O.
    Reply  04 Jul 2012, Konstantin Olchanski, Bug Report, Crash after recursive use of rpc_execute() 
>  ... I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls 
> ss_suspend() calls rpc_execute()
> ... rpc_execute() cannot be called recursively - it is not re-entrant as it uses a global buffer

It turns out that rpc_server_receive() also need protection against recursive calls - it also uses
a global buffer to receive network data.

My solution is to protect rpc_server_receive() against recursive calls by detecting recursion and returning SS_SUCCESS (to ss_suspend()).

I was worried that this would cause a tight loop inside ss_suspend() but in practice, it looks like ss_suspend() tries to call
us about once per second. I am happy with this solution. Here is the diff:


@@ -12813,7 +12815,7 @@
 
 
 /********************************************************************/
-INT rpc_server_receive(INT idx, int sock, BOOL check)
+INT rpc_server_receive1(INT idx, int sock, BOOL check)
 /********************************************************************\
 
   Routine: rpc_server_receive
@@ -13047,7 +13049,28 @@
    return status;
 }
 
+/********************************************************************/
+INT rpc_server_receive(INT idx, int sock, BOOL check)
+{
+  static int level = 0;
+  int status;
 
+  // Provide protection against recursive calls to rpc_server_receive() and rpc_execute()
+  // via rpc_execute() calls tr_stop() calls cm_yield() calls ss_suspend() calls rpc_execute()
+
+  if (level != 0) {
+    //printf("*** enter rpc_server_receive level %d, idx %d sock %d %d -- protection against recursive use!\n", level, idx, sock, check);
+    return SS_SUCCESS;
+  }
+
+  level++;
+  //printf(">>> enter rpc_server_receive level %d, idx %d sock %d %d\n", level, idx, sock, check);
+  status = rpc_server_receive1(idx, sock, check);
+  //printf("<<< exit rpc_server_receive level %d, idx %d sock %d %d, status %d\n", level, idx, sock, check, status);
+  level--;
+  return status;
+}
+
 /********************************************************************/
 INT rpc_server_shutdown(void)
 /********************************************************************\


ladd02:trinat~/packages/midas>svn info src/midas.c
Path: src/midas.c
Name: midas.c
URL: svn+ssh://svn@savannah.psi.ch/repos/meg/midas/trunk/src/midas.c
Repository Root: svn+ssh://svn@savannah.psi.ch/repos/meg/midas
Repository UUID: 050218f5-8902-0410-8d0e-8a15d521e4f2
Revision: 5297
Node Kind: file
Schedule: normal
Last Changed Author: olchanski
Last Changed Rev: 5294
Last Changed Date: 2012-06-15 10:45:35 -0700 (Fri, 15 Jun 2012)
Text Last Updated: 2012-06-29 17:05:14 -0700 (Fri, 29 Jun 2012)
Checksum: 8d7907bd60723e401a3fceba7cd2ba29

K.O.
    Reply  13 Jul 2012, Stefan Ritt, Bug Report, Crash after recursive use of rpc_execute() 
> Then I realized that I see a recursive call to rpc_execute(): rpc_execute() calls tr_stop() calls cm_yield() calls 
> ss_suspend() calls rpc_execute(). The second rpc_execute successfully completes, but leave corrupted 
> data for the original rpc_execute(), which happily crashes. At the moment of the crash, recursive call to 
> rpc_execute() is no longer visible.

This is really strange. I did not protect rpc_execute against recursive calls since this should not happen. rpc_server_receive() is linked to rpc_call() on the client side. So there cannot be 
several rpc_call() since there I do the recursive checking (also multi-thread checking) via a mutex. See line 10142 in midas.c. So there CANNOT be recursive calls to rpc_execute() because 
there cannot be recursive calls to rpc_server_receive(). But apparently there are, according to your stack trace.

So even if your patch works fine, I would like to know where the recursive calls to rpc_server_receive() come from. Since we have one subproces of mserver for each client, there should only 
be one client connected to each mserver process, and the client is protected via the mutex in rpc_call(). Can you please debug this? I would like to understand what is going on there. Maybe 
there is a deeper underlying problem, which we better solve, otherwise it might fall back on use in the future.

For debugging, you have to see what commands rpc_call() send and what rpc_server_receive() gets, maybe by writing this into a common file together with a time stamp.

SR
Entry  20 Jun 2012, Konstantin Olchanski, Info, lazylogger write to HADOOP HDFS 
I tried using the lazylogger "Disk" method to write into a HADOOP HDFS clustered filesystem and found a 
number of problems. I ended up replacing the lazylogger lazy_copy() function that still uses former YBOS 
code with a new lazy_disk_copy() function that uses generic fread/fwrite. Also fixed the situation where 
lazylogger cannot cleanly stop from the mhttpd "programs/stop" button while it is busy writing (the fix 
works only for the "Disk" method).

(Note that one can also use the "Script" method for writing into HDFS)

Anyhow, the new lazylogger writes into HDFS just fine and I expect that it would also work for writing into 
DCACHE using PNFS (if ever we get the SL6 PNFS working with our DCACHE servers).

Writing into our test HDFS cluster runs at about 20 MiBytes/sec for 1GB files with replication set to 3.

svn rev 5295
K.O.
    Reply  29 Jun 2012, Konstantin Olchanski, Info, lazylogger write to HADOOP HDFS 
> Anyhow, the new lazylogger writes into HDFS just fine and I expect that it would also work for writing into 
> DCACHE using PNFS (if ever we get the SL6 PNFS working with our DCACHE servers).
> 
> Writing into our test HDFS cluster runs at about 20 MiBytes/sec for 1GB files with replication set to 3.

Minor update to lazylogger and mlogger:

lazylogger default timeout 60 sec is too short for writing into HDFS - changed to 10 min.
mlogger checks for free space were insufficient and it would fill the output disk to 100% full before stopping 
the run. Now for disks bigger than 100GB, it will stop the run if there is less than 1GB of free space. (100% 
disk full would break the history and the elog if they happen to be on the same disk).

Also I note that mlogger.cxx rev 5297 includes a fix for a performance bug introduced about 6 month ago (mlogger 
would query free disk space after writing each event - depending on your filesystem configuration and the event 
rate, this bug was observed to extremely severely reduce the midas disk writing performance).

svn rev 5296, 5297
K.O.

P.S. I use these lazylogger settings for writing to HDFS. Write speed varies around 10-20-30 Mbytes/sec (4-node 
cluster, 3 replicas of each file).

[local:trinat_detfac:S]Settings>pwd
/Lazy/HDFS/Settings
[local:trinat_detfac:S]Settings>ls -l
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
Period                          INT     1     4     7m   0   RWD  10
Maintain free space (%)         INT     1     4     7m   0   RWD  20
Stay behind                     INT     1     4     7m   0   RWD  0
Alarm Class                     STRING  1     32    7m   0   RWD  
Running condition               STRING  1     128   7m   0   RWD  ALWAYS
Data dir                        STRING  1     256   7m   0   RWD  /home/trinat/online/data
Data format                     STRING  1     8     7m   0   RWD  MIDAS
Filename format                 STRING  1     128   7m   0   RWD  run*
Backup type                     STRING  1     8     7m   0   RWD  Disk
Execute after rewind            STRING  1     64    7m   0   RWD  
Path                            STRING  1     128   7m   0   RWD  /hdfs/users/trinat/data
Capacity (Bytes)                FLOAT   1     4     7m   0   RWD  5e+09
List label                      STRING  1     128   7m   0   RWD  HDFS
Execute before writing file     STRING  1     64    7m   0   RWD  
Execute after writing file      STRING  1     64    7m   0   RWD  
Modulo.Position                 STRING  1     8     7m   0   RWD  
Tape Data Append                BOOL    1     4     7m   0   RWD  y

K.O.
Entry  20 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks lxdaq09cpu.giflxdaq09net.gifladd02cpu.gifladd02net.gif
I am recording here the results from a test VME system using two VF48 waveform digitizers and a 64-bit 
dual-core VME processor (V7865). VF48 data suppression is off, VF48 modules set to read 48 channels, 
1000 ADC samples each. mlogger data compression is enabled (gzip -1).

Event rate is about 200/sec
VME Data rate is about 40 Mbytes/sec
System is 100% busy (estimate)

System utilization of host computer (dual-core 2.2GHz, dual-channel DDR333 RAM):

(note high CPU use by mlogger for gzip compression of midas files)

top - 12:23:45 up 68 days, 20:28,  3 users,  load average: 1.39, 1.22, 1.04
Tasks: 193 total,   3 running, 190 sleeping,   0 stopped,   0 zombie
Cpu(s): 32.1%us,  6.2%sy,  0.0%ni, 54.4%id,  2.7%wa,  0.1%hi,  4.5%si,  0.0%st
Mem:   3925556k total,  3797440k used,   128116k free,     1780k buffers
Swap: 32766900k total,        8k used, 32766892k free,  2970224k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                   
 5169 trinat    20   0  246m 108m  97m R 64.3  2.8  29:36.86 mlogger                                    
 5771 trinat    20   0  119m  98m  97m R 14.9  2.6 139:34.03 mserver                                    
 6083 root      20   0     0    0    0 S  2.0  0.0   0:35.85 flush-9:3                                  
 1097 root      20   0     0    0    0 S  0.9  0.0  86:06.38 md3_raid1        

System utilization of VME processor (dual-core 2.16 GHz, single-channel DDR2 RAM):

(note the more than 100% CPU use of multithreaded fevme)

top - 12:24:49 up 70 days, 19:14,  2 users,  load average: 1.19, 1.05, 1.01
Tasks: 103 total,   1 running, 101 sleeping,   1 stopped,   0 zombie
Cpu(s):  6.3%us, 45.1%sy,  0.0%ni, 47.7%id,  0.0%wa,  0.2%hi,  0.6%si,  0.0%st
Mem:   1019436k total,   866672k used,   152764k free,     3576k buffers
Swap:        0k total,        0k used,        0k free,    20976k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                   
19740 trinat    20   0  177m 108m  984 S 104.5 10.9   1229:00 fevme_gef.exe                             
 1172 ganglia   20   0  416m  99m 1652 S  0.7 10.0   1101:59 gmond                                      
32353 olchansk  20   0 19240 1416 1096 R  0.2  0.1   0:00.05 top                                        
  146 root      15  -5     0    0    0 S  0.1  0.0  42:52.98 kslowd001       

Attached are the CPU and network ganglia plots from lxdaq09 (VME) and ladd02 (host).

The regular bursts of "network out" on ladd02 is lazylogger writing mid.gz files to HADOOP HDFS.

K.O.
    Reply  20 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks 
> I am recording here the results from a test VME system using two VF48 waveform digitizers

Note 1: data compression is about 89% (hence "data to disk" rate is much smaller than the "data from VME" rate)

Note 2: switch from VME MBLT64 block transfer to 2eVME block transfer:
- raises the VME data rate from 40 to 48 M/s
- event rate from 220/sec to 260/sec
- mlogger CPU use from 64% to about 80%

This is consistent with the measured VME block transfer rates for the VF48 module: MBLT64 is about 40 M/s, 2eVME is about 50 M/s (could be 
80 M/s if no clock cycles were lost to sync VME signals with the VF48 clocks), 2eSST is implemented but impossible - VF48 cannot drive the 
VME BERR and RETRY signals. Evil standards, grumble, grumble, grumble).

K.O.
       Reply  24 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks Scalers_(1).gif
> > I am recording here the results from a test VME system using two VF48 waveform digitizers

(I now have 4 VF48 waveform digitizers, so the event rates are half of those reported before. Date rate
is up to 51 M/s - event size has doubled, per-event overhead is the same, so the effective data rate goes 
up).

This message demonstrates the effects of tuning the MIDAS system for high rate data taking.

Attached is the history plot of the event rate counters which show the real-time performance of the MIDAS 
system with better detail compared to the average event rate reported on the MIDAS status page. For an 
ideal real-time system, the event rate should be a constant, without any drop-outs.

Seen on the plot:

run 75: the periodic dropouts in the event rate correspond to the lazylogger writing data into HADOOP 
HDFS. Clearly the host computer cannot keep up with both data taking and data archiving at the same 
time. (see the output of "top" "with HDFS" and "without HDFS" below)

run 76: SYSTEM buffer size increased from 100Mbytes to 300Mbytes. Maybe there is an improvement.

run 77-78: "event_buffer_size" inside the multithreaded (EQ_MULTITHREAD) VME frontend increased from 
100Mbytes to 300Mbytes. (6 seconds of data at 50M/s). Much better, yes?

Conclusion: for improved real-time performance, there should be sufficient buffering between the VME 
frontend readout thread and the mlogger data compression thread.

For benchmark hardware, at 50M/s, 4 seconds of buffer space (100M in the SYSTEM buffer and 100M in 
the frontend) is not enough. 12 seconds of buffer space (300+300) is much better. (Or buy a faster 
backend computer).


P.S. HDFS data rate as measured by lazylogger is around 20M/s for CDH3 HADOOP and around 30M/s for 
CDH4 HADOOP.

P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.


K.O.


---- "top" output during normal data taking, notice mlogger data compression consumes 99% CPU at 51 
M/s data rate.

top - 08:55:22 up 72 days, 17:00,  5 users,  load average: 2.47, 2.32, 2.27
Tasks: 206 total,   2 running, 204 sleeping,   0 stopped,   0 zombie
Cpu(s): 52.2%us,  6.1%sy,  0.0%ni, 34.4%id,  0.8%wa,  0.1%hi,  6.2%si,  0.0%st
Mem:   3925556k total,  3064928k used,   860628k free,     3788k buffers
Swap: 32766900k total,   200704k used, 32566196k free,  2061048k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
 5826 trinat    20   0  437m 291m 287m R 97.6  7.6 636:39.63 mlogger                                                 
27617 trinat    20   0  310m 288m 288m S 24.6  7.5   6:59.28 mserver                                                 
 1806 ganglia   20   0  415m  62m 1488 S  0.9  1.6 668:43.55 gmond       


--- "top" output during lazylogger/HDFS activity. Observe high CPU use by lazylogger and fuse_dfs (the 
HADOOP HDFS client). Observe that CPU use adds up to 167% out of 200% available.

top - 08:57:16 up 72 days, 17:01,  5 users,  load average: 2.65, 2.35, 2.29
Tasks: 206 total,   2 running, 204 sleeping,   0 stopped,   0 zombie
Cpu(s): 57.6%us, 23.1%sy,  0.0%ni,  8.1%id,  0.0%wa,  0.4%hi, 10.7%si,  0.0%st
Mem:   3925556k total,  3642136k used,   283420k free,     4316k buffers
Swap: 32766900k total,   200692k used, 32566208k free,  2597752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                
 5826 trinat    20   0  437m 291m 287m R 68.7  7.6 638:24.07 mlogger                                                 
23450 root      20   0 1849m 200m 4472 S 64.4  5.2  75:35.64 fuse_dfs                                                
27617 trinat    20   0  310m 288m 288m S 18.5  7.5   7:22.06 mserver                                                 
26723 trinat    20   0 38720  11m 1172 S 17.9  0.3  22:37.38 lazylogger                                              
 7268 trinat    20   0 1007m  35m 4004 D  1.3  0.9 187:14.52 nautilus                                                
 1097 root      20   0     0    0    0 S  0.8  0.0 101:45.55 md3_raid1   
          Reply  25 Jun 2012, Stefan Ritt, Info, midas vme benchmarks 
> P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.

An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch 
can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level 
ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which 
punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout 
is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also 
make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.

- Stefan
             Reply  25 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks 
> > P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.
> 
> An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch 
> can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level 
> ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which 
> punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout 
> is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also 
> make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.
> 

In typical applications at TRIUMF we do not setup a private network for the data traffic - data from VME to backend computer
and data from backend computer to DCACHE all go through the TRIUMF network.

This is justified by the required data rates - the highest data rate experiment running right now is PIENU - running
at about 10 M/s sustained, nominally April through December. (This is 20% of the data rate of the present benchmark).

The next highest data rate experiment is T2K/ND280 in Japan running at about 20 M/s (neutrino beam, data rate
is dominated by calibration events).

All other experiments at TRIUMF run at lower data rates (low intensity light ion beams), but we are planning for an experiment
that will run at 300 M/s sustained over 1 week of scheduled beam time.

But we do have the technical capability to separate data traffic from the TRIUMF network - the VME processors and
the backend computers all have dual GigE NICs.

(I did not say so, but obviously the present benchmark at 50 M/s VME to backend and 20-30 M/s from backend to HDFS is a GigE network).

(I am not monitoring the TCP loss and retransmit rates at present time)

(The network switch between VME and backend is a "the cheapest available" rackmountable 8-port GigE switch. The network between
the backend and the HDFS nodes is mostly Nortel 48-port GigE edge switches with single-GigE uplinks to the core router).

K.O.
          Reply  26 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks canvas.pdf
> > > I am recording here the results from a test VME system using four VF48 waveform digitizers

Now we look at the detail of the event readout, or if you want, the real-time properties of the MIDAS 
multithreaded VME frontend program.

The benchmark system includes a TRIUMF-made VME-NIMIO32 VME trigger module which records the 
time of the trigger and provides a 20 MHz timestamp register. The frontend program is instrumented to 
save the trigger time and readout timing data into a special "trigger" bank ("VTR0"). The ROOTANA-based 
MIDAS analyzer is used to analyze this data and to make these plots.

Timing data is recorded like this:

NIM trigger signal ---> latched into the IO32 trigger time register (VTR0 "trigger time")
...
int read_event(pevent, etc) {
VTR0 "trigger time" = io32->latched_trigger_time();
VTR0 "readout start time" = io32->timestamp();
read the VF48 data
io32->release_busy();
VTR0 "readout end time" = io32->timestamp();
}

From the VTR0 time data, we compute these values:

1) "trigger latency" = "readout start time" - "trigger time" --- the time it takes us to "see" the trigger
2) "readout time" = "readout end time" - "readout start time" --- the time it takes to read the VF48 data
3) "busy time" = "readout end time" - "trigger time" --- time during which the "DAQ busy" trigger veto is 
active.
also computed is
4) "time between events" = "trigger time" - "time of previous trigger"

And plot them on the attached graphs:

1) "trigger latency" - we see average trigger latency is 5 usec with hardly any events taking more than 10 
usec (notice the log Y scale!). Also notice that there are 35 events that took longer that 100 usec (0.7% out 
of 5000 events).

So how "real time" is this? For "hard real time" the trigger latency should never exceed some maximum, 
which is determined by formal analysis or experimentally (in which case it will carry an experimental error 
bar - "response time is always less than X usec with probability 99.9...%" - the better system will have 
smaller X and more nines). Since I did not record the maximum latency, I can only claim that the 
"response time is always less than 1 sec, I am pretty sure of it".

For "soft real time" systems, such as subatomic particle physics DAQ systems, one is permitted to exceed 
that maximum response time, but "not too often". Such systems are characterized by the quantities 
derived from the present plot (mean response time, frequency of exceeding some deadlines, etc). The 
quality of a soft real time system is usually judged by non-DAQ criteria (i.e. if the DAQ for the T2K/ND280 
experiment does not respond within 20 msec, a neutrino beam spill an be lost and the experiment is 
required to report the number of lost spills to the weekly facility management meeting).

Can the trigger latency be improved by using interrupts instead of polling? Remember that on most 
hardware, the VME and PCI bus access time is around 1 usec and trigger latency of 5-10 usec corresponds 
to roughly 5-10 reads of a PCI or VME register. So there is not much room for speed up. Consider that an 
interrupt handler has to perform at least 2-3 PCI register reads (to determine the source of the interrupt 
and to clear the interrupt condition), it has to wake up the right process and do a rather slow CPU context 
switch, maybe do a cross-CPU interrupt (if VME interrupts are routed to the wrong CPU core). All this 
takes time. Then the Linux kernel interrupt latency comes into play. All this is overhead absent in pure-
polling implementations. (Yes, burning a CPU core to poll for data is wasteful, but is there any other use 
for this CPU core? With a dual-core CPU, the 1st core polls for data, the 2nd core runs mfe.c, the TCP/IP 
stack and the ethernet transmitter.)

2) "readout time" - between 7 and 8 msec, corresponding to the 50 Mbytes/sec VME block transfer rate. 
No events taking more than 10 msec. (Could claim hard real time performance here).

3) "busy time" - for the simple benchmark system it is a boring sum of plots (1) and (2). The mean busy 
time ("dead time") goes straight into the formula for computing cross-sections (if that is what you do).

4) "time between events" - provides an independent measurement of dead time - one can see that no 
event takes less than 7 msec to process and 27 events took longer than 10 msec (0.65% out of 4154 
events). If the trigger were cosmic rays instead of a pulser, this plot would also measure the cosmic ray 
event rate - one would see the exponential shape of the Poisson distribution (linear on Log scale, with the 
slope being the cosmic event rate).


K.O.
             Reply  26 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks Scalers.gifladd02-cpu.pngladd02-net.pngcanvas-1000-100Hz.pdf
> > > > I am recording here the results from a test VME system using four VF48 
waveform digitizers

Last message from this series. After all the tuning, I reduce the trigger rate 
from 120 Hz to 100 Hz to see
what happens when the backend computer is not overloaded and has some spare 
capacity.

event rate: 100 Hz (down from 120 Hz)
data rate: 37 Mbytes/sec (down from 50 M/s)
mlogger cpu use: 65% (down from 99%)

Attached:

1) trigger rate event plot: now the rate is solid 100 Hz without dropouts
2) CPU and Network plots frog ganglia: the spikes is lazylogger saving mid.gz 
files to HDFS storage
3) time structure plots:
a) trigger latency: mean 5 us, most below 10 us, 59 events (0.046%) longer than 
100 us, (bottom left graph) 7000 us is longest latency observed.
b) readout time is 7000-8000 us (same as before - VME data rate is independant 
from the trigger rate)
c) busy time: mean 7.2 us, 12 events (0.0094%) longer than 10 ms, longest busy 
time ever observed is 17 ms (bottom middle graph)
d) time between events is 10 ms (100 Hz pulser trigger), 1 event was missed 
about 10 times (spike at 20 ms) (0.0085%), more than 1 event missed never (no 
spike at 30 ms, 40 ms, etc).


CPU use on the backend computer:

top - 16:30:59 up 75 days, 35 min,  6 users,  load average: 0.98, 0.99, 1.01
Tasks: 206 total,   3 running, 203 sleeping,   0 stopped,   0 zombie
Cpu(s): 39.3%us,  8.2%sy,  0.0%ni, 39.4%id,  5.7%wa,  0.3%hi,  7.2%si,  0.0%st
Mem:   3925556k total,  3404192k used,   521364k free,     8792k buffers
Swap: 32766900k total,   296304k used, 32470596k free,  2477268k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 5826 trinat    20   0  441m 292m 287m R 65.8  7.6   2215:16 mlogger            
26756 trinat    20   0  310m 288m 288m S 16.8  7.5  34:32.03 mserver            
29005 olchansk  20   0  206m  39m  17m R 14.7  1.0  26:19.42 ana_vf48.exe       
 7878 olchansk  20   0   99m 3988  740 S  7.7  0.1  27:06.34 sshd               
29012 trinat    20   0  314m 288m 288m S  2.8  7.5   4:22.14 mserver            
23317 root      20   0     0    0    0 S  1.4  0.0  24:21.52 flush-9:3     


K.O.
    Reply  21 Jun 2012, Stefan Ritt, Info, midas vme benchmarks Screen_Shot_2012-06-21_at_10.14.09_.png
Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
continuous link speed of 83 MB/sec.
       Reply  21 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks 
> Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
> 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
> continuous link speed of 83 MB/sec.

What VME module is on the other end?

K.O.
          Reply  22 Jun 2012, Stefan Ritt, Info, midas vme benchmarks 
> > Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
> > 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
> > continuous link speed of 83 MB/sec.
> 
> What VME module is on the other end?
> 
> K.O.

The PSI-built DRS4 board, where we implemented the 2eVME protocol in the Virtex II FPGA. The same speed can be obtained with the commercial 
VME memory module CI-VME64 from Chrislin Industries (see http://www.controlled.com/vme/chinp1.html).

Stefan
             Reply  24 Jun 2012, Konstantin Olchanski, Info, midas vme benchmarks 
> > > Just for completeness: Attached is the VME transfer speed I get with the SIS3100/SIS1100 interface using 
> > > 2eVME transfer. This curve can be explained exactly with an overhead of 125 us per DMA transfer and a 
> > > continuous link speed of 83 MB/sec.
>
> [with ...]  the PSI-built DRS4 board, where we implemented the 2eVME protocol in the Virtex II FPGA.

This is an interesting hardware benchmark. Do you also have benchmarks of the MIDAS system using the DRS4 (measurements
of end-to-end data rates, maximum event rate, maximum trigger rate, any tuning of the frontend program
and of the MIDAS experiment to achieve those rates, etc)?

K.O.
ELOG V3.1.4-2e1708b5