Back Midas Rome Roody Rootana
  Midas DAQ System, Page 28 of 142  Not logged in ELOG logo
ID Date Author Topicdown Subject
  814   25 Jun 2012 Stefan RittInfomidas vme benchmarks
> P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.

An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch 
can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level 
ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which 
punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout 
is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also 
make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.

- Stefan
  815   25 Jun 2012 Konstantin OlchanskiInfomidas vme benchmarks
> > P.S. Observe the ever present unexplained event rate fluctuations between 130-140 event/sec.
> 
> An important aspect of optimizing your system is to keep the network traffic under control. I use GBit Ethernet between FE and BE, and make sure the switch 
> can accomodate all accumulated network traffic through its backplane. This way I do not have any TCP retransmits which kill you. Like if a single low-level 
> ethernet packet is lost due to collision, the TCP stack retransmits it. Depending on the local settings, this can be after a timeout of one (!) second, which 
> punches already a hole in your data rate. On the MSCB system actually I use UDP packets, where I schedule the retransmit myself. For a LAN, 10-100ms timeout 
> is there enough. The one second is optimized for a WAN (like between two continents) where this is fine, but it is not what you want on a LAN system. Also 
> make sure that the outgoing traffic (lazylogger) uses a different network card than the incoming traffic. I found that this also helps a lot.
> 

In typical applications at TRIUMF we do not setup a private network for the data traffic - data from VME to backend computer
and data from backend computer to DCACHE all go through the TRIUMF network.

This is justified by the required data rates - the highest data rate experiment running right now is PIENU - running
at about 10 M/s sustained, nominally April through December. (This is 20% of the data rate of the present benchmark).

The next highest data rate experiment is T2K/ND280 in Japan running at about 20 M/s (neutrino beam, data rate
is dominated by calibration events).

All other experiments at TRIUMF run at lower data rates (low intensity light ion beams), but we are planning for an experiment
that will run at 300 M/s sustained over 1 week of scheduled beam time.

But we do have the technical capability to separate data traffic from the TRIUMF network - the VME processors and
the backend computers all have dual GigE NICs.

(I did not say so, but obviously the present benchmark at 50 M/s VME to backend and 20-30 M/s from backend to HDFS is a GigE network).

(I am not monitoring the TCP loss and retransmit rates at present time)

(The network switch between VME and backend is a "the cheapest available" rackmountable 8-port GigE switch. The network between
the backend and the HDFS nodes is mostly Nortel 48-port GigE edge switches with single-GigE uplinks to the core router).

K.O.
  816   26 Jun 2012 Konstantin OlchanskiInfomidas vme benchmarks
> > > I am recording here the results from a test VME system using four VF48 waveform digitizers

Now we look at the detail of the event readout, or if you want, the real-time properties of the MIDAS 
multithreaded VME frontend program.

The benchmark system includes a TRIUMF-made VME-NIMIO32 VME trigger module which records the 
time of the trigger and provides a 20 MHz timestamp register. The frontend program is instrumented to 
save the trigger time and readout timing data into a special "trigger" bank ("VTR0"). The ROOTANA-based 
MIDAS analyzer is used to analyze this data and to make these plots.

Timing data is recorded like this:

NIM trigger signal ---> latched into the IO32 trigger time register (VTR0 "trigger time")
...
int read_event(pevent, etc) {
VTR0 "trigger time" = io32->latched_trigger_time();
VTR0 "readout start time" = io32->timestamp();
read the VF48 data
io32->release_busy();
VTR0 "readout end time" = io32->timestamp();
}

From the VTR0 time data, we compute these values:

1) "trigger latency" = "readout start time" - "trigger time" --- the time it takes us to "see" the trigger
2) "readout time" = "readout end time" - "readout start time" --- the time it takes to read the VF48 data
3) "busy time" = "readout end time" - "trigger time" --- time during which the "DAQ busy" trigger veto is 
active.
also computed is
4) "time between events" = "trigger time" - "time of previous trigger"

And plot them on the attached graphs:

1) "trigger latency" - we see average trigger latency is 5 usec with hardly any events taking more than 10 
usec (notice the log Y scale!). Also notice that there are 35 events that took longer that 100 usec (0.7% out 
of 5000 events).

So how "real time" is this? For "hard real time" the trigger latency should never exceed some maximum, 
which is determined by formal analysis or experimentally (in which case it will carry an experimental error 
bar - "response time is always less than X usec with probability 99.9...%" - the better system will have 
smaller X and more nines). Since I did not record the maximum latency, I can only claim that the 
"response time is always less than 1 sec, I am pretty sure of it".

For "soft real time" systems, such as subatomic particle physics DAQ systems, one is permitted to exceed 
that maximum response time, but "not too often". Such systems are characterized by the quantities 
derived from the present plot (mean response time, frequency of exceeding some deadlines, etc). The 
quality of a soft real time system is usually judged by non-DAQ criteria (i.e. if the DAQ for the T2K/ND280 
experiment does not respond within 20 msec, a neutrino beam spill an be lost and the experiment is 
required to report the number of lost spills to the weekly facility management meeting).

Can the trigger latency be improved by using interrupts instead of polling? Remember that on most 
hardware, the VME and PCI bus access time is around 1 usec and trigger latency of 5-10 usec corresponds 
to roughly 5-10 reads of a PCI or VME register. So there is not much room for speed up. Consider that an 
interrupt handler has to perform at least 2-3 PCI register reads (to determine the source of the interrupt 
and to clear the interrupt condition), it has to wake up the right process and do a rather slow CPU context 
switch, maybe do a cross-CPU interrupt (if VME interrupts are routed to the wrong CPU core). All this 
takes time. Then the Linux kernel interrupt latency comes into play. All this is overhead absent in pure-
polling implementations. (Yes, burning a CPU core to poll for data is wasteful, but is there any other use 
for this CPU core? With a dual-core CPU, the 1st core polls for data, the 2nd core runs mfe.c, the TCP/IP 
stack and the ethernet transmitter.)

2) "readout time" - between 7 and 8 msec, corresponding to the 50 Mbytes/sec VME block transfer rate. 
No events taking more than 10 msec. (Could claim hard real time performance here).

3) "busy time" - for the simple benchmark system it is a boring sum of plots (1) and (2). The mean busy 
time ("dead time") goes straight into the formula for computing cross-sections (if that is what you do).

4) "time between events" - provides an independent measurement of dead time - one can see that no 
event takes less than 7 msec to process and 27 events took longer than 10 msec (0.65% out of 4154 
events). If the trigger were cosmic rays instead of a pulser, this plot would also measure the cosmic ray 
event rate - one would see the exponential shape of the Poisson distribution (linear on Log scale, with the 
slope being the cosmic event rate).


K.O.
Attachment 1: canvas.pdf
canvas.pdf
  817   26 Jun 2012 Konstantin OlchanskiInfomidas vme benchmarks
> > > > I am recording here the results from a test VME system using four VF48 
waveform digitizers

Last message from this series. After all the tuning, I reduce the trigger rate 
from 120 Hz to 100 Hz to see
what happens when the backend computer is not overloaded and has some spare 
capacity.

event rate: 100 Hz (down from 120 Hz)
data rate: 37 Mbytes/sec (down from 50 M/s)
mlogger cpu use: 65% (down from 99%)

Attached:

1) trigger rate event plot: now the rate is solid 100 Hz without dropouts
2) CPU and Network plots frog ganglia: the spikes is lazylogger saving mid.gz 
files to HDFS storage
3) time structure plots:
a) trigger latency: mean 5 us, most below 10 us, 59 events (0.046%) longer than 
100 us, (bottom left graph) 7000 us is longest latency observed.
b) readout time is 7000-8000 us (same as before - VME data rate is independant 
from the trigger rate)
c) busy time: mean 7.2 us, 12 events (0.0094%) longer than 10 ms, longest busy 
time ever observed is 17 ms (bottom middle graph)
d) time between events is 10 ms (100 Hz pulser trigger), 1 event was missed 
about 10 times (spike at 20 ms) (0.0085%), more than 1 event missed never (no 
spike at 30 ms, 40 ms, etc).


CPU use on the backend computer:

top - 16:30:59 up 75 days, 35 min,  6 users,  load average: 0.98, 0.99, 1.01
Tasks: 206 total,   3 running, 203 sleeping,   0 stopped,   0 zombie
Cpu(s): 39.3%us,  8.2%sy,  0.0%ni, 39.4%id,  5.7%wa,  0.3%hi,  7.2%si,  0.0%st
Mem:   3925556k total,  3404192k used,   521364k free,     8792k buffers
Swap: 32766900k total,   296304k used, 32470596k free,  2477268k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 5826 trinat    20   0  441m 292m 287m R 65.8  7.6   2215:16 mlogger            
26756 trinat    20   0  310m 288m 288m S 16.8  7.5  34:32.03 mserver            
29005 olchansk  20   0  206m  39m  17m R 14.7  1.0  26:19.42 ana_vf48.exe       
 7878 olchansk  20   0   99m 3988  740 S  7.7  0.1  27:06.34 sshd               
29012 trinat    20   0  314m 288m 288m S  2.8  7.5   4:22.14 mserver            
23317 root      20   0     0    0    0 S  1.4  0.0  24:21.52 flush-9:3     


K.O.
Attachment 1: Scalers.gif
Scalers.gif
Attachment 2: ladd02-cpu.png
ladd02-cpu.png
Attachment 3: ladd02-net.png
ladd02-net.png
Attachment 4: canvas-1000-100Hz.pdf
canvas-1000-100Hz.pdf
  818   29 Jun 2012 Konstantin OlchanskiInfolazylogger write to HADOOP HDFS
> Anyhow, the new lazylogger writes into HDFS just fine and I expect that it would also work for writing into 
> DCACHE using PNFS (if ever we get the SL6 PNFS working with our DCACHE servers).
> 
> Writing into our test HDFS cluster runs at about 20 MiBytes/sec for 1GB files with replication set to 3.

Minor update to lazylogger and mlogger:

lazylogger default timeout 60 sec is too short for writing into HDFS - changed to 10 min.
mlogger checks for free space were insufficient and it would fill the output disk to 100% full before stopping 
the run. Now for disks bigger than 100GB, it will stop the run if there is less than 1GB of free space. (100% 
disk full would break the history and the elog if they happen to be on the same disk).

Also I note that mlogger.cxx rev 5297 includes a fix for a performance bug introduced about 6 month ago (mlogger 
would query free disk space after writing each event - depending on your filesystem configuration and the event 
rate, this bug was observed to extremely severely reduce the midas disk writing performance).

svn rev 5296, 5297
K.O.

P.S. I use these lazylogger settings for writing to HDFS. Write speed varies around 10-20-30 Mbytes/sec (4-node 
cluster, 3 replicas of each file).

[local:trinat_detfac:S]Settings>pwd
/Lazy/HDFS/Settings
[local:trinat_detfac:S]Settings>ls -l
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
Period                          INT     1     4     7m   0   RWD  10
Maintain free space (%)         INT     1     4     7m   0   RWD  20
Stay behind                     INT     1     4     7m   0   RWD  0
Alarm Class                     STRING  1     32    7m   0   RWD  
Running condition               STRING  1     128   7m   0   RWD  ALWAYS
Data dir                        STRING  1     256   7m   0   RWD  /home/trinat/online/data
Data format                     STRING  1     8     7m   0   RWD  MIDAS
Filename format                 STRING  1     128   7m   0   RWD  run*
Backup type                     STRING  1     8     7m   0   RWD  Disk
Execute after rewind            STRING  1     64    7m   0   RWD  
Path                            STRING  1     128   7m   0   RWD  /hdfs/users/trinat/data
Capacity (Bytes)                FLOAT   1     4     7m   0   RWD  5e+09
List label                      STRING  1     128   7m   0   RWD  HDFS
Execute before writing file     STRING  1     64    7m   0   RWD  
Execute after writing file      STRING  1     64    7m   0   RWD  
Modulo.Position                 STRING  1     8     7m   0   RWD  
Tape Data Append                BOOL    1     4     7m   0   RWD  y

K.O.
  822   27 Jul 2012 Cheng-Ju LinInfoMIDAS under Scientific Linux 6
Hi All,

I was wondering if anyone has attempted to install MIDAS under Scientific Linux 6?  I am planning to install 
Scientific Linux on one of the PCs in our lab to run MIDAS. I would like to know if anyone has been 
successful in getting MIDAS to run under SL6.  Thanks.

Cheng-Ju
  823   31 Jul 2012 Pierre-Andre AmaudruzInfoMIDAS under Scientific Linux 6
Hi Cheng-Ju,

Midas will install and run under SL6. We're presently running SL6.2.
Cheers, PAA

> Hi All,
> 
> I was wondering if anyone has attempted to install MIDAS under Scientific Linux 6?  I am planning to install 
> Scientific Linux on one of the PCs in our lab to run MIDAS. I would like to know if anyone has been 
> successful in getting MIDAS to run under SL6.  Thanks.
> 
> Cheng-Ju
  833   05 Sep 2012 Stefan RittInfoNew pipe compression implemented in mlogger
A new pipe compression has been implemented in mlogger thanks to Fedor Ignatov from BINP 
Novosibirsk. The way it works that the logger write into a pipe instead directly into a file. The pipe can 
then be connected to any compression program without the need to copile against any additional C 
library.

To use is, enter as the filename for example

|bzip2>run%05d.mid     (note the pipe '|' in front of the bzip2)

This way the data stream is run through the bzip2 program, which is known to have better compression 
ratio than gzip. Furthermore, the parallel version of bzip2 can be used, which spreads over all available 
CPU cures and speeds up compression almost linearly with the number of cores. This parallel version 
called pbzip2 can be found here:

http://compression.ca/pbzip2/

It can be easily compiled and installed. Using this method in the MEG experiment at PSI, we can compress 
our waveform data to 37% or it's original size (49% with gzip), and on 8 cores we get a compression rate 
of about 40 MBytes/sec (23 MBytes with gzip on a single core).

The disadvantage of that method is that one cannot see the compression ratio online, but this is not a big 
deal I guess. The new version has been committed as rev. 5324. 

/Stefan
  835   10 Sep 2012 Shaun MeadInfoMIDAS button to display image
Hi,

I've written a python script that reads some data from a file and generates a
.png image. I want to have a button on my MIDAS status page that:

- executes the script and waits for it to finish,
- then displays the image

How can I do that? I tried using the sequencer to just execute the script every
30 seconds, but I can't get it to work, and it would be better to only execute
the script on demand anyway. 

I also am having trouble getting image display to work. I have the ODB keys set:

[local:oven1:S]/Custom>ls
Temperature Map&                /home/deap/ovendaq/online/index.html
Images

[local:oven1:S]/Custom>ls Images/temps.png/           
Background                      /home/deap/ovendaq/online/temps.png

And the HTML file is just this:
<img src="temps.png">

But the image won't display. It shows a "broken" picture, and when I try to view
it directly it says: Invalid custom page: Page not found in ODB.

Any help would be appreciated...

Thanks
Shaun
  836   11 Sep 2012 Stefan RittInfoMIDAS button to display image
> Hi,
> 
> I've written a python script that reads some data from a file and generates a
> .png image. I want to have a button on my MIDAS status page that:
> 
> - executes the script and waits for it to finish,
> - then displays the image
> 
> How can I do that? I tried using the sequencer to just execute the script every
> 30 seconds, but I can't get it to work, and it would be better to only execute
> the script on demand anyway. 
> 
> I also am having trouble getting image display to work. I have the ODB keys set:
> 
> [local:oven1:S]/Custom>ls
> Temperature Map&                /home/deap/ovendaq/online/index.html
> Images
> 
> [local:oven1:S]/Custom>ls Images/temps.png/           
> Background                      /home/deap/ovendaq/online/temps.png
> 
> And the HTML file is just this:
> <img src="temps.png">
> 
> But the image won't display. It shows a "broken" picture, and when I try to view
> it directly it says: Invalid custom page: Page not found in ODB.
> 
> Any help would be appreciated...
> 
> Thanks
> Shaun


If you use the "custom" image system, you need to use GIF images. mhttpd can dynamically create GIF 
images, 
with a background image and overlaid labels, bar graphs etc. But mhttpd just contains a GIF library to do 
that 
in memory, but no PNG library.

Actually I would recommend you not to use a script to create an image, but use the custom image system 
to 
display temperatures. In the attachment you see an page from our experiment which contains a 
background image (the greyish boxes), labels (white temperature boxes), bar graphs (blue level boxes) 
and history pages (left side). This is all dynamically created inside mhttpd using the custom page system 
without any external script. All you have to do is to get the temperatures and levels inside the ODB via the 
slow control system. If you want, I can send you the full code for that page.

Cheers,
Stefan
Attachment 1: Screen_Shot_2012-09-11_at_14.36.56_.png
Screen_Shot_2012-09-11_at_14.36.56_.png
  854   24 Jan 2013 Konstantin OlchanskiInfoCompression benchmarks
In the DEAP experiment, the normal MIDAS mlogger gzip compression  is not fast enough for some data 
taking modes, so I am doing tests of other compression programs. Here is the results.

Executive summary:

fastest compression is no compression (cat at 1800 Mbytes/sec - memcpy speed), next best are:
"lzf" at 300 Mbytes/sec and  "lzop" at 250 Mbytes/sec with 50% compression
"gzip -1" at around 70 Mbytes/sec with around 70% compression
"bzip2" at around 12 Mbytes/sec with around 80% compression
"pbzip2", as advertised, scales bzip2 compression linearly with the number of CPUs to 46 Mbytes/sec (4 
real CPUs), then slower to a maximum 60 Mbytes/sec (8 hyper-threaded CPUs).

This confirms that our original choice of "gzip -1" method for compression using zlib inside mlogger is 
still a good choice. bzip2 can gain an additional 10% compression at the cost of 6 times more CPU 
utilization. lzo/lzf can do 50% compression at GigE network speed and at "normal" disk speed.

I think these numbers make a good case for adding lzo/lzf compression to mlogger.

Comments about the data:

- time measured is the "elapsed" time of the compression program. it excludes the time spent flushing 
the compressed output file to disk.
- the relevant number is the first rate number (input data rate)
- test machine has 32GB of RAM, so all I/O is cached, disk speed does not affect these results
- "cat" gives a measure of overall machine "speed" (but test file is too small to give precise measurement)
- "gzip -1" is the recommended MIDAS mlogger compression setting
- "pbzip2 -p8" uses 8 "hyper-threaded" CPUs, but machine only has 4 "real" CPU cores

<pre>
cat                 : time   0.2s, size    431379371    431379371, comp   0%, rate 1797M/s 1797M/s
cat                 : time   0.6s, size   1013573981   1013573981, comp   0%, rate 1809M/s 1809M/s
cat                 : time   1.1s, size   2027241617   2027241617, comp   0%, rate 1826M/s 1826M/s

gzip -1             : time   6.4s, size    431379371    141008293, comp  67%, rate  67M/s  22M/s
gzip                : time  30.3s, size    431379371    131017324, comp  70%, rate  14M/s   4M/s
gzip -9             : time  94.2s, size    431379371    133071189, comp  69%, rate   4M/s   1M/s

gzip -1             : time  15.2s, size   1013573981    347820209, comp  66%, rate  66M/s  22M/s
gzip -1             : time  29.4s, size   2027241617    638495283, comp  69%, rate  68M/s  21M/s

bzip2 -1            : time  34.4s, size    431379371     91905771, comp  79%, rate  12M/s   2M/s
bzip2               : time  33.9s, size    431379371     86144682, comp  80%, rate  12M/s   2M/s
bzip2 -9            : time  34.2s, size    431379371     86144682, comp  80%, rate  12M/s   2M/s

pbzip2 -p1          : time  34.9s, size    431379371     86152857, comp  80%, rate  12M/s   2M/s (1 CPU)
pbzip2 -p1 -1       : time  34.6s, size    431379371     91935441, comp  79%, rate  12M/s   2M/s
pbzip2 -p1 -9       : time  34.8s, size    431379371     86152857, comp  80%, rate  12M/s   2M/s

pbzip2 -p2          : time  17.6s, size    431379371     86152857, comp  80%, rate  24M/s   4M/s (2 CPU)
pbzip2 -p3          : time  11.9s, size    431379371     86152857, comp  80%, rate  36M/s   7M/s (3 CPU)
pbzip2 -p4          : time   9.3s, size    431379371     86152857, comp  80%, rate  46M/s   9M/s (4 CPU)
pbzip2 -p4          : time  45.3s, size   2027241617    384406870, comp  81%, rate  44M/s   8M/s
pbzip2 -p8          : time  33.3s, size   2027241617    384406870, comp  81%, rate  60M/s  11M/s

lzop -1             : time   1.6s, size    431379371    213416336, comp  51%, rate 261M/s 129M/s
lzop                : time   1.7s, size    431379371    213328371, comp  51%, rate 249M/s 123M/s
lzop                : time   4.3s, size   1013573981    515317099, comp  49%, rate 234M/s 119M/s
lzop                : time   7.3s, size   2027241617    978374154, comp  52%, rate 277M/s 133M/s
lzop -9             : time 176.6s, size    431379371    157985635, comp  63%, rate   2M/s   0M/s

lzf                 : time   1.4s, size    431379371    210789363, comp  51%, rate 299M/s 146M/s
lzf                 : time   3.6s, size   1013573981    523007102, comp  48%, rate 282M/s 145M/s
lzf                 : time   6.7s, size   2027241617    972953255, comp  52%, rate 303M/s 145M/s

lzma -0             : time  27s, size    431379371    112406964, comp  74%, rate  15M/s   4M/s
lzma -1             : time  35s, size    431379371    111235594, comp  74%, rate  12M/s   3M/s
lzma: > 5 min, killed

xz -0               : time  28s, size    431379371    112424452, comp  74%, rate  15M/s   4M/s
xz -1               : time  35s, size    431379371    111252916, comp  74%, rate  12M/s   3M/s
xz: > 5 min, killed
</pre>

Columns are:
compression program
time: elapsed time of the compression program (excludes the time to flush output file to disk)
size: size of input file, size of output file
comp: compression ration (0%=no compression, 100%=file compresses into nothing)
rate: input data rate (size of input file divided by elapsed time), output data rate (size of output file 
divided by elapsed time)

Machine used for testing (from /proc/cpuinfo):
Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz
quad core cpu with hyper-threading (8 CPU total)
32 GB quad-channel DDR3-1600.

Script used for testing:

#!/usr/bin/perl -w

my $x = join(" ", @ARGV);

my $in  = "test.mid";
my $out = "test.mid.out";
my $tout = "test.time";

my $cmd = "/usr/bin/time -o $tout -f \"%e\" /usr/bin/time $x < test.mid > test.mid.out";

print $cmd,"\n";

my $t0 = time();
system $cmd;
my $t1 = time();

my $c = `cat $tout`;
print "Elapsed time: $c";

my $t = $c;

#system "/bin/ls -l $in $out";

my $sin  = -s $in;
my $sout = -s $out;

my $xt = $t1-$t0;
$xt = 1 if $xt<1;

print "Total time: $xt\n";

print sprintf("%-20s: time %5.1fs, size %12d %12d, comp %3.0f%%, rate %3dM/s %3dM/s", $x, $t, $sin, 
$sout, 100*($sin-$sout)/$sin, ($sin/$t)/1e6, ($sout/$t)/1e6), "\n";

exit 0;
# end

Typical output:

[deap@deap00 pet]$ ./r.perl lzf    
/usr/bin/time -o test.time -f "%e" /usr/bin/time lzf < test.mid > test.mid.out
1.27user 0.15system 0:01.44elapsed 99%CPU (0avgtext+0avgdata 2800maxresident)k
0inputs+411704outputs (0major+268minor)pagefaults 0swaps
Elapsed time: 1.44
Total time: 3
lzf                 : time   1.4s, size    431379371    210789363, comp  51%, rate 299M/s 146M/s

K.O.
  858   06 Feb 2013 Stefan RittInfoCompression benchmarks
I redid the tests from Konstantin for our MEG experiment at PSI. The event structure is different, so it
is interesting how the two different experiments compare. We have an event size of 2.4 MB and a trigger
rate of ~10 Hz, so we produce a raw data rate of 24 MB/sec. A typical run contains 2000 events, so has a 
size of 5 GB. Here are the results:


cat                 : time   7.8s, size   4960156030   4960156030, comp   0%, rate 639M/s 639M/s

gzip -1             : time 147.2s, size   4960156030   2468073901, comp  50%, rate  33M/s  16M/s

pbzip2 -p1          : time 679.6s, size   4960156030   1738127829, comp  65%, rate   7M/s   2M/s (1 CPU)
pbzip2 -p8          : time  96.1s, size   4960156030   1738127829, comp  65%, rate  51M/s  18M/s (8 CPU)


As one can see, our compression ratio is poorer (due to the quasi random noise in our waveforms), but the
difference between gzip -1 and pbzip2 is larger (15% instead 10% for DEAP). The single CPU version of
pbzip cannot sustain our DAQ rate of 24 MB, but the parallel version can. Actually we have a somehow old
dual-core dual-CPU board 2.5 GHz Xenon box, and make 8 hyper-threading CPUs out of the total 4 cores.
Interestingly the compression rate scales with 7.3 for 8 virtual cores, so hyper-threading does its job.
So we take all our data with the pbzip2 compression. The additional 15% as compared with gzip does 
not sound much, but we produce raw 250 TB/year. So gzip gives us 132 TB/year and pbzip2 gives 
us 98 TB/year, and we save quite some disks.

Note that you can run bzip2 (as all the other methods) already now with the current logger, if you specify
an external compression program in the ODB using the pipe functionality:


local:MEG:S]/>cd Logger/Channels/0/Settings/
[local:MEG:S]Settings>ls
Active                          y
Type                            Disk
Filename                        |pbzip2>/megdata/run%06d.mid.bz2
Format                          MIDAS
Compression                     0
ODB dump                        y
Log messages                    0
Buffer                          SYSTEM
Event ID                        -1
Trigger mask                    -1
Event limit                     0
Byte limit                      0
Subrun Byte limit               0
Tape capacity                   0
Subdir format                   
Current filename                /megdata/run197090.mid.bz2
</pre>
  863   13 Feb 2013 Konstantin OlchanskiInfoReview of github and bitbucket
I have done a review of github and bitbucket as candidates for hosting GIT repositories for collaborative 
DAQ-type projects. Here is my impressions.

1. GIT as a software management tool seems to be a reasonable choice for DAQ-type projects. "master" 
repositories can be hosted at places like github or self-hosted (in the simplest case, only 
http://host/~user web access is required to host a git repository), for each "daq project" aka "experiment" 
one would "clone" the master repository, perform any local modifications as required, with full local 
version control, and when desired feed the changes back to the master repository as direct commits (git 
push), as patches posted to github ("pull requests") or patches emailed to the maintainers (git format-
patch).

2. Modern requirements for hosting a DAQ-type project include:
a) code repository (GIT, etc) with reasonably easy user access control (i.e. commit privileges should be 
assigned by the project administrators directly, regardless of who is on the payroll at which lab or who is 
a registered user of CERN or who is in some LDAP database managed by some IT departement 
somewhere).
b) a wiki for documentation, with similar user access control requirements.
c) a mailing list, forum or bug tracking system for communication and "community building"
d) an ability to web host large static files (schematics, datasheets, firmware files, etc)
e) reasonable web-based tools for browsing the files, looking at diffs, "cvs annotate/git blame", etc.

3. Both github and bitbucket satisfy most of these requirements in similar ways:

a) GIT repositories:
aa) access using git, ssh and https with password protection. ssh keys can be uploaded to the server, 
permitting automatic commits from scripts and cron jobs.
bb) anonymous checkout possible (cannot be disabled)
cc) user management is simple: participants have to self-register, confirm their email address, the project 
administrator to gives them commit access to specific git repositories (and wikis).
dd) for the case of multiple project administrators, one creates "teams" of participants. In this 
configuration the repositories are owned by the "team" and all designated "team administrators" have 
equal administrative access to the project.

b) Wiki:
aa) both github and bitbucket provide rudimentary wikis, with wiki pages stored in secondary git 
repositories (*NOT* as a branch or subdirectory of the main repo).
bb) github supports "markdown" and "mediawiki" syntax
cc) bitbucket supports "markdown" and "creole" syntax (all documentation and examples use the "creole" 
syntax).
dd) there does not seem to be any way to set the "project standard" syntax - both wikis have the "new 
page" editor default to the "markdown" syntax.
ee) compared to mediawiki (wikipedia, triumf daq wiki) and even plone, both github and bitbucket wikis 
lack important features:
1) cannot edit individual sections of a page, only the whole page at once, bad if you have long pages.
2) cannot upload images (and other documents) directly through the web editor/interface. Both wikis 
require that you clone the wiki git repository, commit image and other files locally and push the wiki git 
repo into the server (hopefully without any collisions), only then you can use the images and documents 
in the wiki.
3) there is no "preview" function for images - in mediawiki I can have small size automatically generated 
"preview" images on the wiki page, when I click on them I get the full size image. (Even "elog" can do this!)
ff) to be extra helpful, the wiki git repository is invisible to the normal git repository graphical tools for 
looking at revisions, branches, diffs, etc. While github has a special web page listing all existing wiki 
pages, bitbucket does not have such a page, so you better write down the filenames on a piece of paper.

c) mailing list/forum/bug tracking:
aa) both github and bitbucket implement reasonable bug tracking systems (but in both systems I do not 
see any button to export the bug database - all data is stuck inside the hosting provider. Perhaps there is 
a "hidden button" somewhere).
bb) bitbucket sends quite reasonable email notifications
cc) github is silent, I do not see any email notifications at all about anything. Maybe github thinks I do not 
want to see notices about my own activities, good of it to make such decisions for me.

d) hosting of large files: both git and wiki functions can host arbitrary files (compared to mediawiki only 
accepting some file types, i.e. Quartus pof files are rejected).

e) web based tools: thumbs up to both! web interfaces are slick and responsive, easy to use.

Conclusions:

Both github and bitbucket provide similar full-featured git repository hosting, user management and bug 
tracking.

Both provide very rudimentary wiki systems. Compared to full featured wikis (i.e. mediawiki), this is like 
going back to SCCS for code management (from before RCS, before CVS, before SVN). Disappointing. A 
deal breaker if my vote counts.

K.O.
  864   14 Feb 2013 Stefan RittInfoReview of github and bitbucket
Let me add my five cents:

We use bitbucket now since two months at PSI, and are very happy with it.

Pros:

- We like the GIT flow model (http://nvie.com/posts/a-successful-git-branching-model/). You can at the same time do hot fixes, have a "distribution 
version", and keep a development branch, where you can try new things without compromising the distribution.
- Nice and fast Web interface, especially the "blame" is lightning fast compared to SVN/CVS
- GIT is non-centralized, so your local clone of a repository contains everything. If bitbucket is down/asks for money, you can continue with your local 
repository and clone it to some other hosting service, or host it yourself
- SourceTree (http://www.sourcetreeapp.com/) is a nice GUI for Mac lovers. 
- Easy user management
- Free for academic use

Con:

- Wiki is limited as KO wrote, so it should not be used as a "full" wiki to replace Plone for example, just to annotate your project
- SVN revision number is gone. This is on purpose since it does not make sense any more if you keep several parallel branches (merging becomes a 
nightmare), so one has to use either the (random) commit-ID or start tagging again.

So I conclusion, I would say that it's time to switch MIDAS to GIT. We'll probably do that in July when I will be at TRIUMF.

/Stefan
  866   08 Mar 2013 Konstantin OlchanskiInfoODB /Experiment/MAX_EVENT_SIZE
Somebody pointed out an error in the MIDAS documentation regarding maximum event size 
supported by MIDAS and the MAX_EVENT_SIZE #define in midas.h.

Since MIDAS svn rev 4801 (August 2010), one can create events with size bigger than 
MAX_EVENT_SIZE in midas.h (without having to recompile MIDAS):

To do so, one must increase:
- the value of ODB /Experiment/MAX_EVENT_SIZE
- the size of the SYSTEM shared memory event buffer (and any buffers used by the event builder, 
etc)
- max_event_size & co in your frontend.

Actual limits on the bank size and event size are written up here:
https://ladd00.triumf.ca/elog/Midas/757

The bottom line is that the maximum event size is limited by the size of the SYSTEM buffer which is 
limited by the physical memory of your computer. No recompilation of MIDAS necessary.

K.O.
  867   01 Apr 2013 Randolf PohlInfoReview of github and bitbucket
And my 2ct:

Go for git!

I've been using git since 2007 or so, after cvs and svn. Git has some killer features which I can't miss any more:

* No central repo. Have all the history with you on the train.
* Branching and merging, with stable branches and feature branches.
  Happy hacking while my students do analysis on a stable version.
  Or multiple development branches for several features.
  And merging really works, including fixing up merge conflicts.
* "git bisect" for finding which commit introduced a (reproducible) bug.
* "gitk --all"

I use git for everything: Software, tex, even (Ooffice) Word documents.

Go for git. :-)

Randolf
  868   02 Apr 2013 Konstantin OlchanskiInfoReview of github and bitbucket
Hi, thanks for your positive feedback. I have been using git for small private projects for a few years now
and I like it. It is similar to the old SCCS days - good version control without having to setup servers,
accounts, doodads, etc.

> * No central repo. Have all the history with you on the train.
> * Branching and merging, with stable branches and feature branches.
>   Happy hacking while my students do analysis on a stable version.
>   Or multiple development branches for several features.

This is the part that worries me the most. Without a "central" "authoritative" repository,
in just a few quick days, everybody will have their own incompatible version of midas.

I guess I am okey with your private midas diverging from mainstream, but when *I* end up
with 10 different incompatible versions just in *my* repository, can that be good?

>   And merging really works, including fixing up merge conflicts.

But somebody still has to do it. With a central repository, the problem takes care of
itself - each developer has to do their own merging - with svn, you cannot commit
to the head without merging the head into your code first. But with git, I can just throw
my changes int some branch out there hoping that somebody else would do the merging.
But guess what, there aint anybody home but us chickens. We do not have a mad finn here
to enforce discipline and keep us in shape...

As an example, look at the HADOOP/HDFS code development, they have at least 3 "mainstream"
branches going, neither has all the features combined together and each branch has bugs with
the fixes in a different branch. What a way to run a railroad.

> * "git bisect" for finding which commit introduced a (reproducible) bug.
> * "gitk --all"
>
> Go for git. :-)

Absolutely. For me, as soon as I can wrap my head around this business of "who does all the merging".

K.O.
  869   02 Apr 2013 Randolf PohlInfoReview of github and bitbucket
Hi Konstantin,

> > * No central repo. Have all the history with you on the train.
> > * Branching and merging, with stable branches and feature branches.
> >   Happy hacking while my students do analysis on a stable version.
> >   Or multiple development branches for several features.
> 
> This is the part that worries me the most. Without a "central" "authoritative" repository,
> in just a few quick days, everybody will have their own incompatible version of midas.

No! This is probably one of the biggest misunderstandings of the git workflow.

You can of course _define_ one central repo: This is the one that you and Stefan decide to be "the source" (as
Linus does for the kernel). It's like the central svn repo: Only Stefan and you can push to it, and everybody
else will pull from it. Why should I pull MIDAS from some obscure source, when your "public" repo is available.

Look at the Linux Kernel: Linus' version is authoritative, even though everybody and his best friend has his
own kernel repo.

So, the main workflow does not change a lot: You collect patches, commit them, and "push" them to the central
repo. All users "pull" from this central repo. This is very much what svn offers.

> 
> I guess I am okey with your private midas diverging from mainstream, but when *I* end up
> with 10 different incompatible versions just in *my* repository, can that be good?

See above: _You_ define what the central repo is.

But: I _bet_ you will very soon have 10 versions in your personal repo, because _you choose_ to do so. It's
just SO much easier. The non-linear history with many branches is a _feature_. I can't live without it any more:


Looking at my MIDAS analyzer:

I have a "public" repo in /pub/git/lamb.git. This is where I publish my analyzer versions. All my collaborators
pull from this.

Then I have my personal repo in ~/src/lamb. 
This is where I develop. When I think something is ready for the public, I merge this branch into the public repo. 

Whenever I start to work on a new feature, I create a branch in my _local_ repo (~/src/lamb).  I can fiddle and
play, not affecting anybody else, because it never sees the public repo.
OK, collaborator A finds a bug. I switch to my local copy of the public version, fix the bug, and push the fix
to the publix repo. Then I go back to my (local) feature branch, merge the bug fix, and continue hacking.
Only when the feature is ready, I push it to the public repo.

Things get moe interesting as you work on several features simultaneously. You have e.g. 3 topic branches:
(a) is nearly ready, and you want a bunch of people to test it.
    push branch "feature (a)" to the public repo and tell the people which branch to pull.
(b) is WIP, you hack on it without affecting (a).
(c) is bug fixes which may or may not affect (a) or (b).
And so on.

You will soon discover the beauty of several parallel branches.

Plus, git merges are SO simple that you never think about "how to merge"

> 
> >   And merging really works, including fixing up merge conflicts.
> 
> But somebody still has to do it. With a central repository, the problem takes care of
> itself - each developer has to do their own merging - with svn, you cannot commit
> to the head without merging the head into your code first. But with git, I can just throw
> my changes int some branch out there hoping that somebody else would do the merging.
> But guess what, there aint anybody home but us chickens. We do not have a mad finn here
> to enforce discipline and keep us in shape...

See above: You will have the exact same workflow in git, if you like.




> As an example, look at the HADOOP/HDFS code development, they have at least 3 "mainstream"
> branches going, neither has all the features combined together and each branch has bugs with
> the fixes in a different branch. What a way to run a railroad.

I haven't look at this. All I can say: Branches are one of the best features.

> 
> > * "git bisect" for finding which commit introduced a (reproducible) bug.
> > * "gitk --all"
> >
> > Go for git. :-)
> 
> Absolutely. For me, as soon as I can wrap my head around this business of "who does all the merging".

Easy: YOU do it.

Keep going as in svn: Collect patches, and send them out.

And then, try "git checkout -b my_first_branch", hack, hack, hack,
"git merge master".

Best,

Randolf


> 
> K.O.
  870   03 Apr 2013 Stefan RittInfoReview of github and bitbucket
> * "git bisect" for finding which commit introduced a (reproducible) bug.

I did not know this command, so I read about it. This IS WONDERFUL! I had once (actually with MSCB) the case that a bug was introduced i the last 100 
revisions, but I did not know in which. So I checked out -1, -2, -3 revisions, then thought a bit, then tried -99, -98, then had the bright idea to try -50, then 
slowly converged. Later I realised that I should have done a binary search, like -50, if ok try -25, if bad try -37, and so on to iteratively find the offending 
commit. Finding that there is a command it git which does this automatically is great news.

Stefan
  871   03 Apr 2013 Randolf PohlInfoReview of github and bitbucket
> > * "git bisect" for finding which commit introduced a (reproducible) bug.
> 
> I did not know this command, so I read about it. This IS WONDERFUL! I had once (actually with MSCB) the case that a bug was introduced i the last 100 
> revisions, but I did not know in which. So I checked out -1, -2, -3 revisions, then thought a bit, then tried -99, -98, then had the bright idea to try -50, then 
> slowly converged. Later I realised that I should have done a binary search, like -50, if ok try -25, if bad try -37, and so on to iteratively find the offending 
> commit. Finding that there is a command it git which does this automatically is great news.

even more so considering the nonlinear history (due to branching) in a regular git repo.
ELOG V3.1.4-2e1708b5