Back Midas Rome Roody Rootana
  Midas DAQ System, Page 16 of 142  Not logged in ELOG logo
    Reply  28 Sep 2006, Stefan Ritt, Suggestion, Increase of maximum event size 

K.O. wrote:
Now, we have per-buffer tunable size (see message
https://ladd00.triumf.ca/elog/Midas/283) and in the long run, I would prefer the
compiled-in limit to go away: already all memory is allocated dynamically and
the MAX_EVENT_SIZE is only useful as kind of a sanity check against frontend
misconfiguration or against malformed events.

If MAX_EVENT_SIZE goes away, the maximum event size becomes limited by the
largest SysV shared memory segment permitted by Linux (via sysctl kernel.shmmax).

To go beyound the limit on SysV shared memories, on can use mmap() based shared
memory: this is limited by available RAM+swap (and disk space for the
.SYSTEM.SHM file). Current MIDAS system.c has an experimental implementation of
mmap() shared memory, but AFAIK it has not been used in any production system, yet.


MAX_EVENT_SIZE is also used for the RPC layer, since the receiving buffer must hold at
least one event. It is right that this can and should be made dynamically. Concerning
the shared memory there is the problem that it cannot be increased when any program is
running and attached to the shared memory, so it can only be defined at startup of the
first program creating the shared memory.

The sanity check in the frontend is done against max_event_size defined in frontend.c which can be smaller than MAX_EVENT_SIZE (some front-ends have limited memory).

So I agree that this issue may need revision, maybe something for me next visit Wink
    Reply  28 Sep 2006, Stefan Ritt, Bug Report, mhttpd elog corruption via double-edit 
> I do not know how to "properly" fix this bug without changing the indexing
> scheme to something similar to what is used by elogd- message numbers instead of
> file indices. In the existing scheme, message editing also breaks URLs shown in
> the email notifications (they contain file indices that point to the wrong
> places after messages are moved around by editing) and "reply threading" links.

Well, the development of elogd with it's message numbers was actually stimulated by
the problem you mentioned. After that all those problems went away. Another
incarnation of that problem is if you edit an mhttpd log file manually. Afterwards
the file offsets are different and the system gets corrupted. To fix this properly,
one would have to backport the el_xxx functions from elogd to mhttpd, or, even
simpler, remove the elog functionality in mhttpd and "force" everybody to use elogd
(after doing elconv to convert the files into the new format).
    Reply  16 Oct 2006, Stefan Ritt, Bug Fix, "make install" error on MacOS 10.4.7, svn 3366 
> While executing "make install" under MacOS 10.4.7, you may encounter errors about "dio". It is the 
> problem of "Makefile". I did some change to it and attach the diff file here.

I committed your patch. Thank you.
    Reply  16 Oct 2006, Stefan Ritt, Bug Fix, Build error with mana.c while using CERNLIB, svn 3366 
Committed, thanks.
    Reply  08 Jan 2007, Stefan Ritt, Suggestion, Access to out_info from mana.c 
I changed out_info into a global structure definition ANA_OUTPUT_INFO and put it into
midas.h, so it can be accessed easily from the user analyzer source code.

> Would it be relevant to transform out_info into a *non-static* variable of a type
> defined by a *named* struct?
> Currently,  programs that  try to access out_info cannot do it anymore; and they
> typically copy the struct definition from mana.c, which is not robust against future
> changes in mana.c.
> 
> If mana.c could be changed in the way described above, that would be great . 
> Otherwise, is it safe to patch it myself for local use?  or is there a better way of
> accessing out_info from mana.c?
> 
> As always, any help would be much appreciated :)
> 
> EOL
> 
> > Hello,
> > 
> > Is it possible to access out_info (defined in mana.c) from another program?
> > 
> > In fact, out_info is now defined as an (anonymous) "static struct" in mana.c,
> > which it seems to me precludes any direct use in another program.  Is there an
> > indirect way of getting ahold of out_info?  or of the information it contains?
> > 
> > out_info used to be defined as a *non-static* struct, and the code I'm currently
> > modifying used to compile seamlessly: it now stops the compilation during
> > linking time, as out_info is now static and the program I have to compile
> > contains an "extern struct {} out_info".
> > 
> > Any help would be much appreciated!  I searched in vain in this forum for
> > details about out_info and I really need to access the information it contains!
> > 
> > EOL (a pure MIDAS novice)
    Reply  11 Jan 2007, Stefan Ritt, Forum, Shared memory problems 
> Hello,
> 
> Just did a fresh install of MIDAS from the SVN repository under CentOS and
> everything compiles fine, but when I go to run the frontend (using dio), I get
> the following error message:
> 
> Connect to experiment ...[odb.c:868:db_open_database] Different database format:
>  Shared memory is 14, program is 2
> [midas.c:1763:cm_connect_experiment1] cannot open database
> 
> 
> Any ideas on what the problem could be, or how to fix it?  

You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
front, so you need a 'ls -alg' to see it). Delete that file and try again.
    Reply  11 Jan 2007, Stefan Ritt, Forum, Shared memory problems 
That sounds like you mix versions: You have an old executable (maybe your mlogger) which
has been linked against the old midas version, but you create the ODB with the new
odbedit or frontend. The new version complains if it finds an ODB from a previous version
(the error you reported first), but an old program does not have that version check, so
it finds a different binary ODB structure and crashes.

> Thanks for your help.  I tried again and it got me back to the initial problem I had.
>  The frontend will start, and the analyzer starts (complains about there not being a
> last.root, but other than that it's fine), and then when starting mlogger, I get:
> 
> [odb.c:860:db_validate_db] Warning: database corruption, first_free_key 0x0001A4
> 04
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> [midas.c:1970:cm_check_client] cannot delete client info
> [odb.c:3666:db_get_key] invalid key handle
> 
> 
> And it continues to shoot out error messages about invalid key handles until I kill
> it.  Then trying to start the frontend again fails until I remove the .ODB.SHM file. 
> Any other ideas?
> 
> > > Hello,
> > > 
> > > Just did a fresh install of MIDAS from the SVN repository under CentOS and
> > > everything compiles fine, but when I go to run the frontend (using dio), I get
> > > the following error message:
> > > 
> > > Connect to experiment ...[odb.c:868:db_open_database] Different database format:
> > >  Shared memory is 14, program is 2
> > > [midas.c:1763:cm_connect_experiment1] cannot open database
> > > 
> > > 
> > > Any ideas on what the problem could be, or how to fix it?  
> > 
> > You have an old .ODB.SHM from a previous version in your directoy (note the '.' in
> > front, so you need a 'ls -alg' to see it). Delete that file and try again.
    Reply  22 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 

Denis Bilenko wrote:
1. Blocking calls to midas api aren't usable when client is connected through mserver. This is true at least for bm_receive_event, but seems to be a more general problem - midas application has call cm_yield within 10 seconds (or whatever timeout is set) to remain alive.
That not the case when RPC is not used.


The 10 seconds timeout you see comes from the RPC layer. If you call bm_receive_event and it blocks, then the client will consider a RPC timeout after 10 seconds. Has nothing to do with cm_yield(). Calling a blocking function via a sever connection is not a good idea anyhow, since this process then cannot respond on anything else, like run transitions. That's why I never used it and that's why I have not realized that behaviour. I did change it however such that bm_receive_event, if called without the ASYNC flag, disables the RPC timeout for this call and restores it afterwards. This is now in midas.c revision 3502. You can try this with midas/examples/lowlevel/produce and consume easily.


Denis Bilenko wrote:
2. On Windows, two processes on the same machine can send/receive events to each other only if they both use midas locally (through shared mem) or they both use midas via RPC (through mserver), but not if they use different ways.


I just tried again and it did work. I used produce/consume. If you enter just <return> for the host name, these programs connect locally. So I tried both producer locally, consumer remote, and vice versa, and both worked. I did however use consume with the callback functionality. I did not try your Python programs however. If you find out that produce/consume does work and your Python program don't, then adapt your Python programs to resemble produce/consume.


Denis Bilenko wrote:
3. Receiving/sending same events from the same process - was possible in 1.9.5-1, not so in the current version (revision 3501, mxml revision 45). Is this an intended behavior fix?


Yes. It was introduced in revision 3186 on July 28th, 2006. It fixed a problem that the buffer level was always shown as 100% full, even if there were no other clients registered. By ignoring the own process, the buffer level now correctly shows the "contents" of a buffer from 0..100%. It also gave a small speed improvement. If you want to send events to the own process, you have to do it from the calling level. Like if you call bm_send_event(), you call manually process_event or however your event receiving routine is called. This is also much faster than going through the buffer.
    Reply  23 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 

Denis Bilenko wrote:
1 & 3 - thanks for the fix and the explanation, as for 2 - I've tried consume and produce
and still has a problem


Acknowledged. I could reproduce it with the information you supplied, thank you very much. Also the data rate is slower than what I expect. I will investigate and fix this, but it could take some time.
    Reply  24 Jan 2007, Stefan Ritt, Bug Report, buffer bugs 
I tried again and could not reproduce the problem. Last time I was probably confused by some old mserver.exe executable I had lying around. I updated to the most recent version (3516) and did a C:\midas> nmake -f makefile.nt. Last time I was also confused about the low rate, but that was caused by a mserver.exe executable which was not compiled with optimization. For small event sizes (such as 10 bytes) there is a big difference between optimized and non-optimized code. So I got:


First Console wrote:
ID of event to produce: 1
Host to connect: localhost
Event size: 10
Level:   0.0 %, Rate: 0.46 MB/sec
flush
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.40 MB/sec
Level:   0.0 %, Rate: 0.42 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.44 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
Level:   0.0 %, Rate: 0.43 MB/sec
flush


and


Second Console wrote:
C:\midas\NT\bin>.\consume
ID of event to request: 1
Host to connect:
Get all events (0/1): 1
Receive via callback ([y]/n):
[consume.c:73:process_event] Serial number mismatch: Ser: 1169666, OldSer: 0, ID
: 1, size: 10
Level:   0.0 %, Rate: 0.00 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.42 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   2.4 %, Rate: 0.35 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.50 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.41 MB/sec, ser mismatches: 1
Level:   0.0 %, Rate: 0.40 MB/sec, ser mismatches: 1
Received break. Aborting...


Actually sending remote and receiving local is a very common thing. Most experiments use that. They have a remote frontend, and the logger and analyzer work locally. If that would not work, all these experiments would have a problem. So I only can encourage you to try again, make sure to update and recompile the executables. Maybe delete any old *.SHM file. Maybe try on another PC or under Linux.
Entry  30 Jan 2007, Stefan Ritt, Bug Report, Large files under Windows XP 
Hello,

We have problems analyzing large files under Windows XP. For small file sizes,
everything is ok. We have events of 2.8 MB each, and we can read ~30 events per
second. But if the file gets larger than typically 600-800 MB, then access
becomes very slow, about 1 event per second. This is not the case under Linux,
where it stays at 30 Hz (~90 MB/sec). 

Looking at the low level file access, it is obvious that this has nothing to do
with midas, this problem can be reproduced with a simple program reading chunks
of 3MB from a 1GB file. The Windows XP file system is NTFS, default formatting.
Does anyone else have observed a similar problem or maybe even have some
suggestions? Unfortunately many people here want to analyze midas data under
Windows...

Stefan Ritt
    Reply  06 Feb 2007, Stefan Ritt, Bug Report, wrong version in include/midas.h? 
> The present .../include/midas.h contains
> [alpha@laddvme06 ~/online]$ grep 1.9.5 /home/alpha/packages/midas/include/*
> /home/alpha/packages/midas/include/midas.h:#define MIDAS_VERSION "1.9.5"
> 
> All MIDAS utilities (odbedit ver) presently report version 1.9.5, even for svn
> trunk, and this may confuse people as to what version of midas they are using,
> and may complicate reporting of bugs.
> 
> Perhaps the trunk version should say something like "svn-22233344" (the svn
> revision number)? The present "1.9.5" is wrong...

Fully agree. I added a svn_revision string into midas.h, which gets reported now
by "odbedit ver". Unfortunately this reflects only changes in midas.c. If one
changes odb.c for example, the svn revision in midas.c does not get modified by
the SVN system. In addition I changed the present version 1.9.5 to 2.0.0. I made
the tar and zip files. After some internal testing, it will be announced
officially in a few days.
    Reply  06 Feb 2007, Stefan Ritt, Bug Fix, Problem solved by Re-define _syscall0(...) 

Exaos Lee wrote:
Maybe it's not the perfect way, but it works. Smile


I changed it to:
#ifdef OS_UNIX

   return syscall(SYS_gettid);

#endif                          /* OS_UNIX */
[/code1]

without any #define.

Does this work for you?

- Stefan
    Reply  06 Feb 2007, Stefan Ritt, Bug Report, segmentation violation of analyzer on a x86_64 
> Hello,
> 
> When I  connect to analyzer on a x86_64 processor(with Roody),  
> a analyzer break with segmentation violation in the root_server_thread  function.
> Same code are working fine on a 32bit processor.
> As I found the problem are in exchanging of pointers between analyzer and client.
> Before to send a pointer, it is saved a pointer in int (size=4, instead of 8) at
> this place:
> Index: src/mana.c
> ===================================================================
> --- src/mana.c  (revision 3498)
> +++ src/mana.c  (working copy)
> @@ -5386,7 +5386,7 @@
> 
>              //write pointer
>              message->Reset(kMESS_ANY);
> -            int p = (POINTER_T) obj;
> +            POINTER_T p = (POINTER_T) obj;
>              *message << p;
>              sock->Send(*message);
> 
> 
> Sincerely Yours,
> Fedor Ignatov 

Do I understand you right? With your patch it works even on 64 bit, right? Or do you
mean there is still a segmentation violation? Anyhow I committed your patch since the
"int" is clearly incorrect.

- Stefan
    Reply  26 Feb 2007, Stefan Ritt, Info, RFC- history system improvements 
I agree to what you propose. I'm pretty sure you are right in getting a significant improvement in readout speed
of the history system. So far there was no big request for improving the history system, since the performance in
the experiments I was involved in was good. In MEG for example, we have ~20MB of history data per day, and all
plots even going back some months can be made in a couple of seconds. Have a look for example at

http://midas.psi.ch/megon00/HS/PCS/Pressures.gif?hscale=1843200&hoffset=-5068800

This plot stretches over two weeks and involves ~500 MB of history data, and is prepared in a couple of seconds.
The key question here is how big the disk cache of the OS is. The above plot does not read all 500 MB, but skips
many data points in order to obtain ~1000 data points (one per pixel) for the requested period. To find these
data points, it reads and scans the history index files (yymmdd.idx), which are only a few percent of the
yymmdd.hst data files. The index file contains only the time stamp, the event id and the location of the event in
the *.hst file. Scanning the index file is as efficient as scanning a history file with a single variable. Now
comes the access of the history file. For ~1000 data points, 1000 locations have to be read. This requires
reading in the FAT table for the history file and accessing the sector clusters containing the data. In worst
case one has to read 1000 clusters. With a cluster size of 2kB this will be 2MB of data, something which can be
read very quickly. On the MEG system I observe that the first history plot takes about 5 seconds, while all
consecutive plots take about 1 second. This indicates that the FAT information is cached by the OS. This depends
of course as you indicated correctly on how much memory is available for disk caching, how many processes are
running etc. and will finally determine how fast your history access will be.

So if you implement your proposed new scheme, please consider the following:

- Scanning a single variable file is about the same as scanning the current index file. You save however the
access to the data file. If you plot several variables together, you have to access several "single variable
files", so your access time scales with the number of variables. In the current system, it's likely that
different variables from the same event are located in the same cluster. So you have to read the history file
once for each variable, but after the first variable the sectors of interest are very likely cached by the OS. So
I would estimate that the break-even point is about 2-3 variables. I mean if you read more that three variables,
your proposed method might get slower than the current one. This is of course not the case if there are very many
events in the history file. In that case the index file might be much bigger, since it gets a new entry if *any*
variable in an event changes. If all index file together are bigger than you disk cache, the system will become
slow (and I guess that's what you see). In MEG, the index file is about 1MB per day, so a few weeks fit easily
into the disk cache.

- In order not to get too much data, the history system needs fine tuning. Each slow control system class driver
as an "update threshold", which is used to determine if a variable has "changed". For some noisy channels, it
might be worth to set the threshold at 3 sigma of the noise level (RMS). This can reduce your history data
dramatically. For some equipment, you even might consider to define a minimum update period. This is done via
"/Equipment/<name>/Common/Log history". If that variable is set to 10, the time between two consecutive history
records is at least 10 seconds. For some temperatures for example it might make sense to set this even to one
minute or so, depending on how fast your temperatures change.

- If you implement a per-variable history, you probably have to use the per-event hot link in the ODB. Otherwise
you would exceed the number of hot links MAX_OPEN_RECORDS which is currently 256. If you then get a hot link
update, you have to check manually which variable(s) have changed in log_history() in mlogger.c

- Before you actually go and implement the full system, I would write some small test code to "simulate" the new
scheme. Write some dummy files with the full data you expect in the ALPHA experiment and see what the improvement
is under realistic conditions. Only if you see a big improvement it's worth to implement the full code. Test this
on various machine to get a better overview. Maybe it's worth testing different file systems and cluster sizes as
well.

- If there is an improvement, I'm more than happy to replace the current history code in midas. It might however
not be clean to have a heterogeneous history system, where some files are in the old format and some in the new.
It might be better to write a little conversion routine which converts the old format into the new one, even
omitting records where single variables did not change. This conversion could be even put into the standard
mlogger code and is executed automatically if the logger is started first and finds some old data files.

Even if the speed improvement is not so big, one will certainly win a lot on disk file size (like if only one
variable out of 100 changes). This will probably make it worth to implement anyhow.
    Reply  26 Feb 2007, Stefan Ritt, Info, RFC- support for writing to removable hard disk storage 
In the MEG experiment, we simply installed 100TB of RAID disks and don't need to change anything Wink

But seriously, you are right that such a system might be beneficial. I propose to extend the current logger code to switch disks. In the current tr_start() funciton in mlogger, the code checks for "subdir_format" to create separate subdirectories like once per week. One could extend this code in the following way:

- Add an array of strings and name it "Path", such as

/dev/sda1/datadir/
/dev/sdb1/datadir/

- On each stop of the run, check if the current disk has enough space for one more run. Take either the "Byte limit" of that channel, or the actual size of the last run and multiply it by two or so. If the disk is "almost full", switch to the next array element in "Path". Append the file name, such as "/dev/sda1/datadir/run1234.mid" and put this into "Current filename" as a feedback for the user. Now write to the new disk/file.

- Add as string like "Execute on switch", which gets called after you switched to the next disk. This shell script can then handle the un-mounting of the full disk, notify the user etc. This is similar to the "/Programs/Execute on start run" in the ODB, but it gets only called if you switch the disk.
Entry  26 Feb 2007, Stefan Ritt, Info, Usage of event channel for improved throughput 
Starting from SVN revision 3642, sending events from the front-end has been revised.

Since long time ago, there is a special TCP socket established between any front-end and the mserver which can be used to bypass the midas RPC layer completely and purely send events. There was a #define USE_EVENT_CHANNEL but to my knowledge nobody used it.

While optimizing data throughput for the MEG experiment, I revisited this mechanism and got it finally working. Here are some benchmark tests made with the produce program on two dual-CPU machines running on Gigabit Ethernet:

Using normal RPC socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40          3            22                   100                
  1000         44            25                   100
100000        101            14                    50

Using new event socket:

event size    speed [MB/sec] CPU usage front-end  CPU usage server
==================================================================
    40         12            100                   34                
  1000         99            58                    59
100000        101            14                    43

As can be seen, the CPU load on the server drops significantly for smaller events since the processing time per event is reduced. If the transfer was limited by the server, the throughput goes up significantly. For large events the bottleneck on the server side is the memcpy of events, so no big improvement is visible. The saved CPU time however can be used to analyze more events for example.

The event socket is now enabled by default in the front-end by setting
rpc_mode = 1

in mfe.c and should be checked carefully in various experiments. There is a small chance that events get stuck in the buffer cache on the server side at the end of the run, in which case they would show up as the first events of the next run. I know that this problem happened in some experiment before, but that must have been unrelated to the rpc_mode. So please check again and report any problem with the new rpc_mode.
Entry  26 Feb 2007, Stefan Ritt, Info, Fragmented polled events 
Fragmented polled events have been implemented in SVN revision 3625.
Fragmentation is a method of breaking down large (>MB) events into smaller
pieces and send them through the shared memory buffers, reassembling them at the
output. In the past this was only possible for periodic events (such as large
histograms read out once every few seconds), but now this is also possible for
polled events.
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Hi there:
> I have a question if there's anybody out there running MIDAS with event builder
> that assembles events from more that just a few front ends (say on the order of
> 0x10 or more)?
> Any experiences with scalability?

At the MEG experiment at PSI we run with 5 front-ends (later 8), each running at
about 10 MB/sec. This gives an overall rate of 50MB/sec without any problem. The
CPU load on the backend (2.6 GHz dual Xenon) is 30% for the event builder and 26%
for the logger. The DANCE experiment at Los Alamos runs 17 front-ends if I'm not
mistaken (John?).
    Reply  27 Feb 2007, Stefan Ritt, Forum, event builder scalability 
> Our bottle neck is (a) compactPCI backplane reading data from waveform digitizers
> to the frontend CPUs and (b) CPU power on the frontend CPUs to analyzer the waveforms.

I forgot to mention that our front-ends at MEG are 2.8 GHz dual Xenon with Hyperthreading.
This gives "virtual" 4 CPU cores which are really necessary for waveform calibration and
analysis. It makes use of the new multi-threading feature in the midas front-end. I run
actually 7 threads (one VME readout, 4 calibration threads, one encoding thread and the
main thread sending data to the backend. This speeds up data taking by a factor of four
compared to a single thread. So if one plans for waveform analysis in the frontend to
reduce the data, I would recommend a box with dual quad cores.
ELOG V3.1.4-2e1708b5