Back Midas Rome Roody Rootana
  Midas DAQ System, Page 18 of 150  Not logged in ELOG logo
ID Date Authordown Topic Subject
  397   26 Jul 2007 Stefan RittInfoChange of pointer type in mvmestd.h
I had to change the pointer type of mvme_read and mvme_write to (void *) instead
to (mvme_locaddr_t *) to avoid warnings under 64-bit linux. Please adjust your
VME drivers if necessary.

- Stefan
  405   03 Sep 2007 Stefan RittBug Reporthow to handle end of run?
> I am having problems with handling the end-of-run situation in my midas
> frontend. I have a device that continuously sends data (over USB) and I read
> this data in my "read_event" function.
> 
> Everything is good until the end-of-run, at which time this happens:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls my end_of_run()
> 2) here, I tell the device "please stop sending data"
> 3) all seems good, but wait!!!
> 4) there is all this data generated between step 0 and step 2 still sitting
> inside the device and it has nowhere to go: the run is ended, the output file is
> closed, my read_event() will never be called ever again (well, until the next run).
> 
> It seems to me mfe.c needs to have one more function, something like
> "pre_end_of_run()" that works like this:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls pre_end_of_run(), here I tell the device to stop sending data
> 2) mfe.c calls read_event() for the very last time, to give me the opportunity
> to read and send away any data I still may have.
> 3) mfe.c calls the end_of_run(). The run is truely finished.
> 
> Any thoughts?

You can achieve the desired functionality without changing mfe.c:

0) mfe.c calls read_event
1) mfe.c calls end_of_run. Your end_of_run tells the device to stop data and flushes
the remaining data. At this point you have to re-make actually a part of the mfe.c
functionality, but basically you need a bm_compose_event() and a bm_send_event(), so
just a few lines of code. If you want to have the final event number right in your
equipment, you also need to update eq->events_sent accordingly. 

Given the fact that 99% of the experiments do not need this functionality, I propose
that we keep mfe.c and you add the few lines of code into your user part of the
specific frontend.

Stefan
  406   06 Sep 2007 Stefan RittInfoIntroduction of MIDAS_MAX_EVENT_SIZE
We had the problem that different experiments used different MAX_EVENT_SIZE
values (the MEG experiment actually 10 MB!). If each experiment changes the
value in midas.h and accidentally commits it, other experiments are affected.
Therefore I modified midas.h and the Makefile to accept a new environment
variable MIDAS_MAX_EVENT_SIZE. If this value is set, the Makefile passes it's
value to midas.h where it supersedes the default value which is currently at 4 MB.

PAA: Can you pleas add this to the documentation at the right spot? Thanks.
  409   08 Oct 2007 Stefan RittBug ReportError in data format- ending blocks on 32bit boundary x86_64
> Hi,
>     I found that midas banks can be given an extra 32 bits of zeros when
> trying to keep to 32bit boundary on my x86_64. 
> 
> This can be fixed by changing (in midas.h)
> #define ALIGN8(x)  (((x)+7) & ~7)
> to
> #define ALIGN8(x)  (((x)+3) & ~3)
> 
> Is there any bad consequences doing this?

Yes. ALIGN8 means 'align to 8-byte boundary' (64-bit), and if you change that, you
break the code at various locations. Furthermore, 8-byte aligned access is faster
on x86_64 than 4-byte aligned access, so you will get a performance penalty. If
course if you have very many small banks, the zero padding can cause some
overhead, but in that case you could combine some data into a single bank.
  410   11 Oct 2007 Stefan RittBug Report_syscall0 not available on gcc 4.1.1
Dear Stephan,

I am writting on behalf of the LiBeRACE collaboration
at Berkeley/Livermore.

We are trying to use midas (2.0.0) for our acquisition system.
However we had some difficulties to compile it on LINUX Fedora
Core 6 with gcc 4.1.1
I tried to trace back the problem and I found that _syscall0 in
system.c is actually an obsolete call (since gcc 4.x apparently).
Playing with assembly language being behond my competence, I would 
like to know if you ever came across this situation recently and
if you have any suggestion(s).

With my best regards
Julien GIBELIN


------------------------------------------------------
GIBELIN Julien

Lawrence Berkeley National Laboratory
Nuclear Science Division
One Cyclotron Rd.
MS 88R0192
BERKELEY, CA 94720-8101

Tel: +1 (510) 495-2695
Fax: +1 (510) 486-7983
------------------------------------------------------
  411   11 Oct 2007 Stefan RittBug Report_syscall0 not available on gcc 4.1.1
> Dear Stephan,
> 
> I am writting on behalf of the LiBeRACE collaboration
> at Berkeley/Livermore.
> 
> We are trying to use midas (2.0.0) for our acquisition system.
> However we had some difficulties to compile it on LINUX Fedora
> Core 6 with gcc 4.1.1
> I tried to trace back the problem and I found that _syscall0 in
> system.c is actually an obsolete call (since gcc 4.x apparently).
> Playing with assembly language being behond my competence, I would 
> like to know if you ever came across this situation recently and
> if you have any suggestion(s).

The '_syscall0' function call was replaced by 'syscall' in SVN revision 3583. I
would recommend that you switch to the current SVN version (see
http://ladd00.triumf.ca/~daqweb/doc/midas/html/quickstart.html on how to obtain
the SVN version). If the problem still persists, please let us know.

- Stefan
  414   17 Oct 2007 Stefan RittForumMulti-core CPUs
> I have this beautiful Intel Quadcore with fast disks, but MIDAS does obviously 
> only make use of one CPU at a time. Has anyboy of you already done some work 
> on making MIDAS parallel? Event-based data analysis should be the best 
> candidate for this.

There are ring buffer routines rb_xxx for distributed event analysis, but this is
currently only implemented in the front-end framework. These routines are pretty
simple, and their integration into the analyzer should not be very difficult.
Unfortunately I don't have time for that right now. We do our analysis such that we
analyze four different runs in parallel on a quadcore machine.

- Stefan
  418   27 Nov 2007 Stefan RittInfoODB links to array elements implemented
In revision 4090 I implemented ODB links to individual array elements. Now you
can have for example:

Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
array                           INT     10    4     2m   0   RWD
                                        [0]             0
                                        [1]             0
                                        [2]             123
                                        [3]             0
                                        [4]             0
                                        [5]             0
                                        [6]             0
                                        [7]             0
                                        [8]             0
                                        [9]             0
element2 -> /array[2]           INT     1     4     3m   0   RWD  123

In this case, the link "element2" points to the third element of "array", but is
treated like a single value. This links are very useful for example for the
"Edit on start" parameters, which can now point to individual array elements.
The same is true for the "Links BOR" when the logger writes to a MySQL database.

This modification required major modifications in the ODB. I have carefully
tested the example experiment from the distribution to verify that everything is
fine, but I'm not 100% sure that I covered all possible situations. So if you
update to revision 4090+ and you observe some strange behavior related to links
in the ODB, please report.

There are following two new functions related to this change: 

  db_get_link()
  db_get_link_data()

They are counterparts of db_get_key() and db_get_data(), respectively, but
without following links in the ODB. These functions are probably not of much use
outside odbedit and mhttpd, which are supposed to display links explicitly. Most
user applications want to follow links without even knowing that these are links.
  419   07 Jan 2008 Stefan RittInfoRoll-back for history sytem added
The midas history system always had the problem that the database can get
corrupted if the disk gets full where the history records (*.hst & *.idx) are
stored. This can happen if a history event can only be written partially on the
almost full disk. If later some space is freed up (by deleting other files), the
writing continues at the old position, leaving the partial event in the data
base. In that case the whole history data of the current day cannot be read
because it is corrupted.

To solve the problem, a roll-back system has been implemented in the
hs_write_event() function. If an event cannot be written fully, the history file
is restored to the old state, so the partial event is removed from the end of
the file via truncation. This way only the data which could not be written to
the disk is missing in the history file, but the other data from that day is
still valid and readable. The change has been committed in revision 4107.
  421   05 Feb 2008 Stefan RittForumanalyzer crashes at high rates
> I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event.  Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes.  In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem.  Recently I tried to analyze all the physical events.  At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event.  At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
>  the analyzer keeps up for a  few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols".  Any suggestions as
> to why the analyzer is crashing would be very helpful.

I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:

- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.

- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.

Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.

Best regards,

  Stefan
  422   05 Feb 2008 Stefan RittInfoImplementation of relative paths in mhttpd
A major change was made to mhttpd, changing all internal URLs to relative paths.
This allows proxy access to mhttpd via an apache server for example, which might
be needed to securely access an experiment from outside the lab through a
firewall. Following setting can be places into the Apache configuration,
assuming the experiment runs on machine "online1.your.domain", and apache on a
publically available machine "www.your.domain":

Redirect permanent /online1 http://www.your.domain/online1
ProxyPass /online1/ http://online1.your.domain/

<Location "/online1">
  AuthType Basic
  AuthName ...
  AuthUserFile ...
  Require user ...
</Location>

If the the URL http://www.your.domain/online1 is accessed, it gets redirected
(after optional authentication) to http://online1.your.domain. If you click on
the mhttpd history page for example, mhttpd would normally redirect this to 

http://online1.your.domain/HS/

but this is not correct since you want to go through the proxy www.your.domain.
The new relative redirection inside mhttpd now redirects the history page
correctly to

http://www.your.domain/onlin1/HS/

I had to change many places inside mhttpd to make this work, and I'm not 100%
sure if I covered all occurrences. So if you upgrade to mhttpd revision 4115 and
observe some error accessing some pages, please report it to me.

- Stefan
  425   06 Feb 2008 Stefan RittForumrpc timeout, related to event_size and watch dog? need help
Most likely you changed the maximal event size in midas.h, but you did not re-compile all programs. The maximal event size goes into the size of the shared memory buffer, so all participating programs have to have the same setting, especially the mserver program. So do the following:

- update to the latest midas version, which is revision 4116
- modify in your midas.h only MAX_EVENT_SIZE. The other settings you modified might have bad side effects. If you increase the RPC timeout, the error will still happen, just later. It comes from the fact that you sent too big events the the server (or the logger), which refuses to take the big events or simply crashes, so the RPC call never returns and after the timeout you get the error.
- recompile all midas programs, don't forget the mserver program
- run the standard demo frontend from the distribution

I tried the above and it just worked fine for me.
  427   06 Feb 2008 Stefan RittForumrpc timeout, related to event_size and watch dog? need help
First of all, I would appreciate if you do not post your entry ten times. Each time you edit it, you produce an email notification going to everybody, so people might get annoyed to receive too many emails from you. Think what you want to write and then post once.

Second, I told you to use the frontend from the distribution, but you used your own code. Since I successfully ran the demo frontend with the large event size, the origin of your problem must be "in between". So start with the demo frontend, try it, then modify its buffer size in frontend.c, then try again. When I told to to recompile midas, I meant you should also recompile your front-end each time you change midas.h. The mserver is automatically recompiled when you recompile and install midas (just check the /usr/local/bin/mserver date and time to confirm that it got updated during your last "make install"). Then add things from your specific front-end program step by step to see at which step the problem occurs the first time. This gives you some hint where the real cause lies.
  431   13 Feb 2008 Stefan RittInfoRoll-back for history sytem added
> But to make things more interesting we had another history outage this week - we
> happen to write history files to an NFS server (not recommened! do not do this!) and
> when the NFS server had a glitch, history files got corrupted - because during the
> glitch NFS was not available, I think this roll-back feature would not have helped.

Actually I put our history data on a separate file system, on a separate disk controlled
by a separate RAID controller! If you write bulk data with the logger, and want to read
history files at the same time with mhttpd, you get a bottleneck if both data are at the
same physical disk. Separating this (and even the controller) speeded things up
dramatically.

The rollback will not work for NFS, since it requires truncating the file if an event
gets only partially written. While on a full file system you always can *delete* data,
this does not work if NFS is down. This explains the behavior.

> Anyhow, I now have a patch to allow hs_read() to "skip the bad spots" in the history
> files. (hs_gen_index() also needs a patch).
> 
> In the nutshell, if invalid history data is detected, the code continues to read the
> data one byte at a time, looking for valid event_id markers (etc).
> 
> The code looks sane by inspection, and if nobody objects, I would like to commit it
> in the next few days.

Great. I was thinking of something like this myself. Having a quick look at your code
looks good. The best of course would be if we would have some "magic number" for
re-synchronizating the data stream, but that would blow up the file length. So searching
for the right event id is good, but will not work 100%. Also the check

  if (irec.time < last_irec_time)

to see if the history is broken is very weak. If you take random data, it will be true
50% and false 50%. If one makes however a check

  if ((irec.time - last_irec_time) > 3600*24)

this would work correctly with random data in >99% of all cases (3600*24/2^32). Maybe
you should change that.
  432   14 Feb 2008 Stefan RittInfomhttpd history display updates
You misspelled one ODB entry:

Line 9014:
            sprintf(str, "/History/Display/%s/Label", path);

Line 9028:

            sprintf(str, "/History/Display/%s/Labels", path);
                                                ---^

I wonder how you could have tested that code for 1/2 year without noticing this error.
I fixed and committed it.
 
  439   19 Feb 2008 Stefan RittBug Fix"make install" error on MacOS 10.4.7, svn 3366
> I forgot to mention that, the following (and similar) lines:
>            install -v -D -m 755 $$file $(SYSBIN_DIR)/`basename $$file` ; \
> are changed into
>            install -v -d -m 755 $$file $(SYSBIN_DIR)/`basename $$file` ; \
> 
> since -D is an illegal option for install. I am not sure whether -D in Linux means the same thing for -d in MacOSX install. 

-D under linux means:

       -D     create all leading components of DEST except the last, then
              copy SOURCE to DEST; useful in the 1st format

This means if you install the first time, and eithe SYSBIN_DIR or `basename is not existing, it will be created on-the-fly from
the install program. If OSX does not support this, you somehow have to crate these subdirectories manually.
  442   21 Feb 2008 Stefan RittBug Reportmhttpd safari 3.0.4 redirect problem
>     /* start command */
>     if (*getparam("Start")) {
>        /* for NT: close reply socket before starting subprocess */
> -      redirect2("?cmd=programs");
> +      redirect2("/?cmd=programs");

The second version won't work if mhttpd is run under an Apache proxy. Assume the proxy redirects

http://proxy.ca/midas

to

http://daq.ca:8080

If you now do a redirect to "/?cmd=programs", you will end up at

http://proxy.ca/?cmd=programs

which is now what you want. I tried to put a "./?cmd=programs", and that bings you to

http://proxy.ca/midas/./?cmd=programs

which is correctly redirected to

http://daq.ca:8080/?cmd=programs

I tried with the windows version (ughhh) of Safari and it worked for me. So give it a try, the change is committed.

> ODB corruption happens here:
>  
>        sprintf(str, "/Programs/%s/Start command", name);
> -      db_get_value(hDB, 0, str, command, &size, TID_STRING, TRUE);
> +      db_get_value(hDB, 0, str, command, &size, TID_STRING, FALSE);
>        if (command[0]) {
>           ss_system(command);
> 
> It looks like db_get_value() would corrupt ODB if given funny "str". When Safari explodes,
> funny strings are generated.

What happes is an endless redirect from xxxx -> xxxx?cmd=Programs. So in the end you have

http://url.ca?cmd=programs?cmd=programs?cmd=programs?cmd=programs....

and in the end you get a stack overflow, which busts all.

> The simple fix is to replace "TRUE" with "FALSE", then at least db_get_value() does not try to make bogus 
> entries in ODB.

I changed both butting FALSE there and adding

   if (strchr(name, '?'))
      *strchr(name, '?') = 0;

which keeps the URL short.

So for me it looks fine at the moment, but I cannot guarantee that everything works, so keep an eye open on that.
  453   07 Mar 2008 Stefan RittBug Reportarray overflows and other bugs
> I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc 
> found a bunch of bugs, I guess.

Ahh, great! gcc is getting more and more clever. Each time gcc is updated, it finds
a few new issues.

Indeed some are real bugs, and I will work down the list as time permits. I see
however no immediate thread (you are not using fragmented events, a transition 12
never occurs, etc.). Issue #4 from your list has to be checked by Pierre-Andre. 
  456   09 Mar 2008 Stefan RittSuggestionNew Makefile for building MIDAS
> I rewrote the Makefile for MIDAS in order to make it tidy. I tested it on my box
> and it works here.
> 1. The full file is seperated to several parts
>   a. initialized setup
>   b. environment setup
>   c. specify OS-specific flags
>   d. processing environment for building flags
>   e. targets
> 2. The file is less than 400 lines now. The original one is more than 500 lines.
> 3. The modified one is easy for debuging.
> 
> I tried to learn "autoconf" and "automake" in order to make building MIDAS more
> compatible for various platforms. But I havn't enough time now. Hope somebody
> can help it. The attached file is original named "Makefile.in" for using "autoconf".

I think it is a good idea to cleanup the Makefile. It grew over many years and certainly
had some inconsistencies. We did however not use "autoconf" since it is not of much use.
It is meant for systems where small differences between different Unix flavors are
covered by this system, but the midas source code is supposed not only to run on Unix,
but also on vxWorks and Windows. As you can imagine, the differences are much more
severe and a simple makefile generator cannot cover the details. Furthermore, under
Windows there is no such thing like autoconf. So all the work to make the source code
compile on all systems has been put into system.c using conditional compiling. So
putting another abstraction layer on this would maybe more complicate things than
simplify it. I will test your Makefile, and I also ask the guys at TRIUMF to do so. Once
we conclude that it works fine, we can replace the original Makefile from the distribution.
  457   10 Mar 2008 Stefan RittSuggestionRFC- ACLs for midas rpc, mserver, mhttpd access
> When accepting connection from a remote host, the remote IP address is converted to a
> hostname using gethostbyaddr(). If ODB directory "/experiment/security/mhttpd hosts",
> exists, access is permitted if there is an entry for the this hostname. "localhost" is
> always permitted.

While your "positive list" will certainly work, it is much more inflexible than a more
general hosts.allow/hosts.deny with wildcards. Assume some experiment decides it wants to
be controlled from all inside CERN. With hosts.allow/deny you could do

host.deny *
host.allow *.cern.ch

to have everything ending with "cern.ch" allowed. Otherwise it would be a nightmare finding
all possible terminals at CERN and add them manually. If you are considering modifying your
committed code to this scheme, you could have a look at my elog package, where exactly this
is implemented. You could copy/paste it from there.

After you finished, also talk to Pierre about documenting this in doxygen (or do it yourself).
ELOG V3.1.4-2e1708b5