ID |
Date |
Author |
Topic |
Subject |
384
|
08 Jun 2007 |
Stefan Ritt | Suggestion | RFC- ACLs for midas rpc, mserver, mhttpd access | First I have a general question: mserver is started through xinetd, and xinetd has
the options "only_from" and "no_access". This is equivalent to the tcp_wrapper
functionality. Why not using this? It's possible without changing anything in midas.
Or am I missing anything?
If that does not work for some reason, here are some thought from my side:
- We don't have much of a problem with malicious hackers, but with institute-wide
security checking. Hackers are only interested in mechanisms where they can obtain
control over thousands of machines (like breaking ssh etc.). The few midas machines
are not a good target for them. But even at PSI there are security scans, which try
to connect to various ports and can crash systems, so I agree that something needs
to be done.
- Whatever we do, it should be consistent on linux and windows and should not rely
on external packages, since I don't want to get into dependencies there.
- I see that both having the security information in the ODB or having them in
external files can be advantageous. There is certainly the aspect of restoring old
ODBs, or keeping several experiments (ODB) on one machine consistent. On the other
hand storing data in the ODB might me liked by people who are familiar with this
concept, and want to change things though mhttpd for example.
- Having said all that, it would make sense to me to write a simple central routine
access_allowed(), which takes the IP address of a remote client wanting to connect,
and return true or false. This routine should read /etc/hosts.allow, /etc/hosts.deny
and interprete it, but only the section for midas, and maybe only a subset of the
functionality there (we probably don't need NIS netgroup names, external files and
spawn commands there). If the files /etc/hosts.x do not contain anything about midas
or are not preset (Windows!), the routine should look in the ODB under
/experiment/security/mserver/hosts.allow and /experiment/security/mserver/hosts.deny
and use that information instead of the files.
- We probably need different mechanisms for mserver and for mhttpd. The mserver
clients are usually only a few programs like the front-ends, while one may want to
control an experiment over mhttpd from much more machines. So we should establish a
second ACL for mhttpd. The already present "/experiment/security/allowed hosts" for
mhttpd should be converted into "/Experiment/Security/mhttpd/hosts.allow" and the
function access_allowed() should be used to interprete that, so that we only need to
write it once. |
385
|
08 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:
#5 <signal handler called>
#6 0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7 0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489
in the line
free(tree_struct.event_tree[i].branch);
If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():
/* fill tree both online and offline */
if (!exclude_all)
et->tree->Fill();
Maybe one should put printf debugging statements in these places to see what's
going on. |
387
|
10 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | > tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should
> presumably not be the case, since CloseRootOutputFile() frees the trees at eor().
Yes this indeed a bug. I applied your change and committed the new code. |
389
|
11 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | > I have hunted down "my" segfault problem to the fact that I book histograms not
> in <module>_init, but in <module>_bor. I have to do so, because only in bor do I
> know which histograms to book, as this information comes from the ODB (booking
> only histograms for CAMAC modules which were set to "read" in the ODB). The core
> dump happens on the first access (->Fill, ->SetName,...) of one of these histos
> in the 2nd run analyzed offline ("./analyzer -r n m").
>
> In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module
> bor() functions go to the output file", and then does a gManaOutputFile->cd();
> Consequently, the histograms vanish after the file is closed, therefore the
> segfault when trying to access them in the 2nd run. (I keep track of existing
> histograms, only booking the missing histos in bor.)
>
> The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with
> TFolders and booking the histogram.
ROOT has the strange concept of "current working directory", coming from the fact that
ROOT was written by Fortran and PAW people, being used to have directories and
subdirectories with a persistent state (not really object-oriented style). So one can
set the "current working directory" to the root (=memory) with gROOT->cd() and to a
subdirectory which will later be written into a file with gManaOutputFile->cd(). If
you do the first one, the histograms are created only in memory, while in the later
case they are also created in memory, but will later be written into the output file
in the routine CloseRootOutputFile(). So if you do a gROOT->cd() in <module>_bor,
these histograms will not be written to file. So I guess your solution is not a real
solution.
> I do, however, not really understand the intention why histos booked in bor() go
> to only the file, whereas histos booked in init() go to memory. Could you please
> comment briefly? Maybe I missed the most important point. And what about online
> mode, should this work?
The root output file is opened in bor() and closed in eor(). For a histo to go to the
file, it must be booked after opening the file, that is after bor() in mana.c and
therefore after the gManaOutputFile->cd().
I agree with you that the current scheme is not satisfactory. When running online, you
want to keep the histos between the runs. When running offline, you delete and
re-create them for each run. It would be better to create all histos online and
offline under gROOT, and just copy them to gManaOutputFile before writing them. I have
to admit that this root code was never really used in a productive environment for
offline analysis, so there might be some issues here and there. Some people write
directly root files in the logger, and then do a root-only (without the midas
analyzer) analysis. Unfortunately I'm busy these days and cannot write any code right
now. But if you feel like something should be modified in mana.c, please send it to me
and I can incorporate it into the standard code. |
392
|
02 Jul 2007 |
Stefan Ritt | Bug Fix | mscb, musbstd fixed on Linux, MacOS |
KO wrote: | There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know. |
I can confirm that revision 3713 still works under Windows. |
396
|
13 Jul 2007 |
Stefan Ritt | Forum | Midas on a x86_64 - incompatible with x86_32 | > The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
> the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
> to these data structures and break the 32-bit ODB).
>
> I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
> environment because out latest host computers all run 64-bit Linux while all our VME processors and all
> older machines can only run 32-bit code; this incompatibility causes us weekly headaches.
>
> Any thoughts?
I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
This ensures to keep the native 64-bit packing, which probably will be somehow optimized for 64-bit
architectures and therefore might be a bit faster in the long run, when most systems are 64-bit. After this
has been implemented and well tested, I would go with an official announcement of the 32-bit break in the ODB,
and release a new version, so people can update from a TAR file if necessary. Existing ODB's can be converted
to the new format by exporting them in XML form and importing them again after the upgrade. |
397
|
26 Jul 2007 |
Stefan Ritt | Info | Change of pointer type in mvmestd.h | I had to change the pointer type of mvme_read and mvme_write to (void *) instead
to (mvme_locaddr_t *) to avoid warnings under 64-bit linux. Please adjust your
VME drivers if necessary.
- Stefan |
405
|
03 Sep 2007 |
Stefan Ritt | Bug Report | how to handle end of run? | > I am having problems with handling the end-of-run situation in my midas
> frontend. I have a device that continuously sends data (over USB) and I read
> this data in my "read_event" function.
>
> Everything is good until the end-of-run, at which time this happens:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls my end_of_run()
> 2) here, I tell the device "please stop sending data"
> 3) all seems good, but wait!!!
> 4) there is all this data generated between step 0 and step 2 still sitting
> inside the device and it has nowhere to go: the run is ended, the output file is
> closed, my read_event() will never be called ever again (well, until the next run).
>
> It seems to me mfe.c needs to have one more function, something like
> "pre_end_of_run()" that works like this:
> 0) mfe.c calls my read_event() to read the data (loop until the end-of-run
> transition)
> 1) mfe.c calls pre_end_of_run(), here I tell the device to stop sending data
> 2) mfe.c calls read_event() for the very last time, to give me the opportunity
> to read and send away any data I still may have.
> 3) mfe.c calls the end_of_run(). The run is truely finished.
>
> Any thoughts?
You can achieve the desired functionality without changing mfe.c:
0) mfe.c calls read_event
1) mfe.c calls end_of_run. Your end_of_run tells the device to stop data and flushes
the remaining data. At this point you have to re-make actually a part of the mfe.c
functionality, but basically you need a bm_compose_event() and a bm_send_event(), so
just a few lines of code. If you want to have the final event number right in your
equipment, you also need to update eq->events_sent accordingly.
Given the fact that 99% of the experiments do not need this functionality, I propose
that we keep mfe.c and you add the few lines of code into your user part of the
specific frontend.
Stefan |
406
|
06 Sep 2007 |
Stefan Ritt | Info | Introduction of MIDAS_MAX_EVENT_SIZE | We had the problem that different experiments used different MAX_EVENT_SIZE
values (the MEG experiment actually 10 MB!). If each experiment changes the
value in midas.h and accidentally commits it, other experiments are affected.
Therefore I modified midas.h and the Makefile to accept a new environment
variable MIDAS_MAX_EVENT_SIZE. If this value is set, the Makefile passes it's
value to midas.h where it supersedes the default value which is currently at 4 MB.
PAA: Can you pleas add this to the documentation at the right spot? Thanks. |
409
|
08 Oct 2007 |
Stefan Ritt | Bug Report | Error in data format- ending blocks on 32bit boundary x86_64 | > Hi,
> I found that midas banks can be given an extra 32 bits of zeros when
> trying to keep to 32bit boundary on my x86_64.
>
> This can be fixed by changing (in midas.h)
> #define ALIGN8(x) (((x)+7) & ~7)
> to
> #define ALIGN8(x) (((x)+3) & ~3)
>
> Is there any bad consequences doing this?
Yes. ALIGN8 means 'align to 8-byte boundary' (64-bit), and if you change that, you
break the code at various locations. Furthermore, 8-byte aligned access is faster
on x86_64 than 4-byte aligned access, so you will get a performance penalty. If
course if you have very many small banks, the zero padding can cause some
overhead, but in that case you could combine some data into a single bank. |
410
|
11 Oct 2007 |
Stefan Ritt | Bug Report | _syscall0 not available on gcc 4.1.1 | Dear Stephan,
I am writting on behalf of the LiBeRACE collaboration
at Berkeley/Livermore.
We are trying to use midas (2.0.0) for our acquisition system.
However we had some difficulties to compile it on LINUX Fedora
Core 6 with gcc 4.1.1
I tried to trace back the problem and I found that _syscall0 in
system.c is actually an obsolete call (since gcc 4.x apparently).
Playing with assembly language being behond my competence, I would
like to know if you ever came across this situation recently and
if you have any suggestion(s).
With my best regards
Julien GIBELIN
------------------------------------------------------
GIBELIN Julien
Lawrence Berkeley National Laboratory
Nuclear Science Division
One Cyclotron Rd.
MS 88R0192
BERKELEY, CA 94720-8101
Tel: +1 (510) 495-2695
Fax: +1 (510) 486-7983
------------------------------------------------------ |
411
|
11 Oct 2007 |
Stefan Ritt | Bug Report | _syscall0 not available on gcc 4.1.1 | > Dear Stephan,
>
> I am writting on behalf of the LiBeRACE collaboration
> at Berkeley/Livermore.
>
> We are trying to use midas (2.0.0) for our acquisition system.
> However we had some difficulties to compile it on LINUX Fedora
> Core 6 with gcc 4.1.1
> I tried to trace back the problem and I found that _syscall0 in
> system.c is actually an obsolete call (since gcc 4.x apparently).
> Playing with assembly language being behond my competence, I would
> like to know if you ever came across this situation recently and
> if you have any suggestion(s).
The '_syscall0' function call was replaced by 'syscall' in SVN revision 3583. I
would recommend that you switch to the current SVN version (see
http://ladd00.triumf.ca/~daqweb/doc/midas/html/quickstart.html on how to obtain
the SVN version). If the problem still persists, please let us know.
- Stefan |
414
|
17 Oct 2007 |
Stefan Ritt | Forum | Multi-core CPUs | > I have this beautiful Intel Quadcore with fast disks, but MIDAS does obviously
> only make use of one CPU at a time. Has anyboy of you already done some work
> on making MIDAS parallel? Event-based data analysis should be the best
> candidate for this.
There are ring buffer routines rb_xxx for distributed event analysis, but this is
currently only implemented in the front-end framework. These routines are pretty
simple, and their integration into the analyzer should not be very difficult.
Unfortunately I don't have time for that right now. We do our analysis such that we
analyze four different runs in parallel on a quadcore machine.
- Stefan |
418
|
27 Nov 2007 |
Stefan Ritt | Info | ODB links to array elements implemented | In revision 4090 I implemented ODB links to individual array elements. Now you
can have for example:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
array INT 10 4 2m 0 RWD
[0] 0
[1] 0
[2] 123
[3] 0
[4] 0
[5] 0
[6] 0
[7] 0
[8] 0
[9] 0
element2 -> /array[2] INT 1 4 3m 0 RWD 123
In this case, the link "element2" points to the third element of "array", but is
treated like a single value. This links are very useful for example for the
"Edit on start" parameters, which can now point to individual array elements.
The same is true for the "Links BOR" when the logger writes to a MySQL database.
This modification required major modifications in the ODB. I have carefully
tested the example experiment from the distribution to verify that everything is
fine, but I'm not 100% sure that I covered all possible situations. So if you
update to revision 4090+ and you observe some strange behavior related to links
in the ODB, please report.
There are following two new functions related to this change:
db_get_link()
db_get_link_data()
They are counterparts of db_get_key() and db_get_data(), respectively, but
without following links in the ODB. These functions are probably not of much use
outside odbedit and mhttpd, which are supposed to display links explicitly. Most
user applications want to follow links without even knowing that these are links. |
419
|
07 Jan 2008 |
Stefan Ritt | Info | Roll-back for history sytem added | The midas history system always had the problem that the database can get
corrupted if the disk gets full where the history records (*.hst & *.idx) are
stored. This can happen if a history event can only be written partially on the
almost full disk. If later some space is freed up (by deleting other files), the
writing continues at the old position, leaving the partial event in the data
base. In that case the whole history data of the current day cannot be read
because it is corrupted.
To solve the problem, a roll-back system has been implemented in the
hs_write_event() function. If an event cannot be written fully, the history file
is restored to the old state, so the partial event is removed from the end of
the file via truncation. This way only the data which could not be written to
the disk is missing in the history file, but the other data from that day is
still valid and readable. The change has been committed in revision 4107. |
421
|
05 Feb 2008 |
Stefan Ritt | Forum | analyzer crashes at high rates | > I'm using midas to read data from a waveform digitizer at event rates of
> 10-30kHz. To accomplish this the digitizer is read via Block transfers and the
> raw data put into a single MIDAS event. Thus a MIDAS event could contain upto
> 250 physical events and at maximum 350kBytes. In the analyzer modules I had
> been analyzing the first physics event contained in a MIDAS event with no
> problem. Recently I tried to analyze all the physical events. At low rates,
> 100hz-1khz, this was no problem, 1-5 physical events in a MIDAS event. At
> higher rates 10-20kHz, where there are about 40physical events per MIDAS event,
> the analyzer keeps up for a few seconds then seg faults with " 'shared object
> read from target memory' has disappear; keeping it symbols". Any suggestions as
> to why the analyzer is crashing would be very helpful.
I personally have never seen this error message. The analyzer is designed such that
it produces "back pressure" if the data rate is higher than the analysis rate and
you have "request all events" on. The only thing I can image are the following two
issues:
- At higher rate where you have more than 40 physical events per MIDAS event, there
is some bug in your analysis code which gets exploited only in that case. Maybe some
temporary array which is only 35 entries long or something like this.
- The back pressure mentioned above will slow down the frontend. If your computer
busy logic is not working correctly, you might get more triggers than you can
acquire. Maybe then the data gets screwed up and the analyzer chokes on it.
Finding the exact reason is not simple. For sure you have to run the analyzer inside
the debugger, to see exactly where the segfault happens. You then maybe have to
produce some dummy data in the frontend (like always sending the same event) to
disentangle some possible trigger problems from other problems.
Best regards,
Stefan |
422
|
05 Feb 2008 |
Stefan Ritt | Info | Implementation of relative paths in mhttpd | A major change was made to mhttpd, changing all internal URLs to relative paths.
This allows proxy access to mhttpd via an apache server for example, which might
be needed to securely access an experiment from outside the lab through a
firewall. Following setting can be places into the Apache configuration,
assuming the experiment runs on machine "online1.your.domain", and apache on a
publically available machine "www.your.domain":
Redirect permanent /online1 http://www.your.domain/online1
ProxyPass /online1/ http://online1.your.domain/
<Location "/online1">
AuthType Basic
AuthName ...
AuthUserFile ...
Require user ...
</Location>
If the the URL http://www.your.domain/online1 is accessed, it gets redirected
(after optional authentication) to http://online1.your.domain. If you click on
the mhttpd history page for example, mhttpd would normally redirect this to
http://online1.your.domain/HS/
but this is not correct since you want to go through the proxy www.your.domain.
The new relative redirection inside mhttpd now redirects the history page
correctly to
http://www.your.domain/onlin1/HS/
I had to change many places inside mhttpd to make this work, and I'm not 100%
sure if I covered all occurrences. So if you upgrade to mhttpd revision 4115 and
observe some error accessing some pages, please report it to me.
- Stefan |
425
|
06 Feb 2008 |
Stefan Ritt | Forum | rpc timeout, related to event_size and watch dog? need help | Most likely you changed the maximal event size in midas.h, but you did not re-compile all programs. The maximal event size goes into the size of the shared memory buffer, so all participating programs have to have the same setting, especially the mserver program. So do the following:
- update to the latest midas version, which is revision 4116
- modify in your midas.h only MAX_EVENT_SIZE. The other settings you modified might have bad side effects. If you increase the RPC timeout, the error will still happen, just later. It comes from the fact that you sent too big events the the server (or the logger), which refuses to take the big events or simply crashes, so the RPC call never returns and after the timeout you get the error.
- recompile all midas programs, don't forget the mserver program
- run the standard demo frontend from the distribution
I tried the above and it just worked fine for me. |
427
|
06 Feb 2008 |
Stefan Ritt | Forum | rpc timeout, related to event_size and watch dog? need help | First of all, I would appreciate if you do not post your entry ten times. Each time you edit it, you produce an email notification going to everybody, so people might get annoyed to receive too many emails from you. Think what you want to write and then post once.
Second, I told you to use the frontend from the distribution, but you used your own code. Since I successfully ran the demo frontend with the large event size, the origin of your problem must be "in between". So start with the demo frontend, try it, then modify its buffer size in frontend.c, then try again. When I told to to recompile midas, I meant you should also recompile your front-end each time you change midas.h. The mserver is automatically recompiled when you recompile and install midas (just check the /usr/local/bin/mserver date and time to confirm that it got updated during your last "make install"). Then add things from your specific front-end program step by step to see at which step the problem occurs the first time. This gives you some hint where the real cause lies. |
431
|
13 Feb 2008 |
Stefan Ritt | Info | Roll-back for history sytem added | > But to make things more interesting we had another history outage this week - we
> happen to write history files to an NFS server (not recommened! do not do this!) and
> when the NFS server had a glitch, history files got corrupted - because during the
> glitch NFS was not available, I think this roll-back feature would not have helped.
Actually I put our history data on a separate file system, on a separate disk controlled
by a separate RAID controller! If you write bulk data with the logger, and want to read
history files at the same time with mhttpd, you get a bottleneck if both data are at the
same physical disk. Separating this (and even the controller) speeded things up
dramatically.
The rollback will not work for NFS, since it requires truncating the file if an event
gets only partially written. While on a full file system you always can *delete* data,
this does not work if NFS is down. This explains the behavior.
> Anyhow, I now have a patch to allow hs_read() to "skip the bad spots" in the history
> files. (hs_gen_index() also needs a patch).
>
> In the nutshell, if invalid history data is detected, the code continues to read the
> data one byte at a time, looking for valid event_id markers (etc).
>
> The code looks sane by inspection, and if nobody objects, I would like to commit it
> in the next few days.
Great. I was thinking of something like this myself. Having a quick look at your code
looks good. The best of course would be if we would have some "magic number" for
re-synchronizating the data stream, but that would blow up the file length. So searching
for the right event id is good, but will not work 100%. Also the check
if (irec.time < last_irec_time)
to see if the history is broken is very weak. If you take random data, it will be true
50% and false 50%. If one makes however a check
if ((irec.time - last_irec_time) > 3600*24)
this would work correctly with random data in >99% of all cases (3600*24/2^32). Maybe
you should change that. |
|