ID |
Date |
Author |
Topic |
Subject |
398
|
12 Aug 2007 |
Konstantin Olchanski | Info | Change of pointer type in mvmestd.h | > I had to change the pointer type of mvme_read and mvme_write to (void *) instead
> to (mvme_locaddr_t *) to avoid warnings under 64-bit linux. Please adjust your
> VME drivers if necessary.
Updated: vmicvme.c (VMIVME-7750/7805) and gefvme.c (GEFANUC V7865)
K.O. |
397
|
26 Jul 2007 |
Stefan Ritt | Info | Change of pointer type in mvmestd.h | I had to change the pointer type of mvme_read and mvme_write to (void *) instead
to (mvme_locaddr_t *) to avoid warnings under 64-bit linux. Please adjust your
VME drivers if necessary.
- Stefan |
396
|
13 Jul 2007 |
Stefan Ritt | Forum | Midas on a x86_64 - incompatible with x86_32 | > The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
> the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
> to these data structures and break the 32-bit ODB).
>
> I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
> environment because out latest host computers all run 64-bit Linux while all our VME processors and all
> older machines can only run 32-bit code; this incompatibility causes us weekly headaches.
>
> Any thoughts?
I agree to make 32-bit and 64-bit compatible. In the long run, everything will be 64-bit, so I would suggest
in breaking the 32-bit ODB, add some padding there where needed, probably with some conditional compiling.
This ensures to keep the native 64-bit packing, which probably will be somehow optimized for 64-bit
architectures and therefore might be a bit faster in the long run, when most systems are 64-bit. After this
has been implemented and well tested, I would go with an official announcement of the 32-bit break in the ODB,
and release a new version, so people can update from a TAR file if necessary. Existing ODB's can be converted
to the new format by exporting them in XML form and importing them again after the upgrade. |
395
|
12 Jul 2007 |
Konstantin Olchanski | Forum | Midas on a x86_64 - incompatible with x86_32 | > We run 64-bit MIDAS on RHEL4 with 64-bit ROOT and everything generally works,
> except for compatibility problems with 32-bit MIDAS.
>
> The big problem is that 64-bit and 32-bit ODB turned out to be incompatible ...
I have now identified 3 data structures that change size when compiled with "-m64":
EVENT_REQUEST: stores a pointer to a function. Pointer size is 4 bytes with -m32 and 8 bytes with -m64.
This structure is part of an array inside BUFFER_HEADER, resulting in a sizable size mismatch between 32
bit and 64 bit shared memory data buffers.
The fix is simple: the function pointer is not used anywhere. Replace is with a "DWORD unused_filler"
makes -m32 and -m64 data buffers compatible. (But breaks compatibility with previous -m64 compiled midas).
CHN_SETTINGS and CHN_STATISTICS: apparently, -m32 and -m64 GCC has different packing rules and in -m64
mode, 4 bytes of padding are added to these data structures. Size size mismatch appears to be benign,
but will result in "size mismatch" complaints from ODB.
The fix is simple: adding "__attribute__ ((__packed__))" to the definition of the data structure makes
-m64 identical to -m32.
The "svn diff" of changes involved is attached below.
The biggest problem here is that making 32-bit ODB and 64-bit ODB compatible requires breaking one or
the other (My proposed changes break the 64-bit version. Alternatively, one could add explicit padding
to these data structures and break the 32-bit ODB).
I think it is important to make 32-bit and 64-bit code compatible: at TRIUMF we have to use a mixed
environment because out latest host computers all run 64-bit Linux while all our VME processors and all
older machines can only run 32-bit code; this incompatibility causes us weekly headaches.
Any thoughts?
K.O.
(this output of svn diff is doctored for clarity)
ladd00:midas$ svn diff
Index: include/midas.h
===================================================================
--- include/midas.h (revision 3744)
+++ include/midas.h (working copy)
- void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
+ INT unused; // was void (*dispatch) (HNDLE, HNDLE, EVENT_HEADER *, void *);
} EVENT_REQUEST;
--- include/msystem.h (revision 3744)
+++ include/msystem.h (working copy)
+#define PACKED __attribute__ ((__packed__)) <--- this goes into midas.h inside the #ifdef "we use GCC"
-typedef struct {
+typedef struct PACKED { ... CHN_SETTINGS
-typedef struct {
+typedef struct PACKED { ... CHN_STATISTICS |
394
|
06 Jul 2007 |
Konstantin Olchanski | Bug Fix | mscb, musbstd fixed on Linux, MacOS | > I commited a few minor changes to musbstd and mscb code...
>
> The basic functions work with the MSCB USB master, but I still need to
> investigate some cases where the connection hangs and usb communications do not
> work until the USB cable is unplugged and plugged back in. I see this problem
> both on MacOS and Linux.
I think I fixed the hangs we see on linux and macos - at the end all I had to do is
issue a usb reset to make mscb communicate again.
Also tested on Linux FC6 and SL4.5.
K.O. |
393
|
03 Jul 2007 |
Ryu Sawada | Info | RHEL5/SL5 success! | > P.S. For the record, the compiler produces two sets of warnings:
> - warning: dereferencing type-punned pointer will break strict-aliasing rules
> (I do not understand the meaning of the second warning. type-punned pointer, huh?)
This is because strict aliasing rule is broken in this code.
In ISO C++99 standard, it is illegal to create two pointers of different types referring to the same address.
Even a code breaks the rule, it compiles, but the result is undefined.
For example following code gives different results depends on -O2 is used or not, because -O2 includes -fstrict-aliasing option.
When -fstrict-aliasing is used, compiler can optimize the code assuming the strict aliasing rule.
#include <stdio.h>
int main(){
int ii = 1;
float* ff = (float*)ⅈ
*ff = 2;
printf("%d\n", ii);
return 0;
}
GCC warns this kind of code with a message like,warning: dereferencing type-punned pointer will break strict-aliasing rules
The behavior differs also depending on compilers. GCC3 does not warn, while GCC4 warns. (GCC3 is the default on SL4, while GCC4 is
the default on SL5)
And results are different. GCC3 gives 0, while GCC4 gives 1.
#include <stdio.h>
typedef struct xxx {int ii;} XX;
int main(){
XX a;
a.ii = 1;
*(short*)&a.ii = 0;
printf("%d\n", a.ii);
return 0;
}
More dangerous thing is that compilers do not always warn about it. For example, following code compiles without warnings even
when you use -Wall (including -Wstrict-aliasing). But the result changes depending on compile flags or compiler versions.#include <stdio.h>
#include <string.h>
#include <malloc.h>
int main(){
int *ii = (int*)malloc(8);
ii[0] = 1;
ii[1] = 2;
float* ff = (float*)ii;
ff[0] = 3;
ff[1] = 4;
printf("%d %d\n", ii[0], ii[1]);
return 0;
}
A safer way is disabling -fstrict-aliasing compile flag. For example, you may change compile flag for midas like "-O2 -fno-strict-
aliasing".
Disadvantage is that there is a possibility that the speed is decreased.
The best way is modifying code to be in the strict aliasing rule.
Best regards |
392
|
02 Jul 2007 |
Stefan Ritt | Bug Fix | mscb, musbstd fixed on Linux, MacOS |
KO wrote: | There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know. |
I can confirm that revision 3713 still works under Windows. |
391
|
29 Jun 2007 |
Konstantin Olchanski | Bug Fix | mscb, musbstd fixed on Linux, MacOS | I commited a few minor changes to musbstd and mscb code to make them work on
MacOSX (tested on 10.3.9) and Linux (tested on Fedora 6).
The basic functions work with the MSCB USB master, but I still need to
investigate some cases where the connection hangs and usb communications do not
work until the USB cable is unplugged and plugged back in. I see this problem
both on MacOS and Linux.
Important changes:
1) mscb_select_device() does not work on both Linux and MacOS and is disabled.
Please run "msc -d usb0".
2) on Linux, the Makefile should define -DOS_LINUX and -DHAVE_LIBUSB;
on MacOS, the Makefile should define -DOS_LINUX and -DOS_DARWIN. (This is
because MacOS is treated as a funny type of Linux).
3) when doing USB communications, one has to use the correct endpoint numbers,
which seem to be system dependant and for now, I hard code them in mscb.c for
the tested systems.
There supposed to be no changes to the Windows code, but I cannot test on
Windows, so if somebody does and finds breakage, please let me know.
K.O. |
390
|
12 Jun 2007 |
Randolf Pohl | Forum | crash when analyzing multiple runs offline | Hi
> So I guess your solution is not a real solution.
I was not precise enough on what I do. This way the histograms persist in memory, but
they are also written to every file:
e.g. in module "trig_tdc":
TDirectory *savedir = gDirectory; // will restore this afterwards
gROOT->cd(); // go to file
// make sure we are in the right "analyzer module folder"
TDC_Folder = (TFolder *) gROOT->FindObjectAny("trig_tdc");
gHistoFolderStack->Add((TObject *) TDC_Folder);
...(loop over all TDCs, figure out which histos exist, and which need to be booked)
open_subfolder("raw4208");
hrTDC = h1_book(....); // create histo in memory, but it shows up in the file, too.
close_subfolder(); //raw4208
// restore gHistoFolderStack (we added a folder when entering routine)
gHistoFolderStack->Remove(gHistoFolderStack->Last());
// restore current directory
savedir->cd();
When deleting histos I do:
gManaHistosFolder->RecursiveRemove(*pHisto);
(*pHisto)->Delete();
(*pHisto) = NULL; // for my book-keeping of existing histos.
You don't have to clear the histos explicitly between runs. gManaHistosFolder does this
magic to you.
> But if you feel like something should be modified in mana.c, please send it to me
> and I can incorporate it into the standard code.
No, the code is fine. I just wanted to explain my problem and a solution to it, because
I thought that somebody might run into the same problem, too.
Ciao,
Randolf |
389
|
11 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | > I have hunted down "my" segfault problem to the fact that I book histograms not
> in <module>_init, but in <module>_bor. I have to do so, because only in bor do I
> know which histograms to book, as this information comes from the ODB (booking
> only histograms for CAMAC modules which were set to "read" in the ODB). The core
> dump happens on the first access (->Fill, ->SetName,...) of one of these histos
> in the 2nd run analyzed offline ("./analyzer -r n m").
>
> In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module
> bor() functions go to the output file", and then does a gManaOutputFile->cd();
> Consequently, the histograms vanish after the file is closed, therefore the
> segfault when trying to access them in the 2nd run. (I keep track of existing
> histograms, only booking the missing histos in bor.)
>
> The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with
> TFolders and booking the histogram.
ROOT has the strange concept of "current working directory", coming from the fact that
ROOT was written by Fortran and PAW people, being used to have directories and
subdirectories with a persistent state (not really object-oriented style). So one can
set the "current working directory" to the root (=memory) with gROOT->cd() and to a
subdirectory which will later be written into a file with gManaOutputFile->cd(). If
you do the first one, the histograms are created only in memory, while in the later
case they are also created in memory, but will later be written into the output file
in the routine CloseRootOutputFile(). So if you do a gROOT->cd() in <module>_bor,
these histograms will not be written to file. So I guess your solution is not a real
solution.
> I do, however, not really understand the intention why histos booked in bor() go
> to only the file, whereas histos booked in init() go to memory. Could you please
> comment briefly? Maybe I missed the most important point. And what about online
> mode, should this work?
The root output file is opened in bor() and closed in eor(). For a histo to go to the
file, it must be booked after opening the file, that is after bor() in mana.c and
therefore after the gManaOutputFile->cd().
I agree with you that the current scheme is not satisfactory. When running online, you
want to keep the histos between the runs. When running offline, you delete and
re-create them for each run. It would be better to create all histos online and
offline under gROOT, and just copy them to gManaOutputFile before writing them. I have
to admit that this root code was never really used in a productive environment for
offline analysis, so there might be some issues here and there. Some people write
directly root files in the logger, and then do a root-only (without the midas
analyzer) analysis. Unfortunately I'm busy these days and cannot write any code right
now. But if you feel like something should be modified in mana.c, please send it to me
and I can incorporate it into the standard code. |
388
|
11 Jun 2007 |
Randolf Pohl | Forum | crash when analyzing multiple runs offline | Hello again,
just for the record, in case somebody else runs into the same problem...
I have hunted down "my" segfault problem to the fact that I book histograms not
in <module>_init, but in <module>_bor. I have to do so, because only in bor do I
know which histograms to book, as this information comes from the ODB (booking
only histograms for CAMAC modules which were set to "read" in the ODB). The core
dump happens on the first access (->Fill, ->SetName,...) of one of these histos
in the 2nd run analyzed offline ("./analyzer -r n m").
In mana.c:bor (line 1854) is stated that "all ROOT objects created by user module
bor() functions go to the output file", and then does a gManaOutputFile->cd();
Consequently, the histograms vanish after the file is closed, therefore the
segfault when trying to access them in the 2nd run. (I keep track of existing
histograms, only booking the missing histos in bor.)
The problem goes away with "gROOT->cd()" in <module>_bor, before fiddling with
TFolders and booking the histogram.
I do, however, not really understand the intention why histos booked in bor() go
to only the file, whereas histos booked in init() go to memory. Could you please
comment briefly? Maybe I missed the most important point. And what about online
mode, should this work?
Thanks a lot in advance,
Randolf |
387
|
10 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | > tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should
> presumably not be the case, since CloseRootOutputFile() frees the trees at eor().
Yes this indeed a bug. I applied your change and committed the new code. |
386
|
09 Jun 2007 |
Randolf Pohl | Forum | crash when analyzing multiple runs offline | Hello Stefan,
tree_struct.n_tree keeps counting up from run to run (in book_ttree). This should
presumably not be the case, since CloseRootOutputFile() frees the trees at eor().
------------------- output ---------------------------
lamb@lamb2:~/midas/root_3705> ./analyzer -e
exa_root -i /tmp/midas/examples/root/run%05d.mid -o /tmp/midas/run%05d.root -r 1 2
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
book_ttree: tree_struct.n_tree = 1
book_ttree: tree_struct.n_tree = 2
Set run number 1 in ODB
Load ODB from run 1...OK
/tmp/midas/examples/root/run00001.mid:2722 /tmp/midas/run00001.root:2720 events,
0.21s
book_ttree: tree_struct.n_tree = 3 <<---- !!!!
book_ttree: tree_struct.n_tree = 4
Set run number 2 in ODB
Load ODB from run 2...OK
/tmp/midas/examples/root/run00002.mid:2347 /tmp/midas/run00002.root:2345 events,
0.18s
*** Break *** segmentation violation
----------------- \output ----------------------------
Adding this one line fixes the segfault problem for the root example expt.
----------------- code -------------------------
lamb@lamb2:/data/software/midas/midas_3705/src/src> svn diff mana.c
Index: mana.c
===================================================================
--- mana.c (revision 3705)
+++ mana.c (working copy)
@@ -1496,6 +1496,7 @@
/* delete event tree */
free(tree_struct.event_tree);
tree_struct.event_tree = NULL;
+ tree_struct.n_tree = 0;
// go to ROOT root directory
gROOT->cd();
---------------- \code ---------------------------
Please check if this gives the intended behaviour. I am not very familiar with the
midas internals.
Unfortunately my own analyzer's segfault problem is not solved by this patch. I
guess I have to keep searching for a bug on my side..... :-)
Cheers,
Randolf |
385
|
08 Jun 2007 |
Stefan Ritt | Forum | crash when analyzing multiple runs offline | Unfortunately I don't have time right now to debug the problem, but I could see
roughly what it could be. The analyzer crashes inside CloseRootOutputFile:
#5 <signal handler called>
#6 0x00002b5f52ad5ee5 in free () from /lib64/libc.so.6
#7 0x000000000040c89b in CloseRootOutputFile () at src/mana.c:1489
in the line
free(tree_struct.event_tree[i].branch);
If a "free" crashes, it might indicate that the memory beyond the allocated space
got corrupted. The branch gets allocated in book_ttree(), once for each
analyze_request[i]. The branch gets filled in write_event_ttree():
/* fill tree both online and offline */
if (!exclude_all)
et->tree->Fill();
Maybe one should put printf debugging statements in these places to see what's
going on. |
384
|
08 Jun 2007 |
Stefan Ritt | Suggestion | RFC- ACLs for midas rpc, mserver, mhttpd access | First I have a general question: mserver is started through xinetd, and xinetd has
the options "only_from" and "no_access". This is equivalent to the tcp_wrapper
functionality. Why not using this? It's possible without changing anything in midas.
Or am I missing anything?
If that does not work for some reason, here are some thought from my side:
- We don't have much of a problem with malicious hackers, but with institute-wide
security checking. Hackers are only interested in mechanisms where they can obtain
control over thousands of machines (like breaking ssh etc.). The few midas machines
are not a good target for them. But even at PSI there are security scans, which try
to connect to various ports and can crash systems, so I agree that something needs
to be done.
- Whatever we do, it should be consistent on linux and windows and should not rely
on external packages, since I don't want to get into dependencies there.
- I see that both having the security information in the ODB or having them in
external files can be advantageous. There is certainly the aspect of restoring old
ODBs, or keeping several experiments (ODB) on one machine consistent. On the other
hand storing data in the ODB might me liked by people who are familiar with this
concept, and want to change things though mhttpd for example.
- Having said all that, it would make sense to me to write a simple central routine
access_allowed(), which takes the IP address of a remote client wanting to connect,
and return true or false. This routine should read /etc/hosts.allow, /etc/hosts.deny
and interprete it, but only the section for midas, and maybe only a subset of the
functionality there (we probably don't need NIS netgroup names, external files and
spawn commands there). If the files /etc/hosts.x do not contain anything about midas
or are not preset (Windows!), the routine should look in the ODB under
/experiment/security/mserver/hosts.allow and /experiment/security/mserver/hosts.deny
and use that information instead of the files.
- We probably need different mechanisms for mserver and for mhttpd. The mserver
clients are usually only a few programs like the front-ends, while one may want to
control an experiment over mhttpd from much more machines. So we should establish a
second ACL for mhttpd. The already present "/experiment/security/allowed hosts" for
mhttpd should be converted into "/Experiment/Security/mhttpd/hosts.allow" and the
function access_allowed() should be used to interprete that, so that we only need to
write it once. |
383
|
07 Jun 2007 |
John M O'Donnell | Suggestion | RFC- ACLs for midas rpc, mserver, mhttpd access | I am in favor of tcp_wrappers.
tcp_wrappers is well understood.
It works well in combination with a firewall.
mhttpd hangs when our security folks scan us. We are not allowed to block them
with a firewall, but we can use tcpwrappers.
Would it make sense to put the same mechanism on mserver?
the man page for libtcpwrappers.a (taken from the tcpwrappers7.6 tar ball) is
attached. And the output after running it through nroff -man.
The odb is too fragile for security. It is not understood well enough by many
experimenters.
As you can see I am in favor of tcp_wrappers. This is mainly because it is part
of an existing and tested security model. I don't know about the windows
world, but as you can also see, I vote for using something that is already part
of the windows security model. Here's an example of how well the integrated
security model works:
if an person is part of an experiment I make sure they can ssh to the
experiment's computer
the same rules could provide them with web access
Second is that when a change is needed to the security model then it is easy to
keep it current. What if somebody restores an old ODB? What if they setup a
small test with a new ODB?
If mhttpd used tcp_wrappers, then all our machines here at LANL would already be
configured! No need for
users to do any root access (though those that need it have it anyway).
John. |
382
|
07 Jun 2007 |
Randolf Pohl | Forum | crash when analyzing multiple runs offline | Hello,
I am having a problem with the root-based analyzer. It crashes when I try to
analyze multiple runs OFFLINE using the "-i run%05d.mid -o result%05d.root -r
1 2" feature.
I can reproduce the problem with the example experiment which comes with the
MIDAS distribution:
Running the analyzer ONLINE works fine: One can start and stop runs one after
the other, roody shows the histograms being reset and then filled again and
such.
But OFFLINE, the analyzer crashes when trying to analyze the SECOND run in a
sequence. So
./analyzer -i run%05d.mid -o result%05d.root -r 1 1 works (only run 1)
./analyzer -i run%05d.mid -o result%05d.root -r 1 3 dies on run 2
Output attached (I added printf's to the "init"-modules, but that's irrelevant
here)
My own analyzer shows the same effect. There I got the impression the segfault
happens on the first attempt to Fill/Reset/SetName etc. a histogram in the 2nd
run. But with the midas example it looks like the analyzer finishes filling
histos even for run 2, but then dies in eor.
Can you reproduce the problem?
I run MIDAS on an Intel Quadcore, 64 bit SuSE Linux 10.2.
pohl@lamb2:~/midas/examples/root> gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)
(maybe 4.1.2 "PRERELEASE" is the problem? See message ID 344)
I am using midas rev. 3674 (April 19, 2007), but I got the impression there
has since not been a change relevant to this problem. Please correct me if I
am wrong, then I would try it with Rev HEAD.
(My version includes already the fix to the x86_64 segfault problem of message
ID 337)
Best regards,
Randolf |
381
|
07 Jun 2007 |
Konstantin Olchanski | Suggestion | RFC- ACLs for midas rpc, mserver, mhttpd access | Running MIDAS at CERN is proving more challenging than I expected. The network environement is not
as benign as I am used to (i.e. at TRIUMF) and our machines are being constantly probed by something/
somebody.
This already caused failures in the mserver (fixed in midas svn) and I would like to resolve this problem
once and for all. The age of "nice networks" is over.
The case of the mserver and for the midas rpc servers (every midas applications listens for midas rpc
requests, i.e. run transitions) is simple. The list of machines running midas applications is known ahead
of time, so we can put them all into a list of permitted machines and deny rpc connections to anybody
else. I propose we keep this list of permitted mserver clients in "/experiment/security/mserver hosts".
(The already existing "/experiment/security/allowed hosts" mechanism is insufficient: it does not
prevent the mserver from accepting connections from hostile machines, and talking to them, for
example giving them the list of available experiments. There is a fair amount of code involved and I do
not presume to certify any of it as hack-proof or even as crash-proof.)
For mhttpd http:// access control, I thought of using tcp_wrappers, but C-API documentation does not
exist (I looked), the example code in tcpd.c is way too complicated, editing the ACL /etc/hosts.allow
unnecessarily requires root privileges and non of it would work on Windows.
So I am favouring a home-made hostname or ip-address filter, similar to /etc/hosts.allow, with ACL
stored, for example, in "/experiment/security/mhttpd hosts".
Any thoughts?
K.O. |
380
|
22 May 2007 |
Stefan Ritt | Bug Report | analyzer_init called by odb_load | > Thanks for the quick reply, Stefan.
>
> Please don't change anything in the code unless you find it really important.
I guess
> changing the analyzer_init prototype will break a lot of code out there?
>
> In fact, I think I do understand this behavior now.
> And even without your suggested fix there is a simple workaround: I add a static
> variable to my analyzer_init.cxx file, and do something similar to your bFirst
fix.
>
> In conclusion, commit your fix if it does not harm others. Postpone this
commit to a
> future new version of midas which breaks a lot of things anyway...
>
> A last question, for me to understand: Why not call db_open_record in
> ana_begin_of_run then?
I fully agree with you that db_open_record would better go into ana_begin_of_run
(and
analyzer_init not being called in odb_load), and I fully agree with you that
changing the
code would break many experiments. ;-)
So I guess we leave it as it is right now as you suggested. |
379
|
22 May 2007 |
Randolf Pohl | Bug Report | analyzer_init called by odb_load | Thanks for the quick reply, Stefan.
Please don't change anything in the code unless you find it really important. I guess
changing the analyzer_init prototype will break a lot of code out there?
In fact, I think I do understand this behavior now.
And even without your suggested fix there is a simple workaround: I add a static
variable to my analyzer_init.cxx file, and do something similar to your bFirst fix.
In conclusion, commit your fix if it does not harm others. Postpone this commit to a
future new version of midas which breaks a lot of things anyway...
A last question, for me to understand: Why not call db_open_record in
ana_begin_of_run then?
Cheers,
Randolf |
|