16 Jul 2008, Stefan Ritt, Info, Implementation of db_set_link_data() and db_set_link_data_index()
|
The current implementation of ODB links has the problem that once a link is
created, it cannot be changed any more through odbedit. This is because each
"set" command works on the destination of the link instead of the link. The same
happens when one loads a *.odb file. To overcome this problem, two new functions
db_set_link_data() and db_set_link_data_index() have been implemented. They
resemble their counterparts db_set_data() and db_set_data_index(), but they can
be used to directly modify a link instead of the link target. I use these
functions now in odbedit and db_paste() so that the above described problems are
fixed now. I do not expect any side effect of this, but if people experience
problems with db_paste(), please let me know. |
04 Jul 2008, Stefan Ritt, Info, Improved alarm conditions implemented
|
I implemented improved alarm conditions in the alarm system. Now one can write
conditions like
/Equipment/HV/Variables/Input[*] < 100
or
/Equipment/HV/Variables/Input[2-3] < 100
to check all values from an array or a certain range. If one array element
fulfills the alarm condition, the alarm is trigger. In addition, bit-wise alarm
conditions are possible
/Equipment/Environment/Variables/Input[0] & 8
is triggered if bit #2 is set in Input[0].
The changes are committed to SVN revision 4242. |
01 Jul 2008, Jimmy Ngai, Forum, CAEN V792N QDC with MIDAS
|
Dear All,
I have a problem when testing the V792N 16 CH QDC with the V1718 VME-USB
Bridge on Scientific Linux 5.1 i386 (kernel 2.6.18-53.1.21.e15).
The problem is that the V792N does not response normally after a few minutes
of continuous polling and readout of data. It seems like the V792N is hanged
and a hardware reset of the VME system is required to bring it working again.
If I do not poll for DREADY first and directly read the Output Buffer
continuously, the system can work properly.
I have worked on this problem many days but I cannot find any clues to solve
it. I have tried to use the CAENVMEDemo program (with some modifications) to
do the same thing (polling and readout) and it works fine. CAEN technical
support also doesn't know why the VME system is hanged. I think it might be a
problem of MIDAS itself. I have tried with MIDAS revision 4132 and the trunk
version, but the problem is still there. Is there any parameter in MIDAS
(buffer size etc?) which may cause this problem? I have attached my frontend
code and drivers for your reference.
Thank you for your kind attention.
Best Regards,
Jimmy |
11 Jun 2008, Andreas Suter, Suggestion, mlogger is flooding the message queue
|
The current versions of mlogger SVN 4215 is flooding our message system with
stuff like
> Tue Jun 10 16:42:01 2008 [Logger,INFO] Configured history with 22 events
> Tue Jun 10 16:42:14 2008 [Logger,INFO] Configured history with 22 events
> Tue Jun 10 16:42:26 2008 [Logger,INFO] Configured history with 22 events
This is fatal to us and blowing up the midas.log like hell. I would prefer if
one could flag these kind of messages (ODB /Logger/..), i.e. enable and disable
it. At the moment I have to comment it out in the source code since we cannot
work with it.
Cheers,
Andreas |
11 Jun 2008, Konstantin Olchanski, Suggestion, mlogger is flooding the message queue
|
> The current versions of mlogger SVN 4215 is flooding our message system with
> stuff like
>
> > Tue Jun 10 16:42:01 2008 [Logger,INFO] Configured history with 22 events
> > Tue Jun 10 16:42:14 2008 [Logger,INFO] Configured history with 22 events
> > Tue Jun 10 16:42:26 2008 [Logger,INFO] Configured history with 22 events
>
> This is fatal to us and blowing up the midas.log like hell. I would prefer if
> one could flag these kind of messages (ODB /Logger/..), i.e. enable and disable
> it. At the moment I have to comment it out in the source code since we cannot
> work with it.
I just sent the attached message to Stefan - please read it.
Before we take any action, we need to understand why history is being
reconfigured every 10 seconds (according to your logfile snippet).
Are you starting a new run every 10 seconds?
If that is what you do and that is your intent, I guess it is atypical usage of
MIDAS and the message from the mlogger is offensive and should be removed/disabled.
If something else is going on, we need to understand it before we sweep trouble
under the carpet by disabling this message.
K.O.
Stefan - there is more bad news - the message is produced when the history
is being reconfigured. This only is supposed to happen when the mlogger
starts or at the begin of run.
So these messages are just a tip of an iceberg of some other trouble.
The logic of when history is reconfigured I did not change. So likely
the trouble existed before, but you did not know about it.
We can kill the message, but why is the history being reconfigured
at a rate that "floods the log file"? That cannot possibly be good.
K.O. |
16 Jun 2008, Konstantin Olchanski, Suggestion, mlogger is flooding the message queue
|
> The current versions of mlogger SVN 4215 is flooding our message system with
> stuff like
>
> > Tue Jun 10 16:42:01 2008 [Logger,INFO] Configured history with 22 events
> > Tue Jun 10 16:42:14 2008 [Logger,INFO] Configured history with 22 events
> > Tue Jun 10 16:42:26 2008 [Logger,INFO] Configured history with 22 events
Problem confirmed on the M11 DAQ system at TRIUMF. We definitely do nothing funny
there, so what is going on? Will investigate.
K.O. |
16 Jun 2008, Stefan Ritt, Suggestion, mlogger is flooding the message queue
|
> > The current versions of mlogger SVN 4215 is flooding our message system with
> > stuff like
> >
> > > Tue Jun 10 16:42:01 2008 [Logger,INFO] Configured history with 22 events
> > > Tue Jun 10 16:42:14 2008 [Logger,INFO] Configured history with 22 events
> > > Tue Jun 10 16:42:26 2008 [Logger,INFO] Configured history with 22 events
>
> Problem confirmed on the M11 DAQ system at TRIUMF. We definitely do nothing funny
> there, so what is going on? Will investigate.
The only place I see where this could happen is in mlogger.c, lines 3064ff:
/* check if event size has changed */
db_get_record_size(hDB, hKey, 0, &size);
if (size != hist_log[i].buffer_size) {
close_history();
open_history();
return;
}
The record size corresponds to /Equipment/<name>/Variables. If this array changes in
size, it will trigger the re-definition of the history. So please have a look there
and check why the record size changes. |
16 Jun 2008, Stefan Ritt, Bug Fix, "Missing event" problem fixed in front-end framework
|
Since the very beginning midas had the problem that the last event of a run was
sometimes missing in the data. While for most experiments this is not an issue,
it starts to hurt on experiments using event building (front-end 1 and front-end
2 in the example below). A missing event can screw up the event builder on the
next begin of run, where the "missing event" would show up as the first event of
the new run, triggering an event mismatch error in the event builder.
After some analysis, we identified the problem as follows. Assume FE1 controls
the trigger, while FE2 generates the second event fragment.
1) Stop is requested to FE1
2) tr_stop gets called on FE1
3) tr_stop calls end_of_run() use code
4) end_of_run() disables the trigger
5) FE1 finishes stop transition
6) Stop is requested to FE2
7) FE2 finishes stop transition
What can now happen is the following: An additional event occurs between 2) and
4). This event triggers ADCs and TDCs, and is then stored in the front-end
hardware. FE2 sees this event, since it has not yet done the stop transition,
and reads it out. FE1 is however already in the end_of_run() routine, and simply
disables the trigger, without reading this last event, and thus causing the
event mismatch at the beginning of the next run.
To fix the problem, the framework in mfe.c was changed:
1) Stop is requested to FE1
2) tr_stop gets called on FE1
3) tr_stop calls end_of_run() use code
4) end_of_run() disables the trigger
4b) tr_stop calls check_polled_events()
5) FE1 finishes stop transition
6) Stop is requested to FE2
7) FE2 finishes stop transition
The new routine check_polled_events checks if there is any more event in the
hardware by calling the user polling routine. If there is one more event, calls
the user readout routine and sends it to the back-end before concluding the run
transition.
This modification solved our problem at the MEG experiment at PSI, but it might
be good that all experiments using event building update midas to revision 4225.
I do not expect any bad side effect, but one never knows. So if there are new
problems caused by this modification, please report.
- Stefan |
05 Jun 2008, Jimmy Ngai, Forum, CAEN VME-USE Bridge with MIDAS
|
Hi All,
Is there any example code for using MIDAS with the CAEN VME-USB Bridge V1718?
Thanks.
Regards,
Jimmy |
07 Jun 2008, Jimmy Ngai, Forum, CAEN VME-USE Bridge with MIDAS
|
Hi All,
I am testing the libraries provided by CAEN with the sample softwares in the
bundle CD. The Windows sample program works fine, but I cannot get started with
the Linux sample program. When I run CAENVMEDemo in Scientific Linux 5.1, it
gives me a message "Error opening the device". I have followed the instructions
in CAENVMElibReadme.txt:
- compile and load the device driver v1718.ko
- install the library libCAENVME.so
Does anyone have any experience of using V1718 in Scientific Linux? Thanks.
Regards,
Jimmy
> Hi All,
>
> Is there any example code for using MIDAS with the CAEN VME-USB Bridge V1718?
> Thanks.
>
> Regards,
> Jimmy |
20 May 2008, Konstantin Olchanski, Bug Report, pending problems and fixes from triumf
|
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn:
1) added variable /equiment/foo/common/PerVariableHistory breaks stuff (mostly
mhttpd). It is not clear how this problem escaped my pre-commit checks. This
per-equipment variable enables the per-variable history for the given equipment.
Local consensus is that this variable should not be in "common" and should not
be in "settings". Probably in "/history"? Or have only one variable to enable
this for all equipments at once (like we do in ALPHA).
2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
size reaches 2 GBytes. This problem could be new in SL5.1.
3) when a midas client becomes unresponsive, runs cannot be stopped using the
"stop" button in mhttpd. This is because cm_transition() loops over all attached
clients, but never removes clients that are known to be dead. Proposed fix is to
call cm_check_client() for each client before calling their rpc transition handler.
4) the discussed before fix for reading broken history files (skip bad data).
5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
present it either does not return all exiting data or crashes mhttpd. (no fix)
6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
to be corrected for "relative URL"). (no fix)
7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
K.O. |
28 May 2008, Konstantin Olchanski, Bug Report, pending problems and fixes from triumf
|
> Here is the list of known problems I am aware of and of fixes not yet committed
> to midas svn:
>
> 1) added variable /equiment/foo/common/PerVariableHistory
corrected in svn revision 4203, read
http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log
> 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> size reaches 2 GBytes. This problem could be new in SL5.1.
(no change)
> 3) when a midas client becomes unresponsive, runs cannot be stopped using the
> "stop" button in mhttpd. This is because cm_transition() loops over all attached
> clients, but never removes clients that are known to be dead. Proposed fix is to
> call cm_check_client() for each client before calling their rpc transition handler.
Fixed in SVN revision 4198, read
http://savannah.psi.ch/viewcvs/trunk/src/midas.c?root=midas&rev=4201&sortby=rev&view=log
> 4) the discussed before fix for reading broken history files (skip bad data).
Fixed in SVN revision 4202, read https://ladd00.triumf.ca/elog/Midas/482
> 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> present it either does not return all exiting data or crashes mhttpd. (no fix)
(no change)
> 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> to be corrected for "relative URL").
Apply this patch to src/mhttpd.c
@@ -11156,10 +11190,7 @@
sprintf(str, "SC/%s/%s", eq_name, group);
redirect(str);
} else {
- strlcpy(str, path, sizeof(str));
- if (strrchr(str, '/'))
- strlcpy(str, strrchr(str, '/')+1, sizeof(str));
- redirect(str);
+ redirect("./");
}
> 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
(no change)
K.O. |
29 May 2008, Konstantin Olchanski, Bug Report, pending problems and fixes from triumf
|
> > Here is the list of known problems I am aware of and of fixes not yet committed to midas svn:
> > 1) added variable /equiment/foo/common/PerVariableHistory
>
> corrected in svn revision 4203, read
> http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log
Was still broken - all should work in revision 4207.
> > 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> > size reaches 2 GBytes. This problem could be new in SL5.1.
It turns out that on SL5 and SL5.1 (and others?) the 32-bit version of ZLIB opens the
compressed output file without the O_LARGEFILE flag, this limits the file size to 2 GB.
Fixed by opening the file ourselves, then attach compression stream using gzdopen().
Revision 4207. (Not tested on Windows - may be broken!)
> > 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> > present it either does not return all exiting data or crashes mhttpd. (no fix)
>
> (no change)
>
> > 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> > to be corrected for "relative URL").
>
> Apply this patch to src/mhttpd.c
>
> @@ -11156,10 +11190,7 @@
> sprintf(str, "SC/%s/%s", eq_name, group);
> redirect(str);
> } else {
> - strlcpy(str, path, sizeof(str));
> - if (strrchr(str, '/'))
> - strlcpy(str, strrchr(str, '/')+1, sizeof(str));
> - redirect(str);
> + redirect("./");
> }
>
> > 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
>
> (no change)
>
> K.O. |
07 Jan 2008, Stefan Ritt, Info, Roll-back for history sytem added
|
The midas history system always had the problem that the database can get
corrupted if the disk gets full where the history records (*.hst & *.idx) are
stored. This can happen if a history event can only be written partially on the
almost full disk. If later some space is freed up (by deleting other files), the
writing continues at the old position, leaving the partial event in the data
base. In that case the whole history data of the current day cannot be read
because it is corrupted.
To solve the problem, a roll-back system has been implemented in the
hs_write_event() function. If an event cannot be written fully, the history file
is restored to the old state, so the partial event is removed from the end of
the file via truncation. This way only the data which could not be written to
the disk is missing in the history file, but the other data from that day is
still valid and readable. The change has been committed in revision 4107. |
13 Feb 2008, Konstantin Olchanski, Info, Roll-back for history sytem added
|
> The midas history system always had the problem that the database can get
> corrupted if the disk gets full where the history records (*.hst & *.idx) are
> stored.
Stefan - big thanks for fixing this problem - it is one of those cases "how come I
did not think of do it!".
This change should fix the last remaining problem with history at CERN - we seem to
be unable to avoid running out of disk space once in a while (run away scripts, fat
fingers, etc) and history got corrupted every time.
But to make things more interesting we had another history outage this week - we
happen to write history files to an NFS server (not recommened! do not do this!) and
when the NFS server had a glitch, history files got corrupted - because during the
glitch NFS was not available, I think this roll-back feature would not have helped.
Anyhow, I now have a patch to allow hs_read() to "skip the bad spots" in the history
files. (hs_gen_index() also needs a patch).
In the nutshell, if invalid history data is detected, the code continues to read the
data one byte at a time, looking for valid event_id markers (etc).
The code looks sane by inspection, and if nobody objects, I would like to commit it
in the next few days.
Here is the diff against src/history.c rev 4114
Index: history.c
===================================================================
--- history.c (revision 4118)
+++ history.c (working copy)
@@ -129,6 +129,7 @@
HIST_RECORD rec;
INDEX_RECORD irec;
DEF_RECORD def_rec;
+ int recovering = 0;
printf("Recovering index files...\n");
@@ -171,7 +172,7 @@
/* skip tags */
lseek(fh, rec.data_size, SEEK_CUR);
- } else {
+ } else if (rec.record_type == RT_DATA) {
/* write index record */
irec.event_id = rec.event_id;
irec.time = rec.time;
@@ -180,6 +181,15 @@
/* skip data */
lseek(fh, rec.data_size, SEEK_CUR);
+ } else {
+
+ if (!recovering)
+ cm_msg(MERROR, "hs_gen_index", "broken history file %d, trying to
recover", (int)ltime);
+
+ recovering = 1;
+ lseek(fh, -sizeof(rec)+1, SEEK_CUR);
+
+ continue;
}
} while (TRUE);
@@ -220,6 +230,7 @@
time_t lt;
int fh, fhd, fhi;
struct tm *tms;
+ int idxsize = 0;
if (*ltime == 0)
*ltime = ss_time();
@@ -250,12 +261,15 @@
hs_open_file(*ltime, "idf", O_RDONLY, &fhd);
hs_open_file(*ltime, "idx", O_RDONLY, &fhi);
+ if (fhi >= 0)
+ idxsize = lseek(fhi, 0, SEEK_END);
+
close(fh);
close(fhd);
close(fhi);
/* generate them if not */
- if (fhd < 0 || fhi < 0)
+ if (fhd < 0 || fhi < 0 || idxsize == 0)
hs_gen_index(*ltime);
return HS_SUCCESS;
@@ -1480,12 +1494,33 @@
i = -1;
M_FREE(cache);
cache = NULL;
- } else
+ } else {
+
+ try_again:
+
i = sizeof(irec);
-
- if (cp < cache_size) {
memcpy(&irec, cache + cp, sizeof(irec));
cp += sizeof(irec);
+
+ /* if history file is broken ... */
+ if (irec.time < last_irec_time) {
+ //printf("time %d -> %d, cache_size %d, cp %d\n", last_irec_time, irec.time,
cache_size, cp);
+
+ //printf("Seeking next record...\n");
+
+ while (cp < cache_size)
+ {
+ DWORD* evidp = (DWORD*)(cache + cp);
+ if (*evidp == event_id) {
+ //printf("Found at cp %d\n", cp);
+ goto try_again;
+ }
+
+ cp++;
+ }
+
+ i = -1;
+ }
}
} else
i = read(fhi, (char *) &irec, sizeof(irec));
K.O. |
13 Feb 2008, Stefan Ritt, Info, Roll-back for history sytem added
|
> But to make things more interesting we had another history outage this week - we
> happen to write history files to an NFS server (not recommened! do not do this!) and
> when the NFS server had a glitch, history files got corrupted - because during the
> glitch NFS was not available, I think this roll-back feature would not have helped.
Actually I put our history data on a separate file system, on a separate disk controlled
by a separate RAID controller! If you write bulk data with the logger, and want to read
history files at the same time with mhttpd, you get a bottleneck if both data are at the
same physical disk. Separating this (and even the controller) speeded things up
dramatically.
The rollback will not work for NFS, since it requires truncating the file if an event
gets only partially written. While on a full file system you always can *delete* data,
this does not work if NFS is down. This explains the behavior.
> Anyhow, I now have a patch to allow hs_read() to "skip the bad spots" in the history
> files. (hs_gen_index() also needs a patch).
>
> In the nutshell, if invalid history data is detected, the code continues to read the
> data one byte at a time, looking for valid event_id markers (etc).
>
> The code looks sane by inspection, and if nobody objects, I would like to commit it
> in the next few days.
Great. I was thinking of something like this myself. Having a quick look at your code
looks good. The best of course would be if we would have some "magic number" for
re-synchronizating the data stream, but that would blow up the file length. So searching
for the right event id is good, but will not work 100%. Also the check
if (irec.time < last_irec_time)
to see if the history is broken is very weak. If you take random data, it will be true
50% and false 50%. If one makes however a check
if ((irec.time - last_irec_time) > 3600*24)
this would work correctly with random data in >99% of all cases (3600*24/2^32). Maybe
you should change that. |
28 May 2008, Konstantin Olchanski, Info, Roll-back for history sytem added
|
> > But to make things more interesting we had another history outage this week...
> > Anyhow, I now have a patch to allow hs_read() to "skip the bad spots" in history files.
>
> [Stefan suggested]
>
> if ((irec.time - last_irec_time) > 3600*24)
Yes, your stronger check works quite nicely. The whole patch is now committed into SVN,
revision 4202.
This is how it all works:
0) teach hs_gen_index() to skip over bad data. This is important because hs_read() only
looks at data records listed in the index file: if bad data is omitted from the index,
hs_read() will never see it and we do not need to worry about it in hs_read().
0a) because hs_gen_index() does not check validity of time stamps, we still need to check
them in hs_read().
1) in hs_read(), if we detect bad data (invalid headers, bad time stamps, etc), we
regenerate the index files - this removes a while class of bad data. We also look at time
stamps carefully and ignore records where time goes backwards (usually bad data) and ignore
records with time in the future beyound the end of the current history file (each history
file only contains 24*60*60 seconds = 1 day's worth of data).
While certainly not bullet-proof, these changes should make it easier to deal with
corruption of history files.
K.O. |
20 May 2008, Konstantin Olchanski, Bug Report, pending problems and fixes from triumf
|
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn: |
30 Apr 2008, Konstantin Olchanski, Info, triumf elog updated to elog-2.7.3-1.i386.rpm
|
FYI - in conjunction with replacement of ladd00.triumf.ca, this MIDAS ELOG has been updated to the latest
version 2.7.3-2058. Please report any problems or anomalies. K.O. |
02 Apr 2008, Konstantin Olchanski, Info, add "const" attributes to db_xxx() functions
|
Now that we use more and more C++, lack of "const" attribute on most midas functions is causing some
problems. I am now ready to commit changes to midas.h and odb.c that add the const attributes to ODB
access functions db_xxx(), i.e.
INT db_rename_key(HNDLE hDB, HNDLE hKey, char *name)
becomes
INT db_rename_key(HNDLE hDB, HNDLE hKey, const char *name)
If we proceed with this conversion, and it does not cause major havoc, I can continue and "const"ify the
rest of midas.h. I note that the mxml functions appear to already have the correct "const" declarations.
P.S. Adding the "const" attribute caught a few places where we were modifying a "char*" string passed by
the caller. This is undesirable if we are passed a string literal, i.e. db_rename_key(...,"foo"), and it is a
complete disaster in conjunction with C++ strings, i.e. db_rename_key(...,foo.c_str())
K.O. |
02 Apr 2008, Stefan Ritt, Info, add "const" attributes to db_xxx() functions
|
> Now that we use more and more C++, lack of "const" attribute on most midas functions is causing some
> problems. I am now ready to commit changes to midas.h and odb.c that add the const attributes to ODB
> access functions db_xxx(), i.e.
> INT db_rename_key(HNDLE hDB, HNDLE hKey, char *name)
> becomes
> INT db_rename_key(HNDLE hDB, HNDLE hKey, const char *name)
>
> If we proceed with this conversion, and it does not cause major havoc, I can continue and "const"ify the
> rest of midas.h. I note that the mxml functions appear to already have the correct "const" declarations.
>
> P.S. Adding the "const" attribute caught a few places where we were modifying a "char*" string passed by
> the caller. This is undesirable if we are passed a string literal, i.e. db_rename_key(...,"foo"), and it is a
> complete disaster in conjunction with C++ strings, i.e. db_rename_key(...,foo.c_str())
I fully approve your idea. You are absolutely right that it also will help to prevent errors such as modifying
fixed strings. I was just too lazy to do that, because it requires some additional code like:
func(const char *p)
{
char str[256];
strlcpy(str, p, sizeof(str));
strlcat(str, ...)
}
So if you do it, it's great! |
03 Apr 2008, Konstantin Olchanski, Info, add "const" attributes to db_xxx() functions
|
> > I am now ready to commit changes to midas.h and odb.c that add the const attributes to ODB
> > access functions db_xxx(), i.e.
> > INT db_rename_key(HNDLE hDB, HNDLE hKey, char *name)
> > becomes
> > INT db_rename_key(HNDLE hDB, HNDLE hKey, const char *name)
>
> I fully approve your idea.
Committed revision 4172.
K.O. |
02 Apr 2008, Konstantin Olchanski, Info, add "const" attributes to db_xxx() functions
|
Now that we use more and more C++, lack of "const" attribute on most midas functions is causing some
problems. I am now ready to commit changes to midas.h and odb.c that add the const attributes to ODB
access functions db_xxx(), i.e.
INT db_rename_key(HNDLE hDB, HNDLE hKey, char *name)
becomes
INT db_rename_key(HNDLE hDB, HNDLE hKey, const char *name)
If we proceed with this conversion, and it does not cause major havoc, I can continue and "const"ify the
rest of midas.h. I note that the mxml functions appear to already have the correct "const" declarations.
P.S. Adding the "const" attribute caught a few places where we were modifying a "char*" string passed by
the caller. This is undesirable if we are passed a string literal, i.e. db_rename_key(...,"foo"), and it is a
complete disaster in conjunction with C++ strings, i.e. db_rename_key(...,foo.c_str())
K.O. |
23 Mar 2008, Konstantin Olchanski, Info, History SQL database poll: MySQL, PgSQL, ODBC?
|
I would like to hear from potential users on which SQL database would be
preferable for storage of MIDAS history data.
My current preference is to use the ODBC interface, leaving the choice of
database engine to the user. While ODBC is not pretty, it appears to be adequate
for the job, permits "funny" databases (i.e. flat files) and I already have
prototype implementations for reading (mhttpd) and writing (mhdump/mlogger)
history data using ODBC.
In practice, MySQL and PgSQL are the main two viable choices for using with the
MIDAS history system. We tested both (no change in code - just tell ODBC which
driver to use) and both provide comparable performance and disk space use. We
were glad to see that the disk space use by both SQL databases is very
efficient, only slightly worse than uncompressed MIDAS history files.
At TRIUMF, for T2K/ND280, we now decided to use MySQL - it provides a better
match to MIDAS data types (has 1-byte and 2-byte integers, etc) and appears to
have working database replication (required for our use).
With mlogger already including support for MySQL, and MySQL being a better match
for MIDAS data, this gives them a slight edge and I think it would be reasonable
choice to only implement support for MySQL.
So I see 3 alternatives:
1) use ODBC (my preference)
2) use MySQL exclusively
3) implement a "midas odbc layer" supporting either MySQL or PgSQL.
Before jumping either way, I would like to hear from you folks.
K.O. |
09 Mar 2008, Exaos Lee, Suggestion, New Makefile for building MIDAS
|
I rewrote the Makefile for MIDAS in order to make it tidy. I tested it on my box
and it works here.
1. The full file is seperated to several parts
a. initialized setup
b. environment setup
c. specify OS-specific flags
d. processing environment for building flags
e. targets
2. The file is less than 400 lines now. The original one is more than 500 lines.
3. The modified one is easy for debuging.
I tried to learn "autoconf" and "automake" in order to make building MIDAS more
compatible for various platforms. But I havn't enough time now. Hope somebody
can help it. The attached file is original named "Makefile.in" for using "autoconf".
:-) |
09 Mar 2008, Stefan Ritt, Suggestion, New Makefile for building MIDAS
|
> I rewrote the Makefile for MIDAS in order to make it tidy. I tested it on my box
> and it works here.
> 1. The full file is seperated to several parts
> a. initialized setup
> b. environment setup
> c. specify OS-specific flags
> d. processing environment for building flags
> e. targets
> 2. The file is less than 400 lines now. The original one is more than 500 lines.
> 3. The modified one is easy for debuging.
>
> I tried to learn "autoconf" and "automake" in order to make building MIDAS more
> compatible for various platforms. But I havn't enough time now. Hope somebody
> can help it. The attached file is original named "Makefile.in" for using "autoconf".
I think it is a good idea to cleanup the Makefile. It grew over many years and certainly
had some inconsistencies. We did however not use "autoconf" since it is not of much use.
It is meant for systems where small differences between different Unix flavors are
covered by this system, but the midas source code is supposed not only to run on Unix,
but also on vxWorks and Windows. As you can imagine, the differences are much more
severe and a simple makefile generator cannot cover the details. Furthermore, under
Windows there is no such thing like autoconf. So all the work to make the source code
compile on all systems has been put into system.c using conditional compiling. So
putting another abstraction layer on this would maybe more complicate things than
simplify it. I will test your Makefile, and I also ask the guys at TRIUMF to do so. Once
we conclude that it works fine, we can replace the original Makefile from the distribution. |
10 Mar 2008, Stefan Ritt, Suggestion, New Makefile for building MIDAS
|
> I rewrote the Makefile for MIDAS in order to make it tidy. I tested it on my box
> and it works here.
> 1. The full file is seperated to several parts
> a. initialized setup
> b. environment setup
> c. specify OS-specific flags
> d. processing environment for building flags
> e. targets
> 2. The file is less than 400 lines now. The original one is more than 500 lines.
> 3. The modified one is easy for debuging.
>
> I tried to learn "autoconf" and "automake" in order to make building MIDAS more
> compatible for various platforms. But I havn't enough time now. Hope somebody
> can help it. The attached file is original named "Makefile.in" for using "autoconf".
The Makefile is missing -lzip:
[ritt@pc5082 ~/midas]$ make -f Makefile-by-EL
Making directory linux ...
Making directory linux/lib ...
Making directory linux/bin ...
g++ -g -O3 -Wall -Wuninitialized -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DOS_LINUX -fPIC
-Wno-unused-function -DHAVE_ZLIB -Iinclude -Idrivers -I../mxml -Llinux/lib -o
linux/bin/mlogger src/mlogger.c linux/lib/libmidas.a -lutil -lpthread
/tmp/cceHnAKe.o(.text+0x83c): In function `midas_flush_buffer(LOG_CHN*)':
src/mlogger.c:984: undefined reference to `gzwrite'
/tmp/cceHnAKe.o(.text+0xb24): In function `midas_log_open(LOG_CHN*, int)':
src/mlogger.c:1132: undefined reference to `gzopen'
/tmp/cceHnAKe.o(.text+0xb46):src/mlogger.c:1140: undefined reference to `gzsetparams'
/tmp/cceHnAKe.o(.text+0xe2a): In function `midas_log_close(LOG_CHN*, int)':
src/mlogger.c:1208: undefined reference to `gzflush'
/tmp/cceHnAKe.o(.text+0xe40):src/mlogger.c:1210: undefined reference to `gzclose'
collect2: ld returned 1 exit status
make: *** [linux/bin/mlogger] Error 1
[ritt@pc5082 ~/midas]$ |
10 Mar 2008, Exaos Lee, Suggestion, New Makefile for building MIDAS
|
> The Makefile is missing -lzip:
Sorry, spelling error.
The "LIBS +=" should be replaced by "LDFLAGS +=" |
10 Mar 2008, Konstantin Olchanski, Suggestion, New Makefile for building MIDAS
|
> I rewrote the Makefile for MIDAS in order to make it tidy.
Not that the current Makefile is too pretty (I have seen worse), but it works and it is fairly compact for a project of
this complexity, it handles a large number of operating systems and build options very efficiently.
I think you found that out with your rewriting exercise - your version of the Makefile contains all the same code,
just rearranged to suite your taste, with existing bugs preserved and new bugs added.
> I tested it on my box and it works here.
As they say, the devil is in the details. I notice some subtle changes in your Makefile that make me go "what?":
1) the command for building the midas shared library used to be "ld -shared", in your version, "-shared" is gone.
But check with the GCC manual, today's recommended command is probably "gcc -shared".
2) mhdump is now linked with ROOT, but I wrote it recently enough to remember that it does not use ROOT
3) hand-crafted dependancies have been replaced with generic "almost every .o depends on every .h", which is
incorrect. The "almost every .o" part bothers me.
4) "make clean" runs "rm -rf" - plain scary.
5) "$(shell ...)" is overused
I think by the end all these little details are sorted out and all the quirks are put back in, your Makefile will look no
better than the current Makefile.
> 2. The file is less than 400 lines now. The original one is more than 500 lines.
It looks like your savings came from removing comments, removing hand-crafted dependancy lists and replacing
fairly verbose "make install" targets (which we do not use anyway) with your own much simpler scripts.
All the juicy bits needed to actually build all the code appear to take about as much space as before.
Also the original mistake of recompiling programs when they only need relinking was not fixed. (For example,
when libmidas is updated, to update mhttpd, the current Makefile needlessly recompiles mhttpd.c. Better use
would be to compile mhttpd.c into mhttpd.o, then only a relink is needed).
> I tried to learn "autoconf" and "automake" in order to make building MIDAS more
> compatible for various platforms. But I havn't enough time now. Hope somebody
> can help it. The attached file is original named "Makefile.in" for using "autoconf".
Most experience with autoconf/automake is all negative. The promise was "never debug your Makefile ever
again!", delivered was "debug the configure script instead!". In practice, with autoconf/automake, you try to run
configure, kludge it until it stops crashing, then tweak the incomprehensible Makefiles it produces until the code
compiles.
K.O. |
10 Mar 2008, Exaos Lee, Suggestion, New Makefile for building MIDAS
|
> Most experience with autoconf/automake is all negative. The promise was "never debug your Makefile ever
> again!", delivered was "debug the configure script instead!". In practice, with autoconf/automake, you try to run
> configure, kludge it until it stops crashing, then tweak the incomprehensible Makefiles it produces until the code
> compiles.
>
> K.O.
I admit that the new one is fit to my flavor. For a common user, I think, a simple procedure of configure/make/install
is better than changing the Makefile manually because many users are lack of knowledges about Makefile. That's why
I want to learn autotools. The configure script is generated automatically by "autoconf", so you needn't to debug it.
For the developer, you need to debug the configure.ac/in files for generating the configure script. For a common user,
he/she only needs to run it. In fact, some more complex projects like ROOT use AUTOTOOLS and they don't include
the original files which are needed for generating the "configure" script. I prefer the MIDAS project includes such a
script to make the compiling simpler and easier instead of changing the Makefile manually. |
12 Mar 2008, Konstantin Olchanski, Suggestion, New Makefile for building MIDAS
|
> > Most experience with autoconf/automake is all negative. The promise was "never debug your Makefile ever
> > again!", delivered was "debug the configure script instead!".
>
> I admit that the new one is fit to my flavor. For a common user, I think, a simple procedure of configure/make/install
> is better than changing the Makefile manually because many users are lack of knowledges about Makefile. That's why
> I want to learn autotools.
The reality is that you will never deliver a Makefile/Configure script that works for everybody in every case - users will always have a need to tweak the build
process to suit their needs. In this situation, "Makefile" is a much better language and "make" is a much better tool for users to deal with - much simpler, better
documented and better understood compared to autotools (*nobody* understands autotools; also compare the size of the midas Makefile with the size of a
typical configure script).
Anyhow, we will be cross-compiling midas to run on a PowerPC processor inside a Virtex4 FPGA. Go handle that with configure scripts.
K.O. |
10 Mar 2008, Exaos Lee, Suggestion, "Makefile-by-EL" updated
|
> Not that the current Makefile is too pretty (I have seen worse), but it
works and it is fairly compact for a project of
> this complexity, it handles a large number of operating systems and build
options very efficiently.
>
> I think you found that out with your rewriting exercise - your version of
the Makefile contains all the same code,
> just rearranged to suite your taste, with existing bugs preserved and new
bugs added.
I derived the new Makefile from the original one so that feathers and bugs are
also included.
I havn't experiences on platforms other than Linux and MacOS, so I cannot
recognize bugs on
other platforms if they exists in the original one. And if there are bugs,
hope users can figure
them out.
>
> As they say, the devil is in the details. I notice some subtle changes in
your Makefile that make me go "what?":
>
> 1) the command for building the midas shared library used to be "ld -
shared", in your version, "-shared" is gone.
> But check with the GCC manual, today's recommended command is probably "gcc -
shared".
Fixed.
> 2) mhdump is now linked with ROOT, but I wrote it recently enough to
remember that it does not use ROOT
The building dependence on ROOT of mhdump may be eliminated by changing the
specific target.
> 3) hand-crafted dependancies have been replaced with generic "almost
every .o depends on every .h", which is
> incorrect. The "almost every .o" part bothers me.
Fixed now.
> 4) "make clean" runs "rm -rf" - plain scary.
Fixed.
> 5) "$(shell ...)" is overused
Replaced with GNU make internal methods.
> I think by the end all these little details are sorted out and all the
quirks are put back in, your Makefile will look no
> better than the current Makefile.
I realized it now. But anyway, it looks tidy to me now. I still hope to use
AUTOTOOLS with MIDAS.
> > 2. The file is less than 400 lines now. The original one is more than 500
lines.
The new one is about 430 lines. Hmmm, it reaches the original one which is
more than 600 lines.
>
> It looks like your savings came from removing comments, removing hand-
crafted dependancy lists and replacing
> fairly verbose "make install" targets (which we do not use anyway) with your
own much simpler scripts.
>
> All the juicy bits needed to actually build all the code appear to take
about as much space as before.
>
> Also the original mistake of recompiling programs when they only need
relinking was not fixed. (For example,
> when libmidas is updated, to update mhttpd, the current Makefile needlessly
recompiles mhttpd.c. Better use
> would be to compile mhttpd.c into mhttpd.o, then only a relink is needed).
Fixed.
>
> Most experience with autoconf/automake is all negative. The promise
was "never debug your Makefile ever
> again!", delivered was "debug the configure script instead!". In practice,
with autoconf/automake, you try to run
> configure, kludge it until it stops crashing, then tweak the
incomprehensible Makefiles it produces until the code
> compiles.
>
> K.O.
====================================================
Maybe BUGS or FEATURES in this new one:
1. The shared libmidas.so and the static libmidas.a are built sperately.
The "libmidas.a" is
always built whether "NEED_SHLIB" is set or not. And all executables are built
staticly default.
I commented this in the Makefile-by-EL. Hence, if you want to use libmidas.so
with PyMIDAS and
do not want to encounter "Segmentation fault" while executing the utilies
linked dynamicly, you
may try this one.
2. I found that "minife" is failed to be built, so I remove it from the
example list.
3. Some bugs while building on MacOS Tiger 10.4.11 PPC are commented out in
the Makefile. These
bugs are still exists in the original one.
4. Using "VPATH" instead of adding pathnames.
5. Using "UTILS_SUID" to handle utilities which need SUID mode. And
the "UTILS_SUID_NEED" may be
defined in the OS-specific field, so you need not to use OS-specific commands
in the "install"
target.
6. Using "tr" with "uname" in order to delete some extra "ifeq
($(OSTYPE),...)".
7. Other things, please see the file.
Anyway, easier building is my purpose. :-) |
10 Mar 2008, Exaos Lee, Suggestion, "Makefile-by-EL" updated
|
Sorry, this line:EXECS += $(EXAMPLES:%/$(BIN_DIR)/%) should be replaced byEXECS += $(EXAMPLES:%=$(BIN_DIR)/%) |
11 Mar 2008, Stefan Ritt, Suggestion, "Makefile-by-EL" updated
|
The linking of mhttpd misses a "-lm":
cc -g -O3 -Wall -Wuninitialized -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DOS_LINUX
-fPIC -Wno-unused-function -DHAVE_ZLIB -Iinclude -Idrivers -I../mxml -o
linux/bin/mhttpd linux/lib/mhttpd.o linux/lib/mgd.o linux/lib/libmidas.a -lutil
-lpthread -lz
linux/lib/mhttpd.o(.text+0xe08f): In function `show_custom_gif':
src/mhttpd.c:5058: undefined reference to `log'
linux/lib/mhttpd.o(.text+0xe0a8):src/mhttpd.c:5058: undefined reference to `log'
The header of the makefile should contain a short description, the author(s), an
$Id:$ tag for SVN, some explanation what "icc", "ifort" means, a note about the
CFLAGS and a clear statement what can be modified by the user and why and what not. |
11 Mar 2008, Exaos Lee, Suggestion, "Makefile-by-EL" updated
|
> The linking of mhttpd misses a "-lm":
>
> cc -g -O3 -Wall -Wuninitialized -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DOS_LINUX
> -fPIC -Wno-unused-function -DHAVE_ZLIB -Iinclude -Idrivers -I../mxml -o
> linux/bin/mhttpd linux/lib/mhttpd.o linux/lib/mgd.o linux/lib/libmidas.a -lutil
> -lpthread -lz
> linux/lib/mhttpd.o(.text+0xe08f): In function `show_custom_gif':
> src/mhttpd.c:5058: undefined reference to `log'
> linux/lib/mhttpd.o(.text+0xe0a8):src/mhttpd.c:5058: undefined reference to `log'
>
Strange. I tested it on Debian Linux 4.0r2 AMD64 with gcc 4.1.2, MIDAS SVN 4124. It worked fine.
Anyway, it can be fixed by addling "-lm" to the initial"LDFLAGS".
> The header of the makefile should contain a short description, the author(s), an
> $Id:$ tag for SVN, some explanation what "icc", "ifort" means, a note about the
> CFLAGS and a clear statement what can be modified by the user and why and what not.
OK. I will comment it in detail. |
07 Mar 2008, Randolf Pohl, Bug Report, array overflows and other bugs
|
Hi,
I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc
found a bunch of bugs, I guess.
> uname -a
Linux pc 2.6.22.17-0.1-default #1 SMP 2008/02/10 20:01:04 UTC x86_64 x86_64
x86_64 GNU/Linux
> gcc --version
gcc (GCC) 4.2.1 (SUSE Linux)
I see loads of warnings during compile, most of which I know from earlier
compiles:
* warning: dereferencing type-punned pointer will break strict-aliasing rules
* warning: pointer targets in passing argument 3 of 'getsockname' differ in
signedness
But then there is a new one (in fact lots of this one), and brief
inspection suggests this is a true bug with the possibility of truly
nasty consequences.
(1)=========================
src/midas.c:7398: warning: array subscript is above array bounds
Inspection of midas.c:
if (i == MAX_DEFRAG_EVENTS) {
/* no buffer available -> no first fragment received */
7398: free(defrag_buffer[i].pevent);
memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
cm_msg(MERROR, "bm_defragement_event",
"Received fragment without first fragment (ID %d) Ser#:%d",
pevent->event_id & 0x0FFF, pevent->serial_number);
return;
}
And midas.c line 7297:
EVENT_DEFRAG_BUFFER defrag_buffer[MAX_DEFRAG_EVENTS];
So, if(i==MAX_DEFRAG_EVENTS) free(defrag_buffer[i]).
I guess this is an off-by-one bug.
(2)==========================
src/midas.c:2958: warning: array subscript is above array bounds
for (i = 0; i < 13; i++)
2958 if (trans_name[i].transition == transition)
break;
Holy cow, hard-coded "13" in the code! Should be a #define, shouldn't it?
Now look at midas.c lines 94ff:
struct {
int transition;
char name[32];
} trans_name[] = {
{
TR_START, "START",}, {
TR_STOP, "STOP",}, {
TR_PAUSE, "PAUSE",}, {
TR_RESUME, "RESUME",}, {
TR_DEFERRED, "DEFERRED",}, {
0, "",},};
There is no trans_name[12].
The trans_name[12] shows up in line 2894 and 2790, too.
(3)=============================
mfe.c:
src/mfe.c:412: warning: array subscript is above array bounds
src/mfe.c:311: warning: array subscript is above array bounds
src/mfe.c:340: warning: array subscript is above array bounds
412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i,
&device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);
for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
(..)
311: device_drv->mt_buffer->channel[current_channel].array[cmd] = value;
for (cmd = CMD_GET_FIRST; cmd <= CMD_GET_LAST; cmd++) {
(..)
340: device_drv->mt_buffer->channel[i].array[cmd] = value;
CMD_GET_DEMAND is in include/midas.h:
#define CMD_GET_DEMAND CMD_GET_DIRECT // = 20
I haven't even tried to understand mfe.c, nor did I read it.
But I suspect the thing should always be something like
....array[cmd-CMD_GET_FIRST]
so array[] is indexed from [0], not from an arbitrary number that
depends on the number of commands you insert before line 698 in
midas.h. But please could the author of this check this very carefully?
(4)=========================
src/lazylogger.c:1957: warning: array subscript is below array bounds
if ((channel < 0) && (lazyinfo[channel].hKey != 0))
That is lazyinfo[something below zero].
(5)=============================
More warnings an expert might want to have a look at:
* warning: deprecated conversion from string constant to 'char*'
* src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
uses anonymous type
* src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
anonymous type
I attach the full output of make.
Could someone knowledgeable please have a look at these warnings and fix them?
They make me a bit nervous when thinking about data integrity, and
there are now so many that they actually start to hide serious stuff
like the ones I presented.
Oh, I got rid of the "dereferencing type-punned pointer" thing by adding
"-fno-strict-aliasing" as a compiler flag. Was suggested on the Web. Seemed to
have worked during data taking (the data look reasonable :-). Is that a
possible fix/workaround?
Cheers,
Randolf |
07 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs
|
> I have just compiled MIDAS svn 4132 on a fresh SuSE 10.3 x86_64 system and gcc
> found a bunch of bugs, I guess.
Ahh, great! gcc is getting more and more clever. Each time gcc is updated, it finds
a few new issues.
Indeed some are real bugs, and I will work down the list as time permits. I see
however no immediate thread (you are not using fragmented events, a transition 12
never occurs, etc.). Issue #4 from your list has to be checked by Pierre-Andre. |
10 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs
|
There were some trivial and some non-trivial issues. Glad the compiled picked up on
this!
> I see loads of warnings during compile, most of which I know from earlier
> compiles:
> * warning: dereferencing type-punned pointer will break strict-aliasing rules
> * warning: pointer targets in passing argument 3 of 'getsockname' differ in
> signedness
I ignore these for the moment until I have a gcc 4.2 myself (we use Scientific
Linux 5 which has gcc 4.1 for the moment). As Randolph pointed out correctly you
can make gcc shut up by a proper flag there. The warnings have no influence on the
stability of midas.
> (1)=========================
> src/midas.c:7398: warning: array subscript is above array bounds
> Inspection of midas.c:
>
> if (i == MAX_DEFRAG_EVENTS) {
> /* no buffer available -> no first fragment received */
> 7398: free(defrag_buffer[i].pevent);
> memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
> cm_msg(MERROR, "bm_defragement_event",
> "Received fragment without first fragment (ID %d) Ser#:%d",
> pevent->event_id & 0x0FFF, pevent->serial_number);
> return;
> }
The free() was just wrong at that place, I removed it.
> (2)==========================
> src/midas.c:2958: warning: array subscript is above array bounds
>
> for (i = 0; i < 13; i++)
> 2958 if (trans_name[i].transition == transition)
> break;
Fixed that by
for (i=0 ;; i++)
if (trans_name[i].name[0] == 0 || trans_name[i].transition == transition)
break;
Since trans_name[i].name = "" indicates the end of the list.
> (3)=============================
> mfe.c:
> src/mfe.c:412: warning: array subscript is above array bounds
> src/mfe.c:311: warning: array subscript is above array bounds
> src/mfe.c:340: warning: array subscript is above array bounds
>
> 412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i,
> &device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);
The code at 412 was wrong there, the demand value is queried later by the device
driver directly. For the other two occurences (311 and 340) I had to really
increase the array size by one. This issue can cause segfaults if you have a slow
control front-end which uses multithreading (not many people use it except me).
> (4)=========================
> src/lazylogger.c:1957: warning: array subscript is below array bounds
>
> if ((channel < 0) && (lazyinfo[channel].hKey != 0))
>
> That is lazyinfo[something below zero].
This has to be fixed by Pierre. I guess an or instead of an and would do it, but
I'm not 100% sure.
> (5)=============================
> More warnings an expert might want to have a look at:
>
> * warning: deprecated conversion from string constant to 'char*'
>
> * src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
> uses anonymous type
> * src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
> anonymous type
>
> I attach the full output of make.
> Could someone knowledgeable please have a look at these warnings and fix them?
Uahhh. Especially the "const char*" vs. "char*" is in principle right, but will
cause a major rework. Probably hundreds of occations have to be fixed. Many strings
must be declared const, others not. It will help the programmer to find some errors
during compile which would later show up only during runtime (like writing into a
fixed string), but I only will go through that when I have gcc 4.2 installed
myself, and have two free days to work on this ;-)
> They make me a bit nervous when thinking about data integrity, and
> there are now so many that they actually start to hide serious stuff
> like the ones I presented.
Except the slow control stuff (which only is an issue for multithreaded frontends)
none of the above things will have an influence on the data integrity. But I agree
that they should be fixed.
- Stefan |
07 Jun 2007, Konstantin Olchanski, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
Running MIDAS at CERN is proving more challenging than I expected. The network environement is not
as benign as I am used to (i.e. at TRIUMF) and our machines are being constantly probed by something/
somebody.
This already caused failures in the mserver (fixed in midas svn) and I would like to resolve this problem
once and for all. The age of "nice networks" is over.
The case of the mserver and for the midas rpc servers (every midas applications listens for midas rpc
requests, i.e. run transitions) is simple. The list of machines running midas applications is known ahead
of time, so we can put them all into a list of permitted machines and deny rpc connections to anybody
else. I propose we keep this list of permitted mserver clients in "/experiment/security/mserver hosts".
(The already existing "/experiment/security/allowed hosts" mechanism is insufficient: it does not
prevent the mserver from accepting connections from hostile machines, and talking to them, for
example giving them the list of available experiments. There is a fair amount of code involved and I do
not presume to certify any of it as hack-proof or even as crash-proof.)
For mhttpd http:// access control, I thought of using tcp_wrappers, but C-API documentation does not
exist (I looked), the example code in tcpd.c is way too complicated, editing the ACL /etc/hosts.allow
unnecessarily requires root privileges and non of it would work on Windows.
So I am favouring a home-made hostname or ip-address filter, similar to /etc/hosts.allow, with ACL
stored, for example, in "/experiment/security/mhttpd hosts".
Any thoughts?
K.O. |
07 Jun 2007, John M O'Donnell, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
I am in favor of tcp_wrappers.
tcp_wrappers is well understood.
It works well in combination with a firewall.
mhttpd hangs when our security folks scan us. We are not allowed to block them
with a firewall, but we can use tcpwrappers.
Would it make sense to put the same mechanism on mserver?
the man page for libtcpwrappers.a (taken from the tcpwrappers7.6 tar ball) is
attached. And the output after running it through nroff -man.
The odb is too fragile for security. It is not understood well enough by many
experimenters.
As you can see I am in favor of tcp_wrappers. This is mainly because it is part
of an existing and tested security model. I don't know about the windows
world, but as you can also see, I vote for using something that is already part
of the windows security model. Here's an example of how well the integrated
security model works:
if an person is part of an experiment I make sure they can ssh to the
experiment's computer
the same rules could provide them with web access
Second is that when a change is needed to the security model then it is easy to
keep it current. What if somebody restores an old ODB? What if they setup a
small test with a new ODB?
If mhttpd used tcp_wrappers, then all our machines here at LANL would already be
configured! No need for
users to do any root access (though those that need it have it anyway).
John. |
08 Jun 2007, Stefan Ritt, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
First I have a general question: mserver is started through xinetd, and xinetd has
the options "only_from" and "no_access". This is equivalent to the tcp_wrapper
functionality. Why not using this? It's possible without changing anything in midas.
Or am I missing anything?
If that does not work for some reason, here are some thought from my side:
- We don't have much of a problem with malicious hackers, but with institute-wide
security checking. Hackers are only interested in mechanisms where they can obtain
control over thousands of machines (like breaking ssh etc.). The few midas machines
are not a good target for them. But even at PSI there are security scans, which try
to connect to various ports and can crash systems, so I agree that something needs
to be done.
- Whatever we do, it should be consistent on linux and windows and should not rely
on external packages, since I don't want to get into dependencies there.
- I see that both having the security information in the ODB or having them in
external files can be advantageous. There is certainly the aspect of restoring old
ODBs, or keeping several experiments (ODB) on one machine consistent. On the other
hand storing data in the ODB might me liked by people who are familiar with this
concept, and want to change things though mhttpd for example.
- Having said all that, it would make sense to me to write a simple central routine
access_allowed(), which takes the IP address of a remote client wanting to connect,
and return true or false. This routine should read /etc/hosts.allow, /etc/hosts.deny
and interprete it, but only the section for midas, and maybe only a subset of the
functionality there (we probably don't need NIS netgroup names, external files and
spawn commands there). If the files /etc/hosts.x do not contain anything about midas
or are not preset (Windows!), the routine should look in the ODB under
/experiment/security/mserver/hosts.allow and /experiment/security/mserver/hosts.deny
and use that information instead of the files.
- We probably need different mechanisms for mserver and for mhttpd. The mserver
clients are usually only a few programs like the front-ends, while one may want to
control an experiment over mhttpd from much more machines. So we should establish a
second ACL for mhttpd. The already present "/experiment/security/allowed hosts" for
mhttpd should be converted into "/Experiment/Security/mhttpd/hosts.allow" and the
function access_allowed() should be used to interprete that, so that we only need to
write it once. |
07 Mar 2008, Konstantin Olchanski, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
The mhttpd host-based access control list as used by ALPHA at CERN is now committed to
SVN (revision 4135).
When accepting connection from a remote host, the remote IP address is converted to a
hostname using gethostbyaddr(). If ODB directory "/experiment/security/mhttpd hosts",
exists, access is permitted if there is an entry for the this hostname. "localhost" is
always permitted.
In other words:
1) To enable the mhttpd access control list, create an ODB directory
"/experiment/security/mhttpd hosts".
2) From this moment, only access from "localhost" is permitted.
3) All connections from remote hosts are rejected with an error written into the midas
log file: Rejecting http connection from 'ladd05.triumf.ca'.
4) To permit access from remote hosts, take the hostname from this error message and
create an entry in "mhttpd hosts": odbedit -> cd "/Experiment/Security/mhttpd hosts" ->
create INT ladd05.triumf.ca
The idea behind this is that mhttpd is running behind an SSL proxy (or an SSH tunnel)
and only accepts connections from this proxy and perhaps from selected machines in the
experiment counting room.
P.S. I considered using tcp_wrappers, but this package does not seem to contain any
simple-to-use function "bool areTheyPermitted(const char* remoteHostname);".
P.P.S. The ODB path name is in variance from Stefan's email. I committed this code
before rereading it, please let me know if I should change the ODB paths.
P.P.P.S. I will now proceed with implementing similar code for the mserver/midas rpc.
Again, the use case is very simple: all machines permitted access to the mserver are
known in advance and can be listed in the access list. All unknown machines should be
rejected.
K.O. |
10 Mar 2008, Stefan Ritt, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
> When accepting connection from a remote host, the remote IP address is converted to a
> hostname using gethostbyaddr(). If ODB directory "/experiment/security/mhttpd hosts",
> exists, access is permitted if there is an entry for the this hostname. "localhost" is
> always permitted.
While your "positive list" will certainly work, it is much more inflexible than a more
general hosts.allow/hosts.deny with wildcards. Assume some experiment decides it wants to
be controlled from all inside CERN. With hosts.allow/deny you could do
host.deny *
host.allow *.cern.ch
to have everything ending with "cern.ch" allowed. Otherwise it would be a nightmare finding
all possible terminals at CERN and add them manually. If you are considering modifying your
committed code to this scheme, you could have a look at my elog package, where exactly this
is implemented. You could copy/paste it from there.
After you finished, also talk to Pierre about documenting this in doxygen (or do it yourself). |
10 Mar 2008, Konstantin Olchanski, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
> While your "positive list" will certainly work, it is much more inflexible than a more
> general hosts.allow/hosts.deny with wildcards. Assume some experiment decides it wants to
> be controlled from all inside CERN. With hosts.allow/deny you could do
I was going to bring this up later, but since mhttpd does not pass security audits, I believe
the only way it should be run in the modern computing environement is behind
a password-protected SSL proxy. In this case, the allow/deny list is very simple: deny all,
allow localhost (assuming httpd runs on the same host as mhttpd).
Speaking about CERN, "deny all; allow *.cern.ch" is the "default" setting, enforced by the CERN firewall. Our problem is with
random "*.cern.ch" computers poking at our DAQ and crashing the mserver. Plus we do not want our competition to access our
DAQ system, so "allow *.cern.ch" is a no go.
But since hosts.allow/hosts.deny is a superset of what I want, and since we can reuse existing code from elogd, I guess I have
no ground to object your suggestion.
I will do the mserver/mrpc this way, then retrofit it into mhttpd. (But have to commit mlogger history changes first!!!).
K.O. |
10 Mar 2008, Stefan Ritt, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
> I was going to bring this up later, but since mhttpd does not pass security audits, I believe
> the only way it should be run in the modern computing environement is behind
> a password-protected SSL proxy.
I recently built in native SSL into elogd.c and found it was very simple. We could do the same for mhttpd.
> Speaking about CERN, "deny all; allow *.cern.ch" is the "default" setting, enforced by the CERN firewall. Our problem is with
> random "*.cern.ch" computers poking at our DAQ and crashing the mserver. Plus we do not want our competition to access our
> DAQ system, so "allow *.cern.ch" is a no go.
I understand your point. But I want to tell you that there are other experiments, which want domain based access. For example at
PSI some experiments want allowed access from the experimental hall, which is the subdomain 129.129.140.* (there is not so much
competition here ;-) but not from other PSI subdomains. So you would need "deny all; allow 129.129.140.*; allow 129.129.228.*" for
example.
> I will do the mserver/mrpc this way, then retrofit it into mhttpd. (But have to commit mlogger history changes first!!!).
Agree. |
02 Mar 2008, Exaos Lee, Suggestion, Bash Script for handling an experiment code
|
I rearanged the files in "examples/experiment" as the attached "mtest_exp.zip". I re-write the start/stop script as the attached "daq.sh". The script "daq.sh" can be re-used for many experiments. The user only needs to provide an script "daq_env.sh" as the following containing the settings for the experiment environment.
#!/bin/sh
[ ! "$MIDASSYS" ] && MIDASSYS=/opt/MIDAS.PSI/Version/Current
[ ! "$HTTPPORT" ] && HTTPPORT=8080
[ ! "$SRVHOST" ] && SRVHOST=localhost
LOGGER=${MIDASSYS}/bin/mlogger
EXPPATH=/home/das/online/test
CODEPATH=${EXPPATH}/code
LOGGER=${MIDASSYS}/bin/mlogger
PROG_FE=${CODEPATH}/frontend
PROG_ANA=${CODEPATH}/analyzer
if [ ! "$MIDAS_EXPTAB" ]; then
MIDAS_DIR=${EXPPATH}
else
MIDAS_EXPT_NAME="test"
fi
I hope this can be helpful. There seem to be some problems such as:
1. When several experiments are defined, the $LOGGER may be not the one used for this exp.
2. The "pidof" may be not in some platforms, so this script is limited.
Hope anybody can help me to improve it for general purpose. All my best! |
05 Feb 2008, Denis Bilenko, Info, pymidas 0.6.0 released - python bindings for Midas
|
Hi!
I have released pymidas - Python binding to Midas.
It includes support for Online Database, Buffer, event
construction and parsing.
We have used it for a couple years now here at CMD. (http://cmd.inp.nsk.su)
One of principal DAQ applications here (Slow Control Frontend) is
written in Python using pymidas.
http://cmd.inp.nsk.su/~bilenko/projects/pymidas/pymidas.html |
18 Feb 2008, Exaos Lee, Bug Report, Great! But I failed to run it. :(
|
I encountered the error message as the following:
Traceback (most recent call last):
File "runtest.py", line 42, in <module>
import midas
File "/opt/MIDAS.PSI/Resources/PyMIDAS/pymidas/midas/__init__.py", line 140, in
<module>
cmidas = ctypes.cdll.LoadLibrary('libmidas.so')
File "/usr/lib/python2.5/site-packages/PIL/__init__.py", line 431, in LoadLibrary
File "/usr/lib/python2.5/site-packages/PIL/__init__.py", line 348, in __init__
OSError: /opt/MIDAS.PSI/Versions/Current/lib/libmidas.so: undefined symbol:
cam16i_rq
Compiling the MIDAS library using NEED_SHLIB=1 causes the same "undefined
reference" error. But it can be fixed by adding "-shared" to CFLAGS in the
Makefile. Though the libmidas.so can be successfully created, the above error is
still there. Can anybody help me?
Environment:
Platform: Ubuntu Linux 7.10 with gcc 4.1
MIDAS version: 2.0.0, svn-4106
Python version: 2.5.1 |
26 Feb 2008, Denis Bilenko, Bug Report, NEED_SHLIB=1 is broken
|
I have the exact same problem with midas rev. 4129.
`make NEED_SHLIB=1` doesn't work.
To fix it apply this patch to Makefile
Index: Makefile
===================================================================
--- Makefile (revision 4129)
+++ Makefile (working copy)
@@ -270,7 +270,7 @@
OBJS = $(LIB_DIR)/midas.o $(LIB_DIR)/system.o $(LIB_DIR)/mrpc.o \
$(LIB_DIR)/odb.o $(LIB_DIR)/ybos.o $(LIB_DIR)/ftplib.o \
- $(LIB_DIR)/mxml.o $(LIB_DIR)/cnaf_callback.o \
+ $(LIB_DIR)/mxml.o \
$(LIB_DIR)/history.o $(LIB_DIR)/alarm.o $(LIB_DIR)/elog.o
ifdef NEED_STRLCPY
i.e. remove cnaf_callback.o which causes the link errors.
I propose that libmidas.so is built by default, so when something breaks it won't go unnoticed. |
27 Feb 2008, Konstantin Olchanski, Bug Report, NEED_SHLIB=1 is broken
|
--- Makefile (revision 4129)
+++ Makefile (working copy)
- $(LIB_DIR)/mxml.o $(LIB_DIR)/cnaf_callback.o \
+ $(LIB_DIR)/mxml.o \
> i.e. remove cnaf_callback.o which causes the link errors.
Hi, Denis - I confirm that cnaf_callback.c is only used by MIDAS frontends that implement CAMAC
functions and that it should not required for building the MIDAS library. I am now looking at removing
it from libmidas.
> I propose that libmidas.so is built by default, so when something breaks it won't go unnoticed
We have been through this before and decided that shared libraries are bad and we do not want to use
them. The option for building libmidas.so was preserved, though.
Not to refight old wars, on reason against using shared libraries was version skew - one could never be
sure what version of midas is being used - depending on the PATH, LD_LIBRARY_PATH, rpath settings,
etc. There were other reasons, perhaps practical, perhaps with the mserver.
The main problem with "just build it", is that then the rest of midas will link against it bringing back all
the problems we solved by going away from using shared libraries.
So back to your proposal about building libmidas.so - can you look and see if you can do the Python
bindings with a statically linked midas library?
I know it is possible with Perl bindings - perl creates it's own shared library containing perl api glue
linked against a foreign static library libfoo.a , so in theory, the shared library is not needed.
But perhaps, Python do things differently...
K.O. |
29 Feb 2008, Denis Bilenko, Bug Report, NEED_SHLIB=1 is broken
|
Having libmidas.so is absolutely necessary for pymidas to work. If there was no such
option in Makefile pymidas users would have to build it themselves.
What I proposed though is that you change Makefile so it builds libmidas.so in
addition to (not instead of) static library. So if someone prefer to build
binaries statically they
may continue to do so. On other hand when someone needs a shared library they won't
discover that it can't be easily built. |
27 Feb 2008, Konstantin Olchanski, Bug Report, mhttpd: cannot attach history to elog
|
From "history" pages, the "create elog" button stopped working - it takes us to the elog entry form, but
then, the "submit" button does not create any elog entries, instead dumps us into an invalid history
display. This is using the internal elog.
This change in mhttpd.c::show_elog_new() makes it work again:
- ("<body><form method=\"POST\" action=\"./\" enctype=\"multipart/form-data\">\n");
+ ("<body><form method=\"POST\" action=\"/EL/\" enctype=\"multipart/form-data\">\n");
Problem and fix confirmed with Linux/firefox and MacOS/firefox and Safari.
K.O. |
28 Feb 2008, Konstantin Olchanski, Bug Report, mhttpd: cannot attach history to elog
|
> From "history" pages, the "create elog" button stopped working - it takes us to the elog entry form, but
> then, the "submit" button does not create any elog entries, instead dumps us into an invalid history
> display. This is using the internal elog.
>
> This change in mhttpd.c::show_elog_new() makes it work again:
> - ("<body><form method=\"POST\" action=\"./\" enctype=\"multipart/form-data\">\n");
> + ("<body><form method=\"POST\" action=\"/EL/\" enctype=\"multipart/form-data\">\n");
This was a problem with relative URLs and it is now fixed. Svn revision 4131, fixes: delete elog, make elog from odb, make elog from history.
K.O. |
18 Aug 2005, Konstantin Olchanski, Info, CAMAC register_cnaf_callback()
|
Some time ago, the "remote CAMAC" functionality in mfe.c was made conditional on
HAVE_CAMAC. This flag is not set by default so remote camac calls silently do
not work, unless midas is compiled in a special way. I am too lazy to compile
midas differently depending on what hardware I use, so I split
register_cnaf_callback() into a separate file and made it easy to call directly
from the user front end.
I left the HAVE_CAMAC bits in mfe.c so people who use that would see no change.
Affected files:
Makefile (add cnaf_callback.o)
midas.h (add void register_cnaf_callback(int debug);
mfe.c (move the rpc code to cnaf_callback.c, call register_cnaf_callback())
cnaf_callback.c (new file)
K.O. |
01 Sep 2005, Stefan Ritt, Info, CAMAC register_cnaf_callback()
|
> Some time ago, the "remote CAMAC" functionality in mfe.c was made conditional on
> HAVE_CAMAC. This flag is not set by default so remote camac calls silently do
> not work, unless midas is compiled in a special way. I am too lazy to compile
> midas differently depending on what hardware I use, so I split
> register_cnaf_callback() into a separate file and made it easy to call directly
> from the user front end.
>
> I left the HAVE_CAMAC bits in mfe.c so people who use that would see no change.
>
> Affected files:
> Makefile (add cnaf_callback.o)
> midas.h (add void register_cnaf_callback(int debug);
> mfe.c (move the rpc code to cnaf_callback.c, call register_cnaf_callback())
> cnaf_callback.c (new file)
>
> K.O.
That's a good idea. The frontend framework should be independent of the used
hardware (CAMAC or VME or whatever). I event went further and removed the HAVE_CAMAC
completely. This means that people have to add the call to register_cnaf_callback()
explicitly into the frontend user init routine. I think this inconvenience is not a
big deal because even before that people had to add the cnaf_callback.c file
explicitely into their Makefile. So they have to be aware of that change, and then
it's not a big deal to modify the init routine as well. But this way we have mfe.c
completely independen of the DAQ hardware which is how it should be.
To make things a bit easier, I modified the midas\examples\experiment\fronted.c to
contain this call, so people should be guided by that. I also added cnaf_callback.c
to the Makefile of the example frontend. |
27 Feb 2008, Konstantin Olchanski, Info, CAMAC register_cnaf_callback() - removed from libmidas
|
> > Affected files:
> > Makefile (add cnaf_callback.o)
> That's a good idea.
> To make things a bit easier, I modified the midas\examples\experiment\fronted.c to
> contain this call, so people should be guided by that. I also added cnaf_callback.c
> to the Makefile of the example frontend.
A request was made to remove cnaf_callback.o from libmidas as it creates a unwanted dependency on the CAMAC
hardware driver when libmidas.so is used in programs that do not use CAMAC.
After looking around, it appears that removing cnaf_callback.o from libmidas would not break anything critical,
other than CAMAC frontends that would fail to link with an obvious and easy to fix error.
I am leaving cnaf_callback.o in the Makefile - so it will be built and placed in linux/lib/cnaf_callback.o for anybody
who wants to use it.
svn revision 4130.
K.O. |
|