10 Mar 2008, Stefan Ritt, Suggestion, New Makefile for building MIDAS
|
> I rewrote the Makefile for MIDAS in order to make it tidy. I tested it on my box
> and it works here.
> 1. The full file is seperated to several parts
> a. initialized setup
> b. environment setup
> c. specify OS-specific flags
> d. processing environment for building flags
> e. targets
> 2. The file is less than 400 lines now. The original one is more than 500 lines.
> 3. The modified one is easy for debuging.
>
> I tried to learn "autoconf" and "automake" in order to make building MIDAS more
> compatible for various platforms. But I havn't enough time now. Hope somebody
> can help it. The attached file is original named "Makefile.in" for using "autoconf".
The Makefile is missing -lzip:
[ritt@pc5082 ~/midas]$ make -f Makefile-by-EL
Making directory linux ...
Making directory linux/lib ...
Making directory linux/bin ...
g++ -g -O3 -Wall -Wuninitialized -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DOS_LINUX -fPIC
-Wno-unused-function -DHAVE_ZLIB -Iinclude -Idrivers -I../mxml -Llinux/lib -o
linux/bin/mlogger src/mlogger.c linux/lib/libmidas.a -lutil -lpthread
/tmp/cceHnAKe.o(.text+0x83c): In function `midas_flush_buffer(LOG_CHN*)':
src/mlogger.c:984: undefined reference to `gzwrite'
/tmp/cceHnAKe.o(.text+0xb24): In function `midas_log_open(LOG_CHN*, int)':
src/mlogger.c:1132: undefined reference to `gzopen'
/tmp/cceHnAKe.o(.text+0xb46):src/mlogger.c:1140: undefined reference to `gzsetparams'
/tmp/cceHnAKe.o(.text+0xe2a): In function `midas_log_close(LOG_CHN*, int)':
src/mlogger.c:1208: undefined reference to `gzflush'
/tmp/cceHnAKe.o(.text+0xe40):src/mlogger.c:1210: undefined reference to `gzclose'
collect2: ld returned 1 exit status
make: *** [linux/bin/mlogger] Error 1
[ritt@pc5082 ~/midas]$ |
10 Mar 2008, Stefan Ritt, Suggestion, RFC- ACLs for midas rpc, mserver, mhttpd access
|
> I was going to bring this up later, but since mhttpd does not pass security audits, I believe
> the only way it should be run in the modern computing environement is behind
> a password-protected SSL proxy.
I recently built in native SSL into elogd.c and found it was very simple. We could do the same for mhttpd.
> Speaking about CERN, "deny all; allow *.cern.ch" is the "default" setting, enforced by the CERN firewall. Our problem is with
> random "*.cern.ch" computers poking at our DAQ and crashing the mserver. Plus we do not want our competition to access our
> DAQ system, so "allow *.cern.ch" is a no go.
I understand your point. But I want to tell you that there are other experiments, which want domain based access. For example at
PSI some experiments want allowed access from the experimental hall, which is the subdomain 129.129.140.* (there is not so much
competition here ;-) but not from other PSI subdomains. So you would need "deny all; allow 129.129.140.*; allow 129.129.228.*" for
example.
> I will do the mserver/mrpc this way, then retrofit it into mhttpd. (But have to commit mlogger history changes first!!!).
Agree. |
10 Mar 2008, Stefan Ritt, Bug Report, array overflows and other bugs
|
There were some trivial and some non-trivial issues. Glad the compiled picked up on
this!
> I see loads of warnings during compile, most of which I know from earlier
> compiles:
> * warning: dereferencing type-punned pointer will break strict-aliasing rules
> * warning: pointer targets in passing argument 3 of 'getsockname' differ in
> signedness
I ignore these for the moment until I have a gcc 4.2 myself (we use Scientific
Linux 5 which has gcc 4.1 for the moment). As Randolph pointed out correctly you
can make gcc shut up by a proper flag there. The warnings have no influence on the
stability of midas.
> (1)=========================
> src/midas.c:7398: warning: array subscript is above array bounds
> Inspection of midas.c:
>
> if (i == MAX_DEFRAG_EVENTS) {
> /* no buffer available -> no first fragment received */
> 7398: free(defrag_buffer[i].pevent);
> memset(&defrag_buffer[i].event_id, 0, sizeof(EVENT_DEFRAG_BUFFER));
> cm_msg(MERROR, "bm_defragement_event",
> "Received fragment without first fragment (ID %d) Ser#:%d",
> pevent->event_id & 0x0FFF, pevent->serial_number);
> return;
> }
The free() was just wrong at that place, I removed it.
> (2)==========================
> src/midas.c:2958: warning: array subscript is above array bounds
>
> for (i = 0; i < 13; i++)
> 2958 if (trans_name[i].transition == transition)
> break;
Fixed that by
for (i=0 ;; i++)
if (trans_name[i].name[0] == 0 || trans_name[i].transition == transition)
break;
Since trans_name[i].name = "" indicates the end of the list.
> (3)=============================
> mfe.c:
> src/mfe.c:412: warning: array subscript is above array bounds
> src/mfe.c:311: warning: array subscript is above array bounds
> src/mfe.c:340: warning: array subscript is above array bounds
>
> 412: device_drv->dd(CMD_GET_DEMAND, device_drv->dd_info, i,
> &device_drv->mt_buffer->channel[i].array[CMD_GET_DEMAND]);
The code at 412 was wrong there, the demand value is queried later by the device
driver directly. For the other two occurences (311 and 340) I had to really
increase the array size by one. This issue can cause segfaults if you have a slow
control front-end which uses multithreading (not many people use it except me).
> (4)=========================
> src/lazylogger.c:1957: warning: array subscript is below array bounds
>
> if ((channel < 0) && (lazyinfo[channel].hKey != 0))
>
> That is lazyinfo[something below zero].
This has to be fixed by Pierre. I guess an or instead of an and would do it, but
I'm not 100% sure.
> (5)=============================
> More warnings an expert might want to have a look at:
>
> * warning: deprecated conversion from string constant to 'char*'
>
> * src/fal.c:106: warning: non-local variable '<anonymous struct> out_info'
> uses anonymous type
> * src/fal.c:3064: warning: non-local variable '<anonymous struct> eb' uses
> anonymous type
>
> I attach the full output of make.
> Could someone knowledgeable please have a look at these warnings and fix them?
Uahhh. Especially the "const char*" vs. "char*" is in principle right, but will
cause a major rework. Probably hundreds of occations have to be fixed. Many strings
must be declared const, others not. It will help the programmer to find some errors
during compile which would later show up only during runtime (like writing into a
fixed string), but I only will go through that when I have gcc 4.2 installed
myself, and have two free days to work on this ;-)
> They make me a bit nervous when thinking about data integrity, and
> there are now so many that they actually start to hide serious stuff
> like the ones I presented.
Except the slow control stuff (which only is an issue for multithreaded frontends)
none of the above things will have an influence on the data integrity. But I agree
that they should be fixed.
- Stefan |
11 Mar 2008, Stefan Ritt, Suggestion, "Makefile-by-EL" updated
|
The linking of mhttpd misses a "-lm":
cc -g -O3 -Wall -Wuninitialized -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DOS_LINUX
-fPIC -Wno-unused-function -DHAVE_ZLIB -Iinclude -Idrivers -I../mxml -o
linux/bin/mhttpd linux/lib/mhttpd.o linux/lib/mgd.o linux/lib/libmidas.a -lutil
-lpthread -lz
linux/lib/mhttpd.o(.text+0xe08f): In function `show_custom_gif':
src/mhttpd.c:5058: undefined reference to `log'
linux/lib/mhttpd.o(.text+0xe0a8):src/mhttpd.c:5058: undefined reference to `log'
The header of the makefile should contain a short description, the author(s), an
$Id:$ tag for SVN, some explanation what "icc", "ifort" means, a note about the
CFLAGS and a clear statement what can be modified by the user and why and what not. |
25 Mar 2008, Stefan Ritt, Info, Per-variable history implementation in the mlogger
|
Before approving the code, two conditions have to be fulfilled:
1) The code has to work on PSI experiments
2) The code must work without any SQL database
Concerning point 1), you correctly mentioned that the event numbering does not work
if there are more than 1000 variables per event. What I do not want is that there
will be a special T2K midas version and a special PSI version. This would make
maintenance horrible in the future. One could make the formula variable with id =
ev_id*n+var_n, where n is not fixed to 1000, but variable (stored in the ODB). The
down side would be that if you analyze your history files offline (outside the
experiment) you have to know a priori n in order to read back the data. If you have
990 variables, then you add 20, then you modify n from 1000 to 1500, then you would
screw up yourself since you cannot read the old data any more.
Taking all this into account, I see no clean way to fix this except to modify the
database format (which you change anyhow "somehow" going to per-variable mode). Use a
32-bit ID for the event (16-bit) and the variable (16-bit). This will increase the
overhead, but only marginally, since there is already a 32-bit time stamp. But this
method would then work for all experiments at all times. I suspect that even in T2K
you will come at some point to a configuration where you have move than n variables
per event, whatever n is. So even you would benefit.
Concernign ponit 2), I like your ODBC approach. I never used it, but if you tell me
it works on all supported OSes it's fine with me, but make sure it compiles under
Windows (with the help of Pierre). One thing I would make sure however is that it
runs by default without setting up a database. There are many experiments out there
which do not need a SQL database, and it would be a hassle for them all to set up a
database, just to continue running. So by default I would use either the current flat
file system, and then per configuration enable ODBC, with bindings to MySQL pgSQL and
maybe SQLite3.
Cheers,
Stefan |
02 Apr 2008, Stefan Ritt, Info, add "const" attributes to db_xxx() functions
|
> Now that we use more and more C++, lack of "const" attribute on most midas functions is causing some
> problems. I am now ready to commit changes to midas.h and odb.c that add the const attributes to ODB
> access functions db_xxx(), i.e.
> INT db_rename_key(HNDLE hDB, HNDLE hKey, char *name)
> becomes
> INT db_rename_key(HNDLE hDB, HNDLE hKey, const char *name)
>
> If we proceed with this conversion, and it does not cause major havoc, I can continue and "const"ify the
> rest of midas.h. I note that the mxml functions appear to already have the correct "const" declarations.
>
> P.S. Adding the "const" attribute caught a few places where we were modifying a "char*" string passed by
> the caller. This is undesirable if we are passed a string literal, i.e. db_rename_key(...,"foo"), and it is a
> complete disaster in conjunction with C++ strings, i.e. db_rename_key(...,foo.c_str())
I fully approve your idea. You are absolutely right that it also will help to prevent errors such as modifying
fixed strings. I was just too lazy to do that, because it requires some additional code like:
func(const char *p)
{
char str[256];
strlcpy(str, p, sizeof(str));
strlcat(str, ...)
}
So if you do it, it's great! |
16 Jun 2008, Stefan Ritt, Bug Fix, "Missing event" problem fixed in front-end framework
|
Since the very beginning midas had the problem that the last event of a run was
sometimes missing in the data. While for most experiments this is not an issue,
it starts to hurt on experiments using event building (front-end 1 and front-end
2 in the example below). A missing event can screw up the event builder on the
next begin of run, where the "missing event" would show up as the first event of
the new run, triggering an event mismatch error in the event builder.
After some analysis, we identified the problem as follows. Assume FE1 controls
the trigger, while FE2 generates the second event fragment.
1) Stop is requested to FE1
2) tr_stop gets called on FE1
3) tr_stop calls end_of_run() use code
4) end_of_run() disables the trigger
5) FE1 finishes stop transition
6) Stop is requested to FE2
7) FE2 finishes stop transition
What can now happen is the following: An additional event occurs between 2) and
4). This event triggers ADCs and TDCs, and is then stored in the front-end
hardware. FE2 sees this event, since it has not yet done the stop transition,
and reads it out. FE1 is however already in the end_of_run() routine, and simply
disables the trigger, without reading this last event, and thus causing the
event mismatch at the beginning of the next run.
To fix the problem, the framework in mfe.c was changed:
1) Stop is requested to FE1
2) tr_stop gets called on FE1
3) tr_stop calls end_of_run() use code
4) end_of_run() disables the trigger
4b) tr_stop calls check_polled_events()
5) FE1 finishes stop transition
6) Stop is requested to FE2
7) FE2 finishes stop transition
The new routine check_polled_events checks if there is any more event in the
hardware by calling the user polling routine. If there is one more event, calls
the user readout routine and sends it to the back-end before concluding the run
transition.
This modification solved our problem at the MEG experiment at PSI, but it might
be good that all experiments using event building update midas to revision 4225.
I do not expect any bad side effect, but one never knows. So if there are new
problems caused by this modification, please report.
- Stefan |
16 Jun 2008, Stefan Ritt, Suggestion, mlogger is flooding the message queue
|
> > The current versions of mlogger SVN 4215 is flooding our message system with
> > stuff like
> >
> > > Tue Jun 10 16:42:01 2008 [Logger,INFO] Configured history with 22 events
> > > Tue Jun 10 16:42:14 2008 [Logger,INFO] Configured history with 22 events
> > > Tue Jun 10 16:42:26 2008 [Logger,INFO] Configured history with 22 events
>
> Problem confirmed on the M11 DAQ system at TRIUMF. We definitely do nothing funny
> there, so what is going on? Will investigate.
The only place I see where this could happen is in mlogger.c, lines 3064ff:
/* check if event size has changed */
db_get_record_size(hDB, hKey, 0, &size);
if (size != hist_log[i].buffer_size) {
close_history();
open_history();
return;
}
The record size corresponds to /Equipment/<name>/Variables. If this array changes in
size, it will trigger the re-definition of the history. So please have a look there
and check why the record size changes. |
17 Jun 2008, Stefan Ritt, Info, Improvement of custom pages
|
Some improvement of custom pages have been implemented. The idea behind is that
a custom page would contain a large background image containing indicators but
also controls. While indicators (values, bars) are already available, the field
of controls have been improved.
Edit boxes floating on top of a graphic
---------------------------------------
The first option has been there from the beginning, but was never documented. It
makes it possible to put an edit box right on top of a graphic by means of a CSS
style tag. The custom page code could look like this:
<div style="position:absolute; top:100px; left:50px;">
<odb src="/Runinfo/run number" edit=1>
</div>
<img src="cusgom.gif">
The "div" tag surrounding the "odb" tag places this directly on top of the
"custom.gif" image, where it can be clicked to be edited.
Password protection of an edit box
----------------------------------
Being able to control an experiment through a web interface of course rises the
question about safety. This is not so much about external access (for which we
have other protection schemes like host lists etc.) but it's about accidental
access by the normal shift crew. If a single click on a web page opens a
critical valve, this might be a problem. In order to restrict access to some
"experts", an additional password can be chosen for all or some controls on a
custom page. This is done by a new option in the "odb" tag and by adding a small
JavaScript function into the custom page:
<script type="text/javascript">
<!--
function promptpwd(path)
{
pwd = prompt('Please enter password', '');
document.cookie = "cpwd=" + pwd;
location.href = path;
}
//-->
</script>
...
<odb src="/..." edit=1 pwd="CustomPwd">
...
If the "pwd" option is present in the "odb" tag, mhttpd establishes a call to
the promptpwd() function if one click on the value. The password is then asked
from the user and submitted as a cookie. mhttpd then check this password against
the ODB entry
/Custom/Pwd/CustomPwd
and shows an error if they don't match. By using an explicit name ("CustomPwd"
in the above example) one can use a single password for all controls on a page,
or one could use several passwords on the same page. Like a shift crew password
for the less severe controls (/Custom/Pwd/ShiftPwd), and an "expert" password
(/Custom/Pwd/ExpertPwd) for the critical things. This password is of course not
secure in the sense that it's placed in plain text into the ODB, it's more to
prevent accidental modifications of things.
Area map to toggle values
-------------------------
Sometimes it's desirable to toggle a value, like the state of a valve. This can
be done now with a new function like this:
<map name="Custom1">
<area shape="rect" coords="40,200,100,300" alt="Main Valve"
href="Custom1?cmd=Toggle&odb=/Equipment/Environment/Variables/Output[2]">
</map>
<img src="cusgom.gif" usemap="#Custom1">
This defines a clickable map on top of the custom image. The area(s) should
match with some areas on the image like the box of a valve. By clicking on it,
the supplied path to the ODB is used (in this case
"/Equpiment/Environment/Variables[2]") and it's value is toggled (set to 0 if it
is 1, set to 0 if it is 1). If the valve value is then used in the image via a
"fill" statement to change the color of the valve, it can turn green or red
depending on it's state.
Are map with password check
---------------------------
The above area map can be combined with the password check. To do so, one needs:
<area shape="rect" coords="40,200,100,300" alt="Main Valve"
href="#"
onClick="promptpwd('Custom1?cmd=toggle&pnam=CustomPwd?odb=/Equipment/Environment/Variables/Output[2]')">
in combination with the JavaScript from above.
An example of the are map technology is shown in the attachment. This page from the MEG experiment at PSI
shows a complex gas system. The valves are represented as green circles. If they are clicked, they close
and become red (after the user successfully supplied the correct password). |
04 Jul 2008, Stefan Ritt, Info, Improved alarm conditions implemented
|
I implemented improved alarm conditions in the alarm system. Now one can write
conditions like
/Equipment/HV/Variables/Input[*] < 100
or
/Equipment/HV/Variables/Input[2-3] < 100
to check all values from an array or a certain range. If one array element
fulfills the alarm condition, the alarm is trigger. In addition, bit-wise alarm
conditions are possible
/Equipment/Environment/Variables/Input[0] & 8
is triggered if bit #2 is set in Input[0].
The changes are committed to SVN revision 4242. |
16 Jul 2008, Stefan Ritt, Info, Implementation of db_set_link_data() and db_set_link_data_index()
|
The current implementation of ODB links has the problem that once a link is
created, it cannot be changed any more through odbedit. This is because each
"set" command works on the destination of the link instead of the link. The same
happens when one loads a *.odb file. To overcome this problem, two new functions
db_set_link_data() and db_set_link_data_index() have been implemented. They
resemble their counterparts db_set_data() and db_set_data_index(), but they can
be used to directly modify a link instead of the link target. I use these
functions now in odbedit and db_paste() so that the above described problems are
fixed now. I do not expect any side effect of this, but if people experience
problems with db_paste(), please let me know. |
31 Jul 2008, Stefan Ritt, Info, Improvement of custom pages
|
Even more improvements have been implemented into custom pages recently, containing a complete JavaScript library for ODB communication. This JavaScript library relies on certain new commands built into mhttpd, and is therefore hardcoded into mhttpd. It can be seen by entering
http://<your mhttpd host>/mhttpd.js
To include it in your custom page, put following statement inside the <head>...</head> tag:
<script type="text/javascript" src="../mhttpd.js"></script>
It contains several functions:
Display of cursor location
When writing custom pages with large background images and labels placed on that image, it is hard to figure out X and Y coordinates of the labels. This can now be simplified by adding a new tag to the background image like
<img id="refimg" src="...">
If the "refimg" tag is present, the cursor changes into a crosshair and it's absolute and relative locations in respect to the reference image are shown in the status bar:

To make this work under Firefox, the user has to explicitly allow for status bar changes. To do so, enter about:config in the address bar. In the filter bar, enter status. Then locate dom.disable_window_status_change and set it to false.
Retrieving ODB values
Retrieving individual or array values from the ODB through the AJAX interface is now very simple. Just call:
ODBGet(<path>);
to obtain a value. If <path> points to an array in the ODB, an individual value can be retrieved by using an index, like
ODBGet('/Equipment/Environment/Variables/Input[3]');
or the complete array can be obtained with
ODBGet('/Equipment/Environment/Variables/Input[*]');
The function then returns a JavaScript array which can be used like
var a = ODBGet('/Equipment/Environment/Variables/Input[*]');
for (i=0 ; i<a.length ; i++)
alert(a[i]);
This functionality together with the window.setInterval() function can be used to update parts of the web page periodically such as:
window.setInterval("Refresh()", 10000);
function Refresh() {
document.getElementById("run_number").innerHTML = ODBGet('/Runinfo/Run number');
}
This function updates the current run number every 10 seconds in the background. The custom page has to contain an element with id="run_number", such as
<td id="run_number"></td>
The formatting of any number uses the internal default. If this should be changed, the format can directly appended in the ODB path such as:
ODBGet('/Equipment/Environment/Variables/Input[3]&format=%1.2lf');
the format %1.2lf is then directly passed to the sprintf() function.
Retrieving System Messages
A similar function ODBGetMsg(<n>) has been defined. It retrieves the last <n> system messages, which can then be displayed in some message area. If n=1 a single string is returned, if n>1 an array of strings is returned similar to ODB arrays.
Setting ODB values
Individual ODB values can be set in the background with
ODBSet(<path>,<value>);
or
ODBSet(<path>,<value>,<password_name>);
The password_name has the same meaning as described in elog:492. It must be defined under /Custom/Pwd/<password_name>. The function ODBSet can be used for example when one clicks on an checkbox for example:
<input type="checkbox" onClick="ODBSet('/Logger/Write data',this.checked?'1':'0')">
If used as above, the state of the checkbox must be initialized when the page is loaded. This can be either done with some JavaScript code called on initialization, which then uses ODBGet() as described above. Alternatively, the <odb> tag can be used like:
<odb src="/Logger/Write data" type="checkbox" edit="2" onclick="ODBSet('/Logger/Write data',this.checked?'1':'0')">
The special code edit="2" instructs mhttpd not to put any JavaScript code into the checkbox tag, since setting this value in the ODB is now handled by the user-supplied ODBSet() code. With edit="1" the internal JavaScript is activated, which uses the old form submission for sending the value to the ODB. |
17 Sep 2008, Stefan Ritt, Info, New flag for auto restart
|
A new ODB flag has been introduced. When the logger is configured for automatic
stop and restart (/Logger/Auto restart = y), the restart delay was hard-wired
to 20 sec., which might be too long or short for some experiments. Therefore a
new parameter "/Logger/Auto restart delay" has been introduced which can be
used to accommodate different delays. A non-zero delay is necessary for
experiments where some lengthy activities occur during the stop of a run, like
an analyzer writing many histograms to disk. |
18 Sep 2008, Stefan Ritt, Info, Potential problems in multi-threaded slow control front-end
|
We had recently some problems at our experiment which I would like to share
with the community. This affects however only experiments which have a slow
control front-end in multi-threaded mode.
The problem is related with the fact that the midas API is not thread safe, so
a device driver or bus driver from the slow control system may not call any ODB
function. We found several drivers (mainly psi_separator.c, psi_beamline.c etc)
which use inside read/write function the midas PAI function cm_msg() to report
any error. While this is ok for the init section (which is executed in the main
frontend thread) this is not ok for the read/write function inside the driver.
If this is done anyhow, it can happen that the main thread locks the ODB (via
db_lock_database()) and the thread interrupts that call and locks the ODB
again. In rare cases this can cause a stale lock on the ODB. This blocks all
other programs to access the ODB and the experiment will die loudly. It is hard
to identify, since error messages cannot be produced any more, and remote
programs (not affected by the lock) just show a rpc timeout.
I fixed all drivers now in our experiment which solved the problem for us, but
I urge other people to double check their device drivers as well.
In case of problems, there is a thread ID check in
db_lock_database()/db_unlock_database() which can be activated by supplying
-DCHECK_THREAD_ID
in the compile command line. If then these functions are called from different
threads, the program aborts with an assertion failure, which can then be
debugged.
There is also a stack history system implemented with new functions
ss_stack_xxxx. Using this system, one can check which functions called
db_lock_database() *before* an error occurs. Using this system, I identified
the malicious drivers. Maybe this system can also be used in other error
debugging scenarios. |
19 Sep 2008, Stefan Ritt, Info, Lazylogger logging changed
|
I modified the logging behavior of lazylogger. Originally, it was writing
messages (run copied, removed, ...) both into midas.log and
lazy_log_update.log. Since we have many files, it kind of clutters up the
logging files. I think it is a good idea to have a separate file (which I
changed not to "lazy.log" instead of "lazy_log_update.log" which I guess was a
bug), so I put the logging into the main file under a conditional compile:
#ifdef WRITE_MIDAS_LOG
cm_msg(MINFO, "lazy_log_update", str);
#endif
so it can be turned on again by adding -DWRITE_MIDAS_LOG to the compile line.
If other experiments have different needs, one could make the logging behavior
controllable through the ODB. In that case, I would suggest a single parameter
"Logging file" which can be either "midas.log" for the normal logging or
"lazy.log" for logging into the extra file. I guess having the messages twice
on the system is not needed by any experiment.
- Stefan |
11 Oct 2008, Stefan Ritt, Bug Report, mhttpd "messages" broken
|
> mhttpd "messages" page stopped working after svn revision 4327 because of uninitialized variable
> "filename2" in midas.c:cm_message_retrieve(). Attached patch fixes the problem for me.
> K.O.
>
>
> --- src/midas.c (revision 4342)
> +++ src/midas.c (working copy)
> @@ -978,6 +978,8 @@
> size = sizeof(filename);
> db_get_value(hDB, 0, "/Logger/Message file", filename, &size, TID_STRING, TRUE);
>
> + strlcpy(filename2, filename, sizeof(filename2));
> +
> if (strchr(filename, '%')) {
> /* replace strings such as midas_%y%m%d.mid with current date */
> tzset();
Ups, was my fault, sorry. I committed your change. |
13 Oct 2008, Stefan Ritt, Info, mhttpd multi-experiment support removed
|
Previously, one mhttpd server could sever several experiments at the same time.
This caused however sometimes problems and was hard to maintain. Starting from
SVN revision 4348, I removed the multi-experiment support, which I believe is
now a much cleaner implementation. So if several experiments are defined on a
computer, each one need a separate mhttpd process listening on a different
port. The experiment name can now be supplied on the command line to mhttpd
like for any other midas program. I have tested this so far at two experiments
at PSI, but this does not cover all possibilities. What I did not try was
experiments with web passwords and odb passwords. If there is any problem after
upgrading to 4348, please report. |
18 Oct 2008, Stefan Ritt, Info, mlogger async transitions, etc
|
> I suspect mlogger uses ASYNC transactions exactly to avoid
> this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
> beginning of time).
That's exactly the case. If you would have asked me, I would have told you
immediately, but it is also good that you re-confirmed the deadlock behavior with
the SYNC flag. I didn't check this for the last ten years or so.
Making the buffers bigger is only a partial solution. Assume that the disk gets
slow for some reason, then any buffer will fill up and you get the dead lock.
The only real solution is to put the logic into a separate thread. So the thread
does all the RPC communication with the clients, while the main logger thread logs
data as usual in parallel. The problem is that the RPC layer is not yet completely
tested to be thread safe. I put some mutex and you correctly realized that these
are system wide, but you want a local mutex just for the logger process. You need
also some basic communication between the "run stop thread" and the "logger main
thread". Maybe Pierre remembers that once there was the problem that the logger did
not know when all events "came down the pipe" and could close the file. He added
some delay which helped most of the time. But if we would have some communication
from the "run stop thread" telling the main thread that all programs except the
logger have stopped the run, then the logger only has to empty the local system
buffer and knows 100% that everything is done.
In the MEG experiment we have the same problem. We need a certain sequence
(basically because we have 9 front-ends and one event builder, which has to be
called after the front-ends). We realized quickly that the logger cannot stop the
run, so we wrote a little tool "RunSubmit", which is a run sequence with scripting
facility. So you write a XML file, telling RunSubmit to start 10 runs, each with
5000 events. RunSubmit now watches the run statistics and stops the run. Since it's
outside the logger process, there is no dead lock. Unfortunately RunSubmit was
written by one of our students and contains some MEG specific code. Otherwise it
could be committed to the distribution.
So I feel that a separate thread for run stop (and maybe even start) would be a
good thing, but I'm not sure when I will have time to address this issue.
- Stefan |
22 Oct 2008, Stefan Ritt, Forum, Mixed CAMAC/VME frontend, SIS3100
|
> Dear MIDAS-addicts,
>
> I would like to hear your opinion on this:
> We've until now used CAMAC with Hytec 1331 controllers. We're using Yale FADCs
> whose readout takes ages in CAMAC (2048 samples take 2 milliseconds to be
> read). We've got 20+ FADC channels (we usually read only 2-3)
>
> Now we've had the brilliant idea to replace the Yale FADCs with some VME
> digitizer and we now plan to buy a Struck SIS 1100/3100 PCI-VME controller,
> plus 4 pc. CAEN 1720 8ch 12bit, 250MHz WFD.
>
> (1) Can anybody comment on this choice? Good experiences/problems?
>
> We are still using the CAMAC stuff for all other modules (TDCs, ADCs,
> scalers). So my plan is to have ONE frontend who reads both the CAMAC modules
> and the VME modules.
>
> (2) Is it possible to build and run a dual-controller frontend for both CAMAC
> and VME? Does anybody have experience with that? Or is it a stupid idea?
>
> I'd appreciate any hints.
>
> [Edit: We're using Linux]
>
> Thanks a lot,
>
> Randolf
Dear Randolf,
I used some time ago several HYTEC 1331 controllers together with the Struck
SIS3100. Since the HYTEC is IO-mapped and the SIS3100 is memory mapped, there was
no problem in running them in parallel. Note however that there will soon be an
improved version of the SIS3100 with improved speed, and also CAEN plans a WFD
with 32 channels, 6 GSPS, 12 bit, using the DRS chip for the next year. I don't
know if you need that, but just that you know.
Best regards,
Stefan |
28 Oct 2008, Stefan Ritt, Bug Report, Inconsistent handling of odb and evet buffer timeouts
|
> In midas.c there are several places where client last activity time stamps are checked against the
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps.
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
>
> Most checks are done like this:
>
> now = ss_millitime();
> if (client->watchdog > 0 <----- check that the watchdog is enabled
> && now > client->last_activity <---- check for crazy time stamps from the future
> && now - client->last_activity > client->watchdog_timeout) <--- normal timeout
> remove_client(client);
>
> But in a few places, the extra checks are missing:
>
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
> remove_client(client);
>
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
>
> K.O.
This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a
breakpoint and sit there for a while, you might be declared dead and the application is removed from the
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a
"cleanup" inside odbedit will even remove clients having the timeout set to zero.
Now there might be more clever ways to accomplish that, but that's how it is implemented right now. |
|