10 May 2013, Konstantin Olchanski, Bug Fix, Fixed: crash if alarm "write elog message" is enabled
|
If the MIDAS Alarm property "write elog message" is enabled, an uninitialized variable "tag" is passed to
el_submit() and depending on your luck, cause a crash. "tag" is supposed to be and is now a NUL-
terminated string. The only other use of el_submit() is in mhttpd.cxx and mserver.c, where it is called
correctly.
alarm.c svn rev 5361
K.O. |
15 Jan 2014, Konstantin Olchanski, Bug Fix, Fixed spurious symlinks to midas.log
|
In some experiments (i.e. DEAP), we see spurious symlinks to midas.log scattered just about everywhere. I
now traced this to an uninitialized variable in cm_msg_log() and it should be fixed now. K.O. |
30 Oct 2003, Stefan Ritt, , Fixed several potential problems for ODB corruption
|
I just realized that db_set_value, db_set_data, db_set_num_values and
db_merge_data do not check for num_values == 0. With such a parameter the
ODB can become corrupted, since zero length ODB entries are not allowed. I
fixed the according places in odb.c and committed the changes. Everyone
with ODB corruption problems should update that code. |
18 May 2006, Stefan Ritt, Bug Fix, Fixed problems with reload of custom pages
|
We had a problem with custom pages and reloading of them. If they contain an ODB field which is editable, one can change the ODB value through the custom page. The URL then contains a "?cmd=Set&value=x&index=x" section, which stays in the browser's address bar after the ODB value has been updated. If the value changes later by some other means in the ODB, and one presses "reload" in the browser, the above URL gets executed again and the value gets changed back which is not wanted.
The problem has been fixed such that mhttpd redirects the browser after setting a variable to the URL not containing the "Set" command from above. |
07 May 2009, Konstantin Olchanski, Bug Fix, Fixed mlogger run start and stop
|
Fixed problems with mlogger starting and stopping runs.
Basic difficulty was with the mlogger using ASYNC transitions, which did not implement proper
transition sequencing according to transition sequence numbers. Basically all clients were called at the
same time, regardless of how long they took to process the transitions.
Switching from ASYNC to SYNC transitions introduces a deadlock between mlogger (not reading data
from SYSTEM buffer while inside cm_transition) and any program trying to write into the SYSTEM buffer
(buffer is full, does not listen for transition requests while waiting for mlogger which tries to call it's
transition handler).
Then we invented the mtransition helper program. In the original implemtation for t2k it was spawned
directly from the mlogger to stop the run (avoiding the deadlock). Then cm_transition(DETACHED) was
introduced, but the mlogger start/stop/restart run logic became broken. One problem was with when
auto restart delay is zero, mtransition tries to restart the run before previous run is stopped (instead,
mlogger should restart the run from it's tr_stop() handler). Another problem was with the auto restart
delay counting from the time when we start stopping the run - because stopping the run can take an
unpredictable time, depending on when various frontends have to do - it is impossible to have a
predictable delay between runs (again this is fixed by restarting the run from mlogger.c::tr_stop()).
All this has been straightened out by svn revision 4484. Basically the old run stop/restart logic was
restored in mlogger.c, using cm_transition(DETACH) to avoid the deadlocks.
To remind all, these are the present controls for transitions initiated by mlogger:
/experiment/transition debug flag - set to "2" to capture transition sequences into midas.log
/experiment/transition timeout and transition connect timeout - one can change default timeouts as
needed to accommodate non cooperative frontends.
/logger/async transitions - do not use mtransition - do ASYNC transitions, as before.
/logger/auto restart delay - delay between stopping the run (mlogger.c::tr_stop) and starting the next
run.
svn rev 4484
K.O. |
02 Nov 2017, Konstantin Olchanski, Bug Fix, Fixed mlogger memory corruption, updated mxml
|
I the agdaq system I see memory corruption in the mlogger. There were at least two bugs: one
memory allocation error in mxml and one incorrect memset() in mlogger.cxx. The mxml bug is fixed
in the mxml repository, mlogger.cxx bug is fixed in the midas-2017-10 branch.
I suggest that all update mxml to the latest version: (without waiting for the new midas release)
https://bitbucket.org/tmidas/mxml/commits/branch/master
K.O. |
27 Nov 2008, Konstantin Olchanski, Info, Fixed mlogger crash, was Per-variable history implementation in the mlogger
|
> revision 4142+4143 are minor fixes, refactoring (switch the code to use helper
> functions) and implementation of history for structured banks
The implementation of "history for structured banks" had a bug - tags inside
structured banks were counted incorrectly, leading to memory overwrites and mlogger
crash in open_history().
This is problem is now fixed (plus added assert() checks to crash-out if overwrite of
tags[] array is detected).
svn revision 4398.
K.O. |
25 May 2006, Stefan Ritt, Bug Fix, Fixed compiler warnings with gcc 3.4.4
|
I fixed a couple of compiler warning which came up with the new gcc 3.4.4. Seems like the compiler gets more and more picky. There a still warning left in ybos.c and in mcnaf.c, which I leave to the original author |
25 May 2006, Pierre-Andre Amaudruz, Bug Fix, Fixed compiler warnings with gcc 3.4.4
|
Stefan Ritt wrote: | I fixed a couple of compiler warning which came up with the new gcc 3.4.4. Seems like the compiler gets more and more picky. There a still warning left in ybos.c and in mcnaf.c, which I leave to the original author |
Pierre-A. Amaudruz wrote: | >ybos.c, cnaf_callback.c, mcnaf.c, mana.c have been corrected too. |
|
23 Sep 2010, Konstantin Olchanski, Info, Fixed ODB corruption by javascript ODBGet(nonexistant)
|
Prior to odb.c rev 4829 and mhttpd.c rev 4830 committed a few minutes ago, HTML javascript
ODBGet("/non_existant_odb_entry") caused ODB corruption requiring ODB reload from backup file.
It turns out that ODBGet() tries to create ODB entries if they do not already exist, but because ODBGet() was
called without the "type", "length", etc arguments, the mhttpd "jset" command was issued with "type" set to
zero. This resulted in a db_create_key() call with "type" set to zero which created an invalid ODB entry.
odb.c rev 4829 adds a check for "type<=0" (check for "type>=TID_LAST" was already there).
In addition, mhttpd.c rev 4830 adds a "jset" check for type==0.
K.O. |
31 Jul 2006, Konstantin Olchanski, Bug Fix, Fix user memory corruption in ODB
|
We have been seeing consistent user memory corruption while setting up a new
experiment. This has been traced to a user memory overwrite in ODB db_set_data()
function and this problem is now fixed. This error was triggered by our frontend
code constantly changing the size of a MIDAS data bank that was also written
into ODB via the RO_ODB option. K.O. |
27 Nov 2008, Konstantin Olchanski, Bug Fix, Fix ss_file_size() on 32-bit Linux
|
It turns out that on 32-bit Linux, ss_file_size() returns the wrong answer for
files bigger than 2 GB (4GB?). The Linux stat() system call returns an error
(which is ignored) and bogus file size data (returned to the caller).
On 64-bit Linux (compiled with -m64), stat() appears to return correct data.
Related functions ss_disk_size() and ss_disk_free() return correct answers on
both 32-bit and 64-bit Linux (biggest disk I tried was 5.5 TB).
I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
the system call returns an error. I also added a test program
utils/test_ss_file_size.c.
svn revision 4397.
K.O. |
01 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux
|
> I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.
The test program gave under 64-bit SL5:
For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'
Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program. |
02 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux
|
> I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler
complains that stat64 is undefined. Since many Makefiles for front-ends out there do
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used
if that flag is defined:
#ifdef _LARGEFILE64_SOURE
struct stat64 stat_buf;
int status;
/* allocate buffer with file size */
status = stat64(path, &stat_buf);
if (status != 0)
return -1;
return (double) stat_buf.st_size;
#else
... |
02 Dec 2008, Konstantin Olchanski, Bug Fix, Fix ss_file_size() on 32-bit Linux
|
> > I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
> That does not work if _LARGEFILE64_SOURCE is not defined.
> #ifdef _LARGEFILE64_SOURE
> struct stat64 stat_buf;
This does not work (observe the typoe in the #ifdef). But you cannot know this because
you already deleted the test program I wrote and committed to svn exactly to detect and
prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users
some way to check that ss_file_size() works on their systems).
K.O. |
02 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux
|
K.O. wrote: | This does not work (observe the typoe in the #ifdef). |
Sorry for that, I fixed and committed it.
K.O. wrote: | But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems).. |
Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.
If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit. |
10 May 2007, Konstantin Olchanski, Bug Fix, Fix error reporting from cm_transition()
|
For some time now, error reporting from cm_transition() was broken.
Typical symptom was when starting a run from mhttpd, when a transition error occurred, the run does not
start (good) but the user is presented with a message "Success" in big letters (confusing the user).
Part of the problem was caused by user-written frontends that return an empty error string. Code in
cm_transition() now detects this and shows the numeric value of the error status returned by the frontend.
This is fixed in revision 3681.
The error string "Success" is now returned only when cm_transition() was successful, and other error
reporting inside this function was cleaned up.
K.O. |
03 Jun 2009, Konstantin Olchanski, Bug Fix, Fix db_open_record() error return
|
The odb hot-link function db_open_record() did not return an error when the system limit for hotlinks is
exceeded and no more hot links could be added (silent failure). This is now fixed.
odb.c svn rev 4500
K.O. |
07 Aug 2006, Konstantin Olchanski, Bug Fix, Fix crash in mfe.c
|
Some time ago, I accidentally introduced a bug in mfe.c- if there is data
congestion in the system, mfe.c can exit with the error "bm_flush_cache(ASYNC)
error 209" because it did not expect the valid return value BM_ASYNC_RETURN
(209) from bm_flush_cache(ASYNC). This error has now been fixed. K.O. |
04 Sep 2006, Konstantin Olchanski, Bug Fix, Fix MIDAS on MacOS 10.4.7
|
I commited minor fixes for building MIDAS on MacOS 10.4.7:
1) there is no linux/unistd.h
2) gcc 4.0.0 does not like "struct { ... } var;" although "struct Foo { ... } var;" is fine
3) there is no "_syscall0(...)" macro
4) there is no "gettid()", I used pthread_self() instead.
K.O.
P.S. ss_gettid() returns "int" instead of "midas_thread_t" (pthread_t, really). On MacOS 10.4.7 at least,
pthread_t appears to be a pointer, not an int. Is that right? |