Back Midas Rome Roody Rootana
  Midas DAQ System, Page 117 of 136  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  948   15 Jan 2014 Konstantin OlchanskiBug FixFixed spurious symlinks to midas.log
In some experiments (i.e. DEAP), we see spurious symlinks to midas.log scattered just about everywhere. I 
now traced this to an uninitialized variable in cm_msg_log() and it should be fixed now. K.O.
  118   30 Oct 2003 Stefan Ritt Fixed several potential problems for ODB corruption
I just realized that db_set_value, db_set_data, db_set_num_values and 
db_merge_data do not check for num_values == 0. With such a parameter the 
ODB can become corrupted, since zero length ODB entries are not allowed. I 
fixed the according places in odb.c and committed the changes. Everyone 
with ODB corruption problems should update that code.
  257   18 May 2006 Stefan RittBug FixFixed problems with reload of custom pages
We had a problem with custom pages and reloading of them. If they contain an ODB field which is editable, one can change the ODB value through the custom page. The URL then contains a "?cmd=Set&value=x&index=x" section, which stays in the browser's address bar after the ODB value has been updated. If the value changes later by some other means in the ODB, and one presses "reload" in the browser, the above URL gets executed again and the value gets changed back which is not wanted.

The problem has been fixed such that mhttpd redirects the browser after setting a variable to the URL not containing the "Set" command from above.
  573   07 May 2009 Konstantin OlchanskiBug FixFixed mlogger run start and stop
Fixed problems with mlogger starting and stopping runs.

Basic difficulty was with the mlogger using ASYNC transitions, which did not implement proper 
transition sequencing according to transition sequence numbers. Basically all clients were called at the 
same time, regardless of how long they took to process the transitions.

Switching from ASYNC to SYNC transitions introduces a deadlock between mlogger (not reading data 
from SYSTEM buffer while inside cm_transition) and any program trying to write into the SYSTEM buffer 
(buffer is full, does not listen for transition requests while waiting for mlogger which tries to call it's 
transition handler).

Then we invented the mtransition helper program. In the original implemtation for t2k it was spawned 
directly from the mlogger to stop the run (avoiding the deadlock). Then cm_transition(DETACHED) was 
introduced, but the mlogger start/stop/restart run logic became broken. One problem was with when 
auto restart delay is zero, mtransition tries to restart the run before previous run is stopped (instead, 
mlogger should restart the run from it's tr_stop() handler). Another problem was with the auto restart 
delay counting from the time when we start stopping the run - because stopping the run can take an 
unpredictable time, depending on when various frontends have to do - it is impossible to have a 
predictable delay between runs (again this is fixed by restarting the run from mlogger.c::tr_stop()).

All this has been straightened out by svn revision 4484. Basically the old run stop/restart logic was 
restored in mlogger.c, using cm_transition(DETACH) to avoid the deadlocks.

To remind all, these are the present controls for transitions initiated by mlogger:

/experiment/transition debug flag - set to "2" to capture transition sequences into midas.log
/experiment/transition timeout and transition connect timeout - one can change default timeouts as 
needed to accommodate non cooperative frontends.
/logger/async transitions - do not use mtransition - do ASYNC transitions, as before.
/logger/auto restart delay - delay between stopping the run (mlogger.c::tr_stop) and starting the next 
run.

svn rev 4484
K.O.
  1319   02 Nov 2017 Konstantin OlchanskiBug FixFixed mlogger memory corruption, updated mxml
I the agdaq system I see memory corruption in the mlogger. There were at least two bugs: one 
memory allocation error in mxml and one incorrect memset() in mlogger.cxx. The mxml bug is fixed 
in the mxml repository, mlogger.cxx bug is fixed in the midas-2017-10 branch.

I suggest that all update mxml to the latest version: (without waiting for the new midas release)
https://bitbucket.org/tmidas/mxml/commits/branch/master

K.O.
  535   27 Nov 2008 Konstantin OlchanskiInfoFixed mlogger crash, was Per-variable history implementation in the mlogger
> revision 4142+4143 are minor fixes, refactoring (switch the code to use helper
> functions) and implementation of history for structured banks

The implementation of "history for structured banks" had a bug - tags inside
structured banks were counted incorrectly, leading to memory overwrites and mlogger
crash in open_history().

This is problem is now fixed (plus added assert() checks to crash-out if overwrite of
tags[] array is detected).

svn revision 4398.
K.O.
  259   25 May 2006 Stefan RittBug FixFixed compiler warnings with gcc 3.4.4
I fixed a couple of compiler warning which came up with the new gcc 3.4.4. Seems like the compiler gets more and more picky. There a still warning left in ybos.c and in mcnaf.c, which I leave to the original author Wink
  260   25 May 2006 Pierre-Andre AmaudruzBug FixFixed compiler warnings with gcc 3.4.4

Stefan Ritt wrote:
I fixed a couple of compiler warning which came up with the new gcc 3.4.4. Seems like the compiler gets more and more picky. There a still warning left in ybos.c and in mcnaf.c, which I leave to the original author Wink



Pierre-A. Amaudruz wrote:
>ybos.c, cnaf_callback.c, mcnaf.c, mana.c have been corrected too.
  722   23 Sep 2010 Konstantin OlchanskiInfoFixed ODB corruption by javascript ODBGet(nonexistant)
Prior to odb.c rev 4829 and mhttpd.c rev 4830 committed a few minutes ago, HTML javascript 
ODBGet("/non_existant_odb_entry") caused ODB corruption requiring ODB reload from backup file.

It turns out that ODBGet() tries to create ODB entries if they do not already exist, but because ODBGet() was 
called without the "type", "length", etc arguments, the mhttpd "jset" command was issued with "type" set to 
zero. This resulted in a db_create_key() call with "type" set to zero which created an invalid ODB entry. 
odb.c rev 4829 adds a check for "type<=0" (check for "type>=TID_LAST" was already there).

In addition, mhttpd.c rev 4830 adds a "jset" check for type==0.
K.O.
  282   31 Jul 2006 Konstantin OlchanskiBug FixFix user memory corruption in ODB
We have been seeing consistent user memory corruption while setting up a new
experiment. This has been traced to a user memory overwrite in ODB db_set_data()
function and this problem is now fixed. This error was triggered by our frontend
code constantly changing the size of a MIDAS data bank that was also written
into ODB via the RO_ODB option. K.O.
  532   27 Nov 2008 Konstantin OlchanskiBug FixFix ss_file_size() on 32-bit Linux
It turns out that on 32-bit Linux, ss_file_size() returns the wrong answer for
files bigger than 2 GB (4GB?). The Linux stat() system call returns an error
(which is ignored) and bogus file size data (returned to the caller).

On 64-bit Linux (compiled with -m64), stat() appears to return correct data.

Related functions ss_disk_size() and ss_disk_free() return correct answers on
both 32-bit and 64-bit Linux (biggest disk I tried was 5.5 TB).

I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
the system call returns an error. I also added a test program
utils/test_ss_file_size.c.

svn revision 4397.
K.O.
  536   01 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.

The test program gave under 64-bit SL5:

For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'

Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program.
  538   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler 
complains that stat64 is undefined. Since many Makefiles for front-ends out there do 
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used 
if that flag is defined:

#ifdef _LARGEFILE64_SOURE
   struct stat64 stat_buf;
   int status;

   /* allocate buffer with file size */
   status = stat64(path, &stat_buf);
   if (status != 0)
      return -1;
   return (double) stat_buf.st_size;
#else
   ...
  539   02 Dec 2008 Konstantin OlchanskiBug FixFix ss_file_size() on 32-bit Linux
> > I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
> That does not work if _LARGEFILE64_SOURCE is not defined.
> #ifdef _LARGEFILE64_SOURE
>    struct stat64 stat_buf;

This does not work (observe the typoe in the #ifdef). But you cannot know this because
you already deleted the test program I wrote and committed to svn exactly to detect and
prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users
some way to check that ss_file_size() works on their systems).

K.O.
  540   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux

K.O. wrote:
This does not work (observe the typoe in the #ifdef).


Sorry for that, I fixed and committed it.


K.O. wrote:
But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems)..


Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.

If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit.
  373   10 May 2007 Konstantin OlchanskiBug FixFix error reporting from cm_transition()
For some time now, error reporting from cm_transition() was broken.

Typical symptom was when starting a run from mhttpd, when a transition error occurred, the run does not 
start (good) but the user is presented with a message "Success" in big letters (confusing the user).

Part of the problem was caused by user-written frontends that return an empty error string. Code in 
cm_transition() now detects this and shows the numeric value of the error status returned by the frontend.

This is fixed in revision 3681.

The error string "Success" is now returned only when cm_transition() was successful, and other error 
reporting inside this function was cleaned up.

K.O.
  587   03 Jun 2009 Konstantin OlchanskiBug FixFix db_open_record() error return
The odb hot-link function db_open_record() did not return an error when the system limit for hotlinks is 
exceeded and no more hot links could be added (silent failure). This is now fixed.
odb.c svn rev 4500
K.O.
  291   07 Aug 2006 Konstantin OlchanskiBug FixFix crash in mfe.c
Some time ago, I accidentally introduced a bug in mfe.c- if there is data
congestion in the system, mfe.c can exit with the error "bm_flush_cache(ASYNC)
error 209" because it did not expect the valid return value BM_ASYNC_RETURN
(209) from bm_flush_cache(ASYNC). This error has now been fixed. K.O.
  299   04 Sep 2006 Konstantin OlchanskiBug FixFix MIDAS on MacOS 10.4.7
I commited minor fixes for building MIDAS on MacOS 10.4.7:
1) there is no linux/unistd.h
2) gcc 4.0.0 does not like "struct { ... } var;" although "struct Foo { ... } var;" is fine
3) there is no "_syscall0(...)" macro
4) there is no "gettid()", I used pthread_self() instead.
K.O.

P.S. ss_gettid() returns "int" instead of "midas_thread_t" (pthread_t, really). On MacOS 10.4.7 at least, 
pthread_t appears to be a pointer, not an int. Is that right?
  67   14 Jan 2004 Konstantin Olchanski First try- midas on darwin/macosx
While watching "The Wizard of Oz", the greatest movie ever made, I took a shot at building 
midas on my macosx computer. After stumbling on a few small and on a few hard problems, I 
built almost everything. However, odb does not work- some further debugging is in order.

Anyway, the easy problems are:
- a few missing header files: pty.h, sys/vfs.h, malloc.h
- a few missing features in system.c (stime(), "get tape position")
- /usr/include/string.h already has strlcpy() & co.
- dbg_malloc() has inconsistent prototypes (size_t vs unsigned int)
- for reasons unknown, PVM is #defined. This flushed a bug in mana.c

A few hard problems:
- namespace pollution by Apple- they #define ALIGN in system headers, colliding with ALIGN 
in midas.h. I was amazed that the two are almost identical, but MIDAS ALIGN aligns to 8 
bytes, while Apple does 4 bytes. ALIGN is used all over the place and I am not sure how to 
reconcile this.
- "timezone" in mhttpd.c. On linux, it's an "int", on darwin, it's a function. What gives?
- building libmidas.a requires running ranlib
- building libmidas.so requires unknown macosx specific magic.

For your enjoyment, the "cvs diff" is attached. The resulting code is known to not work.

K.O.
Attachment 1: xxx
? .ALARM.SHM
? .ELOG.SHM
? .ODB.SHM
? .SYSMSG.SHM
? darwin
? midas.log
? xx
? xxx
Index: Makefile
===================================================================
RCS file: /usr/local/cvsroot/midas/Makefile,v
retrieving revision 1.50
diff -r1.50 Makefile
0a1
> 
218a220,224
> #
> # Uncomment the next line to build the midas shared library
> #
> NEED_SHLIB=1
> 
268a275,290
> # MacOSX/Darwin is just a funny Linux
> #
> ifeq ($(OSTYPE),Darwin)
> OSTYPE = darwin
> endif
> 
> ifeq ($(OSTYPE),darwin)
> OS_DIR = darwin
> OSFLAGS = -DOS_LINUX -DOS_DARWIN -DHAVE_STRLCPY -fPIC -Wno-unused-function
> LIBS = -lpthread
> SPECIFIC_OS_PRG = $(BIN_DIR)/mlxspeaker
> NEED_RANLIB=1
> NEED_SHLIB=
> endif
> 
> #-----------------------
340a363,364
> LIB    =$(LIBNAME)
> ifdef NEED_SHLIB
342,344c366,367
< LIB =   -lmidas
< # Uncomment this for static linking of midas executables
< #LIB =   $(LIBNAME)
---
> LIB   = $(SHLIB)
> endif
351c374
< 	$(LIB_DIR)/fal.o $(PROGS)
---
>  	$(LIB_DIR)/fal.o $(PROGS)
431a455,457
> ifdef NEED_RANLIB
> 	ranlib $@
> endif
432a459
> ifdef NEED_SHLIB
435a463
> endif
Index: include/midas.h
===================================================================
RCS file: /usr/local/cvsroot/midas/include/midas.h,v
retrieving revision 1.126
diff -r1.126 midas.h
464c464
< #if defined(OS_LINUX) || defined(OS_OSF1) || defined(OS_ULTRIX) || defined(OS_FREEBSD) || defined(OS_SOLARIS) || defined(OS_IRIX)
---
> #if defined(OS_LINUX) || defined(OS_OSF1) || defined(OS_ULTRIX) || defined(OS_FREEBSD) || defined(OS_SOLARIS) || defined(OS_IRIX) || defined(OS_DARWIN)
534a535,544
> #endif
> 
> /* need system-dependant thread type */
> #if defined(OS_WINNT)
> typedef HANDLE midas_thread_t;
> #elif defined(OS_UNIX)
> #include <pthread.h>
> typedef pthread_t midas_thread_t;
> #else
> typedef INT midas_thread_t;
Index: include/midasinc.h
===================================================================
RCS file: /usr/local/cvsroot/midas/include/midasinc.h,v
retrieving revision 1.11
diff -r1.11 midasinc.h
50a51
> #include <assert.h>
157d157
< #include <sys/mount.h>
163a164,165
> #ifdef OS_DARWIN
> #else
164a167
> #endif
166a170,172
> #ifdef OS_DARWIN
> #include <util.h>
> #else
167a174
> #endif
Index: include/msystem.h
===================================================================
RCS file: /usr/local/cvsroot/midas/include/msystem.h,v
retrieving revision 1.37
diff -r1.37 msystem.h
719,720c719,720
<    INT EXPRT ss_thread_create(INT(*func) (void *), void *param);
<    INT EXPRT ss_thread_kill(INT thread_id);
---
>    midas_thread_t EXPRT ss_thread_create(INT(*func) (void *), void *param);
>    INT EXPRT ss_thread_kill(midas_thread_t thread_id);
721a722
>    INT ss_timezone(void);
Index: src/mhttpd.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/mhttpd.c,v
retrieving revision 1.262
diff -r1.262 mhttpd.c
6983c6983
<    x_act = (int) floor((double) (xmin - timezone) / label_dx) * label_dx + timezone;
---
>    x_act = (int) floor((double) (xmin - ss_timezone()) / label_dx) * label_dx + ss_timezone();
6995,6996c6995,6996
<          if ((x_act - timezone) % major_dx == 0) {
<             if ((x_act - timezone) % label_dx == 0) {
---
>          if ((x_act - ss_timezone()) % major_dx == 0) {
>             if ((x_act - ss_timezone()) % label_dx == 0) {
Index: src/system.c
===================================================================
RCS file: /usr/local/cvsroot/midas/src/system.c,v
retrieving revision 1.78
diff -r1.78 system.c
306a307,310
> #ifdef OS_UNIX
> #include <sys/mount.h>
> #endif
> 
895c899
<     INT              thread handle
---
>     INT thread handle
914c918
<    return (int) hThread;
---
>    return hThread;
1653c1657
< thread_id = ss_thread_spawn((void *) taskWatch, &tsWatch);
---
> midas_thread_t thread_id = ss_thread_create((void *) taskWatch, &tsWatch);
1662c1666
< thread_id = ss_thread_spawn((void *) taskWatch, pDevice);
---
> midas_thread_t thread_id = ss_thread_create((void *) taskWatch, pDevice);
1673c1677
< INT ss_thread_create(INT(*thread_func) (void *), void *param)
---
> midas_thread_t ss_thread_create(INT(*thread_func) (void *), void *param)
1675c1679
< #ifdef OS_WINNT
---
> #if defined(OS_WINNT)
1689,1690c1693
< #endif                          /* OS_WINNT */
< #ifdef OS_MSDOS
---
> #elif defined(OS_MSDOS)
1694,1695c1697
< #endif                          /* OS_MSDOS */
< #ifdef OS_VMS
---
> #elif defined(OS_VMS)
1699c1701
< #endif                          /* OS_VMS */
---
> #elif defined(OS_VXWORKS)
1701d1702
< #ifdef OS_VXWORKS
1719d1719
< #endif                          /* OS_VXWORKS */
1721c1721,1722
< #ifdef OS_UNIX
---
> #elif defined(OS_UNIX)
> 
1728c1729,1730
< #endif                          /* OS_UNIX */
---
> 
> #endif
1738c1740
< thread_id = ss_thread_create((void *) taskWatch, pDevice);
---
> midas_thread_t thread_id = ss_thread_create((void *) taskWatch, pDevice);
1749c1751
< INT ss_thread_kill(INT thread_id)
---
> INT ss_thread_kill(midas_thread_t thread_id)
1751c1753
< #ifdef OS_WINNT
---
> #if defined(OS_WINNT)
1755c1757
<    status = TerminateThread((HANDLE) thread_id, 0);
---
>    status = TerminateThread(thread_id, 0);
1759,1760c1761
< #endif                          /* OS_WINNT */
< #ifdef OS_MSDOS
---
> #elif defined(OS_MSDOS)
1764,1765c1765
< #endif                          /* OS_MSDOS */
< #ifdef OS_VMS
---
> #elif defined(OS_VMS)
1769c1769
< #endif                          /* OS_VMS */
---
> #elif defined(OS_VXWORKS)
1771d1770
< #ifdef OS_VXWORKS
1773d1771
< 
1775d1772
< 
1777d1773
< #endif                          /* OS_VXWORKS */
1779,1782c1775
< #ifdef OS_UNIX
<    INT status;
< 
<    status = pthread_kill((pthread_t) thread_id, SIGKILL);
---
> #elif defined(OS_UNIX)
1783a1777,1778
>    INT status;
>    status = pthread_kill(thread_id, SIGKILL);
1785c1780,1781
< #endif                          /* OS_UNIX */
---
> 
> #endif
2339c2335
< #ifdef OS_WINNT
---
> #if defined(OS_WINNT)
2356,2357c2352,2358
< #endif
< #ifdef OS_UNIX
---
> #elif defined(OS_DARWIN)
> 
>    assert(!"ss_settime() is not supported");
>    /* not reached */
>    return SS_NO_DRIVER;
> 
> #elif defined(OS_UNIX)
2361,2362c2362
< #endif
< #ifdef OS_VXWORKS
---
> #elif defined(OS_VXWORKS)
2411a2412,2438
> INT ss_timezone()
> /********************************************************************\
> 
>   Routine: ss_timezone
> 
>   Purpose: Returns what?!?
> 
>   Input:
>     none
> 
>   Output:
>     what the heck does it return?!?
> 
>   Function value:
>     INT what is it?!?
> 
> \********************************************************************/
> {
> #ifdef OS_DARWIN
>   return 0;
> #else
>   return timezone; /* on Linux, comes from "#include <time.h>". What is it ?!? */
> #endif
> }
> 
> 
> /*------------------------------------------------------------------*/
4850c4877
<    INT status;
---
> #if defined(OS_DARWIN)
4852c4879,4883
< #ifdef OS_UNIX
---
>    return 0;
> 
... 14 more lines ...
ELOG V3.1.4-2e1708b5