ID |
Date |
Author |
Topic |
Subject |
550
|
13 Jan 2009 |
Derek Escontrias | Forum | mlogger problem | > > Hi,
> >
> > I am running Scientific Linux with kernel 2.6.9-34.EL and I have
> > glibc-2.3.4-2.25. When I run mlogger, I receive the error:
> >
> > *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> > Aborted
> >
> > Any ideas?
>
> Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared
> memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this
> problem does not occur. If you cannot fix it, try to run mlogger inside gdb and
> make a stack trace to see who called the free().
Sorry for being vague. I cleaned up the ODB, but it doesn't seem to be the
problem. Here is a sample run of mlogger and gdb:
/**************************************************************
/**************************************************************
/**************************************************************
[root@tsunami AL_Test]# mlogger -v -d
*** glibc detected *** free(): invalid pointer: 0x007f793e ***
Aborted (core dumped)
[root@tsunami AL_Test]#
[root@tsunami AL_Test]#
[root@tsunami AL_Test]#
[root@tsunami AL_Test]#
[root@tsunami AL_Test]#
[root@tsunami AL_Test]#
[root@tsunami AL_Test]# gdb mlogger core.23213
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library
"/lib/tls/libthread_db.so.1".
Core was generated by `mlogger -v -d'.
Program terminated with signal 6, Aborted.
Reading symbols from /home/dayabay/Software/Root/lib/libCore.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libCore.so
Reading symbols from /home/dayabay/Software/Root/lib/libCint.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libCint.so
Reading symbols from /home/dayabay/Software/Root/lib/libRIO.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libRIO.so
Reading symbols from /home/dayabay/Software/Root/lib/libNet.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libNet.so
Reading symbols from /home/dayabay/Software/Root/lib/libHist.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libHist.so
Reading symbols from /home/dayabay/Software/Root/lib/libGraf.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGraf.so
Reading symbols from /home/dayabay/Software/Root/lib/libGraf3d.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGraf3d.so
Reading symbols from /home/dayabay/Software/Root/lib/libGpad.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGpad.so
Reading symbols from /home/dayabay/Software/Root/lib/libTree.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libTree.so
Reading symbols from /home/dayabay/Software/Root/lib/libRint.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libRint.so
Reading symbols from /home/dayabay/Software/Root/lib/libPostscript.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libPostscript.so
Reading symbols from /home/dayabay/Software/Root/lib/libMatrix.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libMatrix.so
Reading symbols from /home/dayabay/Software/Root/lib/libPhysics.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libPhysics.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libpcre.so.0...done.
Loaded symbols for /lib/libpcre.so.0
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libfreetype.so.6...done.
Loaded symbols for /usr/lib/libfreetype.so.6
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x002e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb)
(gdb)
(gdb)
(gdb) where
#0 0x002e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x016d68b5 in raise () from /lib/tls/libc.so.6
#2 0x016d8329 in abort () from /lib/tls/libc.so.6
#3 0x0170a40a in __libc_message () from /lib/tls/libc.so.6
#4 0x01710a08 in _int_free () from /lib/tls/libc.so.6
#5 0x01710fda in free () from /lib/tls/libc.so.6
#6 0x08057108 in main (argc=3, argv=0xbff94f14) at src/mlogger.c:3473
(gdb)
/**************************************************************
/**************************************************************
/**************************************************************
I am running Midas 2.0.0 and here is a section of my mlogger.c:
/**************************************************************
/**************************************************************
/**************************************************************
/********************************************************************\
Name: mlogger.c
Created by: Stefan Ritt
Contents: MIDAS logger program
$Id: mlogger.c 3476 2006-12-20 09:00:26Z ritt $
\********************************************************************/
// stuff...
/*------------------------ main ------------------------------------*/
int main(int argc, char *argv[])
{
INT status, msg, i, size, run_number, ch = 0, state;
char host_name[HOST_NAME_LENGTH], exp_name[NAME_LENGTH], dir[256];
BOOL debug, daemon, save_mode;
DWORD last_time_kb = 0;
DWORD last_time_stat = 0;
HNDLE hktemp;
#ifdef HAVE_ROOT
char **rargv;
int rargc;
/* copy first argument */
rargc = 0;
rargv = (char **) malloc(sizeof(char *) * 2);
rargv[rargc] = (char *) malloc(strlen(argv[rargc]) + 1);
strcpy(rargv[rargc], argv[rargc]);
rargc++;
/* append argument "-b" for batch mode without graphics */
rargv[rargc] = (char *) malloc(3);
rargv[rargc++] = "-b";
TApplication theApp("mlogger", &rargc, rargv);
/* free argument memory */
free(rargv[0]);
free(rargv[1]); // Line: 3473
free(rargv);
#endif
// etc...
/**************************************************************
/**************************************************************
/**************************************************************
I'll play with it some, but I wanted to post this info first. |
549
|
13 Jan 2009 |
Stefan Ritt | Forum | mlogger problem | > Hi,
>
> I am running Scientific Linux with kernel 2.6.9-34.EL and I have
> glibc-2.3.4-2.25. When I run mlogger, I receive the error:
>
> *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> Aborted
>
> Any ideas?
Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared
memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this
problem does not occur. If you cannot fix it, try to run mlogger inside gdb and
make a stack trace to see who called the free(). |
548
|
09 Jan 2009 |
Derek Escontrias | Forum | mlogger problem | Hi,
I am running Scientific Linux with kernel 2.6.9-34.EL and I have
glibc-2.3.4-2.25. When I run mlogger, I receive the error:
*** glibc detected *** free(): invalid pointer: 0x0073e93e ***
Aborted
Any ideas? |
547
|
01 Jan 2009 |
Konstantin Olchanski | Info | Custom page which executes custom function | > How can I add a button at the top of the "Status" webpage which will show a
> page similar to the "CNAF" one after I click on it? and how can I make a
> custom page similar to "CNAF" which allow me to call some custom funtions? I
> want to make a page which is particularly for doing calibration.
I was going to say that you can do this by using the MIDAS "hot-link" function.
In your equipment program, you create a string /eq/xxx/Settings/Command, and hot-link
it to the function you want to be called. (See midas function db_open_record() for details
and examples). (To test it, you put a call to printf("Hello world!\n") into your handler function,
then change the value of "command" using odbedit or the mhttpd odb editor
and observe that your function gets called and that it receives the correct value of "command").
Then on your custom web page you create 2 buttons "aaa" and "bbb" attached to javascript
ODBset("/eq/xxx/Settings/Command","aaa") and "bbb" respectively. When you push the button,
the specified string is written into ODB, and your hot-link handler function is called with the contents
of "command", which you can then look at to find out which web button was pushed.
But after looking at the hot-link data paths (see https://ladd00.triumf.ca/elog/Midas/546), I see 2
problems that make the above scheme unreliable and maybe unusable in some applications:
1) the data path contains one UDP communication and it is well known that UDP datagrams can be (and
are) lost with low or high probability, depending on not-well-understood external factors.
The effect is that the hot-link fails to "fire": odb contents is changed but your function is not called.
2) there is a timing problem with multiple odb writes: the odb lock is dropped before the "hot-link" gets
to see the new contents of odb: db_data_set()->lock odb->change data->send notification->unlock
odb->xxx->notification received by client->read the data->call user function. If something else is
written into odb during "xxx" above, the client may never see the data written by the first odb write. For
local clients, the delay between "send notification" and "notification is received by client" is not bounded in
time (can be arbitrary long, depending on the system load, etc). For remote clients, there is an additional
delay as the udp datagram is received by the local mserver and is forwarded to the remote client through
a tcp rpc connection (another source of unbounded delay).
The effect is that if buttons "aaa" and "bbb" are pushed quickly one right after the other, while your
function will be called 2 times (if neither udp packet is dropped), you may never see the value of "aaa"
as is it will be overwritten by "bbb" by the time you receive the first notification.
Probability of malfunction increases with code written like this: { ODBset("command", "open door");
ODBset("command", "walk through doorway"); }. You may see the "open door" command sometimes
mysteriously disappear...
The net effect is that sometimes you will push the button but nothing will happen. This may be okey,
depending on your application and depending on how often it happens in practice on your specific system
If you are lucky, you may never see either of the 2 problems listed above ad hot-links will work for you
perfectly. At TRIUMF, in the past, we have seen hot-links misbehave in the TWIST experiment, and now I
think I understand why (because of the 2 problems described above).
K.O. |
546
|
01 Jan 2009 |
Konstantin Olchanski | Info | odb "hot link" magic explored | Here are my notes on the MIDAS ODB "hot link" function. Perhaps others can find them useful.
Using db_open_record(key,function), the user can tell MIDAS to call the specified user function when
the specified ODB key is modified by any other MIDAS program. This function works both locally
(shared memory odb access) and remotely (odb access through mserver tcp rpc). For example, the
MIDAS "history" mechanism is implemented in the mlogger by "hot-linking" ODB
"/equipment/xxx/Variables".
First, the relevant data structures defined in midas.h and msystem.h (ODB database headers, etc)
(in midas.h)
#define NAME_LENGTH 32 /**< length of names, mult.of 8! */
#define MAX_CLIENTS 64 /**< client processes per buf/db */
#define MAX_OPEN_RECORDS 256 /**< number of open DB records */
(in msystem.h)
DATABASE buf <--- local, private to each client)
DATABASE_HEADER* database_header <--- odb in shared memory
char name[NAME_LENGTH]
DATABASE_CLIENT client[MAX_CLIENTS]
char name[NAME_LENGTH]
OPEN_RECORD open_record[MAX_OPEN_RECORDS]
handle
access_mode
flags
(the above means that each midas client has access to the list of all open records through
buf->database_header.client[i].open_record[j])
Second, the data path through db_set_data & co: (other odb "write" functions work the same way)
db_set_data(key)
lock db
update odb <--- memcpy(), really
db_notify_clients(key)
unlock db
return
db_notify_clients(key)
loop: <--- data for this key changed and so data for all keys containing it
also changed, and we need to notify anybody who has an open record
on the parents of this key. need to loop over parents of this key (follow "..")
if (key->notify_count)
foreach client
foreach open_record
if (open_record.handle == key)
ss_resume(client->port, "O hDB hKey")
key = key.parent
goto loop;
ss_resume(port, message)
idx = ss_suspend_get_index() <--- magic here
send udp message ("O hDB hKey") to localhost:port <-- notifications sent only to local host!
note 1: I do not completely understand the ss_suspend_xxx() stuff. The best I can tell
is it creates a number of udp sockets bound to the local host and at least one udp rpc
receive socket ultimately connected to the cm_dispatch_rpc() function.
note 2: More magic here: database_header->client[i].port appears to be the udp rpc server
port of the mserver, while ODB /Clients/xxx/Port is the tcp rpc server port
of the client itself, on the remote host
note 3: the following is for remote odb clients connected through the mserver. For local
clients, cm_dispatch_rpc() calls the local db_update_record() as shown at the very end.
note 4: this uses udp rpc. If the udp datagram is lost inside the os kernel (it looks like these udp/rpc
datagrams never go out to the network), "hot-link" silently fails: code below is not executed. Some
OSes (namely, Linux) are known to lose udp datagrams with high probability under certain
not very well understood conditions.
local mserver receives the udp datagram
...
cm_dispatch_ipc()
if (message=="O hDB hKey")
decode message (hDB, hKey)
db_update_record(hDB, hKey)
send tcp rpc with args(MSG_ODB, hDB, hKey)
(note- unlike udp rpc, tcp rpc are never "lost")
remote client receives tcp rpc:
rpc_client_dispatch()
recv_tcp(net_buffer)
if (net_buffer.routine_id == MSG_ODB)
db_update_record(hDB, hKey)
db_update_record(hDB, hKey)
if remote delivery, see cm_dispatch_ipc() above
<--- local delivery
foreach (_recordlist)
if (recordlist.handle == hKey)
if (!recordlist.access_mode&MODE_WRITE)
db_get_record(hDB,hKey,recordlist.data,recordlist.size)
recordlist.dispatcher(hDB,hKey,recordlist.info); <-- user-supplied handler
Note: the dispatcher() above is the function supplied by the user in db_open_record().
K.O. |
545
|
22 Dec 2008 |
Stefan Ritt | Bug Report | Overflow on "cm_msg" command generates segfault | > The following error has been reported to me by T2K colleagues:
>
> When using "odbedit -c "msg my_message", the following behavior
> has been observed depending on the length "n" of the message.
>
> 1) n < 100 All is well
> 2) 100 <= n < 245 Log not written but exit code = 0
> 3) 245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4) 280 <= n Error: "Cannot connect to remote host" and exit code = 1
>
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
>
> Please fix
Uhhh, who wants this long messages? You should consider to split this into several
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in
SVN revision 4422. I increased the maximum length to 1000 characters. Above that,
the message gets truncated. If you need even more, we can make it a #define.
The second problem you describe (logging from compiled C code) I could not
reproduce, so maybe it was related to the first one. Please try again and report
if it persists. |
544
|
21 Dec 2008 |
Konstantin Olchanski | Bug Fix | mhttpd minor bug fixes and improvements | Committed minor bug fixes and improvements to mhttpd:
1) when generating history plots, use type "double" instead of "float" because "float" does not have enough
significant digits to plot values of large integer numbers. For example, serial numbers of T2K FGD FEB
cards are large integers, i.e. 99000001, 99000002, etc, but when we plot them with offset "-99000000",
the plots show "0" for all cards because when these numbers are converted to "float", they are truncated to
about 5 digits and the least significant digit (the only one of interest, the "1", "2", etc) is lost. Switching to
type "double" makes the plots come out with correct values.
2) fixed breakage of "/History/URL" ODB setting used to offload generation of history plots to a separate
mhttpd process, greatly improving responsiveness of the main mhttpd.
3) fixed memory leak in processing the new javascript requests (jset, jget & co).
svn revisions 4415-4417
K.O. |
543
|
17 Dec 2008 |
Renee Poutissou | Bug Report | Overflow on "cm_msg" command generates segfault | The following error has been reported to me by T2K colleagues:
When using "odbedit -c "msg my_message", the following behavior
has been observed depending on the length "n" of the message.
1) n < 100 All is well
2) 100 <= n < 245 Log not written but exit code = 0
3) 245 <= n < 280 Error: "Experiment not defined" and exit code = 1
4) 280 <= n Error: "Cannot connect to remote host" and exit code = 1
Also, when logging from compiled C code - when messages reach some magic length
the MIDAS client sending them segfaults.
Please fix |
542
|
14 Dec 2008 |
Stefan Ritt | Info | Custom page which executes custom function | > How can I add a button at the top of the "Status" webpage which will show a
> page similar to the "CNAF" one after I click on it? and how can I make a
> custom page similar to "CNAF" which allow me to call some custom funtions? I
> want to make a page which is particularly for doing calibration.
The CNAF page calls directly functions through the RPC layer of midas, which is
not possible from custom pages. All you can do is to execute a scrip on the
server side, which then causes some action. For details please consult the
documentation. |
541
|
12 Dec 2008 |
Jimmy Ngai | Info | Custom page which executes custom function | Dear All,
How can I add a button at the top of the "Status" webpage which will show a
page similar to the "CNAF" one after I click on it? and how can I make a
custom page similar to "CNAF" which allow me to call some custom funtions? I
want to make a page which is particularly for doing calibration.
Thank you for your attention!
Best Regards,
Jimmy Ngai |
540
|
02 Dec 2008 |
Stefan Ritt | Bug Fix | Fix ss_file_size() on 32-bit Linux |
K.O. wrote: | This does not work (observe the typoe in the #ifdef). |
Sorry for that, I fixed and committed it.
K.O. wrote: | But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems).. |
Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.
If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit. |
539
|
02 Dec 2008 |
Konstantin Olchanski | Bug Fix | Fix ss_file_size() on 32-bit Linux | > > I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
> That does not work if _LARGEFILE64_SOURCE is not defined.
> #ifdef _LARGEFILE64_SOURE
> struct stat64 stat_buf;
This does not work (observe the typoe in the #ifdef). But you cannot know this because
you already deleted the test program I wrote and committed to svn exactly to detect and
prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users
some way to check that ss_file_size() works on their systems).
K.O. |
538
|
02 Dec 2008 |
Stefan Ritt | Bug Fix | Fix ss_file_size() on 32-bit Linux | > I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler
complains that stat64 is undefined. Since many Makefiles for front-ends out there do
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used
if that flag is defined:
#ifdef _LARGEFILE64_SOURE
struct stat64 stat_buf;
int status;
/* allocate buffer with file size */
status = stat64(path, &stat_buf);
if (status != 0)
return -1;
return (double) stat_buf.st_size;
#else
... |
537
|
01 Dec 2008 |
Randolf Pohl | Bug Report | gcc warning in melog.c for midas 4401 | Hi all,
I have just compiled midas 4401 using SuSE 11.0.
gcc is some odd SuSE version:
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE
Linux)
Anyway, gcc stumbled over melog.c. I don't see the reason myself, but my
experience is that gcc is usually right when complaining about "array subscript
is above array bounds". So, just in case somebody knowlegeable wants to have a
look at this....
Cheers,
Randolf
The gcc output:
[...]
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/melog
utils/melog.c linux/lib/libmidas.a -lutil -lpthread -lz
utils/melog.c: In function 'submit_elog':
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/mlxspeaker
utils/mlxspeaker.c linux/lib/libmidas.a -lutil -lpthread -lz |
536
|
01 Dec 2008 |
Stefan Ritt | Bug Fix | Fix ss_file_size() on 32-bit Linux | > I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.
The test program gave under 64-bit SL5:
For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'
Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program. |
535
|
27 Nov 2008 |
Konstantin Olchanski | Info | Fixed mlogger crash, was Per-variable history implementation in the mlogger | > revision 4142+4143 are minor fixes, refactoring (switch the code to use helper
> functions) and implementation of history for structured banks
The implementation of "history for structured banks" had a bug - tags inside
structured banks were counted incorrectly, leading to memory overwrites and mlogger
crash in open_history().
This is problem is now fixed (plus added assert() checks to crash-out if overwrite of
tags[] array is detected).
svn revision 4398.
K.O. |
534
|
27 Nov 2008 |
Konstantin Olchanski | Info | lazylogger updated | lazylogger was updated to improve handling of the list of runs still on disk
(odb /Lazy/xxx/List).
Previously, each and every run was listed in the List arrays. With modern
Terabyte-sized data disks, many many days worth of runs tend to remain on disk
and these List arrays were getting too big, inflating the size of ODB dumps
written by mlogger into the output data file and slowing down starting and
stopping of runs considerably.
Now, the runs are listed as ranges of "first run" - "last run", (see example below).
This significantly reduces the size of the "List" arrays and makes lazylogger
usable for the ALPHA experiment at CERN and for T2K/ND280 prototype DAQ at
TRIUMF (writing to Castor and Dcache respectively, using the newly added
"Script" method).
The new List format is fully compatible with the old format and you can update
and run the new lazylogger without changing anything in ODB. New runs will be
added to the List arrays in the new format and data in the old format will
eventually go away as old runs are removed from disk.
svn revision 4394.
K.O.
Example: this reads like this:
range from 7100 to 7154
range from 7157 to 7161 (7155-7156 are missing)
range from 7163 to 7168 (7162 is missing)
runs 7170, 7173, 7176
range from 7179 to 7182
and so forth.
ODB /Lazy/Dcache/List
007100
[0] 7100 (0x1BBC)
[1] -7154 (0xFFFFE40E)
[2] 7157 (0x1BF5)
[3] -7161 (0xFFFFE407)
[4] 7163 (0x1BFB)
[5] -7168 (0xFFFFE400)
[6] 7170 (0x1C02)
[7] 7173 (0x1C05)
[8] 7176 (0x1C08)
[9] 7179 (0x1C0B)
[10] -7182 (0xFFFFE3F2)
[11] 7184 (0x1C10)
[12] 7188 (0x1C14)
[13] -7199 (0xFFFFE3E1)
007200
[0] 7200 (0x1C20)
[1] -7225 (0xFFFFE3C7) |
533
|
27 Nov 2008 |
Konstantin Olchanski | Bug Report | lazylogger complains about zero-size files | I now have a better understanding of this: lazylogger uses ss_file_size() to find
out if a file exists or not. This function used to return 0 (probably) for
non-existant files (there was no check for error status from stat() system call,
so the return value for non-existant files was never well defined).
With ss_file_size() returning 0 for nonexistant files, 0-size files clearly cause
problems to lazylogger.
Now, since svn revision 4397, ss_file_size() returns -1 for non-existant files,
but lazylogger still needs to be tought about this.
The problem "lazylogger does not like 0-size files" remains for now.
K.O.
> With latest midas, I see this:
>
> Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
>
> The file run17567.ybs has size zero:
>
> -rw-r--r-- 1 twistonl users 950272 Oct 13 19:29
> /twist/data_onl/current/run17565.ybs
> -rw-r--r-- 1 twistonl users 950272 Oct 13 19:45
> /twist/data_onl/current/run17566.ybs
> -rw-r--r-- 1 twistonl users 0 Oct 13 20:00
> /twist/data_onl/current/run17567.ybs
> -rw-r--r-- 1 twistonl users 983040 Oct 13 20:03
> /twist/data_onl/current/run17568.ybs
> -rw-r--r-- 1 twistonl users 950272 Oct 13 20:26
> /twist/data_onl/current/run17569.ybs
>
> I am not sure how to fix this lazylogger logic. Please help.
>
> K.O. |
532
|
27 Nov 2008 |
Konstantin Olchanski | Bug Fix | Fix ss_file_size() on 32-bit Linux | It turns out that on 32-bit Linux, ss_file_size() returns the wrong answer for
files bigger than 2 GB (4GB?). The Linux stat() system call returns an error
(which is ignored) and bogus file size data (returned to the caller).
On 64-bit Linux (compiled with -m64), stat() appears to return correct data.
Related functions ss_disk_size() and ss_disk_free() return correct answers on
both 32-bit and 64-bit Linux (biggest disk I tried was 5.5 TB).
I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
the system call returns an error. I also added a test program
utils/test_ss_file_size.c.
svn revision 4397.
K.O. |
531
|
26 Nov 2008 |
Stefan Ritt | Info | Send email alert in alarm system | > We have a temperature/humidity sensor in MIDAS now and will add a liquid level
> sensor to MIDAS soon. We want the operators to get alerted ASAP when the
> laboratory environment or the liquid level reached some critical levels. Can
> MIDAS send email alerts or SMS alerts to cell phones when the alarms are
> triggered? If yes, how can I config it?
Sure that's possible, that's why MIDAS contains an alarm system. To use it, define
an ODB alarm on your liquid level, like
/Alarms/Alarms/Liquid Level
Active y
Triggered 0 (0x0)
Type 3 (0x3)
Check interval 60 (0x3C)
Checked last 1227690148 (0x492D10A4)
Time triggered first (empty)
Time triggered last (empty)
Condition /Equipment/Environment/Variables/Input[0] < 10
Alarm Class Level Alarm
Alarm Message Liquid Level is only %s
The Condition if course might be different in your case, just select the correct
variable from your equipment. In this case, the alarm triggers an alarm of class
"Level Alarm". Now you define this alarm class:
/Alarms/Classes/Level Alarm
Write system message y
Write Elog message n
System message interval 600 (0x258)
System message last 0 (0x0)
Execute command /home/midas/level_alarm '%s'
Execute interval 1800 (0x708)
Execute last 0 (0x0)
Stop run n
Display BGColor red
Display FGColor black
The key here is to call a script "level_alarm", which can send emails. Use
something like:
#/bin/csh
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'
The second command just generates a midas system message for confirmation. Most
cell phones (depends on the provider) have an email address. If you send an email
there, it gets translated into a SMS message.
The script file above can of course be more complicated. We use a perl script
which parses an address list, so everyone can register by adding his/her email
address to that list. The script collects also some other slow control variables
(like pressure, temperature) and combines this into the SMS message.
For very sensitive systems, having an alarm via SMS is not everything, since the
alarm system could be down (computer crash or whatever). In this case we use
'negative alarms' or however you might call it. The system sends every 30 minutes
an SMS with the current levels etc. If the SMS is missing for some time, it might
be an indication that something in the midas system is wrong and one can go there
and investigate. |
|