Back Midas Rome Roody Rootana
  Midas DAQ System, Page 121 of 130  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subject
  189   22 Dec 2004 Stefan RittSuggestionWhat to do with invalid data in the history system?
Dealing with the NaN's in the history system in the past week, a question came
up at PSI about how to deal with invalid history data.

Assume you have several devices going into one history equipment, and one device
has a problem, such that it cannot be read. In the past, the device driver
system returned zero, which was written to the history file. While this is ok in
some cases, it might not be in others, where zero is maybe a valid measurement.
Furthermore, it might confuse some regulations loops.

An alternative is to keep the last correctly measured value. As long as the
device has its problem, the value is kept. However, values are written to the
history system which might look like valid, although they are not. So what about
writing explicitly NaNs to the history system? For the display routine the NaNs
could be omitted, leaving blank regions where no valid measurement is available.
Or one could explicitly mare the region as invalid. Konstantin, do you know how
to write NaN explicitly to a float variable? And what do the others think about
these possibilities?

- Stefan
  188   22 Dec 2004 Stefan RittForumcm_msg
> Could someone please explain to me how cm_msg, cm_msg1, etc. all work.  The
> documentation is very terse.  
> 
> I want to setup a fairly significant set of debugging, and error messages for a
> new frontend.  I need to get these messages to a logging file.  I also would
> like to get the error messages to the user through whatever interface Midas
> normally uses for error reporting.  

For errors, use

  cm_msg(MERROR, "routine_name", "Your error message, code=%d", i);

This produces an error message which is logged to midas.log, and distributed to all
clients which have called cm_msg_register(). For example odbedit will just print
that message. The syntax of the second half of cm_msg is the same as for printf(),
so you can add format specifiers and variable arguments as you do for printf(). The
first argument is the message type (MDEBUG for example is only distributed but not
logged). 

For a more detailed list of message types, please refer to

http://midas.triumf.ca/doc/html/AppendixE.html#midas_macro
  187   16 Dec 2004 Jan WoutersForumcm_msg
Could someone please explain to me how cm_msg, cm_msg1, etc. all work.  The
documentation is very terse.  

I want to setup a fairly significant set of debugging, and error messages for a
new frontend.  I need to get these messages to a logging file.  I also would
like to get the error messages to the user through whatever interface Midas
normally uses for error reporting.  

Jan
  186   16 Dec 2004 Konstantin OlchanskiInfo"cd /" in ss_daemon_init(), was- Commit local TWIST modifications
> > - system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever
> >   getting core dumps from midas daemons.
> 
> The chdir("/") is from one of the unix text books. They say you HAVE to do it. If you start a
> daemon on an NFS file system, you cannot unmount that file system as long as the daemon is
> running.

Right, I remember this NFS problem from a while back.

This problem does not exist in the current crop of Linux systems (since Red Hat 7.3 at least) - they
either kill off all user programs or use "umount -f" and "umount -l".

"umount -l" works in any case to unmount a "busy" filesystem.

For systems where the NFS problem does still exist, one should do this: "mlogger -D" becomes "(cd /; mlogger -D)".

So I suspect that the "cd /" advice from the unix programming book is no longer as necessary
as it used to be. (Perhaps a better advice would have been to "cd /tmp", so we could still get
core dumps from non-root daemons).

K.O.
  185   15 Dec 2004 Pierre-Andre AmaudruzForumWhere's the definition of "H1_BOOK()"
> When i compile the experiment example of 1.9.5 the problem happened:
> 
> adccalib.c: In function `INT adc_calib_init()':
> adccalib.c:114: `H1_BOOK' undeclared (first use this function)
> adccalib.c:114: (Each undeclared identifier is reported only once for each
>    function it appears in.)
> make: *** [adccalib.o] Error 1
> 
> my ROOT is 4.01 and Zlib is 1.2.2

We're in the process of fixing in the proper manner this problem, in the mean time
please add to the analyzer makefile the definition: -DUSE_ROOT at the line:
...
ROOTCFLAGS += -DHAVE_ROOT -DUSE_ROOT
  184   15 Dec 2004  ForumWhere's the definition of "H1_BOOK()"
When i compile the experiment example of 1.9.5 the problem happened:

adccalib.c: In function `INT adc_calib_init()':
adccalib.c:114: `H1_BOOK' undeclared (first use this function)
adccalib.c:114: (Each undeclared identifier is reported only once for each
   function it appears in.)
make: *** [adccalib.o] Error 1

my ROOT is 4.01 and Zlib is 1.2.2
  183   15 Dec 2004 Stefan RittForumFrontend index
> What is the api call to determine the index of the frontend when specifying the
> -i parameter during execution of the frontend? 

INT get_frontend_index();

- Stefan
  182   15 Dec 2004 Stefan RittInfoCommit local TWIST modifications
> - system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever
>   getting core dumps from midas daemons. The old behaviour is trivially
>   restored by "cd /" before starting the daemon; or by "limit coredumpsize 0".

The chdir("/") is from one of the unix text books. They say you HAVE to do it. If you start a
daemon on an NFS file system, you cannot unmount that file system as long as the daemon is
running. I'm sure the same code is inside most other daemons (apache, ...). So if we go away
from that standard, we have to be aware of the consequences.
  181   14 Dec 2004 Jan WoutersForumFrontend index
What is the api call to determine the index of the frontend when specifying the
-i parameter during execution of the frontend? 
  180   14 Dec 2004 Konstantin OlchanskiInfomhttpd: Commit local TWIST modifications
> > I am commiting MIDAS modification accumulated...

mhttpd changes:

- Renee's improvements on http transaction logging
- Implement "minimum" and "maximum" clamping for history graphs. Unfortunately
  there is no GUI code for changing the "minimum" and "maximum" settings,
  other than directly frobbing the odb.
- When making history graphs, detect NaNs in the history data.
(- status page code for the TWIST event builder (precursor of the standard
   event builder) stays uncommited).

K.O.
  179   14 Dec 2004 Konstantin OlchanskiInfoCommit local TWIST modifications
> I am commiting MIDAS modification accumulated during the last few months of running TWIST:

More:
- mfe.c: in error messages "cannot find statistics record", also print
  the name of the record we are looking for.
- mlogger.c: in warning message "Write operation took N ms", report the name
  of the offending data stream.
- system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever
  getting core dumps from midas daemons. The old behaviour is trivially
  restored by "cd /" before starting the daemon; or by "limit coredumpsize 0".
- odb.c: db_validate_db() detect and break infinite looping on free list corruption.

K.O.
  178   14 Dec 2004 Konstantin OlchanskiInfoCommit local TWIST modifications
I am commiting MIDAS modification accumulated during the last few months of running TWIST:
1) system.c::ss_shm_open() fail if trying to map a file that is smaller than we expect.
2) midas.c::bm_lock_buffer(), el_submit(), el_delete_message(): do not wait for mutexes forever, use a 5 
minute timeout. If we can't get the lock, cm_msg()/abort().
The above helps dealing with complete midas freezes. I also have code to keep track of "who locked
the mutex *and* is still holding it?!?" but it is way too ugly to commit. I wish we had a "lockedByPid"
entry for all lockable objects.
K.O.
 
  177   14 Dec 2004 Konstantin OlchanskiForumuse of assert in mhttpd
>    We've had mhttpd aborting regularly since upgrading from midas-1.9.3.  This
> happens during elog queries, and is due to an elog file that was incorrectly
> modified by hand.

(sorry for delayed reply, for reasons unknown, I did not get an email notice when this was posted)

Yes, I agree, error handling in midas elog code is insufficient (note missing error checks for
read() and lseek() system calls). Anything but "perfect" elog files would cause funny errors and
malfunctions.

>  The modification to the file occurred 6 months ago.
>    el_retrieve(midas.c:15683) now has several assert statements, one of which
> aborts the program on reading the bad entry.

I added those to fix problems with "broken last NN days" and with infinite looping in the elog code
that we observed in TWIST.

You are welcome to replace the assert() statements with proper error handling. I used to have some code
that could report the filename of the bad elog file. Can we also report the exact file location for broken
files.

Please send me the diff, I will commit it to midas cvs.

>    Why is assert used, instead of an error return from the function (if
> necessary), and maybe an error message in the log file?  Assert statements are
> often removed, using NDEBUG, for normal use.

I use assert() in several ways:

0) I want a core dump each time X happens. (This is the only reasonable action when facing memory/stack
corruption. The problems in the elog code were stack corruption).
1) "I am too lazy to write proper error handling code" so I just crash and burn. This includes the
case where "proper error handling" would be "too invasive".
2) the error is too bad (or too deep) and there is no reasonable way to recover. Print an error message
and dump core (for later analysis). I sometimes use "cm_msg(); abort()". (assert is "printf("error"); abort()")

Please refer to literature for philosophic discussions on uses of assert() (Argh! Stefan will have my
head again!), but I will mention that "abort() early, abort() often" I find very effective. BTW, this technique
is heavily used in the Linux kernel (oops(), bug(), panic()) with some good effect, too.

>    The problem elog entry had one character removed, so end-of-file came before
> the end of the message.  This could probably occur without the file being
> altered, if the disk containing the elog fills.

Yes, I think you are right. In TWIST, we have seen disk-full conditions break both elog and history.

K.O.
  176   25 Nov 2004 chris pearsonForumuse of assert in mhttpd
   We've had mhttpd aborting regularly since upgrading from midas-1.9.3.  This
happens during elog queries, and is due to an elog file that was incorrectly
modified by hand.  The modification to the file occurred 6 months ago.
   el_retrieve(midas.c:15683) now has several assert statements, one of which
aborts the program on reading the bad entry.

   Why is assert used, instead of an error return from the function (if
necessary), and maybe an error message in the log file?  Assert statements are
often removed, using NDEBUG, for normal use.

Chris

   The problem elog entry had one character removed, so end-of-file came before
the end of the message.  This could probably occur without the file being
altered, if the disk containing the elog fills.
  175   24 Nov 2004 chris pearsonInfomidas on 64bit opteron
   Midas, version 1.9.5 of 7th October, was installed, with a few changes, on a
64 bit opteron computer, running linux.  For this processor, as for the alpha
processor, long integers and addresses are 64 bits.  We added a new flag in the
Makefile,

250a251
> ARCH   = $(shell uname -m)
377a379,381
> ifeq ($(ARCH),x86_64)
> OSFLAGS := $(OSFLAGS) -DX86_64
> endif

and extended the alpha-specific definitions, of DWORD and PTYPE, in midas.h to
include this case,

549c549
< #ifdef __alpha
---
> #if defined(__alpha) || defined(X86_64)
598c598
< #ifdef __alpha
---
> #if defined(__alpha) || defined(X86_64)

apart from this, there are a large number of cases where pointers are cast to
integers, without using the PTYPE definition.  These all need to be changed by
hand, although these conversions should probably be removed anyway - in almost
all cases they are unnecessary, as just differences are being calculated.

There were also a number of warnings, which we ignored, where printf format
strings specified long integers, but the argument was not a long integer.  Casts
should probably be added in all cases where the type of the argument can vary
depending on the machine.

A midas analyser was made, which was able to successfully replay some data, but
this was all that was tested.

Chris
  174   09 Nov 2004 Pierre-Andre AmaudruzBug FixNew transition scheme
Problem:
If cm_set_transition_sequence() is used for changing the sequence number, the 
command odbedit> start/stop/resume/pause -v report the propre sequence but the
action on the client side is actually not performed!

Fix:
Local transition table updated in midas.c (1.226)

Note:
The transition number under /system/clients/<pid>/transition...
is used internally. Changing it won't have any effect on the client action
if sequence number is not registered.
  173   04 Nov 2004 Stefan RittForumFrontend code and the ODB
Hi Jan,

I usually keep under /Experiment/Run Parameters only those settings which are kind of "global" and thus of
interest to frontend *and* analyzer, like a run mode (data/calibration/cosmic/...). Settings more specific to a
frontend I keep under /Equipment/<name>/Settings where <name> is the equipment name the specific frontend
produces. In your case each frontend will then get its own tree (related to each fragment). Please note that
both discussed trees can contain a whole tree with subdirectories, which lets you organize your data better.

Best regards, Stefan.
  172   04 Nov 2004 Jan WoutersForumFrontend code and the ODB
I would like to know whether all parameters used by the frontend code have to be in the "Experiment/
Run Parameters" section.  This section can become big and difficult to maintain, because it is one single 
big section of experim.h (EXP_PARAM_DEFINED).  I have parameters the various frontends read at the 
beginning of each run, which set the hardware settings of various devices.  I would like to place these in 
a section all their own, organized by device.  Is this doable? 
  171   02 Nov 2004 Renee PoutissouInfoEvent Builder info in mhttpd Status page
Information about the Event Builder statistics has been removed from the 
Status page in mhttpd.  I heard from Pierre that this information might 
be redundant when using the new Event Builder format??? 
For the TWIST experiment, we are running and cannot change on the fly
to a new format Event Builder.  It is very important for us to show the users
the rates and statistics coming out of the EventBuilder.  I had  to put this
piece of code back in mhttpd.  
Can I put it back in the distribution? or do I have to put a special TWIST flag? 
or do I have to keep reinserting this every time there is an update to mhttpd.c? 
At the moment, TWIST is generating a couple of updates/week to mhttpd.c
  170   22 Oct 2004 Konstantin OlchanskiBug Fixmhttpd message colouring
I commited a fix to mhttpd logic that decides which messages should be shown in
"red" colour- before, any message with square brackets and colons would be
highlighted in red. Now only messages matching the pattern [...:...] are
highlighted. The decision logic was moved into a function message_red(). K.O.
ELOG V3.1.4-2e1708b5