Back Midas Rome Roody Rootana
  Midas DAQ System, Page 13 of 142  Not logged in ELOG logo
Entry  15 Jul 2004, Stefan Ritt, , Severe bug in 1.9.4 
Hello midas'ers,

Today I discovered a severe bug in the routine bm_check_buffers(), which
causes the logger to crash when it stops a run due to a reached event limit.
The funny thing is that this bug was there since the beginning, but only
recent versions of gcc and libc reveal it.

Since I consider this severe, I fixed it and updated 1.9.4 just now. I did
not go with 1.9.4-1, but maybe in future we should consider patch levels.

So please everybody who uses 1.9.4 and has problems with crashing loggers,
please update to 1.9.4 from today (July 15th, 2004).

- Stefan
    Reply  21 Jul 2004, Stefan Ritt, , Introduction of environment variable MIDASSYS 
> Where should MIDAS be installed?

I personally don't have any preference, as long as it's in accordance with "the standard"
(whatever this is). Maybe one should add a flag to the makefile to specify the
installation directory, either /opt or /usr/local, so people then have the choice. I have
seen that in other packages. As for the RPM, I leave the final proposal to the person
writing the spec file (Paul? Piotr? Konstantin?). We should then commonly agree on the
location based on that proposal. The person supplying the RPM will "officially" become the
RPM maintainer and be responsible for maintaining it.

> installed in /sbin/, /bin/, or /usr/. System administrators can build packages from
> source and install them into the /usr/local/ directory. However, third-party packages
> of add-on software must be installed in /opt/<package>/, where <package> is the name
> that describes a software suite.

Well, midas is kind of in the middle. On one hand it's a third-party package (-> /opt),
but it requires some compilation to allow meaningful work (frontend, analyzer). So maybe
the RPM should go to /opt, and if compiled from the TAR ball it should go to /usr/local?
But that means if someone has to maintain a large basis of midas machines, he/she has to
always search two locations. On the other hand one can alway do a "cd $MIDASSYS" ...

- Stefan
    Reply  07 Sep 2004, Stefan Ritt, , mlogger crash if using mserver. 
I trapped myself into that problem recently so it's the right time to fix it (;-).

We have two options: 

a) Make the logger work remotely, even if it's suboptimal and 
b) Make the logger refuse to run remotely. 

I have no case where I need to run the logger remotely, so I would opt for b).
This would mean removing the "-h" command line switch and the evaluation of
MIDAS_SERVER_HOST, or just supplying an empty host string to
cm_connect_experiment().

Let me know if you agree, I can then remove the "-h" option. The patch you
suggested I would apply in addition.

- Stefan
    Reply  16 Sep 2004, Stefan Ritt, , midas odb locking 
> I will add a timeout of 10 minutes, then shutdown the ODB client with an error message.

I added a timeout handling to db_lock_database. It was already present in
ss_mutex_wait_for, so it was just a matter of passing the status up the calling stack.
ODBEdit stops if it cannot obtain a lock after 5 minutes.
    Reply  21 Sep 2004, Stefan Ritt, , ODB-EPICS gateway 
The easiest way to achieve this is to write a new class driver, probably derived
from the multi.c class driver. One has just to rename all "output" with "write"
(or better "ODB2EPICS") and all "input" with "EPICS2ODB". The multi class driver
handles already a factor/offset for each channel (which could be 1/0 of course),
a threshold to update the ODB/EPICS only when a value changes significantly, to
retrieve labes from the bus driver (EPICS labes -> ODB settings), automatic
event generation and error handling. So it would be a good starting point.

What one gets from the class driver in the ODB is:

  /equipment/<name>/
     variables/
        Input[]     <--- read from the bus driver (float)
        Output[]    <--- witten to the bus driver (float)
     settings/
        Names Input[]        <--- human readable names
        Names Output[]       <--- human readable names
        Update Threshold[]
        Input Offset[]
        Input Factor[]
        Output Offset[]
        Output Factor[]
        Devices/
           Input/
              DD/   <--- parameters for Device Driver
                 ... Epics addresses, flags etc.
              BD/   <--- parameters for Bus Driver
           Output/
        
So if one uses the standard mfe.c code together with the multi.c class driver
and epics_ca.c device driver all what is left is the following:

- replace cd_gen.c by multi.c in the examples/epics directory
- break down the already existing flags into enable epics/write/events
- maybe add th EPICS read period

The last two things should be done in the epics_ca.c device driver, so one can
use the multi.c class driver without any change. Event generation and error
handling then comes for free.
Entry  29 Sep 2004, Stefan Ritt, Info, Increased number of clients in midas.h, important! 
Due to some request several limitations like the maximal number of clients to the ODB have 
been increased in midas.h and committed to CVS. It is important to note that clients compiled
with the old limits cannot coexist with clients compiled with the new limits. You will get
ODB corruption notifications and everything will crash, and you wonder where this comes from.

So once you CVS update midas.h, revision 1.139, please make sure to recompile *ALL* your
midas applications with the new midas.h.

Stefan
Entry  03 Oct 2004, Stefan Ritt, Info, Introduction of new transition scheme 
A new transition scheme has been implemented and committed. Previously, one had the
possibility to register for PRE/POST transitions, which was necessary in order to first
stop the frontends, then stop the logger to close the data file. While this scheme
long time has proven to be successful, it was now concluded that three levels
(PRESTROP/STOP/POSTSTOP for example) are not suffucient in some cases. Therefore,
a true sequence-based scheme has been introduced, implemented and committed.

The PRE/POST transition have been removed and an extra parameter "sequence_number"
has been added to cm_register_transition. If clients register with different
sequence numbers, their RPC transition function is executed according to their
sequnce number, smaller numbers being executed prior to larger numbers.

The frontends register at sequence number 500 for example, while the logger
registers with 200 for start and 800 for stop, making sure it's called after the
frontend(s) when stopping a run. The default numbers can be changed from within
the user code with the new function cm_set_transition_sequence(). This way, it is
for example possible to have all frontends being called in a certain sequence
when starting and stopping runs.

The modification will (hopefully) not have any influence of existing experiemnts,
as long as they don't call cm_register_transition directly. If so however, one has
to add the additional parameter to this function.
    Reply  03 Oct 2004, Stefan Ritt, Info, Increased number of clients in midas.h, important! 
> Stefan, to avoid confusion from crashes caused by incompatible ODBs would it be possible to add a "version number" to ODB,
together with a check and an error message 
> saying "oops... incompatible ODB, please rebuild your programs"? We tend to have different versions of midas floating around and
users have old executables stashed away, 
> and all this makes it rather difficult to manually keep track on what ODB is compatible with what midas.

I fully agree that a version number in ODB is a good thing, and I certainly will put one there, but this won't help for old
applications. If I add new code which checks in cm_connect_experiment() if the version number matches, this will only help for new
applications connecting to old ODBs. If old applications (prior to invention of the version number) connect to a new ODB, they still
will crash.

However, we are planning to make a new release 1.9.5 soon (next week), so can can people tell not to "mix" 1.9.5 with pre-1.9.5
programs.
    Reply  04 Oct 2004, Stefan Ritt, Info, Increased number of clients in midas.h, important! 
> Right. We cannot fix the past, but we should fix the future. BTW, "do not mix versions" is hard to enforce and mismatches did, do and
> will happen

For remote connections (through mserver), there is already a version check. If the minor version differs, you get a warning, if the major
versions differ (1.>>9<<.4), the client won't start. So at least for remote connection you get a clue.

> For one thing, looking at a given midas-using executable, how do I tell what version of midas it has inside?

Ther is a function cm_get_version() returning the version. As for the executable, all you can do is a

strings <executable> | grep 1.9
    Reply  13 Oct 2004, Stefan Ritt, Bug Report, db_paste: found string exceeding MAX_STRING_LENGTH 
Can you attach 

/twist/data_onl/current/run17548.odb

so I can reproduce the problem?
    Reply  13 Oct 2004, Stefan Ritt, Bug Report, silly odbedit "rename Display xxx/yyy" 
> odbedit command "rename Display xxx/yyy" creates a key named "xxx/yyy" (yes,
> with a slash in the name) and this key cannot be deleted or renamed...
> K.O.

"rename" is "rename", not "mv" under Unix. If you want this functionality, put it
in and don't complain!
    Reply  13 Oct 2004, Stefan Ritt, Suggestion, No al_clear_alarm()? 
> We have al_trigger_alarm(), but no matching al_clear_alarm(), and I need it to
> clear my alarm once the alarm condition no longer exists. Any objections if I
> add this function? K.O.

The idea is that once an alarm got triggered, it stays until the user
acknowledged, even if the alarm condition has been disappeared. Through mhttpd,
the user can press the "Reset" button, which then executes al_reset_alarm().
However, it is possible to call al_reset_alarm() directly from user code to
achieve the same thing.
    Reply  14 Oct 2004, Stefan Ritt, Bug Report, TWIST upgrade bombed... 
Agree.

Once you did the modification, please check following situation: Create a fresh
ODB withe increased size ("odbedit -s 2000000" for example). Then check that the
other clients "adopt" this increased size. Note that some experiments need a
bigger ODB, and I don't want to have them recompile all clients, that's why the
code in ss_shm_open() can attach to a *larger* shared memory. However, it should
not matter to the process, since the ODB (or SYSTEM) shared memory size is
stored in the pheader->key_size and pheader->data_size of each participating
process. So they should never write beyond the limits defined in that header.
The size to ss_shm_open() is only a "hint" if the shared memory does not exist,
and is nowhere later used in the code.
    Reply  04 Nov 2004, Stefan Ritt, Forum, Frontend code and the ODB 
Hi Jan,

I usually keep under /Experiment/Run Parameters only those settings which are kind of "global" and thus of
interest to frontend *and* analyzer, like a run mode (data/calibration/cosmic/...). Settings more specific to a
frontend I keep under /Equipment/<name>/Settings where <name> is the equipment name the specific frontend
produces. In your case each frontend will then get its own tree (related to each fragment). Please note that
both discussed trees can contain a whole tree with subdirectories, which lets you organize your data better.

Best regards, Stefan.
    Reply  15 Dec 2004, Stefan Ritt, Info, Commit local TWIST modifications 
> - system.c: do not chdir("/") in ss_daemon_init()- it prevents us from ever
>   getting core dumps from midas daemons. The old behaviour is trivially
>   restored by "cd /" before starting the daemon; or by "limit coredumpsize 0".

The chdir("/") is from one of the unix text books. They say you HAVE to do it. If you start a
daemon on an NFS file system, you cannot unmount that file system as long as the daemon is
running. I'm sure the same code is inside most other daemons (apache, ...). So if we go away
from that standard, we have to be aware of the consequences.
    Reply  15 Dec 2004, Stefan Ritt, Forum, Frontend index 
> What is the api call to determine the index of the frontend when specifying the
> -i parameter during execution of the frontend? 

INT get_frontend_index();

- Stefan
    Reply  22 Dec 2004, Stefan Ritt, Forum, cm_msg 
> Could someone please explain to me how cm_msg, cm_msg1, etc. all work.  The
> documentation is very terse.  
> 
> I want to setup a fairly significant set of debugging, and error messages for a
> new frontend.  I need to get these messages to a logging file.  I also would
> like to get the error messages to the user through whatever interface Midas
> normally uses for error reporting.  

For errors, use

  cm_msg(MERROR, "routine_name", "Your error message, code=%d", i);

This produces an error message which is logged to midas.log, and distributed to all
clients which have called cm_msg_register(). For example odbedit will just print
that message. The syntax of the second half of cm_msg is the same as for printf(),
so you can add format specifiers and variable arguments as you do for printf(). The
first argument is the message type (MDEBUG for example is only distributed but not
logged). 

For a more detailed list of message types, please refer to

http://midas.triumf.ca/doc/html/AppendixE.html#midas_macro
Entry  22 Dec 2004, Stefan Ritt, Suggestion, What to do with invalid data in the history system? 
Dealing with the NaN's in the history system in the past week, a question came
up at PSI about how to deal with invalid history data.

Assume you have several devices going into one history equipment, and one device
has a problem, such that it cannot be read. In the past, the device driver
system returned zero, which was written to the history file. While this is ok in
some cases, it might not be in others, where zero is maybe a valid measurement.
Furthermore, it might confuse some regulations loops.

An alternative is to keep the last correctly measured value. As long as the
device has its problem, the value is kept. However, values are written to the
history system which might look like valid, although they are not. So what about
writing explicitly NaNs to the history system? For the display routine the NaNs
could be omitted, leaving blank regions where no valid measurement is available.
Or one could explicitly mare the region as invalid. Konstantin, do you know how
to write NaN explicitly to a float variable? And what do the others think about
these possibilities?

- Stefan
    Reply  23 Dec 2004, Stefan Ritt, Suggestion, What to do with invalid data in the history system? hist.gif
I preliminary implemented NaNs into the history system. It works such that if a
device driver returns a read error status, the class driver writes a NaN
(Not-a-Number) into the corresponding variable via the new function ss_nan(). The
"mhist" utility directly displays these as "nan" (Linux) or "-1.#IND00" under
Windows, indicating the error status. The history display via mhttpd just skips
these values (see elog:/1). I think this is better than showing just zero values,
because in most cases zero is a valid measurement and could confuse people.

Of course it is not enough just having "gaps" in the history display, so it's
important that the corresponding device driver issues an error message, which could
even trigger an alarm.

I have tested this under Windows, but only compiled under Linux. The only class
driver I modified so far is "multi.c". People should have a look, make some tests,
and let me know if this is a good thing, or if we should change it somehow.

- Stefan
    Reply  21 Jan 2005, Stefan Ritt, Bug Report, Persistency problem with h1_book() & co 
> I can't get onto cvs@midas.psi.ch right now
> (cvs update
> cvs@midas.psi.ch's password: 
> Permission denied, please try again.)

I had to upgrade midas.psi.ch today with Scientific Linux 3.03. Most things are back to work, but
 I failed to do the anonymous CVS account. I have to wait for next week when the experts are
there. I will let you know when it's working again.

- Stefan
ELOG V3.1.4-2e1708b5