ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 2 of 49

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Collapse | Expand

976 Entries

Goto page Previous 1, 2, 3 ... 47, 48, 49 Next

31 Oct 2003, Konstantin Olchanski, , mana.c without ROOT and HBOOK

Stephan, why did you prohibit building mana.c without ROOT and HBOOK
support? I think such a configuration is valid and should be allowed.

Also, this prohibition broke the Midas Makefile, it now bombs building
mana.c. The Makefile is setup for building hmana.c with HBOOK support,
rmana.c with ROOT support (if ROOTSYS is set) and mana.c without HBOOK and
ROOT support (currently bombs on #error in mana.c).

K.O.

01 Nov 2003, Stefan Ritt, , mana.c without ROOT and HBOOK

> Stephan, why did you prohibit building mana.c without ROOT and HBOOK
> support? I think such a configuration is valid and should be allowed.

Oops, sorry, my fault. I forgto that people use mana.c without ROOT and 
HBOOK. The reason I made the change was that people forgot the -DHVAE_HBOOK 
in their makefile. In that case, no HBOOK init is done in mana.c and the 
first histogram booking in the user code crashes HBOOK.

So please take the #error statement out of mana.c (I'm away in two hours for 
one week), but think about preventing the above mentionend problem. I don't 
know any way for the makefile or mana.c to figure out if there is any HF1 
call in the user code. Actually HF1 should return a "proper" error message 
than just crashing.

One possibility is that we put an additional layer on top of the histogram 
boooking/filling. These macros are converted to their HBOOK or ROOT 
equivalents depending on the HAVE_HBOOK/HAVE_ROOT. If none of both is 
present, the histogram booking macro can produce a runtime error. This has 
the additional advantage that users can switch from HBOOK to ROOT without 
change of their user code.

01 Nov 2003, Konstantin Olchanski, , mana.c without ROOT and HBOOK

> > Stephan, why did you prohibit building mana.c without ROOT and HBOOK
> > support? I think such a configuration is valid and should be allowed.
> 
> Oops, sorry, my fault. I forgto that people use mana.c without ROOT and 
> HBOOK. The reason I made the change was that people forgot the -DHVAE_HBOOK 
> in their makefile. In that case, no HBOOK init is done in mana.c and the 
> first histogram booking in the user code crashes HBOOK.

Ahem. There is only so much rope we can give out to prevent people from shooting
themselves in the foot...

> So please take the #error statement out of mana.c

Done.

> One possibility is that we put an additional layer on top of the histogram 
> boooking/filling. These macros are converted to their HBOOK or ROOT 
> equivalents depending on the HAVE_HBOOK/HAVE_ROOT. If none of both is 
> present, the histogram booking macro can produce a runtime error. This has 
> the additional advantage that users can switch from HBOOK to ROOT without 
> change of their user code.

I can't think of anything other than wrapping every HBOOK call with "if
(!hbook_is_initialized) initialize_hbook();". But then, where is PAWC
coming from anyway?!?

We could also print a warning message "This mana.c has no HBOOK support. If you
see HBOOK crashes, please relink with hmana,c". Ugly, but informative, plus it
points anybody who knows how to read towards a solution.

K.O.

31 Oct 2003, Konstantin Olchanski, , Do not frob "/runinfo" in mhttpd.c

I found where we tickle the race condition in db_create_record().

1) in mhttpd.c,  every time we show the status page, we call
db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str));
2) internally db_create_record() deletes /RunInfo
3) other programs read "/runinfo/run number" while it is deleted do not
check for the db_get_value() error code and happily get a zero run number.

Stephan fixed the race condition, and now I commited an mhttpd.c change that
only calls db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); if
/runinfo does not exist. This seems to be redundant with a similar call in
cm_connect_experiment1(), called each time a new client starts up.

Files changed:
src/mhttpd.c

K.O.

01 Nov 2003, Stefan Ritt, , Do not frob

> I found where we tickle the race condition in db_create_record().
> 
> 1) in mhttpd.c,  every time we show the status page, we call
> db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str));
> 2) internally db_create_record() deletes /RunInfo
> 3) other programs read "/runinfo/run number" while it is deleted do not
> check for the db_get_value() error code and happily get a zero run number.
> 
> Stephan fixed the race condition, and now I commited an mhttpd.c change that
> only calls db_create_record(hDB, 0, "/Runinfo", strcomb(runinfo_str)); if
> /runinfo does not exist. This seems to be redundant with a similar call in
> cm_connect_experiment1(), called each time a new client starts up.

The reason for the db_create_record() is the following: Assume that we change 
the /runinfo structure, by adding an additional variable in the future. If we 
run a "new" mhttpd on an "old" experiment, the "runinfo" C structure does not 
match the ODB contents. The db_create_record() ensures that the ODB structure 
exactly matches the C structure. I agree with you that this can cause 
potential problems. But most of them should be fixed by the additional lock() 
I added recently. So other programs cannot read the run number while it is 
deleted.

One could think of checking the record size, and re-creating the runinfo if 
the ODB record size does not match the C record size. But this does not 
prevent the potential error that some variable are reversed in order. They 
are then mapped wrongly to the C runinfo structure.

I see that you work very hard now on all possible checks for the run number. 
But I would not commit that and make it part of the distribution, since all 
experiments at PSI for example do not have this run number problem. Run it 
locally, determine the cause of your problem (the discovery of the race 
condition was already very good, I'm glad that your found it, should make the 
system much more stable), and we'll fix it. Puttin ASSERT's is a good idea, I 
should have done it from the very beginning. But if you start now, please put 
it in all other 100000 places (;-)

I would not add a db_get_value_cannot_possibly_fail() into the standard 
distribution, because it probably cannot correct the initial problem and then 
just will go into an infinite loop. We should tackle problems always at their 
source. 

If you cannot resolve your zero run number problem, do the following: There 
is a cm_msg(MDEBUG, ...) which only puts a message into the shared memory, 
but not in midas.log. This can be used for real time debugging. Add those 
message temporarily in db_get_value() etc. to see what is going on. As soon 
as the run number goes to zero, stop all processes immediately (for example 
by locking the database with db_lock_database), and the look backwards in the 
sysmsg buffer to see what happened *before* the run number went to zero.

- Stefan

01 Nov 2003, Konstantin Olchanski, , Do not frob

> > I found where we tickle the race condition in db_create_record().
> The reason for the db_create_record() is the following: Assume that we change 
> the /runinfo structure...

I think there is a deep fundamental problem with changing data structures "on the
fly". Calling db_create_record("/runinfo") at every show_status_page() does not
fix it.

If I change the runinfo structure, rebuild, relink and restart "mhttpd", the
db_create_record("/runinfo") from cm_connect_experiment() will update the runinfo
structure in ODB. In this case, the call from show_status_page() is redundant. As
a side effect, when we do this, we break every running ODB client- they still
have the old runinfo layout. Not good...

If I change the runinfo structure, rebuild, relink and restart all applications,
*except* for mhttpd, "/runinfo" in ODB will be updated when the first updated
client connects to ODB via the db_create_record("/runinfo") from
cm_connect_experiment(). Then, the old mhttpd will restore the old layout via the 
db_create_record("/runinfo") in show_status_page(), breaking everything. Not good...

If I change the runinfo structure, rebuild, relink and restart everything,
"/runinfo" in ODB will be updated when the first client connects to ODB via the
db_create_record("/runinfo") from cm_connect_experiment(). In this case, the call
from show_status_page() is redundant. This is the only corruption-free scenario.

This lack of integrity enforcement vs version skew in binary data structures is,
I think, an ODB design error. Perhaps, ODB applications should be prohibited from
 direct access to ODB "C" data structures: we cannot ensure that the data layout
in the application and in ODB are the same.

> One could think of checking the record size, and re-creating the runinfo if 
> the ODB record size does not match the C record size. But this does not 
> prevent the potential error that some variable are reversed in order. They 
> are then mapped wrongly to the C runinfo structure.

Exacto.

> I see that you work very hard now on all possible checks for the run number. 
> But I would not commit that and make it part of the distribution...

This is a philosophical issue.

My checks are in line with the "design by contract" school of programming. In a
nutshell, this ideology requires that before I do anything, I should enforce the
validity of my inputs and after I am done, I should enforce the validity of my
outputs. In practice, this translates into liberal use of assert()'s *in
production code*.

To ensure that old bugs stay fixed, and that new bugs are promptly discovered, it
is essential that the "contract checks" stay in the production code forever.

But let better writers argue programming philosophy in the literature.

Personally, when hunting down bugs in unstable code, I find this technique to be
vastly superior to the more common appoach of "This program has no bugs. Error
checking and assert()s are wasteful. Let's close our eyes and hope no bad things
happen to us (again)".

> But if you start now, please put [asserts] in all other 100000 places (;-)

I know that no good deed goes unpunished, but pewleeze!!!

> If you cannot resolve your zero run number problem, do the following: ...
> [lock ODB, freeze the experiment, look at log files]

This technique is obsolete. Today, we instrument the code with sanity checks
and validity tests. Then all the bugs find themselves with minimal manual
intervention.

K.O.

31 Oct 2003, Konstantin Olchanski, , more odb "run number" error checking

I added error checking to the places where we read "/runinfo/run number". In
general, I do this:

  status = db_get_value("/runinfo/run number",&run_number);
  assert(status==SUCCESS);
  assert(run_number >= 0); (and run_number>0, where appropriate)

Here is the rationale: if we cannot read the run number, something must be
very terribly wrong. I cannot think of any recovery action other than
abort() and make a core dump for our debugging enjoyment.

I considered and rejected adding a "retry" loop: if we allow db_get_value()
to intermittently fail, then it's every use has to be wrapped in a retry
loop, which then should be inside db_get_value(), making it pointless to
have external "retry" loops.

I am now pondering on proposing a "db_get_value_cannot_possibly_fail()"
function (it would abort(), exit() with an error or commit harakiri if it
can't get the value). They way most db_xxx() functions are used in midas,
maybe they should be made "void" and "unfailible", with "STATUS
db_xxx_yes_I_can_fail_and_return_an_error_code()" evil twins. I guess this
is why "they" invented C/C++ exceptions. Anyway, something to think about.

Affected files:
src/lazylogger.c
src/odbedit.c
src/mlogger.c
src/mfe.c
src/odb.c
src/mana.c
src/midas.c
src/mhttpd.c

K.O.

01 Nov 2003, Stefan Ritt, , more odb

> I added error checking to the places where we read "/runinfo/run number". In
> general, I do this:

> Affected files:
> src/lazylogger.c
> src/odbedit.c
> src/mlogger.c
> src/mfe.c
> src/odb.c
> src/mana.c
> src/midas.c
> src/mhttpd.c

Now YOU broke the system by editing all these files with something I consider 
temporary debugging code. A run number of zero is *VALILD*. If I want to make 
sure a new experiment starts with run number #1, I put a run number of 0 into 
the ODB. So on the first start the number is incremented by one which results 
in run number from one. So please remove those checks which prevents me of 
doing that. Again, your "run number zero" problem is soemhow specific to your 
environment, and I would not put all these tests into the distribution, 
because this can have side effects, like that one I described above.

- Stefan

01 Nov 2003, Konstantin Olchanski, , more odb

> > I added error checking to the places where we read "/runinfo/run number". 
> Now YOU broke the system by editing all these files with something I consider 
> temporary debugging code. A run number of zero is *VALILD*.

I think I broke nothing. I do know that run number 0 is a valid odb value. Here
is an audit of all places where I abort on invalid run numbers:

mana.c: line 3676: assert(current_run_number > 0);
we take the run number from an event and write it into ODB. Events cannot have
run number negative or zero.

mana.c:analyze_run(): line 4632: assert(run_number > 0);
we are asked to analyze run "run_number". zero or negative is not valid.

midas.c:assert(run_number > old_run_number);
midas.c:assert(run_number > 1);
this code is not in CVS.

odbedit.c: line 2563: assert(old_run_number >= 0);
run number zero is valid

odbedit.c: line 2641: assert(new_run_number > 0);
starting a new run number zero is not valid

mfe.c: line 1786: if (run_number<=0) cm_msg(MERROR, "main", "aborting on attempt
to use invalid run number %d", run_number);
auto restart from run 0 to 1 is not valid

midas.c: line 3917: if (run_number<=0) cm_msg(MERROR, "cm_transition", "aborting
on attempt to use invalid run number %d",run_number);
transition to run zero or negative is not valid

midas.c: line 16101: if (run_number<0) cm_msg(MERROR, "el_submit", "aborting on
attempt to use invalid run number %d", run_number);
negative run numbers are not valid

mlogger.c: line 3301: if (run_number<=0) cm_msg(MERROR, "main", "aborting on
attempt to use invalid run number %d", run_number);
auto restart from run 0 to run 1 is not valid

K.O.

14 Nov 2003, Stefan Ritt, , more odb

Ok, I apologize. It's all ok. Thanks for clearifying. Concerning the assert's, it 
would be nice to be able to disable them in release code. Under Windows, the 
assert() is actually a macro which expands to zero if NDEBUG is defined. I 
believe it's the same under linux, but I don't know about VxWorks. So we have 
three options:

1) Keep asserts always. This might possible slow down a DAQ system, but I'm not 
sure how much. Might be negligible.

2) Disable asserts by default (standard make). Only the "experts" can enable it 
in the make file (by removing NDEBUG), since only they know what to do with the 
assertation messages.

3) Let the user decide on the standard installation. Maybe have two libraries, 
one debug, one no-debug. The no-debug can even have the compiler optimization 
disabled, which makes debugging easier.

So what is your opinion (comments from others are welcome as well) of which way 
to go?

17 Nov 2003, Pierre-Andr� Amaudruz, , Lazylogger application

- Remove temporary "/Programs/Lazy" creation.
- Fix Rate calculation for Web display.
- Change FTP channel description (see help).

15 Nov 2003, Konstantin Olchanski, , Phantom "open records"

Sometimes (maybe after a client uncleanly exits?), I see phantom "open
records", for example:
[local:twist:Running]Gas>sor
/Equipment/Gas/Common open 2 times by fe1hp 
/Equipment/Gas/Variables open 1 times by Logger 
/Equipment/Gas/Variables/Flow1 open 2 times by uBeamTcl1 uBeamTcl 
/Equipment/Gas/Settings/Command open 2 times by fe1hp 
/Equipment/Gas/Statistics open 1 times by 

Note the blank client name in the "/Equipment/Gas/Statistics" line.

This causes these warnings from mfe.c:
Cannot init equipment record, probably other FE is using it
Cannot delete statistics record, error 320
Cannot create statistics record, error 320
Cannot open statistics record, error 318. Probably other FE is using it

Then the number of generated events for this front end is never incremented.

Also attempts to delete this "open" record fail:
[local:twist:Running]Gas>del /Equipment/Gas/Statistics
Are you sure to delete the key
"/Equipment/Gas/Statistics"
and all its subkeys? (y/[n]) y
key is open by other client

How do I go about writing the db_validate_xxx() code to cleanup this
bogosity? I am not too familiar with the implementation of "open record"...

K.O.

16 Nov 2003, Stefan Ritt, , Phantom

I have seen the same behaviour and it annoys me, too. What I did in the past 
is a "cleanup" in ODBEdit which removes these open records. I have soem code 
in cm_watchdog(), which should take care of that. If a client is dead, it 
gets removed from the ODB, and its open records should get its notify_count 
decremented. So obviously this code has some bug. I plan to do in the 
following week (now I got some spare time) the following:

- exchange most db_create_record() by something better. Maybe 
db_check_record(..., correct_flag), which creates the record only if it does 
not exist at all, otherwise checks the structure. If correct_flag is TRUE, it 
corrects the strucure (by calling db_create_record()), if it's false it just 
returns an error code. This way one can decide from case to case which option 
is better. Like for the /Runinfo, the flag would be FALSE, maybe with a 
notification that the /Runinfo is different from the compiled-in structure, 
and one hast to recompile the application.

- revisit the open record issue from dying frontends. I remember vaguely that 
I tried to kill a frontend (kill -9), wait until the watchdog cleans up its 
entries, and it worked fine. So it's more the problem to reproduce the issue 
described in the previous elog entry.

20 Nov 2003, Stefan Ritt, , Phantom

I tried to reproduce the problem, but without success. So in case this happens 
again, one should debug the code im cm_watchdog() next to the line

/* decrement notify_count for open records and clear exclusive mode */
...

So if a killed client is removed from the ODB via the watchdog (or a "cleanup" 
is done in ODBEdit), the notify_count should be decreased and thus the "open 
records" should be closed.

20 Nov 2003, Konstantin Olchanski, , set-uid-root midas programs

I see that MIDAS installs several set-uid-root programs into /usr/local/bin.
In this age and time of evil computer hackers, this is not a good idea and
we should Do Something (TM) about it. Here is my risk assessment:

[olchansk@midtis06 midas]$ ls -l /usr/local/bin | grep wsr
-rwsr-sr-x    1 root     root        25811 Nov 20 09:27 dio
-rwsr-sr-x    1 root     root       344553 Nov 20 09:27 mhttpd
-rwsr-sr-x    1 root     root        70736 Nov 20 09:27 webpaw

dio- is required to be setuid-root to gain I/O permissions. I looked at it a
few times, and it is probably safe, but I would like to get a second
opinion. Stephan, can you should it to your local security geeks?

mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
stick at. Why is it suid-root anyway?

webpaw- what is it?!?

K.O.

20 Nov 2003, Stefan Ritt, , set-uid-root midas programs

> dio- is required to be setuid-root to gain I/O permissions. I looked at it a
> few times, and it is probably safe, but I would like to get a second
> opinion. Stephan, can you should it to your local security geeks?
> 
> mhttpd- definitely unsafe. It has more buffer overflows than I can shake a
> stick at. Why is it suid-root anyway?
> 
> webpaw- what is it?!?

dio was written by Pierre. 

mhttpd and webpaw both are web servers. webpaw is used to display PAW 
pictures over the web. If you run these programs at a port <1024, and most 
people do run them at port 80 (at least at PSI), they need to be setuid-root. 
Unless you know a better way to do that...

17 Nov 2003, Stefan Ritt, , Revised MVMESTD

Let me propose a revised scheme for midas standard VME calls (mvmestd.h). 

Pierre mentioned some limitations before, and I find now also some fields 
to improve. Right now, the vme_open() call retrieves a handle. For some 
interfaces (like SBS/Bit3), one has to obtain separate handles for 
different addressing modes A24D32/A32D32 and so on, which I find a bit 
troublesome. I would rather keep the handle internally, invisible to the 
user, and use ioctl() statments to change the address/data mode. 

So the API could look like:

vme_open()       Deprecated, will be removed
vme_init(void)   Standard initialization, open device(s), stores handles
                 internally in a table
vme_exit(void)   Deallocates any memory, close handles

vme_read(void *dst, DWORD vme_addr, DWORD size)
vme_write(void *src, DWORD vme_addr, DWORD size)

vme_ioctl(int request, int *param)

                 Request is one of 
                   VME_IOCTL_CRATE_SET/GET
                     Sets VME crate (in case several interfaces are
                     plugged into singlePC, meaningless for embedded CPUs)
                   VME_IOCTL_DEST_SET/GET
                     VME_BUS/VME_RAM/VME_LM for VME bus, RAM in VME 
                     interface, or LM for local memory (used in Bit3 
                     interface)
                   VME_IOCTL_AMOD_SET/GET
                     Sets/Retrieves VME AMOD (= VME_AMOD_xxx as currently
                     defined in mvmestd.h)
                   VME_IOCTL_DSIZE_SET/GET
                     Sets/Retrieves VME data size (D8/D16/D32/D64)
                   VME_IOCTL_DMA_SET/GET
                     Enable/Disable DMA, should be independent of AMOD
                   VME_IOCTL_INTR_ATTACH/DETACH/ENABLE/DISABLE
                     Set VME interrupts
                   VME_IOCTL_AUTO_INCR_SET/GET
                     Set autoincremet of source pointer, can be disabled
                     for FIFO readout

vme_mmap(void **ptr, DWORD vme_addr, DWORD size)
vme_unmap(void *ptr, DWORD size)
                  Map/Unmap VME to local memory

vme_read2(void *dst, DWORD vme_addr, DWORD size, DWORD flags)
vme_write2(void *src, DWORD vme_addr, DWORD size, DWORD flags)
                 With these functions one can directly specify the flags
                 usually managed by vme_ioctl(). Usefule for applications
                 where the address modifier for example has to be
                 different in each read/write operation.  

Note that the vme_read/write functions do not have a VME handle any more, 
nor an address modifier. This is all accomplished with vme_ioctl() calls.

Please have a look at this proposal, compare it with what you do currently 
in VME, and let me know if we should add/modify something. I volunteer to 
implement the API for the SBS/Bit3 617 and the Struck SIS1100/3100 
interfaces, for VxWorks somebody at TRIUMF should take care.

20 Nov 2003, Pierre-Andr� Amaudruz, Konstantin Olchanski, , Revised MVMESTD

Before we try to merge the different access scheme for the different VME hardware,
we present the "optimal" configuration for the VMIC setup. This is a first shot so take it
with caution.
From these definitions, we should be able to workout a compromise and come up with
a satisfactory standard.

A) The VMIC vme_slave_xxx() options are not considered.
B) The interrupt handling can certainly match the 4 entries required in the user frontend
    code i.e. Attach, Detach, Enable, Disable.

I don't understand your argument that the handle should be hidden. In case of multiple
interfaces, how do you refer to a particular one if not specified? 
The following scheme does require a handle for refering to the proper (device AND window).

1 ) deviceHandle = vme_init(int devNumber);
    Even though the VMIC doesn't deal with multiple devices,
    the SIS/PCI does and needs to init on a specific PCI card.
    Internally:
      opening of the device (/dev/sisxxxx_1) (ignored in case of VMIC).
      Possible including a mapping to a default VME region of default size with default AM
      (VMIC :16MB, A24). This way in a single call you get a valid handle for full VME access
      in A24 mode. Needs to be elaborate this option. But in principle you need to declare the 
     VME region that you want to work on (vme_map).

2) mapHandle = vme_map(int deviceHandle, int vmeAddress, int am, int size);
    Return a mapHandle specific to a device and window. The am has to be specified.
    What ever are the operation to get there, the mapHandle is a reference to thas setting.
    It could just fill a map structure.
    Internally:
      WindowHandle[deviceHandle] = vme_master_create(BusHandle[deviceHandle], ...
      WindowPtr[WindowHandle] = vme_master_window(BusHandle[deviceHandle]
                                                                           , WindowHandle[deviceHandle]...

3) vme_setmode(mapHandle, const int DATA_SIZE, const int AM
                           , const BOOL ENA_DMA, const BOOL ENA_FIFO);
    Mainly used for the vme_block_read/write. Define for following read the data size and 
    am in case of DMA (could use orther DMA mode than window definition for optimal
    transfer).

    Predefine the mode of access:
    DATA_SIZE : D8, D16, D32
    AM             : A16, A24, A32, etc...
    enaDMA     : optional if available.
    enaFIFO     : optional for block read for autoincrement source pointer.

Remark:
PAA- I can imagine this function to be a vme_ioctl (int mapHandle, int *param)
        such that extension of functionality is possible. But by passing cons int
        arguments, the optimizer is able to substitute and reduce the internal code.

4)   
   uint_8Value   = vme_readD8  (int mapHandle, uint_64 vmeSrceOffset)
   uint_16Value = vme_readD16 (int mapHandle, uint_64 vmeSrceOffset)
   uint_32Value = vme_readD32 (int mapHandle, uint_64 vmeSrceOffset)
   Single VME read access. In the VMIC case, this access is always through mapping.
   Value = *(WindowPtr[WindowHandle] + vmeSrceOffset) 
   or 
   Value = *(WindowStruct->WindowPtr[WindowHandle] + vmeSrceOffset) 
 
5)   
   status  = vme_writeD8   (int mapHandle, uint_64 vmeSrceOffset, uint_8 Value)
   status  = vme_writeD16 (int mapHandle, uint_64 vmeSrceOffset, uint_16 Value)
   status  = vme_writeD32 (int mapHandle, uint_64 vmeSrceOffset, uint_32 Value)
   Single VME write access.

6)
   nBytes = vme_block_read(mapHandle, char * pDest, uint_64 vmeSrceOffset, int size);
   Multiple read access. Can be done through standard do loop or DMA if available.
   nBytes < 0 :  error
   Incremented pDest  = (pDest + nBytes); Don't need to pass **pDest for autoincrement.

7)
   nBytes = vme_block_write(mapHandle, uint_64 vmeSrceOffset, char *pSrce, int size);
   Multiple write access.
   nBytes < 0 :  error
   Incremented pSrce  = (pSrce + nBytes); Don't need to pass **pSrce for autoincrement.

8) status = vme_unmap(int mapHandle)
   Cleanup internal pointers or structure of given mapHandle only.

9) status = vme_exit()
   Cleanup deviceHandle and release device.

21 Nov 2003, Stefan Ritt, , Revised MVMESTD

Thanks for your contribution. Let me try to map your functionality to mvmestd calls:

> A) The VMIC vme_slave_xxx() options are not considered.

We could maybe do that through mvme_mmap(SLAVE, ...) instead of mvme_mmap(MASTER, ...)

> B) The interrupt handling can certainly match the 4 entries required in the user frontend
>     code i.e. Attach, Detach, Enable, Disable.

vmve_ioctl(VME_IOCTL_INTR_ATTACHE/DETACH/ENABLE/DISABLE, func())

> I don't understand your argument that the handle should be hidden. In case of multiple
> interfaces, how do you refer to a particular one if not specified? 
> The following scheme does require a handle for refering to the proper (device AND window).

Four reasons for that:

1) For the SBS/Bit3, you need a handle for each address mode. So if I have two crates (and I do in our 
current experiment), and have to access modules in A16, A24 and A32 mode, I need in total 6 handles. 
Sometimes I mix them up by mistake, and wonder why I get bus errors. 

2) Most installations will only have single crates (as your VMIC). So if there is only one crate, why 
bother with a handle? If you have hunderds of accesses in your code, you save some redundant typing work.

3) A handle is usually kept global, which is considered not good coding style.

4) Our MCSTD and MFBSTD functions also do not use a handle, so people used to those libraries will find it 
more natural not to use one.

> 1 ) deviceHandle = vme_init(int devNumber);
>     Even though the VMIC doesn't deal with multiple devices,
>     the SIS/PCI does and needs to init on a specific PCI card.
>     Internally:
>       opening of the device (/dev/sisxxxx_1) (ignored in case of VMIC).
>       Possible including a mapping to a default VME region of default size with default AM
>       (VMIC :16MB, A24). This way in a single call you get a valid handle for full VME access
>       in A24 mode. Needs to be elaborate this option. But in principle you need to declare the 
>      VME region that you want to work on (vme_map).

Just vme_init(); (like fb_init()).

This function takes the first device, opens it, and stores the handle internally. Sets the AM to a default 
value, and creates a mapping table which is initially empty or mapped to a default VME region. If one wants 
to access a secondary crate, one does a vme_ioctl(VME_IOCTL_CRATE_SET, 2), which opens the secondary crate, 
and stores the new handle in the internal table if applicable.

> 2) mapHandle = vme_map(int deviceHandle, int vmeAddress, int am, int size);
>     Return a mapHandle specific to a device and window. The am has to be specified.
>     What ever are the operation to get there, the mapHandle is a reference to thas setting.
>     It could just fill a map structure.
>     Internally:
>       WindowHandle[deviceHandle] = vme_master_create(BusHandle[deviceHandle], ...
>       WindowPtr[WindowHandle] = vme_master_window(BusHandle[deviceHandle]
>                               , WindowHandle[deviceHandle]...

The best would be if a mvme_read(...) to an unmapped region would automatically (internally) trigger a 
vme_map() call, and store the WindowHandle and WindowPtr internally. The advantage of this is that code 
written for the SIS for example (which does not require this kind of mapping) would work without change 
under the VMIC. The disadvantage is that for each mvme_read(), the code has to scan the internal mapping 
table to find the proper window handle. Now I don't know how much overhead this would be, but I guess a 
single for() loop over a couple of entries in the mapping table is still faster than a microsecond or so, 
thus making it negligible in a block transfer. 

> 3) vme_setmode(mapHandle, const int DATA_SIZE, const int AM
>                            , const BOOL ENA_DMA, const BOOL ENA_FIFO);
>     Mainly used for the vme_block_read/write. Define for following read the data size and 
>     am in case of DMA (could use orther DMA mode than window definition for optimal
>     transfer).
> 
>     Predefine the mode of access:
>     DATA_SIZE : D8, D16, D32
>     AM             : A16, A24, A32, etc...
>     enaDMA     : optional if available.
>     enaFIFO     : optional for block read for autoincrement source pointer.
> 
> Remark:
> PAA- I can imagine this function to be a vme_ioctl (int mapHandle, int *param)
>         such that extension of functionality is possible. But by passing cons int
>         arguments, the optimizer is able to substitute and reduce the internal code.

Right. mvme_ioctl(VME_IOCTL_AMOD_SET/DSIZE_SET/DMA_SET/AUTO_INCR_SET, ...)

>    uint_8Value   = vme_readD8  (int mapHandle, uint_64 vmeSrceOffset)
>    uint_16Value = vme_readD16 (int mapHandle, uint_64 vmeSrceOffset)
>    uint_32Value = vme_readD32 (int mapHandle, uint_64 vmeSrceOffset)
>    Single VME read access. In the VMIC case, this access is always through mapping.
>    Value = *(WindowPtr[WindowHandle] + vmeSrceOffset) 
>    or 
>    Value = *(WindowStruct->WindowPtr[WindowHandle] + vmeSrceOffset) 

mvme_read(*dst, DWORD vme_addr, DWORD size); would cover this in a single call. Note that the SIS for 
example does not have memory mapping, so if one consistently uses mvme_read(), it will work on both 
architectures. Again, this takes some overhead. Consider for example a possible VMIC implementation

mvme_read(char *dst, DWORD vme_addr, DWORD size)
{
  for (i=0 ; table[i].valid ; i++)
    {
    if (table[i].start >= vme_addr && table[i].end < vme_addr+size)
      break;
    }

  if (!table[i].valid)
    {
    vme_master_crate(...)
    table[i].window_handle = vme_master_window(...)
    }

  if (size == 2)
    mvme_ioctl(VME_IOCTL_DSIZE_SET, D16);
  else if (size == 1)
    mvme_ioctl(VME_IOCTL_DSIZE_SET, D8);

  memcpy(dst, table[i].window_handle + vme_addr - table[i].start, size);
}

Note this is only some rough code, would need more checking etc. But you see that for each access the for() 
loop has to be evaluated. Now I know that for the SBS/Bit3 and for the SIS a single VME access takes 
~0.5us. So the for() loop could be much faster than that. But one has to try. If one experiment needs the 
ultimate speed, it can use the native VMIC API, but then looses the portability. I'm not sure if one needs 
the automatic DSIZE_SET, maybe it works without.

>    status  = vme_writeD8   (int mapHandle, uint_64 vmeSrceOffset, uint_8 Value)
>    status  = vme_writeD16 (int mapHandle, uint_64 vmeSrceOffset, uint_16 Value)
>    status  = vme_writeD32 (int mapHandle, uint_64 vmeSrceOffset, uint_32 Value)
>    Single VME write access.

Dito. mvme_write(void *dst, DWORD vme_addr, DWORD size);

>    nBytes = vme_block_read(mapHandle, char * pDest, uint_64 vmeSrceOffset, int size);
>    Multiple read access. Can be done through standard do loop or DMA if available.
>    nBytes < 0 :  error
>    Incremented pDest  = (pDest + nBytes); Don't need to pass **pDest for autoincrement.

vmve_ioctl(VME_IOCTL_DMA_SET, TRUE);
n = mvme_read(char *pDest, DWORD vmd_addr, DWORD size);

>    nBytes = vme_block_write(mapHandle, uint_64 vmeSrceOffset, char *pSrce, int size);
>    Multiple write access.
>    nBytes < 0 :  error
>    Incremented pSrce  = (pSrce + nBytes); Don't need to pass **pSrce for autoincrement.

Dito.

> 8) status = vme_unmap(int mapHandle)
>    Cleanup internal pointers or structure of given mapHandle only.

mvme_unmap(DWORD vme_addr, DWORD size)

Scan through internal table to find handle, then calls vme_unmap(mapHandle);

> 9) status = vme_exit()
>    Cleanup deviceHandle and release device.

mvme_exit();

Let me know if this all makes sense to you...

- Stefan

20 Nov 2003, Konstantin Olchanski, , midas timeout wraparound

While reviving midas on midtig01 after it was not used for a while, we see
this. Notice negative "last called" numbers. Looks like a time_t wraparound
somewhere...

[local:tigress:S]/>scl -w
Name                Host                Timeout    Last called
mhttpd              midtig01.triumf.ca  10000      -2037131082
Logger              midtig01.triumf.ca  10000      -2037131166
Analyzer            midtig01.triumf.ca  10000      -2037131048
JACQ                midtig01.triumf.ca  10000      -2037131667
mhttpd1             midtig01.triumf.ca  10000      325
ODBEdit             midtig01.triumf.ca  10000      829

K.O.

20 Nov 2003, Konstantin Olchanski, , cannot shutdown defunct clients

> While reviving midas on midtig01 after it was not used for a while ... 
> [local:tigress:S]/>scl -w
> Name                Host                Timeout    Last called
> mhttpd              midtig01.triumf.ca  10000      -2037131082

These clients cannot be deleted. I tried:
1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
2) "sh mhttpd" from odbedit -> 
   [midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
   midtig01.triumf.ca, port 32853
   Client mhttpd not active
3) in odbedit: "cd /system/clients; rm xxxx"
   refuses to delete the key

Lacking any better ideas, I deleted them via brain surgery on the odb file:
1) stop everything
2) ipcrm the SYSV shared memory segment
3) odbedit -> save xxx.odb
4) xemacs xxx.odb, delete offending odb entries
5) rm .ODB.SHM
6) odbedit -> load xxx.odb
7) voila, bad clients gone, gone, gone.

K.O.

20 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients

> 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> 2) "sh mhttpd" from odbedit -> 
>    [midas.c:5298:cm_shutdown] cannot connect to client mhttpd on host
>    midtig01.triumf.ca, port 32853
>    Client mhttpd not active
> 3) in odbedit: "cd /system/clients; rm xxxx"
>    refuses to delete the key

Have you tried a "cleanup" in ODBEdit?

The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it 
wraps around after about one month. So if a all clients are stopped 
simultaneously the hard way (such that nobody's watchdog can clean any other 
client from the ODB), like with a power off, and you start the thing one 
month later, there might be a problem. I never tried that before. So next 
time to a cleanup. If that does not help, we should change last_activity 
from INT to DWORD. This way it's alway positive and the wraparound does not 
hurt.

20 Nov 2003, Konstantin Olchanski, , cannot shutdown defunct clients

> > 1) shutdown from mhttpd "programs" page -> "cannot shutdown client"
> Have you tried a "cleanup" in ODBEdit?

Nope. Will try next time...

> The "last_activity" is a 32-bit int, filled with milliseconds. So indeed it 
> wraps around after about one month.... change last_activity 
> from INT to DWORD. This way it's alway positive and the wraparound does not 
> hurt.

INT == "int", wraparound in 1 month
DWORD == "unsigned int", wraparound in 2 months

should we make it the 64-bit "long long" (or C98's "int64_t")?

K.O.

20 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients

> INT == "int", wraparound in 1 month
> DWORD == "unsigned int", wraparound in 2 months
> 
> should we make it the 64-bit "long long" (or C98's "int64_t")?

Won't work on all supported compilers. The point is that DWORD wraps around in 
2 months, but the difference of two DWORDs is alywas positive, never negative 
like you had it. We only have to distinguish if the difference of the current 
time (im ms) minus the last_activity of a client is larget than the timeout, 
typically 10 seconds or so. If you have a wraparound on 32-bit DWORD, the 
difference is still ok. Like

current "time" : 0x0000 0100
last_activity:   0xFFFF FF00

then current_time - last_activity = 0x00000100 - 0xFFFFFF00 = 0x00000200 if 
calculated with 32-bit values.

20 Nov 2003, Renee Poutissou, , cannot shutdown defunct clients

Indeed the ODB command "cleanup" really works. I have used it several
times with the TWIST DAQ and regularly with the BNMR/MUSR setups where
we have these stubborn clients (ie feepics) that do not want to shutdown
cleanly.  
But there is one problem with "cleanup". It has a hardwired timeout of
2 seconds.  This is a problem for tasks like lazylogger which set a timeout
of 60 seconds when moving the tape. So BEWARE, if you issue the "cleanup"
command, it might kill some clients who have setup their timeout to longer
than 2 seconds. 

I have asked Stefan to change this before. He said that, to be effective,
the timeout value used for "cleanup" has to be rather short. 
One possibility, would be to allow for a user entered "cleanup" timeout.
The default could stay at 2 seconds. 




> > Have you tried a "cleanup" in ODBEdit?
> 
> Nope. Will try next time...
>

24 Nov 2003, Stefan Ritt, , cannot shutdown defunct clients

> But there is one problem with "cleanup". It has a hardwired timeout of
> 2 seconds.  This is a problem for tasks like lazylogger which set a timeout
> of 60 seconds when moving the tape. So BEWARE, if you issue the "cleanup"
> command, it might kill some clients who have setup their timeout to longer
> than 2 seconds. 
> 
> I have asked Stefan to change this before. He said that, to be effective,
> the timeout value used for "cleanup" has to be rather short. 
> One possibility, would be to allow for a user entered "cleanup" timeout.
> The default could stay at 2 seconds. 

I changed the behaviour of cleanup by adding an extra parameter 
ignore_timeout to cm_cleanup(). Now, in ODBEdit, a "clanup" obeys the 
timeout set by the clients. The problem with that is if the logger crashes 
for example, and it's timeout is set o 5 min., it cannot be clean-up'ed any 
more for the next five minutes, and therefor not be restarted wasting 
precious beam time. That's why I hard-wired originally the "cleanup" timout 
to 2 sec. Now I added a flag "-f" to the ODBEdit cleanup command which works 
in the old fashion with a 2 sec. timeout. So a "cleanup" alone won't kill a 
looger which currently rewinds a tape or so, but a "cleanup -f" does.

I also changed internal timeouts from INT to DWORD, which should fix the 
problem Konstantin reported recently (re-starting an experiment after 
several weeks). New changes are commited, but I only did basic tests. So 
please try the new code and tell me if there is any problem.

- Stefan

30 Nov 2003, Konstantin Olchanski, , bad call to cm_cleanup() in fal.c

fal.c does not compile: it calls cm_cleanup() with one argument when there
should be two arguments. K.O.

30 Nov 2003, Stefan Ritt, , bad call to cm_cleanup() in fal.c

> fal.c does not compile: it calls cm_cleanup() with one argument when there
> should be two arguments. K.O.

Fixed and committed.

25 Nov 2003, Suzannah Daviel, , delete key followed by create record leads to empty structure in experim.h

Hi,

I have noticed a problem with deleting a key to an array in odb, then
recreating the record as in the code below. The record is recreated
successfully, but when viewing it with mhttpd, a spurious blank line
(coloured orange) is visible, followed by the rest of the data as normal.

This blank line causes trouble with experim.h because it
produces an empty structure e.g. :

#define CYCLE_SCALERS_SETTINGS_DEFINED

typedef struct {
  struct {
  } ;
  char      names[60][32];
} CYCLE_SCALERS_SETTINGS;

rather than :

#define CYCLE_SCALERS_SETTINGS_DEFINED

typedef struct {
  char      names[60][32];
} CYCLE_SCALERS_SETTINGS;


This empty structure causes a compilation error when rebuilding clients that
use experim.h

SD



 CYCLE_SCALERS_TYPE1_SETTINGS_STR(type1_str);
 CYCLE_SCALERS_TYPE2_SETTINGS_STR(type2_str);

Both type1_str and type2_str have been defined as in
experim.h
i.e.
#define CYCLE_SCALERS_TYPE1_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
 ........
 ........
"[32] General%NeutBm Cycle Sum",\
"[32] General%NeutBm Cycle Asym",\
"",\
NULL }

#define CYCLE_SCALERS_TYPE2_SETTINGS_STR(_name) char *_name[] = {\
"[.]",\
"Names = STRING[60] :",\
"[32] Back%BSeg00",\
"[32] Back%BSeg01",\
...........
............
"[32] General%B/F Cumul -",\
"[32] General%Asym Cumul -",\
"",\
NULL }

  if (db_find_key(hDB, 0, "/Equipment/Cycle_scalers/Settings/",&hKey) ==
DB_SUCCESS)
    db_delete_key(hDB,hKey,FALSE);
          
  if (  strncmp(fs.input.experiment_name,"1",1) == 0) {
      exp_mode = 1; /* Imusr type - scans */
      status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
    }
  else {
    exp_mode = 2; /* TDmusr types - noscans */
    status =
db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
  }

01 Dec 2003, Stefan Ritt, , delete key followed by create record leads to empty structure in experim.h

> I have noticed a problem with deleting a key to an array in odb, then
> recreating the record as in the code below. The record is recreated
> successfully, but when viewing it with mhttpd, a spurious blank line
> (coloured orange) is visible, followed by the rest of the data as normal.
> 
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type1_str));
>     }
>   else {
>     exp_mode = 2; /* TDmusr types - noscans */
>     status =
> db_create_record(hDB,0,"/Equipment/Cycle_scalers/Settings/",strcomb(type2_str));
>   }

The first problem is that the db_create_record has a trailing "/" in the key name 
after Settings. This causes the (empty) subsirectory which causes your trouble. 
Simple removing it fixes the problem. I agree that this is not obvious, so I 
added some code in db_create_record() which removes such a trailing slash if 
present. New version under CVS.

Second, the db_create_record() call is deprecated. You should use the new 
function db_check_record() instead, and remove your db_delete_key(). This avoids 
possible ODB trouble since the structure is not re-created each time, but only 
when necessary.

- Stefan

20 Nov 2003, Stefan Ritt, , Implementation of db_check_record()

As Konstantin pointed out correctly, the db_create_record() call is pretty 
heavy since it copies whole structures around the ODB. Therefore, it 
should not used frequently. It might be that several problems are caused 
by that, for example the "phantom" records reported in elog:40 .

I have therefore implemented the function 

db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, 
                BOOL correct)

which takes an ASCII structure in the same way as db_create_record(), but 
only checks this ASCII structure against the ODB contents without writing 
anything to the ODB. 

If the record does not exist at all, it is created via db_create_record(). 
This is useful for example with the /Runinfo structure on a virgin ODB.

If the parameter "correct" is FALSE, the function returns 
DB_STRUCT_MISMATCH if the ODB contents is wrong (wrong order of variables, 
wrong name of variables, wrong type or array size). The calling function 
should then abort, since a subsequent db_open_record() would fail. Note 
that although abort() is useful, one should add cm_disconnect_experiment() 
just before the abort() in order to have the application "log out" from 
the ODB gracefully. If the parameter "correct" is TRUE, the function 
db_create_record() is called internally to correct a mismatching record.

I have changed most calls of db_create_record() in mhttpd.c, mfe.c, mana.c 
and mlogger.c. Pierre, could you do the same for lazylogger.c?

I also started to put assert()'s everywhere and encourage everyone to 
follow. Under Windows, the asserts() are removed automatically if 
compiling in "Release" mode.

So I committed many changes, did some quick tests, but am not 100% 
convinced that all the changes are good. So please use the new code 
cautiously, and let me know if there is any new problem. I also would like 
to get some feedback if the whole thing becomes more stable now.

27 Nov 2003, Konstantin Olchanski, , Implementation of db_check_record()

> I have therefore implemented the function 
> db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
correct)

Stephan, something is very wrong with the new code. My
"/logger/channels/0/settings" is being destroyed on "begin run". Midas
checkout from october 31st is okey. This is a show stopper, but I am in a rush
and cannot debug it. I am falling back to the Oct 31st version... K.O.

30 Nov 2003, Konstantin Olchanski, , Implementation of db_check_record()

> > I have therefore implemented the function 
> > db_check_record(HNDLE hDB, HNDLE hKey, char *keyname, char *rec_str, BOOL
> correct)
> 
> Stephan, something is very wrong with the new code. My
> "/logger/channels/0/settings" is being destroyed on "begin run".

Okey. I found the problem in db_check_record(): when we decide that we have a
mismatch, we call db_create_record(...,rec_str), but by this time, rec_str no
longer points to the beginning of the ODB string because we started parsing it.

I tried this solution: save rec_str into rec_str_orig, then when we decide that
we have a mismatch, call db_create_record() with this saved rec_str_orig. It
fixes my immediate problem (destruction of "/logger/channels/0/settings"), but is
it correct?

I would like to fix it ASAP to get cvs-head working again: our mhttpd dumps core
on an assert() failure in db_create_record() and the set of db_check_record()
changes might fix it for me.

Here is the CVS diff:

RCS file: /usr/local/cvsroot/midas/src/odb.c,v
retrieving revision 1.73
diff -r1.73 odb.c
7810a7811
> char             *rec_str_orig = rec_str;
7820c7821
<     return db_create_record(hDB, hKey, keyname, rec_str);
---
>     return db_create_record(hDB, hKey, keyname, rec_str_orig);
7838c7839
<       return db_create_record(hDB, hKey, keyname, rec_str);
---
>       return db_create_record(hDB, hKey, keyname, rec_str_orig);
8023c8024
<               return db_create_record(hDB, hKey, keyname, rec_str);
---
>               return db_create_record(hDB, hKey, keyname, rec_str_orig);
8037c8038
<               return db_create_record(hDB, hKey, keyname, rec_str);
---
>               return db_create_record(hDB, hKey, keyname, rec_str_orig);

K.O.

30 Nov 2003, Stefan Ritt, , Implementation of db_check_record()

Fixed and committed. Can you check if it's working?

01 Dec 2003, Konstantin Olchanski, , Implementation of db_check_record()

> Fixed and committed. Can you check if it's working?
Yes, it is fixed. Thanks. K.O.

09 Dec 2003, Paul Knowles, , db_close_record non-local/non-return

Hi All,

I have found a weird one:

The following code executes on the frontend machine in the
frontend_exit() routine, and connects to the odb running on
another separate machine:
...
     cm_msg(MINFO,__func__, "line %d", __LINE__);

     cm_get_experiment_database(&hdb, NULL);

     cm_msg(MINFO,__func__, "line %d", __LINE__);
     status = db_find_key(hdb, 0, "/Experiment/Run Parameters", &hkey);
     cm_msg(MINFO,__func__, "line %d, hkey=%d, status=%d",
            __LINE__, hkey, status);
     checkstat("db_find_key returned status %d", status);
     cm_msg(MINFO,__func__, "line %d", __LINE__);
     status = db_close_record(hdb, hkey);

     /* NOTREACHED!! the above call to db_close_record
        doesn't return!
      */
     cm_msg(MINFO,__func__, "line %d, status=%d", __LINE__, status);
     checkstat("db_close_record returned status %d", status);

checkstat is a macro that does the following:
#define checkstat(format, arg...)\
do{ if(status != DB_SUCCESS) {\
cm_msg(MERROR, __func__, format, ## arg);\
return FE_ERR_ODB;}}while(0)

The key exists, and the status of the search is 1
(i.e., DB_SUCCESS) and rest of the code tries to run.  What gets
really weird is that the db_close_record _doesn't_ _return_.
The code following the NOTREACHED comment just doesn't get
called.  I get the message from the __LINE__ just in front
of the call, but not the message afterwards (cm_msg and printf 
were tried).  Somehow db_close_record is causing a non-local 
exit or signal or something. No error message is printed and the 
frontend continues to exit with exit code 0.  But, since the rest
of my frontend_exit/odb closing doesn't happen, the odb is left in
a lost state requiring a cleanup.  If I comment out the calls to 
db_close_record, the rest of my frontend_exit runs normally 
and the cm_disconnect_experiment() in mfe.c eventually closes my 
open records correctly (I expect, anyway) and this is the present 
workaround i am using.  The terror i have is that several of my 
hotlinked callback routines will call the close_record routine 
when resetting illegal values.  No end of hilarity will result there...

I was using the same code in the frontend under 1.9.2 and
have only recently upgraded to 1.9.3-? tarball from PAA and 
there were no problems using the 1.9.2 code: this is a 1.9.3
issue.

I have localized the weirdness to what I think is the RPC interface.
Running the nullfrontend (no camac access) on the same machine as 
hosts the ODB I can make the problem appear and disappear in the 
following way:
(odb is local on machine ``monet'')

nullfe -h monet -e acqmonad     : db_close_record will get lost

nullfe -e acqmonad              : db_close_record works as expected.

I've tried also with the patch for the 256 byte odb string bug since
many of the open records have strings of that length, but that isn't
it. The only substancial looking change to mserver from 1.9.2 to 1.9.3
is the SIGPIPE ignore and that doesn't look like a good candidate either.
Can this be that some of the 
   #IFDEF LOCAL_ROUTINES
that got moved about in odb.c and others
are causing the remote call to get confused?

Clearly the answer is to just use stable and happy 1.9.2, but the 
people for whom I am working now really want to use ROOT for
an analyzer...


cheers,
.p.

Paul Knowles.                   phone: 41 26 300 90 64
email: Paul.Knowles@unifr.ch      Fax: 41 26 300 97 47
finger me at pexppc33.unifr.ch for more contact information

12 Dec 2003, Stefan Ritt, , db_close_record non-local/non-return

Hi Paul,

sorry my late reply, I had to find some time for debugging your problem. 
Thank you very much for the detailed description of the problem, I wish all 
bug reports would be such elaborate!

You were right that there was a bug in the RPC system. The function 
db_remove_open_record() got a new parameter recently, which was not changed 
in the RPC call, and caused the mserver side to crash on any 
db_close_record() call.

I fixed it and the update is under CVS (http://midas.psi.ch/cgi-
bin/cvsweb/midas/src/). Since you need to update many files, I wonder if I 
should enable anonymous CVS read access. Does anybody know how to set this 
up using "ssh" as the protocol (via CVS_RSH=ssh)?

Please note that db_close_record() is not necessary as 
cm_disconnect_experiment() takes care of this, but having it there does not 
hurt.

12 Dec 2003, Stefan Ritt, , Several small fixes and changes

I committed several small fixes and changes:

- install.txt which mentions explicitly ROOT
- mana.c and the main Makefile which fixes all HBOOK compiler warnings
- mana.c to write an explicit warning if the experiment directoy contains 
uppercase letters in the path (HBOOK does not like this and refuses to 
read/write histos)
- mserver.c, mrpc.c, odb.c to fix a wrong parameter in 
db_remove_open_record() (see previous entry from Paul)
- added experim.h into the dependency of the hbookexpt Makefile

15 Dec 2003, Pierre-Andr� Amaudruz, , ROOT GUI at Triumf

The current Triumf DAQ standard (Midas) since the second quarter of this
year (2003) has the capability to deal with ROOT histograms. The internal
midas logger can save data files in ROOT format and the analyzer can book
and fill ROOT histograms. These features triggered a new project started
during summer 2003 for building a Triumf GUI ROOT/Midas display utility.

The initial requirements for this utility are:
1) Solely based on ROOT (VirtualX, no Qt)
2) Similar overall functionality than PAW.
   - Open concurrent ROOT files.
   - Open connection to a single Midas Online experiment (requires analyzer
                                                          as server)
   - Optional Auto-update in ONLINE mode.
   - Zoning, Zooming option display.
   - Simple Historgram gaphic manipulation. (based on current ROOT
                                             implementation)
   - Tree manipulation ( use of TBrowser())
   - Simple user script invocation.
   - Optional experiment specific customization.
3) Session configuration save/restore option.

An initial version has been developed and currently is under evaluation.
Improvement and further development will based on the local experimenters
responses. 

This utility will be available for external use around the second quarter of
the 2004 at the latest.

.

11 Aug 2003, Konstantin Olchanski, , Alarm on no ping?

I want midas alarms to go off when I cannot ping arbitrary remote hosts. Is
there is easy/preferred way to do this? K.O.

18 Dec 2003, Stefan Ritt, , Alarm on no ping?

> I want midas alarms to go off when I cannot ping arbitrary remote hosts. Is
> there is easy/preferred way to do this? K.O.

There are "internal alarms" with type AT_EVALUATED. Just find a program 
where you can put some code which gets periodically executed (like the idle 
loop in the frontend), and so something like:

DWORD last = 0;

  if (ss_time() > last+60)
    {
    last = ss_time();

    /* do a ping via socket(), bind() and connect() */
    ...

    if (status != CM_SUCCESS)
      al_trigger_alarm("XYZ Ping", str, "Warning", 
                       "Host is dead", AT_INTERNAL);
    }

Pierre does the same thing in lazylogger.c, just have a look. I don't know 
how to do a ping correctly in C, I guess you have to send an UDP packet 
somewhere, but I never did it. If you find it out, please post it.

15 Dec 2003, Stefan Ritt, , Poll about default indent style

Dear all,

there are continuing requests about the C indent style we use in midas. As 
you know, the current style does not comply with any standard. It is even 
a mixture of styles since code comes from different people. To fix this 
once and forever, I am considering using the "indent" program which comes 
with every linux installation. Running indent regularly on all our code 
ensures a consistent look. So I propose (actually the idea came from Paul 
Knowles) to put a new section in the midas makefile:

indent:
        find . -name "*.[hc]" -exec indent <flags> {} \;

so one can easily do a "make indent". The question is now how the <flags> 
should look like. The standard is GNU style, but this deviates from the 
original K&R style such that the opening "{" is put on a new line, which I 
use but most of you do not. The "-kr" style does the standard K&R style, 
but used tabs (which is not good), and does a 4-column indention which is 
I think too much. So I would propose following flags:

indent -kr -nut -i2 -di8 -bad <filename.c>

Please take some of your source code, and format it this way, and let me 
know if these flags are a good combination or if you would like to have 
anything changed. It should also be checked (->PAA) that this style 
complies with the DOC++ system. Once we all agree, I can put it into the 
makefile, execute it and commit the newly formatted code for the whole 
source tree.

18 Dec 2003, Paul Knowles, , Poll about default indent style

Hi Stefan,

> once and forever, I am considering using the "indent" program which comes 
> with every linux installation. Running indent regularly on all our code 
> ensures a consistent look.

I think this can be called a Good Thing.

> The "-kr" style does the standard K&R style, 
> but used tabs (which is not good), and does a 4-column 
> indention which is I think too much. So I would propose 
> following flags:
>        indent -kr -nut -i2 -di8 -bad <filename.c>

(some of this is a repeat from an earlier mail to SR):
You might also want a -l90 for a longer line length than 75
characters.  K&R style with indentation from 5 to 8 spaces
is a good indicator of complexity: as soon as 40 characters
of code wind up unreadably squashed to the right of the
screen, you have to refactor to have less indentation
levels.  This means you wind up rolling up the inner parts
of deeply nested conditionals or loops as separate
functions, making the whole code easier to understand.

I think that setting -i2 is ``going around the problem'' 
of deep nesting.  If you really need to keep the indentation 
tabs less than 4 (8 is ideal) because your code is falling off the 
right edge of the screen, you are indented too deeply.  Why do 
I say that?  There is the famous ``7+-1'' idea that you can hold
in you head only 7 ideas (give or take one) at any time.  I'm not 
that smart and I top out at about 5:  So for example, a conditional 
in a loop  in a conditional in a switch is about as deep a level 
of nesting as  I can easily understand (remember that I also have 
to hold the line i'm working on as well): that's 4 levels, plus one for the
function itself and we are at 40 characters away from the right edge
of the screen using -i8 and have some 40 characters available for writing code
(how often is a line of code really longer than about 40 characters?).
On top of that, the indentation is easily seen so you know immediately 
wheather you are at the upper conditional, or inner conditional.  A -i2
just doesn't make the difference big enough.  -i5 is a happy balance 
with enough visual clue as to the indentation level, but leaves you 50
to 60 characters for the code line itself.

However, if you are indenting very deeply, then the poor reader can't hold
on to the context: there are more than 6 or 7 things to keep in mind.
In those cases, roll up the inner levels as a separate function and 
call it that way. The inner complexity of the nested statements gets 
nicely abstracted and then dumb people like me can understand what 
you are doing.

So, in brief: indent is a good idea, and -in with n>=4 will be best.
I don't think -i2 will lend itself to making the code so much easier 
to read.

thanks for listening.
.p.

18 Dec 2003, Stefan Ritt, , Poll about default indent style

Hi Paul,

I agree with you that a nesting level of more than 4-5 is a bad thing, but I 
believe that throughout the midas code, this level is not exceeded (my poor 
mind also does not hold more than 5 things (;-) ). An indent level of 8 columns 
alone does hot force you too much in not extending the nesting level. I have 
seen code which does that, so there are nesting levels of 8 and more, which 
ends up that the code is smashed to the right side of the screen, where each 
statement is broken into many line since each line only holds 10 or 20 
characters. All the nice real estate on the left side of the scree is lost.

So having said that, I don't feel a strong need of giving up a "-i2", since the 
midas code does not contain deep nesting levels and hopefully will never have. 
In my opinion, a small indent level makes more use of your screen space, since 
you do not have a large white area at the left. A typical nesting level is 3-4, 
which causes already 32 blank charactes at the left, or 1/3 of your screen, 
just for nothing. It will lead to more lines (even with -l90), so people have 
to scroll more.

What do others think (Pierre, Konstantin, Renee) ?

01 Jan 2004, Konstantin Olchanski, , Poll about default indent style

> I don't feel a strong need of giving up a "-i2"...

I am comfortable with the current MIDAS styling convention and I would rather not
have yet another private religious war over the right location for the curley braces.

If we are to consider changing the MIDAS coding convention, I urge all and sundry
to read the ROOT coding convention, as written by Rene Brun and Fons Rademakers at
http://root.cern.ch/root/Conventions.html. The ROOT people did their homework, they
did read the literature and they produced a well considered and well argumented style.

Also, while there, do read the Taligent documentation- by far, one of the most
coherent manuals to C++ programming style.

K.O.

06 Jan 2004, Stefan Ritt, , Poll about default indent style

Ok, taking all comments so far into account, I conclude adopting the ROOT 
coding style would be best for us. So I put

indent:
	find . -name "*.[hc]" -exec indent -kr -nut -i3 {} \;

Into the makefile. Hope everybody is happy now (;-)))

14 Jan 2004, Razvan Stefan Gornea, , Access to hardware in the MIDAS framework

I am just starting to explore MIDAS, i.e. reading the manual and trying 
some examples. For the moment I would like to make a simple frontend that 
access a portable multimeter through RS-232 port. I think this could help 
me understand how to access hardare inside MIDAS framework. Initially I've 
started from the MiniFE.c example and tried to initialize the serial port 
on run start transition and build a readout loop in the main function. I 
know that this is not a full frontend but I was just interested in getting 
some experience with the drivers available in the distribution, in this 
case RS-232. The portable multimeter is very simple in principle, one just 
has to configure the port settings and then send character 'R' and read 14 
ASCII characters from the device. Unfortunately I could not understand how 
to invoke the driver services so I changed and started again with the 
slowcont/frontend.c example. From this example and after reading the "Slow 
Control System" section in the MIDAS manual I think that all I need to do 
is to define my own equipment structure based on the multi.c class driver 
with a single input channel (and replace the null driver with the RS-232).

Here I got stuck. I see from the code source that there is a relationship 
between drivers at all levels (even bus) and the ODB but I don't yet fully 
understand how they work. Actually for a couple of days now I am in a loop 
going from class to device to bus and then back again to class drivers 
trying to see how to create my own device driver and especially how to call 
the bus driver. It could be that the framework is invoking the drivers and 
the user just has to configure things ... up to now I didn't dare to look 
at the mfe.c.

Is there a more detailed documentation about slow control and drivers then 
the MIDAS manual? What is the data flow through the three layers system for 
drivers? What is the role of the framework and what is left to the user 
choice?

Thanks

14 Jan 2004, Stefan Ritt, , Access to hardware in the MIDAS framework

There is some information at

http://midas.triumf.ca/doc/html/Internal.html#Slow_Control_system

and at

http://midas/download/course/course_rt03.zip , file "part1.ppt", expecially 
page 59 and page 62 "writing your own device driver".

So what you are missing for your application is a "device driver" for your 
multimeter. The only function it has to implement is the function CMD_INIT 
where you initialize the RS232 port, and the funciton CMD_GET, which sends 
a "R" and reads the value. Now you have two options:

1) You implement RS232 calls directly in your device driver

You link against rs232.c and directly call rs232_init() at the inizialization, 
then call rs232_write() and rs232_read() where you read your 14 ASCII 
characters.

2) You call a "bus driver" in your device driver

This method makes the device driver independent of the underlying transport 
interface. So if your next multimeter accepts the same "R" command over 
Ethernet, you can just replace the RS232 bus driver by the TCPIP bus driver 
without having to change your device driver. But I guess that method 2) is not 
worth for such a simple device like your multimeter.

So take nulldev.c or dastemp.c as your starting point, put some RS232 
initialization into the init routine and the communication via "R" into 
the "get" routine. The slow control frontend, driven by mfe.c, should then 
regularly read your multimeter and the value should appear in the ODB. Take 
the examples/slowcont/frontend.c as an example, and adjust the multi_driver[] 
list to use your new device driver (instead of the nulldev).

I would like to mention that the usage of midas only makes sense for some 
experiemnts which require event based readout, using VME or CAMAC crates. If 
your only task is to read out some devices which are called "slow control 
equipment" in the midas language, then you might be better of with labview or 
something.

16 Jan 2004, Razvan Stefan Gornea, , Access to hardware in the MIDAS framework

The multimeter device is indeed to simple to use MIDAS but I am just trying 
it as a learning experience. The DAQ system to develop involves VME crates 
and general purpose I/O boards. The slow control part, especially accessing 
the I/O boards seem to me more complex then the VME access. I want to 
understand very well the "correct" way of using the MIDAS slow control 
framework before starting the project.

I chose the second method and created a meterdev.c driver (essentially a 
copy of the nulldev.c) where I changed the init. function and the get 
function. I am not sending a "INIT ..." string because for this device it 
is useless. In the get function I send a "D" and read my string. I changed 
the frontend of the example to have a new driver list (in the first try I 
eliminated the Output device but the ODB got corrupted, I guess the class 
multi needs to have defined output channels). The output channel is linked 
with nulldev and null (I guess this is like if they would not be present).

The result is strange because the get function is called all the time very 
fast (much faster then the 9 seconds as set in the equipment) and even 
before starting the run (I just put the flag RO_RUNNING).

Thanks for any help

17 Jan 2004, Stefan Ritt, , Access to hardware in the MIDAS framework

> The result is strange because the get function is called all the time very 
> fast (much faster then the 9 seconds as set in the equipment) and even 
> before starting the run (I just put the flag RO_RUNNING).

This is on purpose. When the frontend is idle, it loops over the slow control 
equipment as fast as possible. This way, you see changes in your hardware very 
quickly. I see no reason to waste CPU cycles in the frontend when there are 
better things to do like reading slow control equipment. Presume you have the 
alarm system running, which turns off some equipment in case of an over 
current. You better do this as quickly as possible, not wasting up to 9 
seconds each time.

The 9 seconds you mention are for reading *EVENTS*. You have double 
functionality: First, reading the slow control system, writing updated values 
to the ODB, where someone else can display or evaluate them (in the alarm 
system for example). Second, assemble events and sending them with the other 
data to disk or tape. Only the second one gets controlled by RO_RUNNING and 
the 9 seconds. You can see this by the updating event statists on your 
frontend display, which increments only when running and then every 9 seconds.

14 Jan 2004, Konstantin Olchanski, , First try- midas on darwin/macosx

While watching "The Wizard of Oz", the greatest movie ever made, I took a shot at building 
midas on my macosx computer. After stumbling on a few small and on a few hard problems, I 
built almost everything. However, odb does not work- some further debugging is in order.

Anyway, the easy problems are:
- a few missing header files: pty.h, sys/vfs.h, malloc.h
- a few missing features in system.c (stime(), "get tape position")
- /usr/include/string.h already has strlcpy() & co.
- dbg_malloc() has inconsistent prototypes (size_t vs unsigned int)
- for reasons unknown, PVM is #defined. This flushed a bug in mana.c

A few hard problems:
- namespace pollution by Apple- they #define ALIGN in system headers, colliding with ALIGN 
in midas.h. I was amazed that the two are almost identical, but MIDAS ALIGN aligns to 8 
bytes, while Apple does 4 bytes. ALIGN is used all over the place and I am not sure how to 
reconcile this.
- "timezone" in mhttpd.c. On linux, it's an "int", on darwin, it's a function. What gives?
- building libmidas.a requires running ranlib
- building libmidas.so requires unknown macosx specific magic.

For your enjoyment, the "cvs diff" is attached. The resulting code is known to not work.

K.O.

14 Jan 2004, Stefan Ritt, , First try- midas on darwin/macosx

Great, I got already questions about MacOSX support...

Once it's working, you should commit the changes. But take into account that using "//" for 
comments might cause problems for the VxWorks compiler (talk to Pierre about that!).

> A few hard problems:
> - namespace pollution by Apple- they #define ALIGN in system headers, colliding with ALIGN 
> in midas.h. I was amazed that the two are almost identical, but MIDAS ALIGN aligns to 8 
> bytes, while Apple does 4 bytes. ALIGN is used all over the place and I am not sure how to 
> reconcile this.

You can rename ALIGN to ALIGN8 all over the place.

> - "timezone" in mhttpd.c. On linux, it's an "int", on darwin, it's a function. What gives?

Wrap it into a function get_timezone(). Under linux, just return "timezone", under OSX, 
return timezone() via conditional compiling.

> - building libmidas.a requires running ranlib
> - building libmidas.so requires unknown macosx specific magic.

I guess we should foget for now about the shared libraries (Mac people anyhow have too much 
money so they can affort additional RAM (;-) ), but building the static library is mandatory.

16 Jan 2004, Konstantin Olchanski, , First try- midas on darwin/macosx

> Great, I got already questions about MacOSX support...
> Once it's working, you should commit the changes.

With the ALIGN8() change ODB works, mhttpd works. ALIGN8 change now commited to cvs, verified that "make all" builds 
on Linux.

ROOT stuff still blows up because of more namespace pollution (/usr/include/sys/something does #define Free(x) 
free(blah...)). Arguably, it is not Apple's fault- portable programs should not include any <sys/foo.h> header files. I 
think I can fix it by moving "#include <sys/mount.h>" from midasinc.h to system.h.

Also figured out why PVM is defined- more pollution from "#include <sys/blah...>". This is only in mana.c and I will 
repace every "#ifdef PVM" with "#ifdef HAVE_PVM". Is there documentation that should be updated as well? Alternatively I 
can try to play games with header files...


> But take into account that using "//" for comments might cause problems for the VxWorks compiler (talk to Pierre 
about that!).

Yes, "// comments" stay out of midas. I used them to make the modification more visible.

> You can rename ALIGN to ALIGN8 all over the place.

Done, commited.

> > - "timezone" in mhttpd.c. On linux, it's an "int", on darwin, it's a function. What gives?
> Wrap it into a function get_timezone(). Under linux, just return "timezone", under OSX, 
> return timezone() via conditional compiling.

Right. Still on the todo list.

> > - building libmidas.a requires running ranlib

I still have to cleanup the Makefile. Not commiting it yet.

Then, a new problem- on MacOSX, pthread_t is not an "INT" and system.c:ss_thread_create() whines about it. I want to 
introduce a system dependant THREAD_T (or whatever) and make ss_thread_create() return that, rather than INT.

ROOT stuff is still not fully tested- it takes a little while to build ROOT on a 600MHz laptop.

Attached is my current CVS diff.

K.O.

17 Jan 2004, Stefan Ritt, , First try- midas on darwin/macosx

> With the ALIGN8() change ODB works, mhttpd works. ALIGN8 change now commited to cvs, verified that "make all" builds 
> on Linux.

Verified that "make all" still works under Windows.

> ROOT stuff still blows up because of more namespace pollution (/usr/include/sys/something does #define Free(x) 
> free(blah...)). Arguably, it is not Apple's fault- portable programs should not include any <sys/foo.h> header files. I 
> think I can fix it by moving "#include <sys/mount.h>" from midasinc.h to system.h.

I would like to keep all OS specific #includes in midasinc.h. In worst case put another section there for OSX, like

in midas.h:

#if !defined(OS_MACOSX)
#if defined ( __????__ ) <- put the proper thing here
#define OS_MACOSX
#endif
#endif

then make a new seciton in midasinc.h

#ifdef OS_MACOSX
#include <...>
#endif

> Also figured out why PVM is defined- more pollution from "#include <sys/blah...>". This is only in mana.c and I will 
> repace every "#ifdef PVM" with "#ifdef HAVE_PVM". Is there documentation that should be updated as well? Alternatively I 
> can try to play games with header files...

Right, PVM should be replaced by HAVE_PVM. This is only for the analyzer. I planned at some point to run the analyzer in 
parallel on a linux cluster, but it was never really used. Going to ROOT, that facility should be replaces by PROOF.

> Then, a new problem- on MacOSX, pthread_t is not an "INT" and system.c:ss_thread_create() whines about it. I want to 
> introduce a system dependant THREAD_T (or whatever) and make ss_thread_create() return that, rather than INT.

Good. If you have a OS_MACOSX, that should help you there.

-SR

18 Jan 2004, Konstantin Olchanski, , First try- midas on darwin/macosx

> I would like to keep all OS specific #includes in midasinc.h

No go. Here is the problem:

midasinc.h includes sys/mount.h, which #defines Free(x) to be something else
mana.c includes msystem.h, which includes midasinc.h
mana.c includes ROOT header files, which blow up because Free(x) is redefined.

I want this:

mana.c does *not* include sys/mount.h
system.c does include sys/mount.h

Simplest solution is to take sys/mount.h out of midasinc.h and include it in system.c

> Right, PVM should be replaced by HAVE_PVM.

Commited.

> > Then, a new problem- on MacOSX, pthread_t is not an "INT" and system.c:ss_thread_create() whines about it. I want to 
> > introduce a system dependant THREAD_T (or whatever) and make ss_thread_create() return that, rather than INT.
> Good. If you have a OS_MACOSX, that should help you there.

Okey. In Darwin, pthread_t is not an int. It is a pointer to a struct. In midas.c I typedef midas_pthread_t to HANDLE on Windows and to pthread_t n OS_UNIX.

This uncovered a problem with ss_getthandle(). What is it supposed to do? On Windows it returns a handle to the current thread, on OS_UNIX, it returns getpid(). 
What gives? I am leaving it alone for now.

Attached is the current diff. Most changes are in system.c: ss_timezone() and midas_pthread_t. The Makefile part is already commited. Building the shared 
library was made dependant on NEED_SHLIB. Now, building static midas applications is very simple, use "make SHLIB="

K.O.

19 Jan 2004, Stefan Ritt, , First try- midas on darwin/macosx

> I want this:
> 
> mana.c does *not* include sys/mount.h
> system.c does include sys/mount.h
> 
> Simplest solution is to take sys/mount.h out of midasinc.h and include it in system.c

Agree.

> This uncovered a problem with ss_getthandle(). What is it supposed to do? On Windows it returns a handle to the current thread, on OS_UNIX, it returns getpid(). 
> What gives? I am leaving it alone for now.

The Unix version of ss_getthandle() returns the pid since at the time when I wrote that function (many years ago) there were no threads under Unix. It should now 
be replaces with a function which returns the real thread id (at least under Linux).

19 Jan 2004, Konstantin Olchanski, , First try- midas on darwin/macosx

> > Simplest solution is to take sys/mount.h out of midasinc.h and include it in system.c
> Agree.

Done.

With this, I commited the rest of my changes: midas_thread_t in midas.h, change ss_thread_xxx() prototypes in msystem.h
, implementation in system.c

My cvs diff is now empty.

Midas should compile on Darwin aka macosx, I tested "odbedit" and "mhttpd"- they seem to work.
 
> > This uncovered a problem with ss_getthandle(). 
> The Unix version of ss_getthandle() returns the pid since at the time when I wrote that function (many years ago) there were no threads under Unix. It should now 
> be replaces with a function which returns the real thread id (at least under Linux).

I do not want to touch this. Sorry.

K.O.

19 Jan 2004, Konstantin Olchanski, , darwin aka macosx changes

I commited the final bits to make Midas build on Darwin aka macosx.

Here is the summary:

1) I treat Darwin as a funny linux, so OS_LINUX is always defined
2) OS_DARWIN is defined for places where the two differ
3) system dependant directory is "midas/darwin/{bin,lib}"
4) a few header files had to be moved around to dodge namespace pollution by Apple system 
header files (i.e. one of the PowerPC header files #defines PVM- collision with PVM in mana.c, 
another #defines Free(x)- collision with ROOT header files)
5) ss_thread_create() and ss_thread_kill() now use midas_thread_t. On Darwin ptherad_t is not 
an "int".
6) the Makefile has no support for building the midas shared library on macosx.
7) on my Mac OS 10.2.8 machine, "make all" works, "odbedit" and "mhttpd" run. This is the 
full extent of my testing. Status on Mac OS 10.3.x is unknown.

K.O.

10 Mar 2004, Jan Wouters, , Creation of secondary Midas output file.

Dear Midas Team,

I have run into a problem with Midas and was wondering if you could explain what I 
am doing wrong.  I have included a simple demo to illustrate what I am doing and 
can send a small input data file if needed.

WHAT I AM TRYING TO DO:
Every midas event for the DANCE experiment consists of many physics events.  I am 
trying to create a secondary mid file where the event boundaries are now the 
physics events rather than the midas events.  This secondary mid file will be 
analyzed using a second stage midas analyzer.

For the demo, I use the data from EV02 (one of our 15 frontends), which consists of a 
variable number of fixed length structures where each structure contains the data for 
one crystal from the DANCE detector. 
 I treat each crystal as a separate physics event and write it out in the TREK bank, 
which is a demo calculated output bank, as a separate event.   

(The only difference between this demo and our real system is that we would include 
all the crystals from the other frontends that have approximately the same time stamp 
in the output bank.  Thus the output bank would consist of a varing number of 
crystals in one event rather than the fixed one crystal per event used in this demo.)

THE CHANGES TO analyzer.c AND adccalib.c
I loop through the EV02 bank examining each crystal structure in turn.  I calculate 
"calibrated" parameters and put them into an output bank called TREK.  The unusual 
part of this example is that the TREK bank is no longer part of the main list of input 
banks, ana_trigger_bank_list[].   Instead it is now part of a new bank list called 
ana_physics_bank_list[].  See the analyzer.c file for this definition.

In adccalib.c I  create the space for this new bank as follows. 

	EVENT_HEADER 	gPhysicsEventHeaders[ MAX_EVENT_SIZE / sizeof( 
EVENT_HEADER ) ];  
	WORD* 		gPhysicsEventData = ( WORD * )( gPhysicsEventHeaders + 1 );		

In the adc_calib routine I create the bank header as follows.  Note that the serial 
numbers will restart at 0 at the beginning of each midas event.  Should I let the serial 
number increment monotonically until the end of the run?:

	gPhysicsEventHeaders->serial_number = (DWORD) - 1;
	gPhysicsEventHeaders->event_id = 2;
	gPhysicsEventHeaders->trigger_mask = 0;
	gPhysicsEventHeaders->time_stamp = pheader->time_stamp;

In a loop that loops through all the crystals contained in EV02,  I extract each crystal, 
calibrate it, and store it in a TREK structure.  In creating the TREK bank I assume that 
each one will be a separate physics event thus I update the event serial number and 
use bk_init32 to initialize the memory.   

   	for ( short i = 0; i < nItems; i++ )
  	{	++(gPhysicsEventHeaders->serial_number);  	// Update serial number.
  		bk_init32( gPhysicsEventData );		// Initialize storage.
  		bk_create( gPhysicsEventData, "TREK", TID_STRUCT, &trek );
  	
  		trek->one = (double) pev->areahg * 1.0;
  		trek->two = (float) pev->timelo * 1.0;

  		bk_close( gPhysicsEventData, trek+1 );
  		
  		pev++; 					// Loop to next crystal's data. 
	}	

The output bank should consist of multiple events for each individual EV02 midas 
input event. 

 As far as I can tell the code compiles and runs fine, but I get no data in the .mid 
output file except for the ODB. I have a print statement at the beginning of each 
midas event stating how many crystals were found in the EV02 bank.  I also print out 
the calibrated value for each crystal as it is being placed in its own TREK output 
bank.  The data appears correct.

 I cannot place TREK in the input bank the way it normally is done in the examples 
because there is not a one-to-one correspondence between a midas event and a 
true physics event.  Instead one midas event has many physics events.  Thus the 
output bank needs to be in a new memory area so that I can create a custom header 
and increment the serial number properly for each event.  Our follow-on analysis 
using a second Midas analyzer only needs to analyze one physics event at a time 
rather than one Midas event at a time, which is why we are going to all the trouble to 
get this paradigm working.

I include all the code for this very simple example. 

RUNNING THE CODE:
To run the example just use the run01220.mid file I will send:

./analyzer -i run01220.mid.gz -o run01220out.mid -c settings.odb_cfg -n 50

The only thing done by the settings.odb_cfg file is to turn on the TREK output bank.  I 
have verified that the bank is on.

SUMMARY:
I believe that I must not be creating the new TREK output bank correctly so that 
midas understands that the event-by-event calculated physics data should be written 
out event-by-event.  I have pointed out several places in the above discussion where 
I might be making a mistake.

I would like to get both this example running and a similar which create Root trees, 
though the Root trees are of secondary importance.  With this example I can finish 
writing the second stage analyzer and get the DANCE collaboration moving forward 
with their analysis.  Currently, we cannot use this paradigm because I cannot create 
a secondary mid file in our stage one analysis.  I would be very grateful if you could 
take a look at this example and tell me what I am doing incorrectly.

Jan

10 Mar 2004, Stefan Ritt, , Creation of secondary Midas output file.

Dear Jan,

I had a look at your code. You create a gPhysicsEventHeader array, fill it, and expect the 
framework to write it to disk. But how can the framework "guess" that you want your private 
global array being written? Unfortunately it cannot do magic!

Do do what you want, you have to write a "secondary" midas file yourself. I modified your 
code to do that. First, I define the event storage like

BYTE           gSecEvent[ MAX_EVENT_SIZE ];
EVENT_HEADER   *gPhysicsEventHeader = (EVENT_HEADER *) gSecEvent;
WORD* 	       gPhysicsEventData = ( WORD * )( gPhysicsEventHeader + 1 );		

I use gSecEvent as a BYTE array, since it only contains one avent at a time, so this is more 
appropriate. Then, in the BOR routine, I open a file:

  sprintf(str, "sec%05d.mid", run_number);
  sec_fh = open(str, O_CREAT | O_RDWR | O_BINARY, 0644);

and close it in the EOR routine

  close(sec_fh);

The event routine now manually fills events into the secondary file:

      /* write event to secondary .mid file */
      gPhysicsEventHeader->data_size = bk_size(gPhysicsEventData);
      write(sec_fh, gPhysicsEventHeader, sizeof(EVENT_HEADER)+bk_size(gPhysicsEventData));

Note that this code is placed *inside* the for() loop over nItems, so for each detector you 
create and event and write it.

That's all you need, the full file adccalib.c is attached. I tried to produce a sec01220.mid 
file and was able to read it back with the mdump utility.

Best regards,

  Stefan

11 Mar 2004, Renee Poutissou, , Creation of secondary Midas output file.

Jan , 

Do you need to log this stage 1 output?  If not, you would use the 
eventbuilder mechanism to create your stage 2 events.  
I use the eventbuilder mechanism with success for my TWIST experiment.

Renee

Goto page Previous 1, 2, 3 ... 47, 48, 49 Next

ELOG V3.1.4-2e1708b5