ELOG Midas

> I just tried now and it seemed to work fine. Do you still have the problem?
> 
> - Stefan

 The problem was still there this morning, shortly after seeing your mail, but seems
to be fixed now.
 BTW, which is the best way to submit patches ? I have a version of khyt1331 for Linux
kernel 2.6 (we are running Scientific Linux 4.1), and a few smaller things, mostly in
the examples. 

 Thanks, Sergio

250

26 Mar 2006

Stefan Ritt

Info

svn@savannah.psi.ch down ?

>  Hi,
> I was trying to update the checkout of Midas, but it looks like something is not
> working - maybe a component of the Savannah system:
> [sergio@daq-pc midas-SVN]$ svn update
> svn@savannah.psi.ch's password: svn
> unix dgram connect: Connection refused at /bin/cvssh.pl line 32
> no connection to syslog available at /bin/cvssh.pl line 32
> svn: Connection closed unexpectedly
> 
> my .svn/entries says (amongst the rest)
>  url="svn+ssh://svn@savannah.psi.ch/afs/psi.ch/project/meg/svn/midas/trunk"
> and yes, it used to work well... 
> 
> Cheers,
>   Sergio

I just tried now and it seemed to work fine. Do you still have the problem?

- Stefan

svn@savannah.psi.ch down ?

 Hi,
I was trying to update the checkout of Midas, but it looks like something is not
working - maybe a component of the Savannah system:
[sergio@daq-pc midas-SVN]$ svn update
svn@savannah.psi.ch's password: svn
unix dgram connect: Connection refused at /bin/cvssh.pl line 32
no connection to syslog available at /bin/cvssh.pl line 32
svn: Connection closed unexpectedly

my .svn/entries says (amongst the rest)
 url="svn+ssh://svn@savannah.psi.ch/afs/psi.ch/project/meg/svn/midas/trunk"
and yes, it used to work well... 

Cheers,
  Sergio

How do I do custom event building?

At DANCE we have a similar issue.  We are still doing "software
handshaking" between multiple frontends (15 which read data, and 16th
with direct accessto the trigger logic), and we apply a time stamp
using gettimeofday().  We use the regular mevb, sorting on serial number.

In the analyzer (MIDAS or ROME) we then keep a big circular buffer of
event fragments, which are rebuilt into new events based on the time stamp
obtained from gettimeofday().  We keep the system clocks synchronized
(often to within about 1ms) using ntp (need to average over several
ntp servers to avoid issues with network noise).  ntp can take a while
to stabilize, so we never reboot our computers... (well almost never).
We have a slow control frontend which monitors the ntp time offsets and
put's them in the history system for easy visualization.

Occasionally we seem to get in a mess, but somehow this fixes itself on
the next run, so it has been a useable system.  Maybe one day we will
get hardware handshaking between the frontend computers and the trigger
logic, but in the meantime we are taking data.

John.

Handling multiple identical USB devices

> Any thoughts?

I got an idea of how to solve this problem in an OS-independent manner. The USB
devices and hubs form a tree, like this

  Root  HUB
  0   1   2
  |   |   \__...
  |   \___
 DevY     \
         HUB
        0 1 2
        |   |
        |  DevX
       HUB
      0 1 2
          |
        DevZ

This tree can be considered as an ordered tree, if you read it from left to right.
In that order, the devices are orderd

DevY - DevZ - DevX

Since the devices are ordered, the "instance" parameter from musb_init can be used
to identify them uniquely, like

instance==0   => DevY
instance==1   => DevZ
instance==2   => DevX

So I would say that we can use the current API using the "instance" parameter to
uniquely access a device. All we have to do is to build that tree, sort it, and then
use the instance parameter as an entry to that tree. The sorting takes care of
different ordering, which can happen during enumeration (depeding on power-up
sequence, phase of the Moon etc.). So if you have three devices like above, DevZ
should alway be at "instance==1". The only problem is if you unplug DevY for
example, then you get the map

instance==0   => DevZ
instance==1   => DevX

which is different from above. But if you have a different number of devices, you
likely have to change your frontend cody anyhow, so you can change the device
mapping there as well. 

In order to simplify the code, I would not build a complete tree and sort it, but
scan the whole tree hierarchically, i.e. look at

Bus1/Port1
Bus1/Port2
Bus1/...
Bus2/Port1
Bus2/Port2
...

Since there is a maximum of toal 127 USB devices, this scan should be pretty quick.
If you find a device with matching vendor and product ID, you increment an internal
counter. If that counter matches your instance parameter, you open that device.

The ultimate solution of course is to put an additional address into each device, so
you can distinguish them easily. For a out-of-the box Web cam you probably have no
chance, but for the home-made MSCB nodes I put such an address into each node, so I
can distinguish them even if the have the same product and vendor ID.

mhttpd "edit on start" broken for arrays

> If a variable under "/experiment/edit on start/" is an array, it is correctly
> offered for editing on the "start run page", but then all elements in the array
> end up set to the value of the first element.

You are right. This was was there from the beginning, you are just the first one
trying "edit on start" with an array. I applied your fix and committed to SVN
reviwion 3013.

Stefan

mhttpd "edit on start" broken for arrays

If a variable under "/experiment/edit on start/" is an array, it is correctly
offered for editing on the "start run page", but then all elements in the array
end up set to the value of the first element.

This appears to be an error in mhttpd.c:interprete(), in the "start dialog"
section. The non-working version in CVS reads:

               for (j = 0; j < key.num_values; j++) {
                  size = key.item_size;
                  sprintf(str, "x%d", n++);
                  db_sscanf(getparam(str), data, &size, j, key.type);
                  db_set_data_index(hDB, hsubkey, data, size + 1, j, key.type);
               }

the fix that works for me reads:
                  db_sscanf(getparam(str), data, &size, 0, key.type);

(notice: the argument "j" is replaced with "0").

The way I understand this, all array elements are encoded into individual HTTP
thingy strings, named sequentially x0, x1, ... and when we parse the values out
of them, the array index should never show up.

(Stefan, if you can, please commit a fix to svn).

K.O.

Handling multiple identical USB devices

When I wrote the musbstd.h "open" method, I kind of punted on the problem of
handling multiple identical USB devices. Instead of a real solution, I added an
"instance" parameter, which allows one to "open" the "first", "second", etc USB
device, as listed in a magic random system dependant order.

Normally, USB devices are identified by two 16-bit integers: manufacturer ID and
product ID (i.e. as reported by "lsusb"). This works well until one has more
than one "identical" device. Two years ago, I had 5 identical USB cameras
(optical alignement system for TRIUMF-TWIST); last year, I had multiple USB
serial adapters; today I have two identical USB-TPC interfaces.

Most of the time, the devices are plugged into the same USB ports, so
theoretically, one should be able to tell exactly which one is which ("upstream
camera is plugged into port 1, downstream camera is plugged into port 2"). But
in the magic system dependant enumeration order, they keep moving around,
depending on the order of enumeration, history of powering up and down, phase of
the Moon, etc.

So my generic "musbstd" method of "open first", "open second", etc turned out to
be completely disfunctional.

So far, I am unable to come up with a system independant solution. But I have a
solution for Linux and maybe for MacOSX:

1) on Linux, I can use the information parsed from /proc/bus/usb/devices to say
"please open the USB device on USB bus 1, port 1", the so called USB device
"path", as seen in the system log and in /sys/bus/usb/devices.

2) on MacOSX, I was unable to find a way to discover the USB topology, but they
seem to maintain an uint32_t "location", which they promise to keep at least
across reboots (did not check this yet).

3) Windows I did not look at yet.

So we have a choice:

a) use system dependant "musb_open_linux(usbpath,vendor,product)",
"musb_open_macosx(???,vendor,product)", etc

b) create order out of chaos by manually keeping a map of "instances" (first,
second, third device) to "persistant addresses". On Linux, it would be a file
containing something like this: "USB-TPC-0 is on bus1-port1, USB-TPC-1 is on
bus1-port2". Then again, I can say "please open USB-TPC interface instance 0" or
"instance 1", etc. There is a small difficulty with dealing with devices
temporarily or permanantly going away, or changing physical addresses ("I moved
the USB device from port 1 to port 3"). This could be handled by telling the
user "hmm... USB topology has changed, please delete the map file and try
again", or we could come up with something more user friendly.

Any thoughts?

P.S. For my immediate need (I need this tomorrow), I will write a
musb_open_linux(usbpath,vendor,product) function.

K.O.

minor changes to run transition code

> I am now considering allowing the run to end even if some clients cannot be
> contacted. The begin, pause and resume transitions would continue to fail if
> clients cannot be contacted.

Sounds like a good idea.

- Stefan

minor changes to run transition code

> Minor changes to run transitions code:
> - fail transition if cannot connect to one of the clients

This change introduced a problem:
1) a run is happily taking data
2) a frontend crashes
3) the web interface cannot stop the run (cannot contact the crashed frontend)
until  it is removed by the timeout (10-60 seconds?).

I am now considering allowing the run to end even if some clients cannot be
contacted. The begin, pause and resume transitions would continue to fail if
clients cannot be contacted.

K.O.

How do I do custom event building?

> It turns out the the standard event builder fragment matching algorithm cannot
> be used in my TPC application. I have two TPC-USB interfaces, which lack any
> "busy" or synchronization logic. I send the hardware trigger into both
> interfaces, and if one of them misses it, the data is out of sync forever. Consider:
> 
> Hardware
> trigger    trig1     trig2    trig3    trig4
> TPC01      serial1   serial2  serial3  serial4
> TPC02      serial1  (missing) serial2  serial3
> 
> With the event builder matching only the event serial numbers, the first event
> will be okey, but the second event will have trig2 data from TPC01 and trig3
> data from TPC02, etc.

Well, I would say: this is a very poor design of an experiment. Before curing the
problems in software, I first would consider a redesign of the data readout scheme with
a global hardware trigger and a hardware busy.

> So in each frontend, I have a high-precision timestamp (gettimeofday(), usec
> resolution) and I would like to have the event builder match the timestamps
> instead of event serial numbers.

What do you do if the frontend clock drifts away? I have seen drifts of up to 10 sec/day
on some PCs, so your required accuracy of 1/50 s would be violated after 3 minutes. You
would have to synchronize your clocks constantly. If your synchronization algorithm
determines a clock is out of sync and adjusts it, and the delta t is more than 1/50 sec,
you are screwed.

So all together I conclude that this proposed synchronization scheme is pretty dangerous
and could ruin the whole experiment.

> What is the best way to do this? The mevb.c
> code does not have any user callbacks for checking "do these fragments belong to
> the same event?".

Pierre can answer that.

- Stefan

midas max event size?

> My TPC events are fairly large: 18 FEC cards * 128 channels per card * 2 Kbytes
> per channel = about 4 Mbytes. In my
> frontend, when I request this event size, MIDAS complaints (in mfe.c) that it is
> bigger than MAX_EVENT_SIZE, which
> is set to 0.5 Mbytes in midas.h. What is the best way to deal with this? Should
> we increase MAX_EVENT_SIZE to
> something bigger? Remove the MAX_EVENT_SIZE limitation altogether?

If you teach me how to remove the MAX_EVENT_SIZE, that would be perfect!

Unfortunately the limit comes from the shared memory on the back end (the so-called
"SYSTEM" shared memory). Due to the structure of the buffer manager, the shared
memory has to hold at least two events simultaneously. And once the shared memeory
is created, it's size cannot be changed without restarting all the clients. That's
the origin of the MAX_EVENT_SIZE. In former days, the total allowed shared memory on
a typical linux machine was 2MB. That's why I set MAX_EVENT_SIZE to 0.5 MB, so midas
takes 2*0.5MB=1MB plus 0.2MB for the ODB, leaving 0.8MB for other applications.
Nowadays, the shared memory might be bigger (actually it's a parameter during kernel
compilation), so one could consider increasing the default MAX_EVENT_SIZE. If you
make a survey of the shared memory sizes in some of the current distributions, we
can choose a safe value.

> For now, I increased the value MAX_EVENT_SIZE & co to (10*1024*1024) and it
> seems to work (I also had to bump the
> sanity check in bm_open_buffer() from 10E6 to 100E6). With 1/4 of the FEC cards,
> the event size is 1 Mbyte at ~6
> ev/sec the machine is almost idle, with the biggest CPU user being the event
> builder at 10% CPU utilization.

I made sure that there is no other limitation as the one given by MAX_EVENT_SIZE, so
it should work fine. Thanks for telling me the wrong sanity check, that should be
changed in the repository.

How do I do custom event building?

It turns out the the standard event builder fragment matching algorithm cannot
be used in my TPC application. I have two TPC-USB interfaces, which lack any
"busy" or synchronization logic. I send the hardware trigger into both
interfaces, and if one of them misses it, the data is out of sync forever. Consider:

Hardware
trigger    trig1     trig2    trig3    trig4
TPC01      serial1   serial2  serial3  serial4
TPC02      serial1  (missing) serial2  serial3

With the event builder matching only the event serial numbers, the first event
will be okey, but the second event will have trig2 data from TPC01 and trig3
data from TPC02, etc.

The problem exists even if the TPC-USB interfaces do not miss any triggers:
during begin and end of run, the interfaces are enabled one at a time, so if a
trigger arrives after the first interface was enabled, but before the second is
enabled, the data starts being out of sync (and if the same happens during the
end-of-run, the event counts from both frontends will match, but all data would
*still* be out of sync).

Obviously additional data is needed to match the fragments.

So in each frontend, I have a high-precision timestamp (gettimeofday(), usec
resolution) and I would like to have the event builder match the timestamps
instead of event serial numbers. What is the best way to do this? The mevb.c
code does not have any user callbacks for checking "do these fragments belong to
the same event?".

P.S. The event rate will be about 1/sec from cosmic ray tests and at most
10-50/sec in the M11 beam line at TRIUMF, at these low rates, the gettimeofday()
timestamps should be adequate.

K.O.

midas max event size?

My TPC events are fairly large: 18 FEC cards * 128 channels per card * 2 Kbytes
per channel = about 4 Mbytes. In my
frontend, when I request this event size, MIDAS complaints (in mfe.c) that it is
bigger than MAX_EVENT_SIZE, which
is set to 0.5 Mbytes in midas.h. What is the best way to deal with this? Should
we increase MAX_EVENT_SIZE to
something bigger? Remove the MAX_EVENT_SIZE limitation altogether?
  
For now, I increased the value MAX_EVENT_SIZE & co to (10*1024*1024) and it
seems to work (I also had to bump the
sanity check in bm_open_buffer() from 10E6 to 100E6). With 1/4 of the FEC cards,
the event size is 1 Mbyte at ~6
ev/sec the machine is almost idle, with the biggest CPU user being the event
builder at 10% CPU utilization.

K.O.

I would like to document a few problems I ran into while setting up a new
experiment (two USB interfaces to Alice TPC electronics, plus maybe a USB
interface to CAMAC). I am using a midas cvs checkout from last October, so I am
not sure if these problems exist in the very latest code. I have fixes for all
of them and I will commit them after some more testing and after I figure out
how to commit into this new svn thingy.

- mxml: writing xml into an in-memory buffer probably produces invalid xml
because one of the mxml functions always writes "/>" into writer->fh, which is 0
for in-memory writers, so the "/>" tag goes to the console instead of the xml
data stream.

- hs_write_event() closes fd 0 (standard input), which confuses ss_getch(),
which makes mlogger not work (at least on my machine). I traced this down to the
history file file descriptors being initialized to zero and hs_write_event()
closing files without checking that it ever opened them.

- mevb: event builder did not work with a single frontend (a two-liner fix, once
Pierre showed me where to look. Why? My second TPC-USB interface did not yet
arrive and I wanted to test my frontend code. Yes, it had enough bugs to prevent
the event builder from working).

- mevb: consumes 100% CPU. Fix: add a delay in the main busy-loop.

- mlogger ROOT tree output does not work for data banks coming through the event
builder: mlogger looks for the bank definition under the event_id of mevb, in 
/equipment/evb/variables, which is empty, as the data banks are under
/equipment/frontendNN/variables. This may be hard to fix: bank "TPCA" may be
under "fe01", "TPCB" under "fe02" and mlogger knows nothing about any of this.
Fix: go back to .mid files.

K.O.

If you want to run MIDAS on Cygwin, make sure you have cygserver running. First set a Windows system environment variable CYGWIN=server. This is best done through the Control Panel -> System -> Advanced -> Environment Variables. Then run /usr/bin/cygserver-config in a Cygwin console window. Then reboot. After that your MIDAS executables should run properly.

If cygserver is not running, one (obvious) symptom is that odbedit fails immediately with a "Bad system call" error.

I've only tested this so far with odbedit and an offline analyzer that generates histograms in the same structure . Both of those work properly.

Endian swapping in mana.c

It was reported that following code in mana.c :

  /* swap event header if in wrong format */
  if (pevent->serial_number > 0x1000000) {
     WORD_SWAP(&pevent->event_id);
     WORD_SWAP(&pevent->trigger_mask);
     DWORD_SWAP(&pevent->serial_number);
     DWORD_SWAP(&pevent->time_stamp);
     DWORD_SWAP(&pevent->data_size);
  }

does not work correctly for events having a true serial number above 16777216 (=0x10000000). After some considerations, I concluded that there is no good way to determine automatically the endian format of midas events, without adding another field in the header, which would break the compatibility with all recorded data up to date. I therefore changed the above code to

  /* swap event header if in wrong format */
#ifdef SWAP_EVENTS
  WORD_SWAP(&pevent->event_id);
  WORD_SWAP(&pevent->trigger_mask);
  DWORD_SWAP(&pevent->serial_number);
  DWORD_SWAP(&pevent->time_stamp);
  DWORD_SWAP(&pevent->data_size);
#endif

So if one wants to analyze events with the midas analyzer on a PC system for example where the events come from a VxWorks system with the opposite endian encoding, one has to set the flag -DSWAP_EVENTS when compiling the analyzer for that type of analysis.

Where to put drivers?

Stefan Ritt wrote:

We have both the example experiment and the MSCB Makefile which both expect to find the midas drivers under $MIDASSYS/drivers/camac or $MIDASSYS/drivers/usb. The documentation does not explicitely mention to define MIDASSYS as /usr/local, but some people do it. That however requires to put all drivers then under /usr/local/drivers, which is not the case in the current Makefile for midas. Do you think that we should add this? Or should we better ask (->documentation) people to define MIDASSYS to wherever they install the midas package (usually /usr/home/<name>/midas or so)?

Pierre-Andr� Amaudruz wrote:

The purpose of the MIDASSYS introduction was to permit the placement of the package in the user area as well as publishing the Midas entry point. Doing so, we lessen the necessity to "install" Midas in the standard OS directory such as /opt or /usr/local. Static linking, use of rpath, new "make minimal_install" go in that direction.
Regarding the drivers, organizing the directories per hardware type (camac, vme, fastbus, usb, etc) seems better to me. Originally, we mostly dealt with CAMAC and therefore the diverse Makefile had a default reference to /drivers/bus/(camacrpc). Now that we removed cnaf/rpc from the automatic mfe build, it indicates that CAMAC is no longer the prime hardware. Then we should leave open to the user the selection of the hardware and document the necessity for him/her to adjust the build appropriately ( $MIDASSYS/drivers/<HW_type> ). The different Makefile examples should be adjusted to the proper driver location they're dealing with.
Pierre-Andr�

I agree with what you say. So I will include the drivers in the ("full") install to be copied under /usr/local/drivers, just for the people using midas in an "installed" way, but we keep the possibility to use a minimal_install to skip the driver installation.

233

06 Nov 2005

Pierre-Andre Amaudruz

Suggestion

Where to put drivers?

Stefan Ritt wrote:

Hi,

I would like to raise the question where to put the midas drivers.

We have both the example experiment and the MSCB Makefile which both expect to find the midas drivers under $MIDASSYS/drivers/camac or $MIDASSYS/drivers/usb. The documentation does not explicitely mention to define MIDASSYS as /usr/local, but some people do it. That however requires to put all drivers then under /usr/local/drivers, which is not the case in the current Makefile for midas. Do you think that we should add this? Or should we better ask (->documentation) people to define MIDASSYS to wherever they install the midas package (usually /usr/home/<name>/midas or so)?

Looking forward to hear your opinion,

Stefan

Pierre-Andr� Amaudruz wrote:

Where to put drivers?

Goto page Previous 1, 2, 3 ... 123, 124, 125 ... 134, 135, 136 Next

ELOG V3.1.4-2e1708b5