Back Midas Rome Roody Rootana
  Midas DAQ System, Page 136 of 136  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topicdown Subject
  28   20 Jul 2004 Konstantin Olchanski Introduction of environment variable MIDASSYS
> > Starting from midas version 1.9.4 on, the environment variable 'MIDASSYS' ...
> 2. What will the entire structure tree look like?
> 
> Here's my suggestion
> MIDASSYS=/opt/midas-1.9.4 (for example)   

Where should MIDAS be installed?

After looking at the LSB and at the FHS, it appears that the standards permit all of:
1) /opt/midas...
2) /usr/{bin,lib,...}
3) /usr/local/{bin,lib,...}

Some handy references:
http://www.pathname.com/fhs/pub/fhs-2.3.html
http://www.linuxbase.org/spec/

The "example LSB-compliant packages" appear to install into /opt/lsb, but I do not see
 any guidance as to where "my" packages should go.

Then, after some googling, I see that IBM "recommends" /opt (see
http://www-106.ibm.com/developerworks/linux/library/l-lsb.html):

begin-quote---
To avoid name space collisions when installing LSB-conforming applications, the
applications belonging to the base operating system or the distribution are to be
installed in /sbin/, /bin/, or /usr/. System administrators can build packages from
source and install them into the /usr/local/ directory. However, third-party packages
of add-on software must be installed in /opt/<package>/, where <package> is the name
that describes a software suite.
end-quote---


K.O.
  29   21 Jul 2004 Stefan Ritt Introduction of environment variable MIDASSYS
> Where should MIDAS be installed?

I personally don't have any preference, as long as it's in accordance with "the standard"
(whatever this is). Maybe one should add a flag to the makefile to specify the
installation directory, either /opt or /usr/local, so people then have the choice. I have
seen that in other packages. As for the RPM, I leave the final proposal to the person
writing the spec file (Paul? Piotr? Konstantin?). We should then commonly agree on the
location based on that proposal. The person supplying the RPM will "officially" become the
RPM maintainer and be responsible for maintaining it.

> installed in /sbin/, /bin/, or /usr/. System administrators can build packages from
> source and install them into the /usr/local/ directory. However, third-party packages
> of add-on software must be installed in /opt/<package>/, where <package> is the name
> that describes a software suite.

Well, midas is kind of in the middle. On one hand it's a third-party package (-> /opt),
but it requires some compilation to allow meaningful work (frontend, analyzer). So maybe
the RPM should go to /opt, and if compiled from the TAR ball it should go to /usr/local?
But that means if someone has to maintain a large basis of midas machines, he/she has to
always search two locations. On the other hand one can alway do a "cd $MIDASSYS" ...

- Stefan
  40   31 Aug 2004 Konstantin Olchanski midas odb locking
One of our experiments is suffering from periodic ODB corruption and I
suspected that there might be a problem with ODB locking. In the last few
days, I finally had time to read the ODB locking code, to write a little
test program and to play with ODB. This is what I found:

1) ODB locking appears to be sound, my test program failed to find any
locking flaws, except for a big problem, described below. Please read on.

2) ODB locking is "unfair". A program "while (1) { lock(); do_stuff();
unlock(); /* no sleep here */ }" would lock out other users of ODB
(including odbedit) for seconds and minutes at a time. I see this as a flaw
in the semop() implementation in the Linux kernel and I cannot think of an
easy way to fix it in our code. (I tested only on RHL9 2.4.20-31.9smp on a
dual CPU machine. 2.6 kernels may work better).

3) presently, we use an infinite timeout waiting for the ODB lock. I suggest
we set the timeout to, say, 5 minutes, to protect against dead (or live)
locks that we saw a few times here at TRIUMF- every ODB client would hang
without any error messages or explanations forever waiting for the ODB lock
that is held by some rogue ODB client stuck in an infinite loop in corrupted
ODB.

4) while reading the locking code in db_{lock,unlock}_database(), I thought
that there is a race condition against the "lock_cnt" variable, until I
realized that this variable is local and there is no race condition. I would
like to comment this in the code?

5) I found a failure mode where db_close_database() erroneously deletes the
lock semaphore. Once the semaphore is deleted, ODB locking silently fails
(in db_lock_database() we do not check for success status of
mutex_wait_for()) and remaining ODB clients operate without locking protection.

This failure happens after ODB undercounts active clients after losing track
of clients removed by "idle timeouts" (and by other checks?). At some point,
db_close_database() decides that there are no more clients left, attempts to
delete the shared memory (this fails because there are still active clients
attached) and deletes the lock semaphore. Afterwards, the remaining "lost"
active clients operate without lock protection. This would tend to happen
while shutting down all clients, a time when they all rush-in to delete
themselves from "/system/clients", unsuring ODB corruption.

A quick solution I just coded would not work for mmap()-based shared memory
(I destroy the lock semaphore after the ODB shared memory is destroyed) as
this relies on "shm_nattch" counting feature of System-V shared memories,
absent in the mmap() based shared memories. Since the Windows implementation
uses mmap(), my "solution" is an obvious no-go. Alternatives would be to add
a second semaphore, just for counting active ODB clients (kluncky); or never
delete the semaphore in the first place (dirty, and how does one clear it if
it gets stuck in the locked state?).

For now, I would like to add a check to ss_mutex_waitfor() call in
db_lock_database() and crash if we can't get the mutex. Returning an error
code would be cleaner, but would not work because nobody checks the return
status of db_lock_database(). If can't get the mutex for (say) 5 minutes, I
think we should crash, too- something is very wrong and it is pointless to
continue waiting.

K.O.
  43   31 Aug 2004 Konstantin Olchanski mlogger crash if using mserver.
Our users keep making a simple mistake- they set MIDAS_SERVER_HOST in their
environement. Most midas programs do not mind this- they go through the
mserver, inefficient but benign- except for the mlogger, which dumps core
about 10 seconds after starting. This mightily confuses the users-
everything works perfectly, except for the mlogger, (for most users) the
most obscure and magical part of midas. Obviously they can't take data
without the mlogger and they fail to correlate this crash with editing their
.cshrc file, so we get panic calls at midnight or whenever. And every time,
while debugging midas malfunctions, changes to .cshrc is absolutely the last
place we look for. Ouch!

As it turns out, mlogger does crash if it uses the mserver-
log_system_history() calls db_lock_database(), with a prompt crash because
the mlogger is not directly connected to any ODB (it's mserver is).

Obviously, running the mlogger via the mserver makes no sense, but we should
warn about this rather than dump core.

I propose this patch to src/mlogger.c::log_system_history():

-   db_lock_database(hDB);
-   db_notify_clients(hDB, hist_log[index].hKeyVar, FALSE);
-   db_unlock_database(hDB);
-
+   if (!rpc_is_remote())
+     {
+       db_lock_database(hDB);
+       db_notify_clients(hDB, hist_log[index].hKeyVar, FALSE);
+       db_unlock_database(hDB);
+     }
+   else
+     {
+       cm_msg(MERROR, "log_system_history", "Warning: mlogger is running
remotely via the mserver. This is an unsupported configuration. Please unset
MIDAS_SERVER_HOST and restart the mlogger");
+     }

K.O.
  44   07 Sep 2004 Stefan Ritt mlogger crash if using mserver.
I trapped myself into that problem recently so it's the right time to fix it (;-).

We have two options: 

a) Make the logger work remotely, even if it's suboptimal and 
b) Make the logger refuse to run remotely. 

I have no case where I need to run the logger remotely, so I would opt for b).
This would mean removing the "-h" command line switch and the evaluation of
MIDAS_SERVER_HOST, or just supplying an empty host string to
cm_connect_experiment().

Let me know if you agree, I can then remove the "-h" option. The patch you
suggested I would apply in addition.

- Stefan
  45   15 Sep 2004 Konstantin Olchanski mlogger crash if using mserver.
> I trapped myself into that problem recently so it's the right time to fix it (;-).
> We have two options: 
> a) Make the logger work remotely, even if it's suboptimal and 
> b) Make the logger refuse to run remotely.

After some discussion between Stefan, Pierre and myself, it was decided to disallow
running mlogger remotely via the mserver.

K.O.
  41   15 Sep 2004 Konstantin Olchanski midas odb locking
After some discussion with Stefan-

> 1) ODB locking appears to be sound...
> 2) ODB locking is "unfair"

Stefan reminded me that "priority boosting" is the standard solution for this
problem. Since Linux does not appear to implement this, we may try doing it inside
midas, time permitting. "Fairness" behaviour of Win32, BSD and MacOSX may be worth
investigating.

> 3) presently, we use an infinite timeout waiting for the ODB lock.

I will add a timeout of 10 minutes, then shutdown the ODB client with an error message.

> 4) in db_{lock,unlock}_database(), [there is no] race condition against the
"lock_cnt" variable [because it is local].

I will document this.

> 5) I found a failure mode where db_close_database() erroneously deletes the
> lock semaphore. Once the semaphore is deleted, ODB locking silently fails
> (in db_lock_database() we do not check for success status of
> mutex_wait_for()) and remaining ODB clients operate without locking protection.

I will add a check and shutdown the ODB client with an error message if the lock
cannot be obtained (the mutex was deleted, the "lock" system call returns an error,
etc).

> [how to decide when the last ODB client disconnected from the shared memory and
when to delete the lock semaphore?]

We considered using a counting semaphore to count active ODB clients, if counting
semaphores do the right things on all supported systems (Linux, Win32, MacOSX).

K.O.
  42   16 Sep 2004 Stefan Ritt midas odb locking
> I will add a timeout of 10 minutes, then shutdown the ODB client with an error message.

I added a timeout handling to db_lock_database. It was already present in
ss_mutex_wait_for, so it was just a matter of passing the status up the calling stack.
ODBEdit stops if it cannot obtain a lock after 5 minutes.
  38   21 Sep 2004 Konstantin Olchanski ODB-EPICS gateway
At TRIUMF, we use several different versions of code to interface MIDAS and
EPICS (http://www.aps.anl.gov/epics). Now that we more or less understand
our needs, I propose this design for a simplified "EPICS" MIDAS frontend. I
would like to keep this new front end in the MIDAS CVS repository, possibly
replacing the existing EPICS frontend in examples/epics.

The basic idea is to provide an ODB-driven bi-directional gateway between
EPICS and ODB with this functionality: periodically read EPICS data and save
it in ODB, optionally generate MIDAS events with EPICS data; for writing
data to EPICS, use hotlinks- if the user changes "write" variables in ODB,
the changes are sent to EPICS.

1) ODB structure
   /equipment/epicsgw/
     common/...
     statistics/...
     variables/
       epics[...] <--- EPICS->ODB data (double[])
       write[...] <--- ODB->EPICS data (double[])
     settings/
       num epics  <--- number of epics variables (int)
       num write  <--- number of write variables (int)
       names epics <--- human-readable names for EPICS variables (string[])
       names write <--- human-readable names for EPICS variables (string[])
       chans epics <--- EPICS channels for epics-read data (string[])
       chans write <--- EPICS channels for epics-write data (string[])
       period      <--- EPICS read period in milliseconds (int[])
       enable epics <-- enable (y/n) epics-read (bool[])
       enable write <-- enable (y/n) epics-write (bool[])
       enable events <- enable event generation (bool)

2) EPICS to ODB data path: periodically read each enabled "epics" variable
and write the data values to ODB. Other front ends can hotlink the "epics"
variables to receive updated epics data.

3) ODB to EPICS data path: monitor the hotlink to ".../variables/write". If
data changes, send the changes to EPICS. At startup, write all "write"
variables to EPICS.

4) event generation: TBD.

5) error handling: TBD.

K.O.
  39   21 Sep 2004 Stefan Ritt ODB-EPICS gateway
The easiest way to achieve this is to write a new class driver, probably derived
from the multi.c class driver. One has just to rename all "output" with "write"
(or better "ODB2EPICS") and all "input" with "EPICS2ODB". The multi class driver
handles already a factor/offset for each channel (which could be 1/0 of course),
a threshold to update the ODB/EPICS only when a value changes significantly, to
retrieve labes from the bus driver (EPICS labes -> ODB settings), automatic
event generation and error handling. So it would be a good starting point.

What one gets from the class driver in the ODB is:

  /equipment/<name>/
     variables/
        Input[]     <--- read from the bus driver (float)
        Output[]    <--- witten to the bus driver (float)
     settings/
        Names Input[]        <--- human readable names
        Names Output[]       <--- human readable names
        Update Threshold[]
        Input Offset[]
        Input Factor[]
        Output Offset[]
        Output Factor[]
        Devices/
           Input/
              DD/   <--- parameters for Device Driver
                 ... Epics addresses, flags etc.
              BD/   <--- parameters for Bus Driver
           Output/
        
So if one uses the standard mfe.c code together with the multi.c class driver
and epics_ca.c device driver all what is left is the following:

- replace cd_gen.c by multi.c in the examples/epics directory
- break down the already existing flags into enable epics/write/events
- maybe add th EPICS read period

The last two things should be done in the epics_ca.c device driver, so one can
use the multi.c class driver without any change. Event generation and error
handling then comes for free.
  Draft   27 Jun 2019 Hassan  
  Draft   20 Feb 2020 Marius Koeppel  
We also agree and found the problem now. Since we build everything (MIDAS Event Header, Bank Header, Banks etc.) in the FPGA we had some struggle with the MIDAS data format (http://lmu.web.psi.ch/docu/manuals/bulk_manuals/software/midas195/html/AppendixA.html). We thought that only the MIDAS Event needs to be aligned to 64 bit but as it turned out also the bank data (Stefan updated the wiki page already) needs to be aligned. Since we are using the BANK32 it was a bit unclear for us since the bank header is not 64 bit aligned. But we managed this now by adding empty data and the system is running now.

Our setup looks like this:

- mfe.cxx multithread equipment
- mfe readout thread grabs pointer from dma ring buffer 
- since the dma buffer is volatile we do copy_n for transforming the data to MIDAS 
- the data is already in the MIDAS format so done from our side :)
- mfe readout thread increments the ring buffer
- mfe main thread grabs events from ring buffer, sends them to the mserver

From the firmware side we have an Arria 10 development board and 

But now I am curious, which DMA controller you use? The Altera or Xilinx PCIe block with the vendor supplied DMA driver? Or you do DMA on an ARM SoC FPGA? (no PCI/PCIe, 
different DMA controller, different DMA driver).

I am curious because we will be implementing pretty much what you do on ARM SoC FPGAs pretty soon, so good to know
if there is trouble to expect.

But I will probably use the tmfe.h c++ frontend and a "pure c++" ring buffer instead of mfe.cxx and the midas "rb" ring buffer.

(I did not look at your code at all, there could be a bug right there, this ring buffer stuff is tricky. With luck there is no bug
in your dma driver. The dma drivers for our vme bridges did do have bugs).

K.O.
  Draft   04 Jun 2020 Lukas Gerritzen stime() deprecated in glibc 2.31
In glibc 2.31, the stime function was deprecated:

* The obsolete function stime is no longer available to newly linked
  binaries, and its declaration has been removed from <time.h>.
  Programs that set the system time should use clock_settime instead.

https://sourceware.org/legacy-ml/libc-announce/2020/msg00001.html

This creates a problem in src/system.cxx:3197:4
ELOG V3.1.4-2e1708b5