Back Midas Rome Roody Rootana
  Midas DAQ System, Page 29 of 154  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  775   10 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > recorded in the file .SHM_HOST.TXT.

Due to a typo in src/system.c svn rev 5125, ss_shm_delete() did not work at all. This broke "odbedit -R", "odbedit -s 5000000" (to change ODB size), etc. 
Fixed in src/system.c svn rev 5134. (It is safe to update just tis one file to fix this problem).

Sorry for the inconvenience,
K.O.
  776   11 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > > recorded in the file .SHM_HOST.TXT.


Because the mserver did not setup correct experiment name and path, POSIX shared memory did not work at all when used with the mserver. Fixed in mserver.c rev 5135


Sorry for the inconvenience,
K.O.
  919   22 Oct 2013 Konstantin OlchanskiInfomidas programs "auto start", etc
MIDAS "programs" settings include: /programs/xxx/"auto start", "auto restart" and "auto stop". What do 
they do?

"auto start":

if set to "y", the program's "start command" will be unconditionally executed at the beginning of the run 
start transition.

Because there are no checks or tests, the "start command" will be executed even if the program is already 
running. It means that this function cannot be used to start frontend programs - a new copy will be 
started each time, and a previously running copy will be killed.

Also the timing of the program startup and run transition is wrong - in my tests, the program starts too 
late to see the run transition. If the program is a frontend, it will never see the begin-of-run transition.

1st conclusion: "auto start" should be "n" for frontend programs and for any other programs that are 
supposed to be continuously running (mlogger, lazylogger, etc).

2nd conclusion: "auto start" does the same thing as "/programs/execute on start run".

"auto stop":

if set to "y", the program will be stopped after the end of run. (using cm_shutdown).

"auto restart":

this has nothing to do with starting and stopping runs. Instead, it works in conjunction with the alarm 
system and the "program is not running" alarm.

The alarm system periodically calls al_check(). al_check() checks all programs defined under /Programs to 
see if they are running (using cm_exist()). If a program is not running and an alarm is defined, the alarm is 
raised ("program is not running" alarm). If there is a start command and "auto restart" is set to "y", the 
start command is executed.

When using these "auto start" and "auto restart" functions, one needs to be careful about the context 
where the start command will be executed: midas clients may be running from different directories, under 
different user names and on different computers.

In "auto start", the start command is executed from cm_transition. For remote clients, this will happen on 
the remote computer. (against the expectation that the program will be started on the main computer).

In "auto restart", the start command is executed by al_check() which always runs locally (for remote 
clients, it runs inside the mserver). So the started program will always run on the main computer, but 
maybe not in the same directory as when started from the mhttpd "programs -> start" button.

Conclusion:

"programs auto start" : works but has strange interactions and side effects, do not use it.
"programs auto stop" : works, can be used to stop programs at the end of run (but what for?)
"programs auto restart" : works, seems to work correctly, can be used to auto restart mlogger, frontends, 
etc.

K.O.
  926   06 Nov 2013 Stefan RittInfomidas programs "auto start", etc
> "programs auto start" : works but has strange interactions and side effects, do not use it.
> "programs auto stop" : works, can be used to stop programs at the end of run (but what for?)
> "programs auto restart" : works, seems to work correctly, can be used to auto restart mlogger, frontends, 

auto start and auto stop have been requested by PAA loooong time ago. Maybe he remembers if/where this has been used at all. I never used it. So if 
this is the case for others, we can easily change it and won't break anything. Like auto start can be executed before the run transition happens, check 
for a previous version of the program, and only continue when the program is actually running. Should be only a few lines of code. Auto restart is used 
successfully here at PSI, for example for the lazy logger.

/Stefan
  2420   08 Aug 2022 Konstantin OlchanskiInfomidas on ubuntu LTS 22.04
reporting that as of commit 78f707c0686d22f8329c7a1f1c46d7dccf35ceff, midas builds 
without errors or warnings on Ubuntu LTS 22.04, 20.04, CentOS-7 and MacOS 12.4. 
(except for some warnings from mscb and msc). K.O.
  1758   12 Jan 2020 Konstantin OlchanskiInfomidas on centos-8 status
I now have a centos-8 computer and I tried midas on it:

- the develop and midas-2019-09 branches build, mhttpd runs
- there are compiler warnings about use of strncpy() that need to be looked into, but see https://stackoverflow.com/questions/50198319/gcc-8-wstringop-truncation-what-is-
the-good-practice
- mhttpd built-in https support does not seem to work (see the other forum thread)
- apache httpd proxy for https can be made to work, but there are problems with certbot.

K.O.
  175   24 Nov 2004 chris pearsonInfomidas on 64bit opteron
   Midas, version 1.9.5 of 7th October, was installed, with a few changes, on a
64 bit opteron computer, running linux.  For this processor, as for the alpha
processor, long integers and addresses are 64 bits.  We added a new flag in the
Makefile,

250a251
> ARCH   = $(shell uname -m)
377a379,381
> ifeq ($(ARCH),x86_64)
> OSFLAGS := $(OSFLAGS) -DX86_64
> endif

and extended the alpha-specific definitions, of DWORD and PTYPE, in midas.h to
include this case,

549c549
< #ifdef __alpha
---
> #if defined(__alpha) || defined(X86_64)
598c598
< #ifdef __alpha
---
> #if defined(__alpha) || defined(X86_64)

apart from this, there are a large number of cases where pointers are cast to
integers, without using the PTYPE definition.  These all need to be changed by
hand, although these conversions should probably be removed anyway - in almost
all cases they are unnecessary, as just differences are being calculated.

There were also a number of warnings, which we ignored, where printf format
strings specified long integers, but the argument was not a long integer.  Casts
should probably be added in all cases where the type of the argument can vary
depending on the machine.

A midas analyser was made, which was able to successfully replay some data, but
this was all that was tested.

Chris
  40   31 Aug 2004 Konstantin Olchanski midas odb locking
One of our experiments is suffering from periodic ODB corruption and I
suspected that there might be a problem with ODB locking. In the last few
days, I finally had time to read the ODB locking code, to write a little
test program and to play with ODB. This is what I found:

1) ODB locking appears to be sound, my test program failed to find any
locking flaws, except for a big problem, described below. Please read on.

2) ODB locking is "unfair". A program "while (1) { lock(); do_stuff();
unlock(); /* no sleep here */ }" would lock out other users of ODB
(including odbedit) for seconds and minutes at a time. I see this as a flaw
in the semop() implementation in the Linux kernel and I cannot think of an
easy way to fix it in our code. (I tested only on RHL9 2.4.20-31.9smp on a
dual CPU machine. 2.6 kernels may work better).

3) presently, we use an infinite timeout waiting for the ODB lock. I suggest
we set the timeout to, say, 5 minutes, to protect against dead (or live)
locks that we saw a few times here at TRIUMF- every ODB client would hang
without any error messages or explanations forever waiting for the ODB lock
that is held by some rogue ODB client stuck in an infinite loop in corrupted
ODB.

4) while reading the locking code in db_{lock,unlock}_database(), I thought
that there is a race condition against the "lock_cnt" variable, until I
realized that this variable is local and there is no race condition. I would
like to comment this in the code?

5) I found a failure mode where db_close_database() erroneously deletes the
lock semaphore. Once the semaphore is deleted, ODB locking silently fails
(in db_lock_database() we do not check for success status of
mutex_wait_for()) and remaining ODB clients operate without locking protection.

This failure happens after ODB undercounts active clients after losing track
of clients removed by "idle timeouts" (and by other checks?). At some point,
db_close_database() decides that there are no more clients left, attempts to
delete the shared memory (this fails because there are still active clients
attached) and deletes the lock semaphore. Afterwards, the remaining "lost"
active clients operate without lock protection. This would tend to happen
while shutting down all clients, a time when they all rush-in to delete
themselves from "/system/clients", unsuring ODB corruption.

A quick solution I just coded would not work for mmap()-based shared memory
(I destroy the lock semaphore after the ODB shared memory is destroyed) as
this relies on "shm_nattch" counting feature of System-V shared memories,
absent in the mmap() based shared memories. Since the Windows implementation
uses mmap(), my "solution" is an obvious no-go. Alternatives would be to add
a second semaphore, just for counting active ODB clients (kluncky); or never
delete the semaphore in the first place (dirty, and how does one clear it if
it gets stuck in the locked state?).

For now, I would like to add a check to ss_mutex_waitfor() call in
db_lock_database() and crash if we can't get the mutex. Returning an error
code would be cleaner, but would not work because nobody checks the return
status of db_lock_database(). If can't get the mutex for (say) 5 minutes, I
think we should crash, too- something is very wrong and it is pointless to
continue waiting.

K.O.
  41   15 Sep 2004 Konstantin Olchanski midas odb locking
After some discussion with Stefan-

> 1) ODB locking appears to be sound...
> 2) ODB locking is "unfair"

Stefan reminded me that "priority boosting" is the standard solution for this
problem. Since Linux does not appear to implement this, we may try doing it inside
midas, time permitting. "Fairness" behaviour of Win32, BSD and MacOSX may be worth
investigating.

> 3) presently, we use an infinite timeout waiting for the ODB lock.

I will add a timeout of 10 minutes, then shutdown the ODB client with an error message.

> 4) in db_{lock,unlock}_database(), [there is no] race condition against the
"lock_cnt" variable [because it is local].

I will document this.

> 5) I found a failure mode where db_close_database() erroneously deletes the
> lock semaphore. Once the semaphore is deleted, ODB locking silently fails
> (in db_lock_database() we do not check for success status of
> mutex_wait_for()) and remaining ODB clients operate without locking protection.

I will add a check and shutdown the ODB client with an error message if the lock
cannot be obtained (the mutex was deleted, the "lock" system call returns an error,
etc).

> [how to decide when the last ODB client disconnected from the shared memory and
when to delete the lock semaphore?]

We considered using a counting semaphore to count active ODB clients, if counting
semaphores do the right things on all supported systems (Linux, Win32, MacOSX).

K.O.
  42   16 Sep 2004 Stefan Ritt midas odb locking
> I will add a timeout of 10 minutes, then shutdown the ODB client with an error message.

I added a timeout handling to db_lock_database. It was already present in
ss_mutex_wait_for, so it was just a matter of passing the status up the calling stack.
ODBEdit stops if it cannot obtain a lock after 5 minutes.
  569   07 May 2009 Konstantin OlchanskiInfomidas misc timeout fixes
(catching up on recent changes from t2k and pienu)

Various timeout problems fixed:
- cm_transition() timeouts now settable from ODB (/experiment/transition timeout, transition connect 
timeout). Rev 4479
- rpc_client_call() timeout did not work because of bad select() and alarm() interaction. Rev 4479
- implement rpc connect timeout (was hardwired 10 sec) via rpc_{set,get}_option(-2, RPC_OTIMEOUT). Rev 
4478
- ss_mutex_wait_for() timeout only worked if 1Hz alarm() interrupts are present. Now I use semtimedop() 
and timeout should always work. Rev 4472

K.O.
  577   15 May 2009 Konstantin OlchanskiInfomidas misc timeout fixes
> - cm_transition() timeouts now settable from ODB (/experiment/transition timeout, transition connect timeout). Rev 4479

transition connect timeout was actually only half of that specified because of an error in computing timeout arguments to the select() system 
call in recv_string() in system.c. This is now fixed.

rev 4488
K.O.
  578   15 May 2009 Konstantin OlchanskiInfomidas misc timeout fixes
> - cm_transition() timeouts now settable from ODB (/experiment/transition timeout, transition connect timeout). Rev 4479

transition connect timeout was actually only half of that specified because of an error in computing timeout arguments to the select() system 
call in recv_string() in system.c. This is now fixed.

rev 4488
K.O.
  238   22 Dec 2005 Konstantin OlchanskiInfomidas max event size?
My TPC events are fairly large: 18 FEC cards * 128 channels per card * 2 Kbytes
per channel = about 4 Mbytes. In my
frontend, when I request this event size, MIDAS complaints (in mfe.c) that it is
bigger than MAX_EVENT_SIZE, which
is set to 0.5 Mbytes in midas.h. What is the best way to deal with this? Should
we increase MAX_EVENT_SIZE to
something bigger? Remove the MAX_EVENT_SIZE limitation altogether?
  
For now, I increased the value MAX_EVENT_SIZE & co to (10*1024*1024) and it
seems to work (I also had to bump the
sanity check in bm_open_buffer() from 10E6 to 100E6). With 1/4 of the FEC cards,
the event size is 1 Mbyte at ~6
ev/sec the machine is almost idle, with the biggest CPU user being the event
builder at 10% CPU utilization.

K.O.
  240   23 Dec 2005 Stefan RittInfomidas max event size?
> My TPC events are fairly large: 18 FEC cards * 128 channels per card * 2 Kbytes
> per channel = about 4 Mbytes. In my
> frontend, when I request this event size, MIDAS complaints (in mfe.c) that it is
> bigger than MAX_EVENT_SIZE, which
> is set to 0.5 Mbytes in midas.h. What is the best way to deal with this? Should
> we increase MAX_EVENT_SIZE to
> something bigger? Remove the MAX_EVENT_SIZE limitation altogether?

If you teach me how to remove the MAX_EVENT_SIZE, that would be perfect!

Unfortunately the limit comes from the shared memory on the back end (the so-called
"SYSTEM" shared memory). Due to the structure of the buffer manager, the shared
memory has to hold at least two events simultaneously. And once the shared memeory
is created, it's size cannot be changed without restarting all the clients. That's
the origin of the MAX_EVENT_SIZE. In former days, the total allowed shared memory on
a typical linux machine was 2MB. That's why I set MAX_EVENT_SIZE to 0.5 MB, so midas
takes 2*0.5MB=1MB plus 0.2MB for the ODB, leaving 0.8MB for other applications.
Nowadays, the shared memory might be bigger (actually it's a parameter during kernel
compilation), so one could consider increasing the default MAX_EVENT_SIZE. If you
make a survey of the shared memory sizes in some of the current distributions, we
can choose a safe value.

> For now, I increased the value MAX_EVENT_SIZE & co to (10*1024*1024) and it
> seems to work (I also had to bump the
> sanity check in bm_open_buffer() from 10E6 to 100E6). With 1/4 of the FEC cards,
> the event size is 1 Mbyte at ~6
> ev/sec the machine is almost idle, with the biggest CPU user being the event
> builder at 10% CPU utilization.

I made sure that there is no other limitation as the one given by MAX_EVENT_SIZE, so
it should work fine. Thanks for telling me the wrong sanity check, that should be
changed in the repository.
  849   18 Dec 2012 xelapForummidas installation on SL6.3
I try to do make in zlib folder and got  this
cc -O -o example example.o -L. -lz
/usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches
non-TLS reference in ./libz.a(gzio.o)
/lib/libc.so.6: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [example] Error 1

Do I miss any package to be installed?
Thanks in advance,
Xelap
  2288   11 Oct 2021 Konstantin OlchanskiForummidas forum updated, moved
The midas forum software (elogd) was updated to latest version and moved from our old server 
(ladd00.triumf.ca) to our new server (daq00.triumf.ca).

The following URLs should work:

https://daq00.triumf.ca/elog-midas/Midas/ (new URL)
https://midas.triumf.ca/elog/Midas/ (old URL, redirects to daq00)
https://midas.triumf.ca/forum (link from midas wiki)

The configuration on the old server ladd00.triumf.ca is quite tangled between
several virtual hosts and several DNS CNAMEs. I think I got all the redirects
correct and all old URLs and links in old emails & etc still work.

If you see something wrong, please reply to this message here or email me directly.

K.O.
  1391   29 Aug 2018 Konstantin OlchanskiForummidas forum mail relay changed to smtp.triumf.ca
Per changes at TRIUMF, the MIDAS forum mail relay was changed from trmail.triumf.ca to 
smtp.triumf.ca. K.O.
  1112   16 Sep 2015 Konstantin OlchanskiForummidas forum elog updated
the midas forum elog is updated to latest version from Stefan - ELOG V3.1.1-b4d2a37 built from git sources. K.O.
  2963   20 Mar 2025 Konstantin OlchanskiBug Reportmidas equipment "format"
we are migrating the dragon experiment from an old mac to a new mac studio and we ran into a problem 
where one equipment format was set to "fixed" instead of "midas". lots of confusion, mdump crash, 
analyzer crash, etc. (mdump fixes for this are already committed).

it made us think whether equipment format is still needed. in the old days we had choice of MIDAS and 
YBOS formats, but YBOS was removed years ago, and I was surprised that format FIXED was permitted at 
all.

I did a midas source code review, this is what I found:

- remnants of YBOS support in a few places, commit to remove them pending.
- FORMAT_ROOT is used in mlogger for automatic conversion of MIDAS banks to ROOT trees
- FORMAT_FIXED is used in a few slow control drivers in drivers/class, instead of creating MIDAS 
banks, they copy raw data directly into an event (there is no bank header and no way to identify such 
events automatically)
- lots of code to support different formats in mdump (mostly dead code)
- the rest of the code does not care or use this format stuff

Current proposal is to remove support for all formats except FORMAT_MIDAS (and FORMAT_ROOT in 
mlogger).

- defines of FORMAT_XXX will be removed from midas.h
- "Format" will be removed from ODB Equipment/Common
- "Format" will be removed from ODB Logger/Channel
- to maintain binary compatibility, we can keep the "Format" ODB entries, but they will be ignored.

List of slow control drivers that support FORMAT_FIXED:

daq00:midas$ grep FORMAT_FIXED drivers/class/*
drivers/class/cd_fdg.cxx:   if (fgd_info->format == FORMAT_FIXED) {
drivers/class/cd_ivc32.cxx:   if (hv_info->format == XFORMAT_FIXED) {
drivers/class/cd_rf.cxx:	if (rf_info->format == XFORMAT_FIXED) 
drivers/class/generic.cxx:   if (gen_info->format == XFORMAT_FIXED) {
drivers/class/hv.cxx:   if (hv_info->format == XFORMAT_FIXED) {
drivers/class/multi.cxx:   if (m_info->format == XFORMAT_FIXED) {
drivers/class/slowdev.cxx:   if (gen_info->format == XFORMAT_FIXED) {
daq00:midas$ 

K.O.
ELOG V3.1.4-2e1708b5