ID |
Date |
Author |
Topic |
Subject |
757
|
15 Apr 2011 |
Konstantin Olchanski | Forum | How large does "bank32" support? | > Reading an FADC buffer often needs large buffer size, especially while several
> FADCs work together. I want to know how large a bank32 can support.
Limitations in order:
- bank32 size is limited to a 32 bit integer size (about 4000 Gbytes)
- bank size is limited by event size
- event size in a midas mfe.c based frontend is limited to the value of
max_event_size set by the user
- maximum event size that can go through the MIDAS event buffer system is limited
to ODB value /Experiment/MAX_EVENT_SIZE (MAX_EVENT_SIZE in midas.h does not do
anything now)
- maximum event size is limited to *half* the size of the SYSTEM shared memory
event buffer (or any other buffers that the event has to go through)
- default size of the SYSTEM buffer is 8 Mbytes set by ODB /Experiment/Buffer
sizes/SYSTEM. This limits maximum event size to about 4 Mbytes.
- size of SYSTEM buffer can be increased to arbitrary size, but in practice no
bigger than the amount of computer physical memory minus space needed for running
the frontend program and the mlogger, which also allocate buffer space to hold 1
event of maximum size.
So for a computer with 8 Gbytes of RAM, you can make the SYSTEM buffer size 4
GBytes, set ODB MAX_EVENT_SIZE to 1 Gbyte, and in theory, you should be able to
write 1 Gbyte events from your frontend to disk.
In practice, I think the biggest events I have seen go through a MIDAS system are
non-compressed waveforms in the T2K/ND280 FGD and TPC detectors, about 4 Mbytes of
data per event.
Other considerations (rules of thumb):
1) the SYSTEM event buffer should be big enough to hold 10-100 events.
2) the SYSTEM event buffer should be big enough to hold about 5-10 seconds of data
- i.e. if your event size is 1 Mbyte and data rate is 1 Hz, 10 seconds of data
will be 1Mbyte*1Hz*10sec = 10 Mbytes.
This is because the SYSTEM buffer decouples the real-time activity of the frontend
program from non-real-time activity of writing data to storage devices.
K.O. |
764
|
17 Jun 2011 |
Konstantin Olchanski | Forum | ladd00.triumf.ca https ssl certificate update | The HTTPS SSL certificate on ladd00.triumf.ca has been updated. Same as the old
certificate, the new one is self-signed and your web browser may complain about
that and ask you to "save a security exception".
When you save the new certificate, you can verify that you are connected to the
real ladd00.triumf.ca by comparing the "SHA1 fingerprint" reported by your web
browser to the one given below (as reported by "svn update"):
Certificate information:
- Hostname: ladd00.triumf.ca
- Valid: from Jun 17 23:36:35 2011 GMT until Jun 16 23:36:35 2012 GMT
- Issuer: DAQ, TRIUMF, Vancouver, BC, CA
- Fingerprint: 2a:be:9f:9f:70:d4:dc:72:9f:63:bf:4f:fe:c0:2c:8f:a8:29:f2:f1
K.O. |
769
|
27 Jun 2011 |
Konstantin Olchanski | Suggestion | Build MIDAS debian packages using autoconf/automake. | > I deployed several Debian Linux boxes as the DAQ systems in our lab. But I
feel it's boring to build and install midas and its related softwares (such as
root) on each box.
Our solution at TRIUMF is to install such packages on a shared NFS filesystem
visible to all client computers. This works well for ROOT and but MIDAS we found
it nearly impossible to keep MIDAS versions in sync between different projects
and expiments, so each experiment uses it's own copy of MIDAS, usually located
in the experiment home directory ($HOME/packages/midas). Because we often need
to make local modifications to MIDAS sources (Makefile, etc), we do not
"install" MIDAS into non-user-writable /usr/local & etc.
> I use autoconf/automake
The promise (premise) of autoconf/automake is to "hide" system dependencies. The
scripts are supposed to automatically probe the build environment and construct
an appropriate Makefile.
In practice, the autotool scripts always have bugs and incorrect assumptions
about the build environment and only work well for a few standardized systems
(RHEL and Debian derivatives) where the differences are so trivial that
autotools is an overkill and a normal Makefile is adequate for the job.
In my experience, as soon as I try to build an autotool-ized package on anything
that does not look like RHEL or Debian, autotool scripts explode and have to be
debugged and kludged by hand. Anybody who has ever done that would agree with me
that one would rather hack the ugliest Makefile than any of the autotool
generated gibberish.
And of course autotools have never handled cross-compilation in any reasonable
way. Since we do cross-compile MIDAS (for VxWorks and embedded Linux, see "make
crosscompile") a Makefile is required and it so happens that the same Makefile
also works for normal Linux and MacOS, thank you very much.
> Here are the installation:
> [*] executalbes -- /usr/lib/daq-midas/bin
> [*] library and objs -- /usr/lib/daq-midas/lib
Is this in violation of the LSB (or LFS)? I though they mandate that files
controlled by package manager should be /usr/bin/odbedit, /usr/lib64/libmidas.a,
etc (/usr/bin/midas/odbedit no permitted).
> gcc `mdaq-config --cflags` -c -o myfe.o myfe.c
Please check if your config scripts correctly handle the "-m32" and "-m64" flags
- we frequently cross-compile 32-bit MIDAS executables on 64-bit machines.
K.O. |
770
|
27 Jun 2011 |
Konstantin Olchanski | Info | updated mhttpd history "export" function | The mhttpd history "export" function has been converted to the new midas history
interface and should now work for SQL-based history systems. In the process,
improvements by Eoin Butler (CERN AD-5/ALPHA) were merged - adding a UNIX
timestamp and a better text timestamp. Also now "export" outputs the actual
values from the history file - the scaling values from the definition of the
history plot panel are no longer applied.
Here is an example of the new file format:
Time, Timestamp, Run, Run State, SLOW
2011.06.21 15:45:21, 1308696321, 13292, 3, -89.1007
svn rev 5104
K.O. |
771
|
27 Jun 2011 |
Konstantin Olchanski | Info | midas shared memory changes | A number of changes were made to the midas shared memory implementation for
Linux and MacOS:
1) SysV or POSIX shared memory compile-type choice is removed. Both shared
memory types are compiled-in and are selected at run time.
2) the shared memory type used by an experiment is recorded in the file
.SHM_TYPE.TXT. Currently implemented are "POSIXv2_SHM" (the new default for new
experiments), "POSIX_SHM", "MMAP_SHM" and "SYSV_SHM". (see system.c) (MMAP_SHM
is fully functional but is not recommended). The POSIXv2_SHM uses an improved
filename scheme (on Linux, see "ls -l /dev/shm") and permits multiple
experiments to coexist on a MacOS computer (where there is a severe limit on
shared memory filename length).
3) following a number of mishaps where "odbedit" has been run on the wrong
computer (causing havoc with ODB and .xxx.SHM files), for each experiment, the
hostname of the computer where the ODB shared memory is meant to reside is now
recorded in the file .SHM_HOST.TXT. Typically, this is the machine running
mserver, mhttpd and mlogger. If some client is accidentally started on the wrong
machine or if MIDAS_SERVER_HOST is accidentally left undefined, MIDAS will now
print a stern message reporting the hostname mismatch, tell the user to use the
mserver and refuse to run. The user has the choice of starting the client on the
correct computer (as reported in the error message), using the mserver (start
client with -H flag) or edit/delete the .SHM_HOST.TXT file (full pathname is
reported by the error message).
With this update, MIDAS on MacOS becomes fully functional (before, only one
experiment could be used at a time).
svn rev 5105
K.O. |
772
|
27 Jun 2011 |
Konstantin Olchanski | Info | mlogger lock for runNNN.mid.gz files | By popular request, Stefan R. implemented a locking scheme for mlogger output files.
To use this function, set the mlogger ODB /Logger/Channels/NNN/Settings/Filename
to ".run%05dsub%05d.mid.gz" (note the leading dot).
In this mode, active output files will have a filename with a leading dot
(.run00001sub00001.mid.gz) while the file is being written to. After the file is
closed, it is renamed and the leading dot is removed.
To use this function with the lazylogger, please set ODB
"/Lazy/Foo/Settings/Filename format" to "run*.mid.gz,run*.xml" (note the leading
text "run"). Set "stay behind" to 0.
svn rev 5080 (or so, checking by Stefan R.)
K.O. |
773
|
05 Jul 2011 |
Konstantin Olchanski | Info | midas shared memory changes | > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
An error with creating the file .SHM_TYPE.TXT was corrected in system.c svn rev 5125 - if file did not exist, it is
created correctly, but MIDAS reports "cannot connect to ODB". Second try works correctly because the file exists
now.
> 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> recorded in the file .SHM_HOST.TXT.
This is causing problems on mobile computers where "hostname" changes all the time (i.e. set according to
DHCP on whatever network happens to be connected).
If you run into this problem, keep deleting .SHM_HOST.TXT or use this workaround: disable the hostname check
by making the file .SHM_HOST.TXT empty (zero length).
K.O. |
774
|
05 Jul 2011 |
Konstantin Olchanski | Bug Report | MacOS network socket timeouts non-functional | It turns out that because of differences between select() syscall implementation between UNIX (MacOS,
maybe BSD) and Linux, network socket timeouts do not work.
This affects timeouts during run transitions (transition calls to dead clients do not timeout), maybe other
places.
I am looking into fixing this. The main difficulty is with UNIX select() not updating the timeout parameter
when it is interrupted by the MIDAS watchdog alarm signal. Linux select() subtracts the elapsed time from
the timeout value and this code from system.c works correctly: while (1) { status = select(..., &timeout); if
(status==0) break; } (value of timeout becomes smaller each time), while on MacOS it loops forever (value
of timeout does not change).
K.O. |
775
|
10 Jul 2011 |
Konstantin Olchanski | Bug Fix | midas shared memory changes | > > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > recorded in the file .SHM_HOST.TXT.
Due to a typo in src/system.c svn rev 5125, ss_shm_delete() did not work at all. This broke "odbedit -R", "odbedit -s 5000000" (to change ODB size), etc.
Fixed in src/system.c svn rev 5134. (It is safe to update just tis one file to fix this problem).
Sorry for the inconvenience,
K.O. |
776
|
11 Jul 2011 |
Konstantin Olchanski | Bug Fix | midas shared memory changes | > > > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > > recorded in the file .SHM_HOST.TXT.
Because the mserver did not setup correct experiment name and path, POSIX shared memory did not work at all when used with the mserver. Fixed in mserver.c rev 5135
Sorry for the inconvenience,
K.O. |
777
|
11 Jul 2011 |
Konstantin Olchanski | Info | Make "STOP" run transition always succeed | Over the years, there was some back-and-forth changes in what happens to run transitions when some
of the participants misbehave (do not respond to RPC calls, timeout, crash, etc).
The very original behaviour was to ignore all errors. This resulted in user confusion when some clients
would start, some would not, data from frontends that missed the transition did not arrive, etc.
So it was changed to fail the transition if any client misbehaves.
This left mlogger (who is usually the first one to see the TR_START transition) in a funny state - output
file is open, etc, but there is no run active. This was fixed by adding a TR_STARTABORT transition to tell
mlogger, event builder & co that the just started run did not start after all.
Also at some point code was added to forcefully kill clients that do not respond to run transitions (do
not respond to RPC, timeout, etc).
Recently, it was observed how during unattended overnight operation of a MIDAS DAQ system, with the
logger set to "auto restart", some unnecessary clients misbehave during the run stop transition, and
prevent the run from stopping and restarting. The user comes in the morning and is unhappy that data
taking stopped some time during the night.
midas.c svn rev 5136 changes the TR_STOP transition to always succeed, even if some clients had
transition errors. If these clients are unnecessary for normal operation of the DAQ, the following run
"auto restart" will continue taking data. If those were important clients, data taking will continue the
best it can - it *is* unattended operation - nobody is looking - but users can always setup alarms for
checking that important clients are always running during data taking. (For very important clients, one
can setup alarms to send email, send SMS messages, etc).
K.O. |
781
|
16 Dec 2011 |
Konstantin Olchanski | Bug Report | bk_delete uses memcpy instead of memmove | > In midas.c, the bk_delete function removes a bank by decrementing the total
> event size and then copying the remaining banks into the location of the first
> using memcpy from string.h.
I confirm the documented difference between memcpy() and memmove() and I confirm the
questionable use of memcpy() in bk_delete(). I think it should be memmove(). I made it so in my copy
of midas, so this change will not be lost.
But I am not sure how to test it - I do not think I ever used bk_delete(). I will probably ponder upon
this and do a blind commit.
K.O. |
784
|
29 Feb 2012 |
Konstantin Olchanski | Bug Report | Problem with semaphores | Hi there! In the T2K/ND280 experiment in Japan, we keep having problems with MIDAS locking (probably
of ODB). The symptoms are: some program reports a timeout waiting for the ODB lock, then all programs
eventually die with this same error. Complete system meltdown. This does not look like the deadlock
between locks for ODB, cm_msg and the data buffers that I looked into last year. It looks more like
somebody locks ODB, dies and the Linux kernel fails to unlock the lock (via the SYSV "sem undo"
function). But it is hard to confirm, hence this message:
The implementation of semaphores in MIDAS (used for locking ODB and the shared memory data buffers)
uses the straight SYSV semaphore API - which lacks basic debugging features - there is no tracking of
who locked what when, so if anything at all goes wrong at all, i.e. we are confronted with a timeout
waiting for the ODB lock, the only corrective action possible is to kill all MIDAS clients and tell the user to
start from scratch. There is no additional information available from the SYSV semaphore API to identify
which MIDAS program caused the fault.
The POSIX semaphore API is even worse - no debugging features are available, *and* if a program dies
while holding a lock, the lock stays locked forever (everybody else will wait forever or see a semaphore
timeout, and then what?).
So I am looking for an "advanced semaphore library" to use in MIDAS. In addition to the boring functions
of reliable locking and unlocking, it should support:
- wait with timeout
- remember who is holding the lock
- detect that the process holding the lock is dead and take corrective action (automatic unlock as done by
SYSV semaphores, call back to user code where we can cleanup and unlock ourselves, etc)
- maybe permit recursive locking (not really required as ODB locks are already made recursive "by hand")
- maybe remember some of the locking history (so we can dump it into a log file when we detect a
deadlock or other lock malfunction).
Quick google search only find sundry wrappers for SYSV and POSIX semaphores. How they deal with the
problem of processes locking the semaphore and dying remains a mystery to me (other than telling users
to remove the Ctrl-C button from their keyboard). BTW, we have seen this problem with several
commercial applications that use SYSV semaphores but forget to enable the SEM_UNDO function).
Anyhow, if anybody can suggest such an advanced locking library it would be great. Will save me the
effort of writing one.
K.O. |
788
|
25 Apr 2012 |
Konstantin Olchanski | Bug Report | Build error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’ | Stefan's fix is incomplete - the "gzFile" cast is needed for all calls to zlib, not just those that some version
of GCC happens to complain about. Fixed.
svn rev 5286.
BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and
blinking text, but better than me not reading your stuff at all).
BTW2, for easier reading, please include error messages as plain text in your message. As opposed to
compressed attachements.
K.O. |
791
|
10 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > In midas.c, ...
>
> 1) _net_send_buffer is not set to NULL when declared.
_net_send_buffer is a global variable. All global variables are automatically initialized to zero before the program
starts.
static char*x; // = NULL; is redundant
char*y=realloc(x, 100); // x is NULL, usage is correct
> 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its
> value to NULL.
My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:
if (_net_send_buffer_size > 0) {
M_FREE(_net_send_buffer);
_net_send_buffer_size = 0;
}
What version of midas do you have? (svn info .)
K.O. |
793
|
11 Jun 2012 |
Konstantin Olchanski | Bug Report | _net_send_buffer realloc | > > > In midas.c, ...
> > >
> > > 1) _net_send_buffer is not set to NULL when declared.
>
> Ah,okay. I was not aware of this feature of global variables.
>
RTFM K&R "The C programming language".
http://en.wikipedia.org/wiki/The_C_Programming_Language
>
> > > 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set
> its value to NULL.
>
Confirmed. Sorry for confusion in my previous message. Set the pointer to NULL after free() is good practice.
But note that calling cm_connect and cm_disconnect multiple times is unusual use of MIDAS and you will most
likely find more breakage.
K.O. |
795
|
13 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > Revision: r5286
> Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
> Problem:
> After building and installation, using the script 'start_daq.sh' to start
> 'sampleexpt'. Everything seems fine. But I cannot start a run through web. Using
> 'odbedit' and 'mtransition' to start/stop a run works fine. So, what may cause
> such a problem?
Well, it's mhttpd who cannot start the run, not you. So what happens when you press
the "start run" button? Any errors in midas.log or in midas messages? Is mtransition
in your PATH?
K.O. |
796
|
13 Jun 2012 |
Konstantin Olchanski | Forum | ladd00.triumf.ca https ssl certificate update | The HTTPS SSL certificate on ladd00.triumf.ca has been updated. Same as the old
certificate, the new one is self-signed and your web browser may complain about
that and ask you to "save a security exception".
When you save the new certificate, you can verify that you are connected to the
real ladd00.triumf.ca by comparing the "SHA1 fingerprint" reported by your web
browser to the one given below (as reported by "svn update"):
Certificate information:
- Hostname: ladd00.triumf.ca
- Valid: from Wed, 13 Jun 2012 22:31:51 GMT until Thu, 13 Jun 2013 22:31:51 GMT
- Issuer: DAQ, TRIUMF, Vancouver, BC, CA
- Fingerprint: 82:95:78:cb:78:d3:93:1d:d4:c8:e8:1a:64:0f:62:04:2d:0e:c3:4a
K.O. |
800
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> > And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> > are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> > fine. So, what's the difference between invoking mhttpd directly and through a script?
>
> When you start it with "-D", then mhttpd become a daemon. According to linux rules, it has to "cd /", so it lives in the
> root directory, in order not to block any NFS mount/unmount. If something with the path is not correct then, mhttpd
> cannot find mtransition then. Once I fixed that problem my moving mtransition to /usr/bin.
>
I agree. Somehow mhttpd cannot run mtransition. I am not super happy with this dependance on user $PATH settings and the inability to capture error messages
from attempts to start mtransition. I am now thinking in the direction of running mtransition code by forking. But remember that mlogger and the event builder also
have to use mtransition to stop runs (otherwise they can dead-lock). So an mhttpd-only solution is not good enough...
K.O. |
801
|
14 Jun 2012 |
Konstantin Olchanski | Bug Report | Cannot start/stop run through mhttpd | > > > Revision: r5286
> > > Platform: Debian Linux 6.0.5 AMD64, with packages from squeeze-backports
>
> I found the problem only appears when I run mhttpd in scripts, whether bash or python.
> And I'm quite sure that the MIDAS environments (e.g. PATH, MIDAS_EXPTAB, MIDASSYS, etc.)
> are set in such scripts. If I start mhttpd in an xterm with or without "-D", it works
> fine.
Right. I see Debian 6.0.5 just came out hot off the presses. Would be good to fix this problem.
As a work around, can you run mhttpd without "-D", but in the background, i.e. "mhttpd -p xxx >& mhttpd.log &"?
Also what are your $PATH settings?
> So, what's the difference between invoking mhttpd directly and through a script?
As Stefan mentioned, "-D" invokes some nasty unix magic to disconnect the process from the user login session. It is
possible that this magic breaks in the latest Debian.
MIDAS "-D" does roughly the same thing as "nohup".
K.O. |
|