Back Midas Rome Roody Rootana
  Midas DAQ System, Page 79 of 142  Not logged in ELOG logo
ID Date Authordown Topic Subject
  747   16 Feb 2011 Konstantin OlchanskiBug ReportProblems with midas history SVN 4936
It looks like email notices did not go the first time. Please read my replies below. K.O.

> > I have the following problems after updating to midas SVN 4936: the history 
> > system (web-page via mhttpd) seems to stop working. I checked the history files 
> > themself and they are indeed written, except that the events ID's are not the 
> > same anymore (I mean the ones defined under /Equipment/XXX/Common/Event ID), 
> > rather the mlogger seems to choose an ID by itself.
> 
> Yes, I found the problem - it was introduced around svn rev 4827 in September 2010.
> 
> It is fixed now, please do this:
> 1) update history_midas.c to latest svn rev 4979
> 1a) do NOT update any other files - update only history_midas.c
> 2) rebuild mlogger (it will do no harm and no good if you rebuild everything)
> 3) odbedit save odb.xml
> 4) in odb, remove /history/events and /history/tags (you can also set "/History/DisableTags" to "y")
> 5) restart mlogger
> 6) observe that odb /history/events now has event ids same as equipment ids
> 7) restart your frontend, observe that history file is growing
> 8) use mhdump to observe that history is now written with correct event id
> 9) go to mhttpd history plot, you should see the new data coming in. Plot history in the "1 year" scale, you 
> should see the old data and you should see a gap where data was written with wrong event id
> 10) I should still have  an mhrewrite program sitting somewhere that can change the event ids inside midas 
> history files, if you have many data files with wrong event id, let me know, I will find this program and tell you 
> how to use it to repair your data files.
> 
> > Currently the only way to get things working again was to recompile midas with 
> > adding -DOLD_HISTORY to the CFLAGS which is troublesome since it is likely to be 
> > forgotton with the next SVN update.
> 
> Yes, I am glad you found OLD_HISTORY, I kept it just for the case some breakage like this happens. I will still 
> keep it around until the dust settles.
> 
> > When looking into the SVN I have the  impression there is something going on concerning the history 
> system, however I couldn't find any documentation.
> 
> Yes, you found the right stuff, and it is partially documented. mlogger uses /History/Events to map history 
> event names (equipment names in your case) to history event ids. But in your case, the wrong event id has 
> been assigned by mlogger so nothing worked right. As a bonus, I now see inconsistency between event_id 
> code remaining in mlogger (which is not used) and event_id code in history_midas (which *is* used). I will be 
> straightening this stuff over the next few days.
> 
> I hope my correction to history_midas.cxx is good enough to get you going for now.
> 
> > What is the best practice for the future, in order not to run into any problems 
> > but still being able to look at the old history (also from within the web-page 
> > via mhttpd)?
> 
> Personally, I think that the midas history storage into binary files is not robust enough
> when facing changes to equipment and event ids, renaming and deleting of stuff, etc. There
> are other limitations, as well, i.e. the 16-bit history event id, etc.
> 
> The newly implemented SQL history storage (uses ODBC layer, MySQL supported, PgSQL partially
> implemented) does not have any of these problems and seems to work well enough
> for T2K/ND280. Sometimes MySQL history is even faster when making history plots in mhttpd.
> 
> I am now thinking about implementing SQL history storage in SQLite files, and it will not have
> any of these problems, too. Performance and robustness for database corruption remain a question, though.
> 
> K.O.
  748   16 Feb 2011 Konstantin OlchanskiBug ReportProblems with midas history SVN 4936
> 
> Do you mind giving little more detail? We might have the same issue, where we got
> complaints that midas history stops working after a certain time.
> 


Yes, please do supply more information. What problems do *you* see?


K.O.
  749   16 Feb 2011 Konstantin OlchanskiInfoNotes on MIDAS history
> 
> 1) PerVariableHistory.
> 
> The default value of 0 is intended to operate the midas history in "traditional" mode. In this mode:
> - there is one history record for each equipment
> - history record id is equal to the equipment id
> - /History/Events and /History/Tags are not required and can be safely deleted
>

I now commited an example experiment for testing and using non-per-variable history:
.../midas/examples/history1

I confirm that this example does work as expected after src/history_midas.cxx is updated to latest rev 4979 (today). I guess it also worked just 
fine before breakage in svn rev 4827 last September.

svn rev 4980.



Here is the README file:

Example experiment "history1"

Purpose:
example and test of a simple periodic frontend writing slow controls data into midas history

To run:
use bash shell
assuming MIDAS is installed in $HOME/packages/midas on linux, otherwise edit setup.sh and Makefile
run make to build feslow.exe
run source ./setup.sh
when starting this experiment for the very first time, load experiment settings from odb.xml: odbedit -c "load odb.xml"
run ./start_daq.sh
mlogger and mhttpd should now be running
connect to the midas status page at http://localhost:8080 (port number is set in start_daq.sh
start the example frontend from the "programs" page
observe event number of equipment "slow" is incrementing
go to the "Slow" equipment page (click on "Slow" on the midas status page)
observe numbers are changing when you refresh the web page
from the midas status page, go to "history" -> "slow" - observe history plot for "SLOW[2]" shows a sine wave
from shell, examine contents of history file: "mhdump *.hst"
study feslow.cxx

Enjoy,
K.O.
  750   16 Feb 2011 Konstantin OlchanskiBug Reportfixed. odb corruption, odb race condition?
> My torture test runs okey in my mac now, one remaining problem is spurious client removal caused
> by semaphore starvation...

My torture test runs okey on Linux and I do not see any problems with spurious client removal - actually
I do not see any strange longs waits for semaphores that I was seeing on MacOS. Must be another
proof that MacOS is years behind Linux in kernel technology (but parsecs ahead in user experience)

K.O.
  753   28 Feb 2011 Konstantin OlchanskiInfojavascript example experiment
I just committed a MIDAS example for using most mhttpd html and javascript functions: ODBGet(), 
ODBSet(), ODBRpc() & co.

Please checkout .../midas/examples/javascript1, and follow instructions in the README file.

For your enjoyment, the example html file is attached to this message.

(to use the function ODBRpc_rev1 - the midas rpc call that returns data to the web page - you need 
mhttpd svn rev 4994 committed today. Other functions should work with any revision of mhttpd)

svn rev 4992.
K.O.
Attachment 1: example.html
  757   15 Apr 2011 Konstantin OlchanskiForumHow large does "bank32" support?
> Reading an FADC buffer often needs large buffer size, especially while several
> FADCs work together. I want to know how large a bank32 can support.

Limitations in order:

- bank32 size is limited to a 32 bit integer size (about 4000 Gbytes)
- bank size is limited by event size
- event size in a midas mfe.c based frontend is limited to the value of
max_event_size set by the user
- maximum event size that can go through the MIDAS event buffer system is limited
to ODB value /Experiment/MAX_EVENT_SIZE (MAX_EVENT_SIZE in midas.h does not do
anything now)
- maximum event size is limited to *half* the size of the SYSTEM shared memory
event buffer (or any other buffers that the event has to go through)
- default size of the SYSTEM buffer is 8 Mbytes set by ODB /Experiment/Buffer
sizes/SYSTEM. This limits maximum event size to about 4 Mbytes.
- size of SYSTEM buffer can be increased to arbitrary size, but in practice no
bigger than the amount of computer physical memory minus space needed for running
the frontend program and the mlogger, which also allocate buffer space to hold 1
event of maximum size.

So for a computer with 8 Gbytes of RAM, you can make the SYSTEM buffer size 4
GBytes, set ODB MAX_EVENT_SIZE to 1 Gbyte, and in theory, you should be able to
write 1 Gbyte events from your frontend to disk.

In practice, I think the biggest events I have seen go through a MIDAS system are
non-compressed waveforms in the T2K/ND280 FGD and TPC detectors, about 4 Mbytes of
data per event.

Other considerations (rules of thumb):

1) the SYSTEM event buffer should be big enough to hold 10-100 events.
2) the SYSTEM event buffer should be big enough to hold about 5-10 seconds of data
- i.e. if your event size is 1 Mbyte and data rate is 1 Hz, 10 seconds of data
will be 1Mbyte*1Hz*10sec = 10 Mbytes.

This is because the SYSTEM buffer decouples the real-time activity of the frontend
program from non-real-time activity of writing data to storage devices.

K.O.
  764   17 Jun 2011 Konstantin OlchanskiForumladd00.triumf.ca https ssl certificate update
The HTTPS SSL certificate on ladd00.triumf.ca has been updated. Same as the old
certificate, the new one is self-signed and your web browser may complain about
that and ask you to "save a security exception".

When you save the new certificate, you can verify that you are connected to the
real ladd00.triumf.ca by comparing the "SHA1 fingerprint" reported by your web
browser to the one given below (as reported by "svn update"):

Certificate information:
 - Hostname: ladd00.triumf.ca
 - Valid: from Jun 17 23:36:35 2011 GMT until Jun 16 23:36:35 2012 GMT
 - Issuer: DAQ, TRIUMF, Vancouver, BC, CA
 - Fingerprint: 2a:be:9f:9f:70:d4:dc:72:9f:63:bf:4f:fe:c0:2c:8f:a8:29:f2:f1

K.O.
  769   27 Jun 2011 Konstantin OlchanskiSuggestionBuild MIDAS debian packages using autoconf/automake.
> I deployed several Debian Linux boxes as the DAQ systems in our lab. But I
feel it's boring to build and install midas and its related softwares (such as
root) on each box.


Our solution at TRIUMF is to install such packages on a shared NFS filesystem
visible to all client computers. This works well for ROOT and but MIDAS we found
it nearly impossible to keep MIDAS versions in sync between different projects
and expiments, so each experiment uses it's own copy of MIDAS, usually located
in the experiment home directory ($HOME/packages/midas). Because we often need
to make local modifications to MIDAS sources (Makefile, etc), we do not
"install" MIDAS into non-user-writable /usr/local & etc.


> I use autoconf/automake


The promise (premise) of autoconf/automake is to "hide" system dependencies. The
scripts are supposed to automatically probe the build environment and construct
an appropriate Makefile.

In practice, the autotool scripts always have bugs and incorrect assumptions
about the build environment and only work well for a few standardized systems
(RHEL and Debian derivatives) where the differences are so trivial that
autotools is an overkill and a normal Makefile is adequate for the job.

In my experience, as soon as I try to build an autotool-ized package on anything
that does not look like RHEL or Debian, autotool scripts explode and have to be
debugged and kludged by hand. Anybody who has ever done that would agree with me
that one would rather hack the ugliest Makefile than any of the  autotool
generated gibberish.

And of course autotools have never handled cross-compilation in any reasonable
way. Since we do cross-compile MIDAS (for VxWorks and embedded Linux, see "make
crosscompile") a Makefile is required and it so happens that the same Makefile
also works for normal Linux and MacOS, thank you very much.



> Here are the installation:
> [*] executalbes -- /usr/lib/daq-midas/bin
> [*] library and objs -- /usr/lib/daq-midas/lib


Is this in violation of the LSB (or LFS)? I though they mandate that files
controlled by package manager should be /usr/bin/odbedit, /usr/lib64/libmidas.a,
etc (/usr/bin/midas/odbedit no permitted).


> gcc `mdaq-config --cflags` -c -o myfe.o myfe.c


Please check if your config scripts correctly handle the "-m32" and "-m64" flags
- we frequently cross-compile 32-bit MIDAS executables on 64-bit machines.


K.O.
  770   27 Jun 2011 Konstantin OlchanskiInfoupdated mhttpd history "export" function
The mhttpd history "export" function has been converted to the new midas history
interface and should now work for SQL-based history systems. In the process,
improvements by Eoin Butler (CERN AD-5/ALPHA) were merged - adding a UNIX
timestamp and a better text timestamp. Also now "export" outputs the actual
values from the history file - the scaling values from the definition of the
history plot panel are no longer applied.

Here is an example of the new file format:

Time, Timestamp, Run, Run State, SLOW
2011.06.21 15:45:21, 1308696321, 13292, 3, -89.1007

svn rev 5104
K.O.
  771   27 Jun 2011 Konstantin OlchanskiInfomidas shared memory changes
A number of changes were made to the midas shared memory implementation for
Linux and MacOS:

1) SysV or POSIX shared memory compile-type choice is removed. Both shared
memory types are compiled-in and are selected at run time.
2) the shared memory type used by an experiment is recorded in the file
.SHM_TYPE.TXT. Currently implemented are "POSIXv2_SHM" (the new default for new
experiments), "POSIX_SHM", "MMAP_SHM" and "SYSV_SHM". (see system.c) (MMAP_SHM
is fully functional but is not recommended). The POSIXv2_SHM uses an improved
filename scheme (on Linux, see "ls -l /dev/shm") and permits multiple
experiments to coexist on a MacOS computer (where there is a severe limit on
shared memory filename length).
3) following a number of mishaps where "odbedit" has been run on the wrong
computer (causing havoc with ODB and .xxx.SHM files), for each experiment, the
hostname of the computer where the ODB shared memory is meant to reside is now
recorded in the file .SHM_HOST.TXT. Typically, this is the machine running
mserver, mhttpd and mlogger. If some client is accidentally started on the wrong
machine or if MIDAS_SERVER_HOST is accidentally left undefined, MIDAS will now
print a stern message reporting the hostname mismatch, tell the user to use the
mserver and refuse to run. The user has the choice of starting the client on the
correct computer (as reported in the error message), using the mserver (start
client with -H flag) or edit/delete the .SHM_HOST.TXT file (full pathname is
reported by the error message).

With this update, MIDAS on MacOS becomes fully functional (before, only one
experiment could be used at a time).

svn rev 5105
K.O.
  772   27 Jun 2011 Konstantin OlchanskiInfomlogger lock for runNNN.mid.gz files
By popular request, Stefan R. implemented a locking scheme for mlogger output files.

To use this function, set the mlogger ODB /Logger/Channels/NNN/Settings/Filename
to ".run%05dsub%05d.mid.gz" (note the leading dot).

In this mode, active output files will have a filename with a leading dot
(.run00001sub00001.mid.gz) while the file is being written to. After the file is
closed, it is renamed and the leading dot is removed.

To use this function with the lazylogger, please set ODB
"/Lazy/Foo/Settings/Filename format" to "run*.mid.gz,run*.xml" (note the leading
text "run"). Set "stay behind" to 0.

svn rev 5080 (or so, checking by Stefan R.)
K.O.
  773   05 Jul 2011 Konstantin OlchanskiInfomidas shared memory changes
> 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.

An error with creating the file .SHM_TYPE.TXT was corrected in system.c svn rev 5125 - if file did not exist, it is 
created correctly, but MIDAS reports "cannot connect to ODB". Second try works correctly because the file exists 
now.

> 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> recorded in the file .SHM_HOST.TXT.

This is causing problems on mobile computers where "hostname" changes all the time (i.e. set according to 
DHCP on whatever network happens to be connected).

If you run into this problem, keep deleting .SHM_HOST.TXT or use this workaround: disable the hostname check 
by making the file .SHM_HOST.TXT empty (zero length).

K.O.
  774   05 Jul 2011 Konstantin OlchanskiBug ReportMacOS network socket timeouts non-functional
It turns out that because of differences between select() syscall implementation between UNIX (MacOS, 
maybe BSD) and Linux,  network socket timeouts do not work.

This affects timeouts during run transitions (transition calls to dead clients do not timeout), maybe other 
places.

I am looking into fixing this. The main difficulty is with UNIX select() not updating the timeout parameter 
when it is interrupted by the MIDAS watchdog alarm signal. Linux select() subtracts the elapsed time from 
the timeout value and this code from system.c works correctly: while (1) { status = select(..., &timeout); if 
(status==0) break; } (value of timeout becomes smaller each time), while on MacOS it loops forever (value 
of timeout does not change).
K.O.
  775   10 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > recorded in the file .SHM_HOST.TXT.

Due to a typo in src/system.c svn rev 5125, ss_shm_delete() did not work at all. This broke "odbedit -R", "odbedit -s 5000000" (to change ODB size), etc. 
Fixed in src/system.c svn rev 5134. (It is safe to update just tis one file to fix this problem).

Sorry for the inconvenience,
K.O.
  776   11 Jul 2011 Konstantin OlchanskiBug Fixmidas shared memory changes
> > > 2) the shared memory type used by an experiment is recorded in the file .SHM_TYPE.TXT.
> > > 3) the hostname of the computer where the ODB shared memory is meant to reside is now
> > > recorded in the file .SHM_HOST.TXT.


Because the mserver did not setup correct experiment name and path, POSIX shared memory did not work at all when used with the mserver. Fixed in mserver.c rev 5135


Sorry for the inconvenience,
K.O.
  777   11 Jul 2011 Konstantin OlchanskiInfoMake "STOP" run transition always succeed
Over the years, there was some back-and-forth changes in what happens to run transitions when some 
of the participants misbehave (do not respond to RPC calls, timeout, crash, etc).

The very original behaviour was to ignore all errors. This resulted in user confusion when some clients 
would start, some would not, data from frontends that missed the transition did not arrive, etc.

So it was changed to fail the transition if any client misbehaves.

This left mlogger (who is usually the first one to see the TR_START transition) in a funny state - output 
file is open, etc, but there is no run active. This was fixed by adding a TR_STARTABORT transition to tell 
mlogger, event builder & co that the just started run did not start after all.

Also at some point code was added to forcefully kill clients that do not respond to run transitions (do 
not respond to RPC, timeout, etc).

Recently, it was observed how during unattended overnight operation of a MIDAS DAQ system, with the 
logger set to "auto restart", some unnecessary clients misbehave during the run stop transition, and 
prevent the run from stopping and restarting. The user comes in the morning and is unhappy that data 
taking stopped some time during the night.

midas.c svn rev 5136 changes the TR_STOP transition to always succeed, even if some clients had 
transition errors. If these clients are unnecessary for normal operation of the DAQ, the following run 
"auto restart" will continue taking data. If those were important clients, data taking will continue the 
best it can - it *is* unattended operation - nobody is looking - but users can always setup alarms for 
checking that important clients are always running during data taking. (For very important clients, one 
can setup alarms to send email, send SMS messages, etc).

K.O.
  781   16 Dec 2011 Konstantin OlchanskiBug Reportbk_delete uses memcpy instead of memmove
> In midas.c, the bk_delete function removes a bank by decrementing the total
> event size and then copying the remaining banks into the location of the first
> using memcpy from string.h.


I confirm the documented difference between memcpy() and memmove() and I confirm the 
questionable use of memcpy() in bk_delete(). I think it should be memmove(). I made it so in my copy 
of midas, so this change will not be lost.

But I am not sure how to test it - I do not think I ever used bk_delete(). I will probably ponder upon 
this and do a blind commit.


K.O.
  784   29 Feb 2012 Konstantin OlchanskiBug ReportProblem with semaphores
Hi there! In the T2K/ND280 experiment in Japan, we keep having problems with MIDAS locking (probably 
of ODB). The symptoms are: some program reports a timeout waiting for the ODB lock, then all programs 
eventually die with this same error. Complete system meltdown. This does not look like the deadlock 
between locks for ODB, cm_msg and the data buffers that I looked into last year. It looks more like 
somebody locks ODB, dies and the Linux kernel fails to unlock the lock (via the SYSV "sem undo" 
function). But it is hard to confirm, hence this message:

The implementation of semaphores in MIDAS (used for locking ODB and the shared memory data buffers) 
uses the straight SYSV semaphore API - which lacks basic debugging features - there is no tracking of 
who locked what when, so if anything at all goes wrong at all, i.e. we are confronted with a timeout 
waiting for the ODB lock, the only corrective action possible is to kill all MIDAS clients and tell the user to 
start from scratch. There is no additional information available from the SYSV semaphore API to identify 
which MIDAS program caused the fault.

The POSIX semaphore API is even worse - no debugging features are available, *and* if a program dies 
while holding a lock, the lock stays locked forever (everybody else will wait forever or see a semaphore 
timeout, and then what?).

So I am looking for an "advanced semaphore library" to use in MIDAS. In addition to the boring functions 
of reliable locking and unlocking, it should support:
- wait with timeout
- remember who is holding the lock
- detect that the process holding the lock is dead and take corrective action (automatic unlock as done by 
SYSV semaphores, call back to user code where we can cleanup and unlock ourselves, etc)
- maybe permit recursive locking (not really required as ODB locks are already made recursive "by hand")
- maybe remember some of the locking history (so we can dump it into a log file when we detect a 
deadlock or other lock malfunction).

Quick google search only find sundry wrappers for SYSV and POSIX semaphores. How they deal with the 
problem of processes locking the semaphore and dying remains a mystery to me (other than telling users 
to remove the Ctrl-C button from their keyboard). BTW, we have seen this problem with several 
commercial applications that use SYSV semaphores but forget to enable the SEM_UNDO function).

Anyhow, if anybody can suggest such an advanced locking library it would be great. Will save me the 
effort of writing one.

K.O.
  788   25 Apr 2012 Konstantin OlchanskiBug ReportBuild error with mlogger: invalid conversion from ‘void*’ to ‘gzFile’
Stefan's fix is incomplete - the "gzFile" cast is needed for all calls to zlib, not just those that some version 
of GCC happens to complain about. Fixed.
svn rev 5286.

BTW, I read the midas elog via email and if you post html or elcode messages, I receive complete 
gibberish. For prompt service, please select message type "plain". (yes, you cannot use fancy colours and 
blinking text, but better than me not reading your stuff at all).

BTW2, for easier reading, please include error messages as plain text in your message. As opposed to 
compressed attachements.

K.O.
  791   10 Jun 2012 Konstantin OlchanskiBug Report_net_send_buffer realloc
> In midas.c, ...
>
> 1) _net_send_buffer is not set to NULL when declared.

_net_send_buffer is a global variable. All global variables are automatically initialized to zero before the program 
starts.

static char*x; // = NULL; is redundant
char*y=realloc(x, 100);  // x is NULL, usage is correct

> 2) cm_disconect_experiment() calls free(_net_send_buffer) but does not set its 
> value to NULL.

My copy of midas.c (svn rev 5256) sets _net_send_buffer to NULL:

   if (_net_send_buffer_size > 0) { 
      M_FREE(_net_send_buffer); 
      _net_send_buffer_size = 0; 
   } 
 
What version of midas do you have? (svn info .)

K.O.
ELOG V3.1.4-2e1708b5