Back Midas Rome Roody Rootana
  Midas DAQ System, Page 35 of 47  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
Entry  07 May 2009, Konstantin Olchanski, Bug Fix, Fixed mlogger run start and stop 
Fixed problems with mlogger starting and stopping runs.

Basic difficulty was with the mlogger using ASYNC transitions, which did not implement proper 
transition sequencing according to transition sequence numbers. Basically all clients were called at the 
same time, regardless of how long they took to process the transitions.

Switching from ASYNC to SYNC transitions introduces a deadlock between mlogger (not reading data 
from SYSTEM buffer while inside cm_transition) and any program trying to write into the SYSTEM buffer 
(buffer is full, does not listen for transition requests while waiting for mlogger which tries to call it's 
transition handler).

Then we invented the mtransition helper program. In the original implemtation for t2k it was spawned 
directly from the mlogger to stop the run (avoiding the deadlock). Then cm_transition(DETACHED) was 
introduced, but the mlogger start/stop/restart run logic became broken. One problem was with when 
auto restart delay is zero, mtransition tries to restart the run before previous run is stopped (instead, 
mlogger should restart the run from it's tr_stop() handler). Another problem was with the auto restart 
delay counting from the time when we start stopping the run - because stopping the run can take an 
unpredictable time, depending on when various frontends have to do - it is impossible to have a 
predictable delay between runs (again this is fixed by restarting the run from mlogger.c::tr_stop()).

All this has been straightened out by svn revision 4484. Basically the old run stop/restart logic was 
restored in mlogger.c, using cm_transition(DETACH) to avoid the deadlocks.

To remind all, these are the present controls for transitions initiated by mlogger:

/experiment/transition debug flag - set to "2" to capture transition sequences into midas.log
/experiment/transition timeout and transition connect timeout - one can change default timeouts as 
needed to accommodate non cooperative frontends.
/logger/async transitions - do not use mtransition - do ASYNC transitions, as before.
/logger/auto restart delay - delay between stopping the run (mlogger.c::tr_stop) and starting the next 
run.

svn rev 4484
K.O.
Entry  07 May 2009, Konstantin Olchanski, Bug Fix, mhttpd "Names" length 
mhttpd did not like it when the equipment "Names" arrays had different length compared to the 
corresponding "Variables" arrays. These limitations are now removed.
svn rev 4469
K.O.
Entry  17 Apr 2009, Jimmy Ngai, Forum, MIDAS mhttpd custom page questions control_panel.html
Dear All,

I have created a custom page (please see the attachment) and imported into 
MIDAS with key name "Control panel&" (without the ""). I have the following 
two questions:

1) I display the status of the run with <odb src="/Runinfo/State">, but it 
returns numbers which is not user friendly. How can I make something 
like "Running" with green background and "Stopped" with red background in the 
default status page?

2) When I click either Start/Stop/Pause/Resume, it can performs the right 
things, but afterward it jumps to the page "http://domain.name:8081/CS/" which 
shows "Invalid custom page: NULL path". How can I make it returns to the 
correct page "http://domain.name:8081/CS/Control%20panel"?

Thank you for your attention.

Best Regards,
Jimmy
    Reply  20 Apr 2009, Jimmy Ngai, Forum, MIDAS mhttpd custom page questions 
Dear All,

I have one more question. I use <odb src="odb field" edit=1> to display an 
editable ODB value, but how can I show this value in hexadecimal?

Thanks.

Best Regards,
Jimmy


> Dear All,
> 
> I have created a custom page (please see the attachment) and imported into 
> MIDAS with key name "Control panel&" (without the ""). I have the following 
> two questions:
> 
> 1) I display the status of the run with <odb src="/Runinfo/State">, but it 
> returns numbers which is not user friendly. How can I make something 
> like "Running" with green background and "Stopped" with red background in the 
> default status page?
> 
> 2) When I click either Start/Stop/Pause/Resume, it can performs the right 
> things, but afterward it jumps to the page "http://domain.name:8081/CS/" which 
> shows "Invalid custom page: NULL path". How can I make it returns to the 
> correct page "http://domain.name:8081/CS/Control%20panel"?
> 
> Thank you for your attention.
> 
> Best Regards,
> Jimmy
       Reply  06 May 2009, Stefan Ritt, Forum, MIDAS mhttpd custom page questions 
> I have one more question. I use <odb src="odb field" edit=1> to display an 
> editable ODB value, but how can I show this value in hexadecimal?

Again with JavaScript:

  var v = ODBGet('/some/path&format=%X');

this will retrieve /some/path and format it in hexadecimal. Then you can set a table 
cell with "v" as I wrote in the last reply. If you want to change this value 
however, you need to encode this yourself in JavaScript.

- Stefan
    Reply  06 May 2009, Stefan Ritt, Forum, MIDAS mhttpd custom page questions control.html
> 1) I display the status of the run with <odb src="/Runinfo/State">, but it 
> returns numbers which is not user friendly. How can I make something 
> like "Running" with green background and "Stopped" with red background in the 
> default status page?

Sorry my late reply, I was really busy. You need JavaScript to perform such a 
task. See the attached example.

> 2) When I click either Start/Stop/Pause/Resume, it can performs the right 
> things, but afterward it jumps to the page "http://domain.name:8081/CS/" 
> which shows "Invalid custom page: NULL path". How can I make it returns 
> to the correct page "http://domain.name:8081/CS/Control%20panel"?

You add a hidden redirect statement:

  <input type=hidden name=redir value="CS/Control panel">

Best regards,

  Stefan
Entry  04 Mar 2009, Dawei Liu, Forum, Analyzer gets killed cm_watchdog 
Hello Midas experts:

We have setup a DAQ using MIDAS to readout two ADCs in the crate.
We are running into problem of analyzer getting killed between 
runs.  Sometimes it would crash after a few runs and sometimes it 
would go on for many many runs before analyzer gets killed.  It always 
occurred between runs not when we are taking data.  Any suggestions 
on what we could try?  The error message from the midas.log file is 
appended below.

Thanks,

Dawei

Wed Mar  4 11:53:11 2009 [Analyzer,ERROR] [midas.c:1739:,ERROR]
cm_disconnect_experiment not called at end of program
Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' on buffer 'SYSMSG'
removed by cm_watchdog (idle 10.7s,TO 10s)
Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' (PID 1) on buffer 'ODB'
removed by cm_watchdog (idle 10.7s,TO 10s)
Wed Mar  4 11:53:22 2009 [AL Experiment Frontend,INFO] Client 'Analyzer' on
buffer 'SYSTEM' removed by cm_watchdog (idle 10.9s,TO 10s)
Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,TALK] starting new run
Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,ERROR]
[midas.c:8264:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
tsunami
    Reply  24 Mar 2009, Stefan Ritt, Forum, Analyzer gets killed cm_watchdog 
Hi,

your log script sound to me like the analyzer either got into an infinite loop or 
did a segment violation and just died. I would recommend to run the analyzer from 
inside the debugger. When you then get the segment violation, you can inspect the 
stack trace and see where the bad things happen. Since the analyzer works nicely in 
other experiment, I expect that your problem is related to the user code. Maybe it 
happens at the end of the run, but there is a timeout before the crashed process 
gets cleaned from the ODB, that's why you might think that it happens "between" 
runs.

Best regards,

  Stefan

> 
> Hello Midas experts:
> 
> We have setup a DAQ using MIDAS to readout two ADCs in the crate.
> We are running into problem of analyzer getting killed between 
> runs.  Sometimes it would crash after a few runs and sometimes it 
> would go on for many many runs before analyzer gets killed.  It always 
> occurred between runs not when we are taking data.  Any suggestions 
> on what we could try?  The error message from the midas.log file is 
> appended below.
> 
> Thanks,
> 
> Dawei
> 
> Wed Mar  4 11:53:11 2009 [Analyzer,ERROR] [midas.c:1739:,ERROR]
> cm_disconnect_experiment not called at end of program
> Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' on buffer 'SYSMSG'
> removed by cm_watchdog (idle 10.7s,TO 10s)
> Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' (PID 1) on buffer 'ODB'
> removed by cm_watchdog (idle 10.7s,TO 10s)
> Wed Mar  4 11:53:22 2009 [AL Experiment Frontend,INFO] Client 'Analyzer' on
> buffer 'SYSTEM' removed by cm_watchdog (idle 10.9s,TO 10s)
> Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,TALK] starting new run
> Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,ERROR]
> [midas.c:8264:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
> tsunami
Entry  17 Jan 2009, Konstantin Olchanski, Info, mhttpd, mlogger updates 
mhttpd and mlogger have been updated with potentially troublesome changes.
Before using these latest versions, please make a backup of your ODB. This is
svn revisions 4434 (mhttpd.c) and 4435 (mlogger.c).

These new features are now available:
- a "feature complete" implementation of "history in an SQL database". We use
this new code to write history data from the T2K test setup in the TRIUMF M11
beam line to a MySQL database (mlogger) and to make history plots directly from
this database (mhttpd). We still write normal midas history files and we have a
utility to import midas .hst files into an SQL database (utils/mh2sql). The code
is functional, but incomplete. For best SQL database data layout, you should
enable the "per variable history" (but backup your ODB before you do this!). All
are welcome to try it, kick the tires, report any problems. Documentation TBW.
- experimental implementation of "ODBRpc" added to the midas javascript library
(ODBSet, ODBGet & co). This permits buttons on midas "custom" web pages to
invoke RPC calls directly into user frontend programs, for example to turn
things on or off. Documentation TBW.
- the mlogger/mhttpd implementation of /History/Tags has proved troublesome and
we are moving away from it. The SQL database history implementation already does
not use it. During the present transition period:
- mlogger and mhttpd will now work without /History/Tags. This implementation
reads history tags directly from the history files themselves. Two downsides to
this: it is slower and tags become non-persistent: if some frontends have not
been running for a while, their variables may vanish from the history panel
editor. To run in this mode, set "/History/DisableTags" to "y". Existing
/History/Tags will be automatically deleted.
- for the above 2 reasons, I still recommend using /History/Tags, but the format
of the tags is now changed to simplify management and reduce odb size. mlogger
will automatically convert the tags to this new format (this is why you should
make a backup of your ODB).
- using old mlogger with new mhttpd is okey: new mhttpd understands both formats
of /History/Tags.
- using old mhttpd with new mlogger is okey: please set ODB
"/History/CreateOldTags" to "y" (type TID_BOOL/"boolean") before starting mlogger.

K.O.
    Reply  21 Jan 2009, Andreas Suter, Bug Report, mhttpd, mlogger updates 
There is an obvious "unwanted feature" in this version of the mhttpd. It writes the
"plot time" into the gif (mhttpd, if-statement starting in line 8853). 

Please check this obvious things more carefully in the future before submitting code. ;-)

> mhttpd and mlogger have been updated with potentially troublesome changes.
> Before using these latest versions, please make a backup of your ODB. This is
> svn revisions 4434 (mhttpd.c) and 4435 (mlogger.c).
> 
> These new features are now available:
> - a "feature complete" implementation of "history in an SQL database". We use
> this new code to write history data from the T2K test setup in the TRIUMF M11
> beam line to a MySQL database (mlogger) and to make history plots directly from
> this database (mhttpd). We still write normal midas history files and we have a
> utility to import midas .hst files into an SQL database (utils/mh2sql). The code
> is functional, but incomplete. For best SQL database data layout, you should
> enable the "per variable history" (but backup your ODB before you do this!). All
> are welcome to try it, kick the tires, report any problems. Documentation TBW.
> - experimental implementation of "ODBRpc" added to the midas javascript library
> (ODBSet, ODBGet & co). This permits buttons on midas "custom" web pages to
> invoke RPC calls directly into user frontend programs, for example to turn
> things on or off. Documentation TBW.
> - the mlogger/mhttpd implementation of /History/Tags has proved troublesome and
> we are moving away from it. The SQL database history implementation already does
> not use it. During the present transition period:
> - mlogger and mhttpd will now work without /History/Tags. This implementation
> reads history tags directly from the history files themselves. Two downsides to
> this: it is slower and tags become non-persistent: if some frontends have not
> been running for a while, their variables may vanish from the history panel
> editor. To run in this mode, set "/History/DisableTags" to "y". Existing
> /History/Tags will be automatically deleted.
> - for the above 2 reasons, I still recommend using /History/Tags, but the format
> of the tags is now changed to simplify management and reduce odb size. mlogger
> will automatically convert the tags to this new format (this is why you should
> make a backup of your ODB).
> - using old mlogger with new mhttpd is okey: new mhttpd understands both formats
> of /History/Tags.
> - using old mhttpd with new mlogger is okey: please set ODB
> "/History/CreateOldTags" to "y" (type TID_BOOL/"boolean") before starting mlogger.
> 
> K.O.
    Reply  18 Feb 2009, Konstantin Olchanski, Info, odbc sql history mlogger update 
> mhttpd and mlogger have been updated with potentially troublesome changes.
> These new features are now available:
> - a "feature complete" implementation of "history in an SQL database".

The mlogger SQL history driver has been updated with improvements that make this new system usable in 
production environment: the silly "create all tables on startup, every time, even if they already exist" is fixed,
mlogger survives restarts of mysqld and checks that existing sql columns have data types compatible with the 
data we are trying to write.

There are still a few trouble spots remaining. For example, in mapping midas names into sql names (sql names 
have more restrictions on permitted characters) and in reverse mapping of sql data types to midas data types. 
To properly solve this, I may have to save the midas names and data types into an additional index table.

Included is the mh2sql utility for importing existing history files into an SQL database (in the same way as if 
they were written into the database by mlogger).

The mhttpd side of this system still needs polishing, but should be already fully functional.

A preliminary version of documentation for this new SQL history system is here. After additional review and 
editing it will be committed to the midas midox documentation. Included are full instructions on enabling 
writing of midas history into a MySQL database.
http://ladd00.triumf.ca/~olchansk/midas/Internal.html#History_sql_internal

svn revision 4452
K.O.
Entry  26 Jan 2009, Derek Escontrias, Forum, Question - ODB access from a custom page 
Hi, I am looking for a way to mutate ODB values from a custom page. I have been
using the edit attribute for the 'odb' tag, but for some things it would be nice
if a form can handle the change. I have seen references to ODBSet on the forums,
but I haven't been able to find documentation on it. Is there an available
Javascript library for Midas and/or are there more tags than I am aware of (I am
only aware of the 'odb' tag)?
    Reply  27 Jan 2009, Suzannah Daviel, Forum, Question - ODB access from a custom page 
At present the only documentation on the Javascript library is in this elog
e.g. Message 496 31 Jul 08

The Javascript library which you can view
http://<your mhttpd host>/mhttpd.js
now supports ODBEdit as well as ODBGet and ODBSet
 
I advise you get the latest version of mhttpd.c so you can use ODBEdit which changes
the ODB value directly via ODBSet.

You use it like this:
document.write('<a href="#" onclick="ODBEdit(/Equipment/test/Variables/Demand[0])">');
document.write('<odb src="/Equipment/test/Variables/Demand[0]">');
document.write('</a>');

You can also use HTML to edit the variables, but the advantage of Javascript is that
you can use variable ODB paths, so it is more powerful.

Here is an example of using a form on a custom page to edit a variable (in the
example, the run number) using Javascript (ODBEdit) and HTML. 

To try this example, in ODB, create key (STRING)
/custom/try& 
 and set it to "/home/user/try.html"

where the path of the example code on the disk is  /home/user/try.html

This will put an alias link on the Main Status page called "try" which you click on
to see the custom page.

Code of try.html:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<html><head>
<title> ODBEdit test</title>
<script src="/js/mhttpd.js" type="text/javascript"></script>

<script type="text/javascript">
var my_action = '"/CS/try&"'
var rn
var path

document.write('</head><body>')
document.write('<form method="get" name="form2" action='+my_action+'> ') 
document.write('<input name="exp" value="'+my_expt+'" type="hidden">');

document.write('Using Javascript and ODBEdit:
') path='/runinfo/run number' rn = ODBGet(path) document.write('Run Number: '+rn+'
') document.write('Edit Run Number:') document.write('<a href="#" onclick="ODBEdit(path)" >') document.write(rn) document.write('</a>'); document.write('
') ; </script>
Using HTML :
Using edit=2 ... Run Number: <odb src="/runinfo/run number" edit=2>
Using edit=1 ... Run Number: <odb src="/runinfo/run number" edit=1>
</form> </html> Note the "edit=2" feature is handy so that you can use Javascript or HTML on your page and the user sees no difference. > Hi, I am looking for a way to mutate ODB values from a custom page. I have been > using the edit attribute for the 'odb' tag, but for some things it would be nice > if a form can handle the change. I have seen references to ODBSet on the forums, > but I haven't been able to find documentation on it. Is there an available > Javascript library for Midas and/or are there more tags than I am aware of (I am > only aware of the 'odb' tag)?
Entry  20 Jan 2009, Stefan Ritt, Info, Subrun scheme implemented 
A new "subrun" scheme has been implemented in mlogger to split a big data file into several individual data files. This feature might be helpful if a data file from a single run gets too large (>4 GB for example) and if shorter runs are not wanted for efficiency reasons. The scheme works as follows:

  • Set /Channels/x/Settings/Subrun Byte limit to the number of bytes for a subrun
  • Set /Channels/x/Settings/Filename to something like run%05d_%02d.mid. The first %05d gets replaced by the run number, while the second one gets replaced by the subrun number. This will result in files such as
    run00001_00.mid    run #1
    run00001_01.mid      "
    run00001_02.mid      "
    run00001_03.mid      "
    run00002_00.mid    run #2
    run00002_01.mid      "
    run00002_02.mid      "
    run00002_03.mid      "

Each subrun will contain an ODB dump if this is turned on via /Channels/x/Settings/ODB dump. The stopping of the "main" run (after four subruns in the above example) can be done in the usual way (event limit in the front-end, manually through odbedit, etc.).

The code has been tested in two test environments, but not yet in a real experiment. So please test it before going into production. The modification in mlogger requires SVN revision 4440 of mlogger.c and 4441 of odb.c.

Please note that the lazylogger cannot be used with this scheme at the moment since it does not recognize the subruns. That will be fixed in a future version and announced in this forum.

- Stefan
    Reply  23 Jan 2009, Renee Poutissou, Info, Subrun scheme implemented 
Hi Stefan,
My colleague Tobi Raufer (tobi.raufer@stfc.ac.uk) has tested this new implementation and
sent me the following questions:
-------- Original Message --------
Subject: Re: [Fwd: [Midas] Subrun scheme implemented]
Date: Fri, 23 Jan 2009 01:52:37 +0000
From: Tobias Raufer <tobi.raufer@stfc.ac.uk>
To: Renee Poutissou <renee@triumf.ca>
Hi Renee

I have tested the new subrun functionality a bit more and I have two observations. First, it seems to work on a basic level, i.e. subruns are created, which are equal in size. However, I can't relate their size to the byte limit set in the ODB.

Here is an example. The settings in the ODB are the following:
[local:testExp:S]/>ls /Logger/Channels/0/Settings/
Active y
Type Disk
Filename run%05d_%02d.mid
Format MIDAS
Compression 0
ODB dump n
Log messages 0
Buffer SYSTEM
Event ID -1
Trigger mask -1
Event limit 0
Byte limit 0
Subrun Byte limit 10000
Tape capacity 0
Subdir format
Current filename run00005_07.mid

As you can see, I set the subrun byte limit to 10000. Here are the subrun files which were created:

-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_00.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_01.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_02.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_03.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_04.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_05.mid
-rw-r--r-- 1 raufer 32800 Jan 23 01:36 run00005_06.mid
-rw-r--r-- 1 raufer 4960 Jan 23 01:36 run00005_07.mid

The file size seems to be 32800 bytes. Any idea what's going on? I first thought this might have to do with the ODB dump not being accounted for but as you can see from the configuration above, I turned it off for this run.

When I run with the ODB dump on but with the same byte limit, things become even more strange. I get the following sizes:

bash-3.2$ ls -l run00006_*.mid
-rw-r--r-- 1 raufer 53798 Jan 23 01:46 run00006_00.mid
-rw-r--r-- 1 raufer 53804 Jan 23 01:46 run00006_01.mid
-rw-r--r-- 1 raufer 53793 Jan 23 01:46 run00006_02.mid
-rw-r--r-- 1 raufer 53781 Jan 23 01:46 run00006_03.mid
-rw-r--r-- 1 raufer 53781 Jan 23 01:46 run00006_04.mid
-rw-r--r-- 1 raufer 53781 Jan 23 01:46 run00006_05.mid
-rw-r--r-- 1 raufer 53802 Jan 23 01:46 run00006_06.mid
-rw-r--r-- 1 raufer 53833 Jan 23 01:46 run00006_07.mid
-rw-r--r-- 1 raufer 71557 Jan 23 01:46 run00006_08.mid
-rw-r--r-- 1 raufer 20999 Jan 23 01:46 run00006_09.mid

As you can see, now the sizes are larger and they don't even seem to be consistent between the different subruns. Renee, could you forward this to the MIDAS developers?

Thanks much,

Tobi



Quote:

The code has been tested in two test environments, but not yet in a real experiment. So please test it before going into production. The modification in mlogger requires SVN revision 4440 of mlogger.c and 4441 of odb.c.

Please note that the lazylogger cannot be used with this scheme at the moment since it does not recognize the subruns. That will be fixed in a future version and announced in this forum.

- Stefan
       Reply  25 Jan 2009, Stefan Ritt, Info, Subrun scheme implemented 

Renee Poutissou wrote:
I have tested the new subrun functionality a bit more and I have two observations. First, it seems to work on a basic level, i.e. subruns are created, which are equal in size. However, I can't relate their size to the byte limit set in the ODB.


What you describe is expected. The logger process maintains a write cache, which is 32 kB under linux and 1 MB under Windows. The size is controlled through the constant TAPE_BUFFER_SIZE defined in midas.h. The reason for this buffer is to optimize writes to disks and tapes and has been carefully optimized to give maximum performance. It means however that data gets written only in 32 kB chunks to disk. That's the reason why your run size is 32kB plus a few bytes. You can change this by modifying TAPE_BUFFER_SIZE, but be aware that this will then slow down your logging of data.
Entry  09 Jan 2009, Derek Escontrias, Forum, mlogger problem 
Hi,

I am running Scientific Linux with kernel 2.6.9-34.EL and  I have
glibc-2.3.4-2.25. When I run mlogger, I receive the error:

*** glibc detected *** free(): invalid pointer: 0x0073e93e ***
Aborted

Any ideas?
    Reply  13 Jan 2009, Stefan Ritt, Forum, mlogger problem 
> Hi,
> 
> I am running Scientific Linux with kernel 2.6.9-34.EL and  I have
> glibc-2.3.4-2.25. When I run mlogger, I receive the error:
> 
> *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> Aborted
> 
> Any ideas?

Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared 
memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this 
problem does not occur. If you cannot fix it, try to run mlogger inside gdb and 
make a stack trace to see who called the free().
       Reply  13 Jan 2009, Derek Escontrias, Forum, mlogger problem 
> > Hi,
> > 
> > I am running Scientific Linux with kernel 2.6.9-34.EL and  I have
> > glibc-2.3.4-2.25. When I run mlogger, I receive the error:
> > 
> > *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> > Aborted
> > 
> > Any ideas?
> 
> Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared 
> memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this 
> problem does not occur. If you cannot fix it, try to run mlogger inside gdb and 
> make a stack trace to see who called the free().

Sorry for being vague. I cleaned up the ODB, but it doesn't seem to be the
problem. Here is a sample run of mlogger and gdb:


/**************************************************************
/**************************************************************
/**************************************************************
[root@tsunami AL_Test]# mlogger -v -d
*** glibc detected *** free(): invalid pointer: 0x007f793e ***
Aborted (core dumped)
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# 
[root@tsunami AL_Test]# gdb mlogger core.23213 
GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library
"/lib/tls/libthread_db.so.1".

Core was generated by `mlogger -v -d'.
Program terminated with signal 6, Aborted.
Reading symbols from /home/dayabay/Software/Root/lib/libCore.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libCore.so
Reading symbols from /home/dayabay/Software/Root/lib/libCint.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libCint.so
Reading symbols from /home/dayabay/Software/Root/lib/libRIO.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libRIO.so
Reading symbols from /home/dayabay/Software/Root/lib/libNet.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libNet.so
Reading symbols from /home/dayabay/Software/Root/lib/libHist.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libHist.so
Reading symbols from /home/dayabay/Software/Root/lib/libGraf.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGraf.so
Reading symbols from /home/dayabay/Software/Root/lib/libGraf3d.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGraf3d.so
Reading symbols from /home/dayabay/Software/Root/lib/libGpad.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libGpad.so
Reading symbols from /home/dayabay/Software/Root/lib/libTree.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libTree.so
Reading symbols from /home/dayabay/Software/Root/lib/libRint.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libRint.so
Reading symbols from /home/dayabay/Software/Root/lib/libPostscript.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libPostscript.so
Reading symbols from /home/dayabay/Software/Root/lib/libMatrix.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libMatrix.so
Reading symbols from /home/dayabay/Software/Root/lib/libPhysics.so...done.
Loaded symbols for /home/dayabay/Software/Root/lib/libPhysics.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libutil.so.1...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libpcre.so.0...done.
Loaded symbols for /lib/libpcre.so.0
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /usr/lib/libfreetype.so.6...done.
Loaded symbols for /usr/lib/libfreetype.so.6
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x002e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) 
(gdb) 
(gdb) 
(gdb) where
#0  0x002e37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0x016d68b5 in raise () from /lib/tls/libc.so.6
#2  0x016d8329 in abort () from /lib/tls/libc.so.6
#3  0x0170a40a in __libc_message () from /lib/tls/libc.so.6
#4  0x01710a08 in _int_free () from /lib/tls/libc.so.6
#5  0x01710fda in free () from /lib/tls/libc.so.6
#6  0x08057108 in main (argc=3, argv=0xbff94f14) at src/mlogger.c:3473
(gdb) 
/**************************************************************
/**************************************************************
/**************************************************************


I am running Midas 2.0.0 and here is a section of my mlogger.c:


/**************************************************************
/**************************************************************
/**************************************************************
/********************************************************************\

  Name:         mlogger.c
  Created by:   Stefan Ritt

  Contents:     MIDAS logger program

  $Id: mlogger.c 3476 2006-12-20 09:00:26Z ritt $

\********************************************************************/

// stuff...

/*------------------------ main ------------------------------------*/

int main(int argc, char *argv[])
{
   INT status, msg, i, size, run_number, ch = 0, state;
   char host_name[HOST_NAME_LENGTH], exp_name[NAME_LENGTH], dir[256];
   BOOL debug, daemon, save_mode;
   DWORD last_time_kb = 0;
   DWORD last_time_stat = 0;
   HNDLE hktemp;

#ifdef HAVE_ROOT
   char **rargv;
   int rargc;

   /* copy first argument */
   rargc = 0;
   rargv = (char **) malloc(sizeof(char *) * 2);
   rargv[rargc] = (char *) malloc(strlen(argv[rargc]) + 1);
   strcpy(rargv[rargc], argv[rargc]);
   rargc++;

   /* append argument "-b" for batch mode without graphics */
   rargv[rargc] = (char *) malloc(3);
   rargv[rargc++] = "-b";

   TApplication theApp("mlogger", &rargc, rargv);

   /* free argument memory */
   free(rargv[0]);
   free(rargv[1]);   // Line: 3473
   free(rargv);

#endif

// etc...

/**************************************************************
/**************************************************************
/**************************************************************


I'll play with it some, but I wanted to post this info first.
          Reply  13 Jan 2009, Stefan Ritt, Forum, mlogger problem 
> Sorry for being vague. I cleaned up the ODB, but it doesn't seem to be the
> problem. Here is a sample run of mlogger and gdb:

Thanks for the info, that explained the problem. It is related to the lines

rargv[rargc] = (char *)malloc(3);
rargv[rargc++] = "-b";

where one first allocates some memory (3 bytes), but then overwrites the pointer with 
another pointer to some static memory ("-b"). The following

free(rargv[1]);

then tries to free the static memory which fails.

The problem was already fixed some time ago, so please update your version from the SVN 
revision (see https://midas.psi.ch/download.html for details).
             Reply  14 Jan 2009, Konstantin Olchanski, Forum, mlogger problem 
> The problem was already fixed some time ago, so please update your version from the SVN 
> revision (see https://midas.psi.ch/download.html for details).

I wanted to check out the latest websvn midas repository viewer installed at PSI, so I used the web "annotate/blame" tools 
to trace the fix to this bug down to revision 3660 committed in April 2007. (It turns out that "svn blame" is not very useful 
for tracing *removed* lines, so I ended up doing a manual binary search across different revisions of mlogger.c)

K.O.
Entry  01 Jan 2009, Konstantin Olchanski, Info, odb "hot link" magic explored 
Here are my notes on the MIDAS ODB "hot link" function. Perhaps others can find them useful.

Using db_open_record(key,function), the user can tell MIDAS to call the specified user function when 
the specified ODB key is modified by any other MIDAS program. This function works both locally 
(shared memory odb access) and remotely (odb access through mserver tcp rpc). For example, the 
MIDAS "history" mechanism is implemented in the mlogger by "hot-linking" ODB 
"/equipment/xxx/Variables".

First, the relevant data structures defined in midas.h and msystem.h (ODB database headers, etc)

(in midas.h)
#define NAME_LENGTH            32            /**< length of names, mult.of 8! */
#define MAX_CLIENTS            64            /**< client processes per buf/db */
#define MAX_OPEN_RECORDS       256           /**< number of open DB records   */

(in msystem.h)
DATABASE buf <--- local, private to each client)
  DATABASE_HEADER* database_header <--- odb in shared memory
    char name[NAME_LENGTH]
    DATABASE_CLIENT client[MAX_CLIENTS]
      char name[NAME_LENGTH]
      OPEN_RECORD open_record[MAX_OPEN_RECORDS]
        handle
        access_mode
        flags

(the above means that each midas client has access to the list of all open records through
buf->database_header.client[i].open_record[j])

Second, the data path through db_set_data & co: (other odb "write" functions work the same way)

db_set_data(key)
  lock db
  update odb <--- memcpy(), really
  db_notify_clients(key)
  unlock db
  return

db_notify_clients(key)
  loop: <--- data for this key changed and so data for all keys containing it
             also changed, and we need to notify anybody who has an open record
             on the parents of this key. need to loop over parents of this key (follow "..")
  if (key->notify_count)
    foreach client
      foreach open_record
        if (open_record.handle == key)
          ss_resume(client->port, "O hDB hKey")
  key = key.parent
  goto loop;

ss_resume(port, message)
  idx = ss_suspend_get_index()  <--- magic here
  send udp message ("O hDB hKey") to localhost:port <-- notifications sent only to local host!

note 1: I do not completely understand the ss_suspend_xxx() stuff. The best I can tell
is it creates a number of udp sockets bound to the local host and at least one udp rpc
receive socket ultimately connected to the cm_dispatch_rpc() function.

note 2: More magic here: database_header->client[i].port appears to be the udp rpc server
port of the mserver, while ODB /Clients/xxx/Port is the tcp rpc server port
of the client itself, on the remote host

note 3: the following is for remote odb clients connected through the mserver. For local
clients, cm_dispatch_rpc() calls the local db_update_record() as shown at the very end.

note 4: this uses udp rpc. If the udp datagram is lost inside the os kernel (it looks like these udp/rpc 
datagrams never go out to the network), "hot-link" silently fails: code below is not executed. Some 
OSes (namely, Linux) are known to lose udp datagrams with high probability under certain
not very well understood conditions.

local mserver receives the udp datagram
  ...
  cm_dispatch_ipc()
    if (message=="O hDB hKey")
      decode message (hDB, hKey)
      db_update_record(hDB, hKey)
        send tcp rpc with args(MSG_ODB, hDB, hKey)

(note- unlike udp rpc, tcp rpc are never "lost")

remote client receives tcp rpc:
rpc_client_dispatch()
  recv_tcp(net_buffer)
  if (net_buffer.routine_id == MSG_ODB)
    db_update_record(hDB, hKey)

db_update_record(hDB, hKey)
  if remote delivery, see cm_dispatch_ipc() above
  <--- local delivery
  foreach (_recordlist)
    if (recordlist.handle == hKey)
      if (!recordlist.access_mode&MODE_WRITE)
        db_get_record(hDB,hKey,recordlist.data,recordlist.size)
        recordlist.dispatcher(hDB,hKey,recordlist.info); <-- user-supplied handler

Note: the dispatcher() above is the function supplied by the user in db_open_record().

K.O.
    Reply  14 Jan 2009, Stefan Ritt, Info, odb "hot link" magic explored 

KO wrote:
note 1: I do not completely understand the ss_suspend_xxx() stuff. The best I can tell is it creates a number of udp sockets bound to the local host and at least one udp rpc receive socket ultimately connected to the cm_dispatch_rpc() function.


The ss_suspend_xxx() stuff is indeed the most complicated thing in midas an I have to remind myself always
on how this works. So let me try again:

The basic idea is that for a high performance system, you cannot do the inter-process communication via
polling. That would waste CPU time. Inter-process communication is necessary for for buffer manager
(producer notifies consumer when new events are there), for the RPC mechanism (odbedit tells mlogger to
start a run) or for ODB hot-links. To avoid polling, the inter-process communication works with sockets (UDP
and TCP). This allows to use the select() call, which suspends the calling process until some socket
receives data or a pre-defined time-out expires. This is the only portable method I found which works under
unix and windows (signals are only poorly supported under windows).

So after creating all sockets, ss_suspend() does a select() on these sockets:

_suspend_struct[idx].listen_socket Server side for any new RPC connection (each client is also a RPC server which gets contacted directly during run transitions for example
_suspend_struct[idx].server_acception.recv_sock Receive socket (TCP) for any active RPC connection
_suspend_struct[idx].server_acception.event_sock Receive socket (TCP) for bare events (bypassing RPC layer for performance reasons)
_suspend_struct[idx].server_connection->recv_sock Outgoing TCP connection to mserver. Used for example for hot-link notifications from mserver
_suspend_struct[idx].ipc_recv_socket UDP socket for inter-process notification


For each socket there is a dispatch function, which gets called if that socket receives some data. Hope this sheds some light on the guts of that.
Entry  12 Dec 2008, Jimmy Ngai, Info, Custom page which executes custom function 
Dear All,

How can I add a button at the top of the "Status" webpage which will show a 
page similar to the "CNAF" one after I click on it? and how can I make a 
custom page similar to "CNAF" which allow me to call some custom funtions? I 
want to make a page which is particularly for doing calibration.

Thank you for your attention!

Best Regards,
Jimmy Ngai
    Reply  14 Dec 2008, Stefan Ritt, Info, Custom page which executes custom function 
> How can I add a button at the top of the "Status" webpage which will show a 
> page similar to the "CNAF" one after I click on it? and how can I make a 
> custom page similar to "CNAF" which allow me to call some custom funtions? I 
> want to make a page which is particularly for doing calibration.

The CNAF page calls directly functions through the RPC layer of midas, which is 
not possible from custom pages. All you can do is to execute a scrip on the 
server side, which then causes some action. For details please consult the 
documentation.
    Reply  01 Jan 2009, Konstantin Olchanski, Info, Custom page which executes custom function 
> How can I add a button at the top of the "Status" webpage which will show a 
> page similar to the "CNAF" one after I click on it? and how can I make a 
> custom page similar to "CNAF" which allow me to call some custom funtions? I 
> want to make a page which is particularly for doing calibration.


I was going to say that you can do this by using the MIDAS "hot-link" function.

In your equipment program, you create a string /eq/xxx/Settings/Command, and hot-link
it to the function you want to be called. (See midas function db_open_record() for details
and examples). (To test it, you put a call to printf("Hello world!\n") into your handler function,
then change the value of "command" using odbedit or the mhttpd odb editor
and observe that your function gets called and that it receives the correct value of "command").

Then on your custom web page you create 2 buttons "aaa" and "bbb" attached to javascript
ODBset("/eq/xxx/Settings/Command","aaa") and "bbb" respectively. When you push the button,
the specified string is written into ODB, and your hot-link handler function is called with the contents
of "command", which you can then look at to find out which web button was pushed.


But after looking at the hot-link data paths (see https://ladd00.triumf.ca/elog/Midas/546), I see 2 
problems that make the above scheme unreliable and maybe unusable in some applications:

1) the data path contains one UDP communication and it is well known that UDP datagrams can be (and 
are) lost with low or high probability, depending on not-well-understood external factors.

The effect is that the hot-link fails to "fire": odb contents is changed but your function is not called.

2) there is a timing problem with multiple odb writes: the odb lock is dropped before the "hot-link" gets 
to see the new contents of odb: db_data_set()->lock odb->change data->send notification->unlock 
odb->xxx->notification received by client->read the data->call user function. If something else is 
written into odb during "xxx" above, the client may never see the data written by the first odb write. For 
local clients, the delay between "send notification" and "notification is received by client" is not bounded in 
time (can be arbitrary long, depending on the system load, etc). For remote clients, there is an additional 
delay as the udp datagram is received by the local mserver and is forwarded to the remote client through 
a tcp rpc connection (another source of unbounded delay).

The effect is that if buttons "aaa" and "bbb" are pushed quickly one right after the other, while your 
function will be called 2 times (if neither udp packet is dropped), you may never see the value of "aaa"
as is it will be overwritten by "bbb" by the time you receive the first notification.

Probability of malfunction increases with code written like this: { ODBset("command", "open door"); 
ODBset("command", "walk through doorway"); }. You may see the "open door" command sometimes 
mysteriously disappear...

The net effect is that sometimes you will push the button but nothing will happen. This may be okey,
depending on your application and depending on how often it happens in practice on your specific system 
If you are lucky, you may never see either of the 2 problems listed above ad hot-links will work for you 
perfectly. At TRIUMF, in the past, we have seen hot-links misbehave in the TWIST experiment, and now I 
think I understand why (because of the 2 problems described above).

K.O.
       Reply  13 Jan 2009, Stefan Ritt, Info, Custom page which executes custom function 
The UDP connection you mention is only used locally for inter-process communication. When I implemented that, I 
made extensive tests and found that there is never a packet being dropped. This happens for UDP only if the packet 
goes over a physical network. Maybe this is different in modern Linux versions, so one should double check this 
again.

For remote hot-link notification, the notification is sent over the TCP link, so it should not be lost either. But 
your second point is correct. The hot-link mechanism was developed to change parameters in front-end programs for 
example. So by design it is guaranteed that if you change a value in the ODB, any client hot-linked to that will 
see the change (sooner or later). If there are many changes in short intervals (or the callback function on the 
remote client takes long time), only the last change is guaranteed to arrive. Therefore, as you correctly state, 
the hot-link mechanism is not a save replacement for the RPC layer (That's why the RPC layer is there after all).
Entry  17 Dec 2008, Renee Poutissou, Bug Report, Overflow on "cm_msg" command generates segfault 
The following error has been reported to me by T2K colleagues:

When using  "odbedit -c "msg my_message", the following behavior 
has been observed depending on the length "n" of the message. 

1)  n < 100        All is well
2)  100 <= n < 245 Log not written but exit code = 0
3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1

Also, when logging from compiled C code - when messages reach some magic length
the MIDAS client sending them segfaults.

Please fix
    Reply  22 Dec 2008, Stefan Ritt, Bug Report, Overflow on "cm_msg" command generates segfault 
> The following error has been reported to me by T2K colleagues:
> 
> When using  "odbedit -c "msg my_message", the following behavior 
> has been observed depending on the length "n" of the message. 
> 
> 1)  n < 100        All is well
> 2)  100 <= n < 245 Log not written but exit code = 0
> 3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1
> 
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
> 
> Please fix

Uhhh, who wants this long messages? You should consider to split this into several 
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in 
SVN revision 4422. I increased the maximum length to 1000 characters. Above that, 
the message gets truncated. If you need even more, we can make it a #define.

The second problem you describe (logging from compiled C code) I could not 
reproduce, so maybe it was related to the first one. Please try again and report 
if it persists.
Entry  21 Dec 2008, Konstantin Olchanski, Bug Fix, mhttpd minor bug fixes and improvements 
Committed minor bug fixes and improvements to mhttpd:
1) when generating history plots, use type "double" instead of "float" because "float" does not have enough 
significant digits to plot values of large integer numbers. For example, serial numbers of T2K FGD FEB 
cards are large integers, i.e. 99000001, 99000002, etc, but when we plot them with offset "-99000000", 
the plots show "0" for all cards because when these numbers are converted to "float", they are truncated to 
about 5 digits and the least significant digit (the only one of interest, the "1", "2", etc) is lost. Switching to 
type "double" makes the plots come out with correct values.
2) fixed breakage of "/History/URL" ODB setting used to offload generation of history plots to a separate 
mhttpd process, greatly improving responsiveness of the main mhttpd.
3) fixed memory leak in processing the new javascript requests (jset, jget & co).
svn revisions 4415-4417
K.O.
Entry  27 Nov 2008, Konstantin Olchanski, Bug Fix, Fix ss_file_size() on 32-bit Linux 
It turns out that on 32-bit Linux, ss_file_size() returns the wrong answer for
files bigger than 2 GB (4GB?). The Linux stat() system call returns an error
(which is ignored) and bogus file size data (returned to the caller).

On 64-bit Linux (compiled with -m64), stat() appears to return correct data.

Related functions ss_disk_size() and ss_disk_free() return correct answers on
both 32-bit and 64-bit Linux (biggest disk I tried was 5.5 TB).

I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
the system call returns an error. I also added a test program
utils/test_ss_file_size.c.

svn revision 4397.
K.O.
    Reply  01 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux 
> I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.

The test program gave under 64-bit SL5:

For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'

Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program.
    Reply  02 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux 
> I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler 
complains that stat64 is undefined. Since many Makefiles for front-ends out there do 
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used 
if that flag is defined:

#ifdef _LARGEFILE64_SOURE
   struct stat64 stat_buf;
   int status;

   /* allocate buffer with file size */
   status = stat64(path, &stat_buf);
   if (status != 0)
      return -1;
   return (double) stat_buf.st_size;
#else
   ...
       Reply  02 Dec 2008, Konstantin Olchanski, Bug Fix, Fix ss_file_size() on 32-bit Linux 
> > I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".
> That does not work if _LARGEFILE64_SOURCE is not defined.
> #ifdef _LARGEFILE64_SOURE
>    struct stat64 stat_buf;

This does not work (observe the typoe in the #ifdef). But you cannot know this because
you already deleted the test program I wrote and committed to svn exactly to detect and
prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users
some way to check that ss_file_size() works on their systems).

K.O.
          Reply  02 Dec 2008, Stefan Ritt, Bug Fix, Fix ss_file_size() on 32-bit Linux 

K.O. wrote:
This does not work (observe the typoe in the #ifdef).


Sorry for that, I fixed and committed it.


K.O. wrote:
But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems)..


Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.

If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit.
Entry  01 Dec 2008, Randolf Pohl, Bug Report, gcc warning in melog.c for midas 4401 
Hi all,

I have just compiled midas 4401 using SuSE 11.0.
gcc is some odd SuSE version:
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE
Linux) 

Anyway, gcc stumbled over melog.c. I don't see the reason myself, but my
experience is that gcc is usually right when complaining about "array subscript
is above array bounds". So, just in case somebody knowlegeable wants to have a
look at this....

Cheers,

Randolf

The gcc output:

[...]
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/melog
utils/melog.c linux/lib/libmidas.a -lutil -lpthread -lz
utils/melog.c: In function 'submit_elog':
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
utils/melog.c:224: warning: array subscript is above array bounds
cc -g -O3 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib
-DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_MYSQL -I/usr/include/mysql
-DHAVE_ROOT -pthread -m64 -I/usr/local/root/root_v5.20.00/include/root
-DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/bin/mlxspeaker
utils/mlxspeaker.c linux/lib/libmidas.a -lutil -lpthread -lz
Entry  23 Mar 2008, Konstantin Olchanski, Info, Per-variable history implementation in the mlogger 
The changes to mlogger implementing per-variable history have been committed to
svn. Revision 4145.

The rationale for these changes is roughly described in
https://ladd00.triumf.ca/elog/Midas/347

The main user-visible effect is reduction of data volume written to history
files and better integration with the history plot system in mhttpd.

The new functionality is disabled by default, pending review by Stefan (Except
for /history/tags stuff, which will be created by mlogger and used by mhttpd).
To enable it, set "/equipment/xxx/Common/PerVariableHistory" to 1 (type TID_INT).

In the "per-variable" mode, each entry in /equipment/xxx/variables is assigned
it's own event id and creates it's own events in the history file. In the
"classical" (or per-equipment) mode, all variables are assigned the same event
id (equal to the equipment id) and are written to disk at the same time.

In other words, in per-equipment mode, if there are 100 variables and 1 of them
is updated, all 100 numbers are written to disk. In per-variable mode, only the
one updated variable is written out.

The one point for review in this implementation is the assignment of event id's.
Committed code uses the formula "1000*eq_id + n" (i.e. variables in equipment id
2 get 2001, 2002, etc..., equipment id 3 get 3001, 3002, ...). This formula
works for most experiments, but as I understand is no good for some experiments
at PSI. Other than inventing a better formula that would work for everybody in
every case, one can also assign event id's manually by creating appropriate
entries in "/history/events".

This code has been used at CERN for running ALPHA since last Summer and it will
be used extensively at TRIUMF for T2K/ND280 slow controls. Per-variable history
is also required for the pending implementation of "history logged directly to
an SQL database", to be used at T2K/ND280.

If history (ahem) is any guide, we will now have a brief period of fixing merge
errors and "works for me" mistakes.

K.O.
    Reply  23 Mar 2008, Konstantin Olchanski, Info, Per-variable history implementation in the mlogger 
> The changes to mlogger implementing per-variable history have been committed to
> svn. Revision 4145.

To make code changes more clear, the commit was done in 3 stages:

revision 4142+4143 are minor fixes, refactoring (switch the code to use helper
functions) and implementation of history for structured banks
revision 4144 implements the per-variable history
revision 4145 is minor cleanup.

K.O.
       Reply  27 Nov 2008, Konstantin Olchanski, Info, Fixed mlogger crash, was Per-variable history implementation in the mlogger 
> revision 4142+4143 are minor fixes, refactoring (switch the code to use helper
> functions) and implementation of history for structured banks

The implementation of "history for structured banks" had a bug - tags inside
structured banks were counted incorrectly, leading to memory overwrites and mlogger
crash in open_history().

This is problem is now fixed (plus added assert() checks to crash-out if overwrite of
tags[] array is detected).

svn revision 4398.
K.O.
    Reply  25 Mar 2008, Stefan Ritt, Info, Per-variable history implementation in the mlogger 
Before approving the code, two conditions have to be fulfilled:

1) The code has to work on PSI experiments
2) The code must work without any SQL database

Concerning point 1), you correctly mentioned that the event numbering does not work
if there are more than 1000 variables per event. What I do not want is that there
will be a special T2K midas version and a special PSI version. This would make
maintenance horrible in the future. One could make the formula variable with id =
ev_id*n+var_n, where n is not fixed to 1000, but variable (stored in the ODB). The
down side would be that if you analyze your history files offline (outside the
experiment) you have to know a priori n in order to read back the data. If you have
990 variables, then you add 20, then you modify n from 1000 to 1500, then you would
screw up yourself since you cannot read the old data any more. 

Taking all this into account, I see no clean way to fix this except to modify the
database format (which you change anyhow "somehow" going to per-variable mode). Use a
32-bit ID for the event (16-bit) and the variable (16-bit). This will increase the
overhead, but only marginally, since there is already a 32-bit time stamp. But this
method would then work for all experiments at all times. I suspect that even in T2K
you will come at some point to a configuration where you have move than n variables
per event, whatever n is. So even you would benefit.

Concernign ponit 2), I like your ODBC approach. I never used it, but if you tell me
it works on all supported OSes it's fine with me, but make sure it compiles under
Windows (with the help of Pierre). One thing I would make sure however is that it
runs by default without setting up a database. There are many experiments out there
which do not need a SQL database, and it would be a hassle for them all to set up a
database, just to continue running. So by default I would use either the current flat
file system, and then per configuration enable ODBC, with bindings to MySQL pgSQL and
maybe SQLite3.

Cheers,

  Stefan
Entry  27 Nov 2008, Konstantin Olchanski, Info, lazylogger updated 
lazylogger was updated to improve handling of the list of runs still on disk
(odb /Lazy/xxx/List).

Previously, each and every run was listed in the List arrays. With modern
Terabyte-sized data disks, many many days worth of runs tend to remain on disk
and these List arrays were getting too big, inflating the size of ODB dumps
written by mlogger into the output data file and slowing down starting and
stopping of runs considerably.

Now, the runs are listed as ranges of "first run" - "last run", (see example below).

This significantly reduces the size of the "List" arrays and makes lazylogger
usable for the ALPHA experiment at CERN and for T2K/ND280 prototype DAQ at
TRIUMF (writing to Castor and Dcache respectively, using the newly added
"Script" method).

The new List format is fully compatible with the old format and you can update
and run the new lazylogger without changing anything in ODB. New runs will be
added to the List arrays in the new format and data in the old format will
eventually go away as old runs are removed from disk.

svn revision 4394.
K.O.

Example: this reads like this:
range from 7100 to 7154
range from 7157 to 7161 (7155-7156 are missing)
range from 7163 to 7168 (7162 is missing)
runs 7170, 7173, 7176
range from 7179 to 7182
and so forth.

ODB /Lazy/Dcache/List
007100
[0] 7100 (0x1BBC)
[1] -7154 (0xFFFFE40E)
[2] 7157 (0x1BF5)
[3] -7161 (0xFFFFE407)
[4] 7163 (0x1BFB)
[5] -7168 (0xFFFFE400)
[6] 7170 (0x1C02)
[7] 7173 (0x1C05)
[8] 7176 (0x1C08)
[9] 7179 (0x1C0B)
[10] -7182 (0xFFFFE3F2)
[11] 7184 (0x1C10)
[12] 7188 (0x1C14)
[13] -7199 (0xFFFFE3E1)
007200
[0] 7200 (0x1C20)
[1] -7225 (0xFFFFE3C7)
Entry  14 Oct 2004, Konstantin Olchanski, Bug Report, lazylogger complains about zero-size files 
With latest midas, I see this:

Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
file run17567.ybs doesn't exists
Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
file run17567.ybs doesn't exists

The file run17567.ybs has size zero:

-rw-r--r--    1 twistonl users      950272 Oct 13 19:29
/twist/data_onl/current/run17565.ybs
-rw-r--r--    1 twistonl users      950272 Oct 13 19:45
/twist/data_onl/current/run17566.ybs
-rw-r--r--    1 twistonl users           0 Oct 13 20:00
/twist/data_onl/current/run17567.ybs
-rw-r--r--    1 twistonl users      983040 Oct 13 20:03
/twist/data_onl/current/run17568.ybs
-rw-r--r--    1 twistonl users      950272 Oct 13 20:26
/twist/data_onl/current/run17569.ybs

I am not sure how to fix this lazylogger logic. Please help.

K.O.
    Reply  27 Nov 2008, Konstantin Olchanski, Bug Report, lazylogger complains about zero-size files 
I now have a better understanding of this: lazylogger uses ss_file_size() to find
out if a file exists or not. This function used to return 0 (probably) for
non-existant files (there was no check for error status from stat() system call,
so the return value for non-existant files was never well defined).

With ss_file_size() returning 0 for nonexistant files, 0-size files clearly cause
problems to lazylogger.

Now, since svn revision 4397, ss_file_size() returns -1 for non-existant files,
but lazylogger still needs to be tought about this.

The problem "lazylogger does not like 0-size files" remains for now.

K.O.


> With latest midas, I see this:
> 
> Thu Oct 14 19:31:17 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> Thu Oct 14 19:31:27 2004 [Lazy_Tape] [lazylogger.c:1717:Lazy] lazy_file_exists
> file run17567.ybs doesn't exists
> 
> The file run17567.ybs has size zero:
> 
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:29
> /twist/data_onl/current/run17565.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 19:45
> /twist/data_onl/current/run17566.ybs
> -rw-r--r--    1 twistonl users           0 Oct 13 20:00
> /twist/data_onl/current/run17567.ybs
> -rw-r--r--    1 twistonl users      983040 Oct 13 20:03
> /twist/data_onl/current/run17568.ybs
> -rw-r--r--    1 twistonl users      950272 Oct 13 20:26
> /twist/data_onl/current/run17569.ybs
> 
> I am not sure how to fix this lazylogger logic. Please help.
> 
> K.O.
Entry  26 Nov 2008, Jimmy Ngai, Info, Send email alert in alarm system 
Dear All,

We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
laboratory environment or the liquid level reached some critical levels. Can 
MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
triggered? If yes, how can I config it?

Many thanks!

Best Regards,
Jimmy
    Reply  26 Nov 2008, Stefan Ritt, Info, Send email alert in alarm system 
> We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
> sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
> laboratory environment or the liquid level reached some critical levels. Can 
> MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
> triggered? If yes, how can I config it?

Sure that's possible, that's why MIDAS contains an alarm system. To use it, define 
an ODB alarm on your liquid level, like

/Alarms/Alarms/Liquid Level
Active	                 y
Triggered	         0 (0x0)
Type	                 3 (0x3)
Check interval	        60 (0x3C)
Checked last	1227690148 (0x492D10A4)
Time triggered first	(empty)
Time triggered last	(empty)
Condition	        /Equipment/Environment/Variables/Input[0] < 10
Alarm Class	        Level Alarm
Alarm Message	        Liquid Level is only %s

The Condition if course might be different in your case, just select the correct 
variable from your equipment. In this case, the alarm triggers an alarm of class 
"Level Alarm". Now you define this alarm class:

/Alarms/Classes/Level Alarm
Write system message	y
Write Elog message	n
System message interval	600 (0x258)
System message last	0 (0x0)
Execute command	        /home/midas/level_alarm '%s'
Execute interval	1800 (0x708)
Execute last	        0 (0x0)
Stop run	        n
Display BGColor	        red
Display FGColor	        black

The key here is to call a script "level_alarm", which can send emails. Use 
something like:

#/bin/csh
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'

The second command just generates a midas system message for confirmation. Most 
cell phones (depends on the provider) have an email address. If you send an email 
there, it gets translated into a SMS message.

The script file above can of course be more complicated. We use a perl script 
which parses an address list, so everyone can register by adding his/her email 
address to that list. The script collects also some other slow control variables 
(like pressure, temperature) and combines this into the SMS message.

For very sensitive systems, having an alarm via SMS is not everything, since the 
alarm system could be down (computer crash or whatever). In this case we use 
'negative alarms' or however you might call it. The system sends every 30 minutes 
an SMS with the current levels etc. If the SMS is missing for some time, it might 
be an indication that something in the midas system is wrong and one can go there 
and investigate.
Entry  20 Nov 2008, Jimmy Ngai, Info, Recommended platform for running MIDAS 
Dear All,

Is there any recommended platforms for running MIDAS? Have anyone encountered 
problems when running MIDAS on Scientific Linux?

Thanks.

Jimmy
    Reply  20 Nov 2008, Stefan Ritt, Info, Recommended platform for running MIDAS 
> Dear All,
> 
> Is there any recommended platforms for running MIDAS? Have anyone encountered 
> problems when running MIDAS on Scientific Linux?
> 
> Thanks.
> 
> Jimmy

I run MIDAS on scientific Linux 5.1 without any problem.
Entry  20 Oct 2008, Suzannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
I am using an external Custom web page via a link in the ODB in /Custom, and
Javascript to add customscript button(s) and run start/stop buttons.

After executing these buttons, instead of returning to the custom page, or
to the Midas main status page, there is an error page generated:

Invalid custom page: NULL path
and the URL is 

http://lxfred:8082/CS/

The behaviour is the same whether the custom page replaces the main status page
or not.

I am using
MIDAS version 2.0.0
mhttpd.c SVN Rev 4282

In an older version of mhttpd.c, buttons of this type used to return to the
Midas main status page regardless of whether the custom page replaced the status
page. I found this behaviour annoying, and I made a custom mhttpd.c that
returned to the custom page. 
Would it be possible to fix this problem, and to return to the custom page after
pressing the buttons?


Here is the Javascript to add the buttons:

<script type="text/javascript">
var rstate = '<odb src="/runinfo/run state">'

 if (rstate == 1) // stopped
    document.write('<input name="cmd" value="Start" type="submit">')
 else if (rstate == 2 // paused
    document.write('<input name="cmd" value="Resume" type="submit">')
 else  // running
 {
    document.write('<input name="cmd" value="Stop" type="submit">')
    document.write('<input name="cmd" value="Pause" type="submit">')
 }

 if (rstate == 1) // stopped
    document.write('<input name="customscript" value="tri_config" type="submit">');
</script>
    Reply  29 Oct 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
To fix this problem, do the following:

- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:

  <input type=hidden name="redir" value="name">

  where "name" is the name of your custom page which follows the CS/ in the URL. Like 
if you have a custom page which you access through httpd://localhost/CS/junk then the 
tag would be 

  <input type=hidden name="redir" value="junk">

The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
custom page. You can also define another custom page as the target, if that makes 
sense in your application.

Pierre: Would be nice to document this somewhere more officially.
       Reply  04 Nov 2008, Suannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
Thanks Stefan. 
Your fix works nicely with the start/stop buttons not returning to the same or to a
different web page.

However, it does not seem to have fixed the problem with the Customscript button. It does
not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
appear when the run starts).


> To fix this problem, do the following:
> 
> - Update to the current SVN revision 4368 of mhttpd.c
> - Add following tag into your custom page:
> 
>   <input type=hidden name="redir" value="name">
> 
>   where "name" is the name of your custom page which follows the CS/ in the URL. Like 
> if you have a custom page which you access through httpd://localhost/CS/junk then the 
> tag would be 
> 
>   <input type=hidden name="redir" value="junk">
> 
> The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
> custom page. You can also define another custom page as the target, if that makes 
> sense in your application.
> 
> Pierre: Would be nice to document this somewhere more officially.
          Reply  09 Nov 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors 
> Thanks Stefan. 
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
> 
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).

That has been fixed in rev. 4377
ELOG V3.1.4-2e1708b5