ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 38 of 49

Not logged in

Find | Login | Help

Full | Summary | Threaded | Collapse | Expand

975 Entries

Goto page Previous 1, 2, 3 ... 37, 38, 39 ... 47, 48, 49 Next

20 Nov 2008, Jimmy Ngai, Info, Recommended platform for running MIDAS

Dear All,

Is there any recommended platforms for running MIDAS? Have anyone encountered 
problems when running MIDAS on Scientific Linux?

Thanks.

Jimmy

20 Nov 2008, Stefan Ritt, Info, Recommended platform for running MIDAS

> Dear All,
> 
> Is there any recommended platforms for running MIDAS? Have anyone encountered 
> problems when running MIDAS on Scientific Linux?
> 
> Thanks.
> 
> Jimmy

I run MIDAS on scientific Linux 5.1 without any problem.

20 Oct 2008, Suzannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors

I am using an external Custom web page via a link in the ODB in /Custom, and
Javascript to add customscript button(s) and run start/stop buttons.

After executing these buttons, instead of returning to the custom page, or
to the Midas main status page, there is an error page generated:

Invalid custom page: NULL path
and the URL is 

http://lxfred:8082/CS/

The behaviour is the same whether the custom page replaces the main status page
or not.

I am using
MIDAS version 2.0.0
mhttpd.c SVN Rev 4282

In an older version of mhttpd.c, buttons of this type used to return to the
Midas main status page regardless of whether the custom page replaced the status
page. I found this behaviour annoying, and I made a custom mhttpd.c that
returned to the custom page. 
Would it be possible to fix this problem, and to return to the custom page after
pressing the buttons?


Here is the Javascript to add the buttons:

<script type="text/javascript">
var rstate = '<odb src="/runinfo/run state">'

 if (rstate == 1) // stopped
    document.write('<input name="cmd" value="Start" type="submit">')
 else if (rstate == 2 // paused
    document.write('<input name="cmd" value="Resume" type="submit">')
 else  // running
 {
    document.write('<input name="cmd" value="Stop" type="submit">')
    document.write('<input name="cmd" value="Pause" type="submit">')
 }

 if (rstate == 1) // stopped
    document.write('<input name="customscript" value="tri_config" type="submit">');
</script>

29 Oct 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors

To fix this problem, do the following:

- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:

  <input type=hidden name="redir" value="name">

  where "name" is the name of your custom page which follows the CS/ in the URL. Like 
if you have a custom page which you access through httpd://localhost/CS/junk then the 
tag would be 

  <input type=hidden name="redir" value="junk">

The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
custom page. You can also define another custom page as the target, if that makes 
sense in your application.

Pierre: Would be nice to document this somewhere more officially.

04 Nov 2008, Suannah Daviel, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors

Thanks Stefan. 
Your fix works nicely with the start/stop buttons not returning to the same or to a
different web page.

However, it does not seem to have fixed the problem with the Customscript button. It does
not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
appear when the run starts).


> To fix this problem, do the following:
> 
> - Update to the current SVN revision 4368 of mhttpd.c
> - Add following tag into your custom page:
> 
>   <input type=hidden name="redir" value="name">
> 
>   where "name" is the name of your custom page which follows the CS/ in the URL. Like 
> if you have a custom page which you access through httpd://localhost/CS/junk then the 
> tag would be 
> 
>   <input type=hidden name="redir" value="junk">
> 
> The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
> custom page. You can also define another custom page as the target, if that makes 
> sense in your application.
> 
> Pierre: Would be nice to document this somewhere more officially.

09 Nov 2008, Stefan Ritt, Bug Report, custom web pages: customscript buttons and start/stop buttons generate errors

> Thanks Stefan. 
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
> 
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).

That has been fixed in rev. 4377

04 Nov 2008, Suzannah Daviel, Suggestion, <odb ... edit=1> buttons and javascript

When writing custom webpages, it would be nice to be able to write code such as

<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset (ms)" edit=1>

from Javascript, e.g.
<script  type="text/javascript">
if ( flag != 3)
   document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset
(ms)" edit=1>ms');
else
   document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans4/time offset
(ms)" edit=1>ms');
</script>

This is not translated correctly by mhttpd; the final quote and bracket get
stripped off, and it gives Javascript error

 Error: unterminated string literal
Source File: http://titan04:8089/CS/ppg_cycle?cmd=Edit&index=11
Line: 477, Column: 18
Source Code:
   document.write('<input type=text size=10 maxlength=80 name=value value="1">

I can get round this by using an input box and a combination of ODBGet and
ODBSet, but it would be easier if the edit=1 form above worked correctly, or
there was a command like ODBSet that would accept input from the user.

Thanks.

 would be nice is there was a command such as ODBGet or ODBSet that would work
with javascript to

09 Nov 2008, Stefan Ritt, Suggestion, <odb ... edit=1> buttons and javascript

> When writing custom webpages, it would be nice to be able to write code such as
> 
> <odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset (ms)" edit=1>
> 
> from Javascript, e.g.
> <script  type="text/javascript">
> if ( flag != 3)
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset
> (ms)" edit=1>ms');
> else
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans4/time offset
> (ms)" edit=1>ms');
> </script>
> 
> This is not translated correctly by mhttpd; the final quote and bracket get
> stripped off, and it gives Javascript error
> 
>  Error: unterminated string literal
> Source File: http://titan04:8089/CS/ppg_cycle?cmd=Edit&index=11
> Line: 477, Column: 18
> Source Code:
>    document.write('<input type=text size=10 maxlength=80 name=value value="1">
> 
> I can get round this by using an input box and a combination of ODBGet and
> ODBSet, but it would be easier if the edit=1 form above worked correctly, or
> there was a command like ODBSet that would accept input from the user.
> 
> Thanks.
> 
>  would be nice is there was a command such as ODBGet or ODBSet that would work
> with javascript to 

Actually that won't work, even if I would fix it. The <odb> tag is evaluated on the
server side (mhttpd), where is gets replaced by the actual ODB value. But if you
use JavaScript to generate the <odb> tag dynamically, this only happens on the
client side, so the server has no chance to substitute them. So you have to go with
ODBGet's I'm afraid. Nevertheless, I changed the code such that any ODB tags inside
a JavaScript is not interpreted by mhttpd.

04 Nov 2008, Suannah Daviel, Bug Report, bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string

Not sure if this is a bug or a feature:

Writing a boolean label on an image seems to produce rather strange behaviour.

For example,
odb>ls /Equipment/gas/settings/my_bool -lt
Key name                        Type    #Val  Size  Last Opn Mode Value
---------------------------------------------------------------------------
my_bool                         BOOL    1     4     14m  0   RWD  y


odb>cd /custom/images/my_image.gif/labels
odb>ls
Src                             /Equipment/gas/settings/my_bool
Format                          val: %d (bool)
Font                            Medium
X                               10
Y                               10
Align                           0
FGColor                         FFFFFF
BGColor                         FF8800

Instead of the expected string "val: y (bool)", only the value of the key
appears, i.e. "y". 
The behaviour is the same whether I use %d, %u, %s, %c etc as the format character.

09 Nov 2008, Stefan Ritt, Bug Report, bool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string

> Not sure if this is a bug or a feature:
> 
> Writing a boolean label on an image seems to produce rather strange behaviour.
> 
> For example,
> odb>ls /Equipment/gas/settings/my_bool -lt
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> my_bool                         BOOL    1     4     14m  0   RWD  y
> 
> 
> odb>cd /custom/images/my_image.gif/labels
> odb>ls
> Src                             /Equipment/gas/settings/my_bool
> Format                          val: %d (bool)
> Font                            Medium
> X                               10
> Y                               10
> Align                           0
> FGColor                         FFFFFF
> BGColor                         FF8800
> 
> Instead of the expected string "val: y (bool)", only the value of the key
> appears, i.e. "y". 
> The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 

That has been fixed in rev. 4379

06 Nov 2008, Konstantin Olchanski, Info, midas elog outage

Around Wednesday Noon, there was a power outage at triumf (loss of ups power in the triumf 
computing center) and after rebooting ladd00, https/ssl access stopped working with a complaint 
about mismatching server name and ssl certificate name. This configuration used to work, so one of the 
system updated must have broke it. This problem is now fixed and access to midas elog is restored. 
K.O.

22 Oct 2008, Konstantin Olchanski, Info, mscb timeouts and retries

A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
applications:

+   int EXPRT mscb_get_max_retry();
+   int EXPRT mscb_set_max_retry(int max_retry);
+   int EXPRT mscb_get_usb_timeout();
+   int EXPRT mscb_set_usb_timeout(int timeout);
+   int EXPRT mscb_get_eth_max_retry();
+   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

There are 3 settings:

1) mscb_max_retry: most (all?) mscb operations, like mscb_read(), retry failed mscb transactions up to 
10 times. The corresponding set and get functions allow tuning this retry limit.

2) mscb_usb_timeout: the driver for the USB-MSCB adapter uses a timeout of 6 seconds. 
mscb_set_usb_timeout() permits changing this value.

3) mscb_eth_max_retry: the driver for the Ethernet-MSCB adapter has to deal with UDP packet loss. If 
the adapter does not respond to a UDP command, the UDP command is sent again, with a bigger 
timeout (timeout = 100 * (retry+1), in ms), this is repeated up to 10 times. mscb_set_eth_max_retry() 
permits adjusting this number of retries.

This is how it works for the usb interface:

int mscb_read(...)
   for (retry=0; retry<mscb_max_retry; retry++)
       mscb_exch()
            musb_write(..., mscb_usb_timeout)
            musb_read(..., mscb_usb_timeout)     

This is how it works for the ethernet interface:

int mscb_read(...)
   for (retry=0; retry<mscb_max_retry; retry++)
       mscb_exch()
            for (retry=0; retry<mscb_eth_max_retry; retry++)
                 send_udp_command()
                 wait_for_udp_response(timeout = 100 * (retry+1))

This is how the new functions are intended to be used:
   ...
   int old = mscb_set_max_retry(2);
   ... do stuff ...
   mscb_set_max_retry(old); // restore default value

svn revision 4356.
K.O.

28 Oct 2008, Stefan Ritt, Info, mscb timeouts and retries

> A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
> applications:
> 
> +   int EXPRT mscb_get_max_retry();
> +   int EXPRT mscb_set_max_retry(int max_retry);
> +   int EXPRT mscb_get_usb_timeout();
> +   int EXPRT mscb_set_usb_timeout(int timeout);
> +   int EXPRT mscb_get_eth_max_retry();
> +   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

In the spirit of this, a variable retry scheme has been implemented in the mscbdev.c device driver. At the 
MEG experiment, we have one mscb device which is pretty slow, while the others are fast. Therefore it is 
necessary to have a per-device max retry count which can be different for different submasters. I moved 
therefore the max_eth_retry variable into the mscb_fd structure and adjusted a few functions accordingly. I 
did not bother with the other timeouts and retries, since I don't need this for the moment, but it would be 
nice if they would be handled in the same way. Then I added code into mscbdev.c to read the retry variable 
form the ODB under /Equipment/<name>/Settings/Device/<Name>/Retries. The default is 10, but it can be 
changed and becomes valid after the program has been restarted.

23 Oct 2008, Konstantin Olchanski, Bug Report, strange output from "odbedit cleanup"

When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.

28 Oct 2008, Stefan Ritt, Bug Report, strange output from "odbedit cleanup"

> When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
> output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
> it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.

The db_get_open_records() call was by mistake there, I removed it. What remains is that the notification 
message if a client is removed from the ODB goes through the system messages. When running locally, odbedit 
echoes it's own messages, but when running remotely, this is not the case. So the messages can be seen by 
everybody else (plus it ends up in the message file), but not by the remote odbedit where the cleanup is 
started. The quick fix for that is to say "old" in odbedit which shows the last few lines of the message 
file, so one can see any successful cleanup.

23 Oct 2008, Konstantin Olchanski, Bug Report, Inconsistent handling of odb and evet buffer timeouts

In midas.c there are several places where client last activity time stamps are checked against the 
watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
ODB time stamps and data buffer time stamps appear to be handled in a similar manner.

Most checks are done like this:

now = ss_millitime();
if (client->watchdog > 0      <----- check that the watchdog is enabled
    && now > client->last_activity    <---- check for crazy time stamps from the future
    && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
        remove_client(client);

But in a few places, the extra checks are missing:

now = ss_millitime();
if (now - client->last_activity > client->watchdog_timeout)
        remove_client(client);

Is this an oversight from when additional checks were added?
Should I make all checks read like the first one?

K.O.

28 Oct 2008, Stefan Ritt, Bug Report, Inconsistent handling of odb and evet buffer timeouts

> In midas.c there are several places where client last activity time stamps are checked against the 
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
> 
> Most checks are done like this:
> 
> now = ss_millitime();
> if (client->watchdog > 0      <----- check that the watchdog is enabled
>     && now > client->last_activity    <---- check for crazy time stamps from the future
>     && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
>         remove_client(client);
> 
> But in a few places, the extra checks are missing:
> 
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
>         remove_client(client);
> 
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
> 
> K.O.

This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog 
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a 
breakpoint and sit there for a while, you might be declared dead and the application is removed from the 
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This 
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying 
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in 
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed 
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a 
"cleanup" inside odbedit will even remove clients having the timeout set to zero. 

Now there might be more clever ways to accomplish that, but that's how it is implemented right now.

23 Oct 2008, Konstantin Olchanski, Bug Report, bm_wait_for_free_space never sleeps inside the mserver

When mserver receives events from remote client, writes them into a data buffer and this data buffer 
becomes 100% full, we see mserver go into 100% consumption.

It turns out this happens because bm_wait_for_free_space() never sleeps, instead, it busy-loops waiting 
for free space. bm_wait_for_free_space() does call ss_suspend(), but ss_suspend() does not sleep 
because there is pending data in the event network connection and it want to process it.

Best solution I have is to use silly "if (ss_suspend()!=SS_TIMEOUT) sleep(1);"

Also read this explanation: (bm_cleanup is needed to detect that the client holding the buffer at 100% 
full (a stuck or dead GET_ALL reader, mevb in our case), has been killed off and we can continue as 
usual)

       /* signal other clients wait mode */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = requested_space;
 
+      bm_cleanup("bm_wait_for_free_space", ss_millitime(), FALSE);
+
       status = ss_suspend(1000, MSG_BM);
 
+      /* make sure we do sleep in this loop:
+       * if we are the mserver receiving data on the event
+       * socket and the data buffer is full, ss_suspend() will
+       * never sleep: it will detect data on the event channel,
+       * call rpc_server_receive() (recursively, we already *are* in
+       * rpc_server_receive()) and return without sleeping. Result
+       * is a busy loop waiting for free space in data buffer */
+      if (status != SS_TIMEOUT)
+         sleep(1);
+
       /* validate client index: we could have been removed from the buffer */
       pheader->client[bm_validate_client_index(pbuf)].write_wait = 0;

K.O.

21 Oct 2008, Randolf Pohl, Forum, Mixed CAMAC/VME frontend, SIS3100

Dear MIDAS-addicts,

I would like to hear your opinion on this:
We've until now used CAMAC with Hytec 1331 controllers. We're using Yale FADCs 
whose readout takes ages in CAMAC (2048 samples take 2 milliseconds to be 
read). We've got 20+ FADC channels (we usually read only 2-3)

Now we've had the brilliant idea to replace the Yale FADCs with some VME 
digitizer and we now plan to buy a Struck SIS 1100/3100 PCI-VME controller,
plus 4 pc. CAEN 1720 8ch 12bit, 250MHz WFD.

(1) Can anybody comment on this choice? Good experiences/problems?

We are still using the CAMAC stuff for all other modules (TDCs, ADCs, 
scalers). So my plan is to have ONE frontend who reads both the CAMAC modules 
and the VME modules.

(2) Is it possible to build and run a dual-controller frontend for both CAMAC 
and VME? Does anybody have experience with that? Or is it a stupid idea?

I'd appreciate any hints.

[Edit: We're using Linux]

Thanks a lot,

Randolf

22 Oct 2008, Stefan Ritt, Forum, Mixed CAMAC/VME frontend, SIS3100

> Dear MIDAS-addicts,
> 
> I would like to hear your opinion on this:
> We've until now used CAMAC with Hytec 1331 controllers. We're using Yale FADCs 
> whose readout takes ages in CAMAC (2048 samples take 2 milliseconds to be 
> read). We've got 20+ FADC channels (we usually read only 2-3)
> 
> Now we've had the brilliant idea to replace the Yale FADCs with some VME 
> digitizer and we now plan to buy a Struck SIS 1100/3100 PCI-VME controller,
> plus 4 pc. CAEN 1720 8ch 12bit, 250MHz WFD.
> 
> (1) Can anybody comment on this choice? Good experiences/problems?
> 
> We are still using the CAMAC stuff for all other modules (TDCs, ADCs, 
> scalers). So my plan is to have ONE frontend who reads both the CAMAC modules 
> and the VME modules.
> 
> (2) Is it possible to build and run a dual-controller frontend for both CAMAC 
> and VME? Does anybody have experience with that? Or is it a stupid idea?
> 
> I'd appreciate any hints.
> 
> [Edit: We're using Linux]
> 
> Thanks a lot,
> 
> Randolf

Dear Randolf,

I used some time ago several HYTEC 1331 controllers together with the Struck 
SIS3100. Since the HYTEC is IO-mapped and the SIS3100 is memory mapped, there was 
no problem in running them in parallel. Note however that there will soon be an 
improved version of the SIS3100 with improved speed, and also CAEN plans a WFD 
with 32 channels, 6 GSPS, 12 bit, using the DRS chip for the next year. I don't 
know if you need that, but just that you know.

Best regards, 
  Stefan

18 Oct 2008, Konstantin Olchanski, Info, make linux32 & co

The Makefile targets for crosscompiling MIDAS are now documented in the MIDAS
Doxygen documentation:

make linux32 & make clean32
make linux64 & make clean64
make crosscompile
make dox

This has to do with which flavour of MIDAS is built by default: 32-bit or 64-bit.

This is how this works now.

Default flavour is determined by ROOT. If ROOTSYS points to 32-bit ROOT, then
32-bit MIDAS is built, if 64-bit ROOT, then 64-bit MIDAS. This works well after
the ROOT team added the correct "-m32" and "-m64" flags to "rootconfig --cflags".

If for some reason, we also need a non-default flavour of MIDAS, for example
when the main daq computer runs 64-bit MIDAS, but one frontend has to run on a
"32-bit only" VME processor, you say "make linux32". This creates the
"linux-m32/{lib,bin}" tree that you then reference in the Makefile of your
special frontend (i.e. instead of "-L$MIDASSYS/linux/lib" say
"-L$MIDASSYS/linux-m32/lib"). "make linux64" works the same way.

These non-default flavours of MIDAS are compiled with most special features
disabled: no ROOT, no MYSQL, etc.

When building "make linux32", you may also see errors caused by missing 32-bit
libraries - many 64-bit Linux distributions do not install the full 32-bit
development environment by default - so some header files and libraries may be
reported as missing. These not-installed-by-default 32-bit packages are usually
easy to install using commands like "yum install libxxx-devel.i386".

K.O.

17 Oct 2008, Konstantin Olchanski, Info, mlogger async transitions, etc

As we were looking into problems with starting and stopping runs in one of our
daq systems, we found that the mlogger does something differently compared to
mhttpd and odbedit. Starting and stopping runs from mhttpd and odbedit works
correctly, but runs restarted by the file size limit in mlogger would often have
problems.

It turns out that mlogger calls cm_transition() with the ASYNC flag, while
mhttpd and odbedit always use SYNC.

The best I can tell, the ASYNC flag tells cm_transition() to fire off the
end-run rpc calls to all clients all at once, without waiting for reply from the
previous client before calling the next one. This effectively defeats the
transition sequence numbers - higher-numbered clients are told to end-run before
the lower-numbered clients have finished their end-run processing.

Most of the time, transition sequence numbers do not matter - all frontends can
stop at the same time, only mlogger has to be the very last, and for transitions
initiated by the mlogger itself, this sequencing is preserved.

It turns out that for our system, correct sequencing of individual frontends is
important, for example, the frontend controlling the trigger system has to stop
first. As we are using correctly adjusted transition sequence numbers, the right
sequence is always done when runs are started/stopped from mhttpd and from
odbedit, but not for runs started/stopped by the mlogger.

So by changing mlogger to always do SYNC transitions, we fixed our sequencing
problem - now runs always start and stop correctly.

But then we ran into a deadlock between the mlogger and the event builder:

1) mlogger wants to stop the run
2a) mlogger stops reading the SYSTEM buffer
2b) mlogger starts cm_transition(SYNC)
3) rpc call to trigger frontend, trigger is blocked (no new events are
generated, but existing data is still flowing through the system)
4) other frontends are stopped (data still flowing)
5) data still flowing through the system, into the event builder, into the
SYSTEM buffer
6) SYSTEM buffer becomes 100% full (mlogger is not reading it, it is busy inside
cm_transition()), event builder is waiting for free space inside bm_send_event()
7) mlogger issues end-run rpc call to event builder
8) deadlock: mlogger is waiting for a reply from the event builder, the event
builder is waiting for free space in the SYSTEM buffer (not processing rpc
calls), mlogger is supposed to empty the SYSTEM buffer, but it is waiting for an
rpc reply instead.

In our particular case, the dead lock was easy to avoid by making the SYSTEM
buffer big enough to accommodate all in-flight data, but the problem remains in
the general case. I suspect mlogger uses ASYNC transactions exactly to avoid
this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
beginning of time).

Personally, I am not happy about the inconsistency of run sequencing between
mlogger and mhttpd/odbedit (hmm... should also check mfe.c, it also stops runs
based on event count limits, etc). I think it would be better if all programs
did the same exact thing when starting/stopping runs. When mlogger does
something different, we get surprising unexpected behaviour, best avoided.

One possible solution could be to add an odb variable "/logger/async
transitions", set to "false" by default - to be consistent with other programs.
Systems that benefit from the old ASYNC behaviour and do not care about exact
sequencing can set this flag to "true".

K.O.

18 Oct 2008, Stefan Ritt, Info, mlogger async transitions, etc

> I suspect mlogger uses ASYNC transactions exactly to avoid
> this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
> beginning of time).

That's exactly the case. If you would have asked me, I would have told you 
immediately, but it is also good that you re-confirmed the deadlock behavior with 
the SYNC flag. I didn't check this for the last ten years or so.

Making the buffers bigger is only a partial solution. Assume that the disk gets 
slow for some reason, then any buffer will fill up and you get the dead lock.

The only real solution is to put the logic into a separate thread. So the thread 
does all the RPC communication with the clients, while the main logger thread logs 
data as usual in parallel. The problem is that the RPC layer is not yet completely 
tested to be thread safe. I put some mutex and you correctly realized that these 
are system wide, but you want a local mutex just for the logger process. You need 
also some basic communication between the "run stop thread" and the "logger main 
thread". Maybe Pierre remembers that once there was the problem that the logger did 
not know when all events "came down the pipe" and could close the file. He added 
some delay which helped most of the time. But if we would have some communication 
from the "run stop thread" telling the main thread that all programs except the 
logger have stopped the run, then the logger only has to empty the local system 
buffer and knows 100% that everything is done.

In the MEG experiment we have the same problem. We need a certain sequence 
(basically because we have 9 front-ends and one event builder, which has to be 
called after the front-ends). We realized quickly that the logger cannot stop the 
run, so we wrote a little tool "RunSubmit", which is a run sequence with scripting 
facility. So you write a XML file, telling RunSubmit to start 10 runs, each with 
5000 events. RunSubmit now watches the run statistics and stops the run. Since it's 
outside the logger process, there is no dead lock. Unfortunately RunSubmit was 
written by one of our students and contains some MEG specific code. Otherwise it 
could be committed to the distribution.

So I feel that a separate thread for run stop (and maybe even start) would be a 
good thing, but I'm not sure when I will have time to address this issue.

- Stefan

13 Oct 2008, Konstantin Olchanski, Info, MIDAS drivers for Tundra tsi148 pci-vme bridge

The latest midas mvmestd.h driver for the Tundra tsi148 pci-vme bridge as used
on GEFANUC VME processors have been commited, revision 4349.

This midas drivers require the "gefvme" Linux kernel driver supplied by GEFANUC
as part of their Linux BSP. (Note that version "v7865-sdk-linux-R01.00" from
GEFANUC is mostly non-functional).

At TRIUMF have the V7865 VME processors and use the kernel driver
v7865-sdk-linux-R01.00-KO6. This driver supports these functions:

1) memory mapped access to full VME A16 and A24 address spaces and window-mapped
access to VME A32 address space. (original gefvme driver does not do
memory-mapped access)
2) DMA directly from vme to user memory, with support for multi-segment chained
transfers (original gefvme driver lacks chained transfers)
3) DMA from user memort to vme should work but is untested
4) no support for interrupts (original gefvme driver does not interrupts).

If you are interested in in using the TRIUMF driver, please contact me directly.

If you already purchased the GEFANUC BSP, I think you can use my drivers
immediately, without objection from GEFANUC.

Otherwise, I will have to do some research into the gefvme code license: since
all of the code appears to have GPL headers and identical code exists on the
internet, I expect to find that my gefvme driver can be freely distributed under
the GPL. But until then, and until it is cleared with TRIUMF management, I
cannot make my gefvme driver available for free download.

K.O.

13 Oct 2008, Stefan Ritt, Info, mhttpd multi-experiment support removed

Previously, one mhttpd server could sever several experiments at the same time. 
This caused however sometimes problems and was hard to maintain. Starting from 
SVN revision 4348, I removed the multi-experiment support, which I believe is 
now a much cleaner implementation. So if several experiments are defined on a 
computer, each one need a separate mhttpd process listening on a different 
port. The experiment name can now be supplied on the command line to mhttpd 
like for any other midas program. I have tested this so far at two experiments 
at PSI, but this does not cover all possibilities. What I did not try was 
experiments with web passwords and odb passwords. If there is any problem after 
upgrading to 4348, please report.

10 Oct 2008, Konstantin Olchanski, Bug Report, mhttpd "messages" broken

mhttpd "messages" page stopped working after svn revision 4327 because of uninitialized variable 
"filename2" in midas.c:cm_message_retrieve(). Attached patch fixes the problem for me.
K.O.


--- src/midas.c (revision 4342)
+++ src/midas.c (working copy)
@@ -978,6 +978,8 @@
       size = sizeof(filename);
       db_get_value(hDB, 0, "/Logger/Message file", filename, &size, TID_STRING, TRUE);
 
+      strlcpy(filename2, filename, sizeof(filename2));
+
       if (strchr(filename, '%')) {
          /* replace strings such as midas_%y%m%d.mid with current date */
          tzset();

11 Oct 2008, Stefan Ritt, Bug Report, mhttpd "messages" broken

> mhttpd "messages" page stopped working after svn revision 4327 because of uninitialized variable 
> "filename2" in midas.c:cm_message_retrieve(). Attached patch fixes the problem for me.
> K.O.
> 
> 
> --- src/midas.c (revision 4342)
> +++ src/midas.c (working copy)
> @@ -978,6 +978,8 @@
>        size = sizeof(filename);
>        db_get_value(hDB, 0, "/Logger/Message file", filename, &size, TID_STRING, TRUE);
>  
> +      strlcpy(filename2, filename, sizeof(filename2));
> +
>        if (strchr(filename, '%')) {
>           /* replace strings such as midas_%y%m%d.mid with current date */
>           tzset();

Ups, was my fault, sorry. I committed your change.

03 Oct 2008, Konstantin Olchanski, Info, Implement non-default mserver tcp port numbers.

midas revision 4342 implements non-default tcp port numbers for the mserver.

To use, run "mserver -p 7070" and say "setenv MIDAS_SERVER_HOST
host.example.com:7070".

This is useful when multiple experiments share the same computer, but one does
not want to setup a global /etc/exptab (non-root users cannot change it) or one
does not want to run the mserver from xinetd (i.e. all experiments run different
versions of midas and cannot use the same common mserver executable).

Changed files:
src/mserver.c
src/midas.c
doxfiles/utilities.dox
doxfiles/appendixD.dox

Revision 4342.

K.O.

19 Sep 2008, Stefan Ritt, Info, Lazylogger logging changed

I modified the logging behavior of lazylogger. Originally, it was writing 
messages (run copied, removed, ...) both into midas.log and 
lazy_log_update.log. Since we have many files, it kind of clutters up the 
logging files. I think it is a good idea to have a separate file (which I 
changed not to "lazy.log" instead of "lazy_log_update.log" which I guess was a 
bug), so I put the logging into the main file under a conditional compile:

#ifdef WRITE_MIDAS_LOG
   cm_msg(MINFO, "lazy_log_update", str);
#endif

so it can be turned on again by adding -DWRITE_MIDAS_LOG to the compile line. 
If other experiments have different needs, one could make the logging behavior 
controllable through the ODB. In that case, I would suggest a single parameter 
"Logging file" which can be either "midas.log" for the normal logging or 
"lazy.log" for logging into the extra file. I guess having the messages twice 
on the system is not needed by any experiment.

- Stefan

18 Sep 2008, Stefan Ritt, Info, Potential problems in multi-threaded slow control front-end

We had recently some problems at our experiment which I would like to share 
with the community. This affects however only experiments which have a slow 
control front-end in multi-threaded mode.

The problem is related with the fact that the midas API is not thread safe, so 
a device driver or bus driver from the slow control system may not call any ODB 
function. We found several drivers (mainly psi_separator.c, psi_beamline.c etc) 
which use inside read/write function the midas PAI function cm_msg() to report 
any error. While this is ok for the init section (which is executed in the main 
frontend thread) this is not ok for the read/write function inside the driver. 
If this is done anyhow, it can happen that the main thread locks the ODB (via 
db_lock_database()) and the thread interrupts that call and locks the ODB 
again. In rare cases this can cause a stale lock on the ODB. This blocks all 
other programs to access the ODB and the experiment will die loudly. It is hard 
to identify, since error messages cannot be produced any more, and remote 
programs (not affected by the lock) just show a rpc timeout.

I fixed all drivers now in our experiment which solved the problem for us, but 
I urge other people to double check their device drivers as well.

In case of problems, there is a thread ID check in 
db_lock_database()/db_unlock_database() which can be activated by supplying 

-DCHECK_THREAD_ID

in the compile command line. If then these functions are called from different 
threads, the program aborts with an assertion failure, which can then be 
debugged. 

There is also a stack history system implemented with new functions 
ss_stack_xxxx. Using this system, one can check which functions called 
db_lock_database() *before* an error occurs. Using this system, I identified 
the malicious drivers. Maybe this system can also be used in other error 
debugging scenarios.

17 Sep 2008, Stefan Ritt, Info, New flag for auto restart

A new ODB flag has been introduced. When the logger is configured for automatic 
stop and restart (/Logger/Auto restart = y), the restart delay was hard-wired 
to 20 sec., which might be too long or short for some experiments. Therefore a 
new parameter "/Logger/Auto restart delay" has been introduced which can be 
used to accommodate different delays. A non-zero delay is necessary for 
experiments where some lengthy activities occur during the stop of a run, like 
an analyzer writing many histograms to disk.

29 Aug 2008, Konstantin Olchanski, Info, history_odbc: store MIDAS history in ODBC/MySQL database

The code for storing midas history in an odbc sql database has been committed.
Changes:
include/history_odbc.h, src/history_odbc.cxx --- implementation
src/mlogger.c --- call the history_odbc functions
utils/mh2sql.cxx --- import existing midas history files (*.hst) into an odbc
sql database.

This new code is enabled by the HAVE_ODBC gunk in the Makefile. If compilation
bombs, please let me know and as a work around, comment out all instances of
HAVE_ODBC from your Makefile.

Limitations:
- mhttpd support for reading history data from odbc sql database is missing
- many sql functions are implemented in a very minimalistic form (i.e. when
defining a history event, we blindly ask sql to create the tables, even if they
already exist - this works, but spams the midas log with sql errors).
- error handling is incomplete: after any sql error, the odbc connection is closed.
- only MySQL (and ascii output) are supported: we use mysql-specific data types
as they match midas types exactly. Code to support PgSQL is present and it used
to work, but is commented out. (At TRIUMF/T2K, we intend to use MySQL exclusively).
- ODBC ascii interface is used, instead of the potentially more efficient binary
interface.

To enable:
- create a MySQL database,
- create $HOME/.odbc.ini (see attached example)
- set ODB "/History/PerVariableHistory" to "1" - the new code is intended to be
used with per-variable history. Per-equipment (traditional) history would work,
but will result in suboptimal layout of SQL tables.
- set ODB "/Logger/ODBC_DSN" to the DSN defined in .odc.ini.
- set ODB "/Logger/ODBC_Debug" to non-zero to enable debugging output from the
new code.

To use the "ascii output" mode:
Included is code to write "ascii" sql output into a text file, instead of using
an actual SQL database. To enable it, set "ODBC_DSN" to
"/path/to/some/text/file" and all SQL output will be written to this file. No
actual SQL database required. This mode exists mostly for debugging the SQL syntax.

Despite limitations, the committed code is fully functional - we are presently
using it to record history data from slow controls of T2K detector tests
(voltages, currents, temperatures).

Comments and suggestions on naming and mapping from odb structures to SQL tables
is very much welcome.

K.O.

Goto page Previous 1, 2, 3 ... 37, 38, 39 ... 47, 48, 49 Next

ELOG V3.1.4-2e1708b5