Back Midas Rome Roody Rootana
  Midas DAQ System, Page 20 of 152  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Authordown Topic Subject
  501   19 Sep 2008 Stefan RittInfoLazylogger logging changed
I modified the logging behavior of lazylogger. Originally, it was writing 
messages (run copied, removed, ...) both into midas.log and 
lazy_log_update.log. Since we have many files, it kind of clutters up the 
logging files. I think it is a good idea to have a separate file (which I 
changed not to "lazy.log" instead of "lazy_log_update.log" which I guess was a 
bug), so I put the logging into the main file under a conditional compile:

#ifdef WRITE_MIDAS_LOG
   cm_msg(MINFO, "lazy_log_update", str);
#endif

so it can be turned on again by adding -DWRITE_MIDAS_LOG to the compile line. 
If other experiments have different needs, one could make the logging behavior 
controllable through the ODB. In that case, I would suggest a single parameter 
"Logging file" which can be either "midas.log" for the normal logging or 
"lazy.log" for logging into the extra file. I guess having the messages twice 
on the system is not needed by any experiment.

- Stefan
  504   11 Oct 2008 Stefan RittBug Reportmhttpd "messages" broken
> mhttpd "messages" page stopped working after svn revision 4327 because of uninitialized variable 
> "filename2" in midas.c:cm_message_retrieve(). Attached patch fixes the problem for me.
> K.O.
> 
> 
> --- src/midas.c (revision 4342)
> +++ src/midas.c (working copy)
> @@ -978,6 +978,8 @@
>        size = sizeof(filename);
>        db_get_value(hDB, 0, "/Logger/Message file", filename, &size, TID_STRING, TRUE);
>  
> +      strlcpy(filename2, filename, sizeof(filename2));
> +
>        if (strchr(filename, '%')) {
>           /* replace strings such as midas_%y%m%d.mid with current date */
>           tzset();

Ups, was my fault, sorry. I committed your change.
  505   13 Oct 2008 Stefan RittInfomhttpd multi-experiment support removed
Previously, one mhttpd server could sever several experiments at the same time. 
This caused however sometimes problems and was hard to maintain. Starting from 
SVN revision 4348, I removed the multi-experiment support, which I believe is 
now a much cleaner implementation. So if several experiments are defined on a 
computer, each one need a separate mhttpd process listening on a different 
port. The experiment name can now be supplied on the command line to mhttpd 
like for any other midas program. I have tested this so far at two experiments 
at PSI, but this does not cover all possibilities. What I did not try was 
experiments with web passwords and odb passwords. If there is any problem after 
upgrading to 4348, please report.
  508   18 Oct 2008 Stefan RittInfomlogger async transitions, etc
> I suspect mlogger uses ASYNC transactions exactly to avoid
> this type of deadlock (mlogger used ASYNC transactions since svn revision 2, the
> beginning of time).

That's exactly the case. If you would have asked me, I would have told you 
immediately, but it is also good that you re-confirmed the deadlock behavior with 
the SYNC flag. I didn't check this for the last ten years or so.

Making the buffers bigger is only a partial solution. Assume that the disk gets 
slow for some reason, then any buffer will fill up and you get the dead lock.

The only real solution is to put the logic into a separate thread. So the thread 
does all the RPC communication with the clients, while the main logger thread logs 
data as usual in parallel. The problem is that the RPC layer is not yet completely 
tested to be thread safe. I put some mutex and you correctly realized that these 
are system wide, but you want a local mutex just for the logger process. You need 
also some basic communication between the "run stop thread" and the "logger main 
thread". Maybe Pierre remembers that once there was the problem that the logger did 
not know when all events "came down the pipe" and could close the file. He added 
some delay which helped most of the time. But if we would have some communication 
from the "run stop thread" telling the main thread that all programs except the 
logger have stopped the run, then the logger only has to empty the local system 
buffer and knows 100% that everything is done.

In the MEG experiment we have the same problem. We need a certain sequence 
(basically because we have 9 front-ends and one event builder, which has to be 
called after the front-ends). We realized quickly that the logger cannot stop the 
run, so we wrote a little tool "RunSubmit", which is a run sequence with scripting 
facility. So you write a XML file, telling RunSubmit to start 10 runs, each with 
5000 events. RunSubmit now watches the run statistics and stops the run. Since it's 
outside the logger process, there is no dead lock. Unfortunately RunSubmit was 
written by one of our students and contains some MEG specific code. Otherwise it 
could be committed to the distribution.

So I feel that a separate thread for run stop (and maybe even start) would be a 
good thing, but I'm not sure when I will have time to address this issue.

- Stefan
  512   22 Oct 2008 Stefan RittForumMixed CAMAC/VME frontend, SIS3100
> Dear MIDAS-addicts,
> 
> I would like to hear your opinion on this:
> We've until now used CAMAC with Hytec 1331 controllers. We're using Yale FADCs 
> whose readout takes ages in CAMAC (2048 samples take 2 milliseconds to be 
> read). We've got 20+ FADC channels (we usually read only 2-3)
> 
> Now we've had the brilliant idea to replace the Yale FADCs with some VME 
> digitizer and we now plan to buy a Struck SIS 1100/3100 PCI-VME controller,
> plus 4 pc. CAEN 1720 8ch 12bit, 250MHz WFD.
> 
> (1) Can anybody comment on this choice? Good experiences/problems?
> 
> We are still using the CAMAC stuff for all other modules (TDCs, ADCs, 
> scalers). So my plan is to have ONE frontend who reads both the CAMAC modules 
> and the VME modules.
> 
> (2) Is it possible to build and run a dual-controller frontend for both CAMAC 
> and VME? Does anybody have experience with that? Or is it a stupid idea?
> 
> I'd appreciate any hints.
> 
> [Edit: We're using Linux]
> 
> Thanks a lot,
> 
> Randolf

Dear Randolf,

I used some time ago several HYTEC 1331 controllers together with the Struck 
SIS3100. Since the HYTEC is IO-mapped and the SIS3100 is memory mapped, there was 
no problem in running them in parallel. Note however that there will soon be an 
improved version of the SIS3100 with improved speed, and also CAEN plans a WFD 
with 32 channels, 6 GSPS, 12 bit, using the DRS chip for the next year. I don't 
know if you need that, but just that you know.

Best regards, 
  Stefan
  517   28 Oct 2008 Stefan RittBug ReportInconsistent handling of odb and evet buffer timeouts
> In midas.c there are several places where client last activity time stamps are checked against the 
> watchdog timeout and the clients are declared dead if they fail to update their activity time stamps. 
> ODB time stamps and data buffer time stamps appear to be handled in a similar manner.
> 
> Most checks are done like this:
> 
> now = ss_millitime();
> if (client->watchdog > 0      <----- check that the watchdog is enabled
>     && now > client->last_activity    <---- check for crazy time stamps from the future
>     && now - client->last_activity > client->watchdog_timeout)   <--- normal timeout
>         remove_client(client);
> 
> But in a few places, the extra checks are missing:
> 
> now = ss_millitime();
> if (now - client->last_activity > client->watchdog_timeout)
>         remove_client(client);
> 
> Is this an oversight from when additional checks were added?
> Should I make all checks read like the first one?
> 
> K.O.

This is on purpose. Inside cm_watchdog(), the system check for client->watchdog > 0. If the watchdog 
timeout is zero, the client is not removed. This feature is used if you debug a program. If you come to a 
breakpoint and sit there for a while, you might be declared dead and the application is removed from the 
ODB, meaning that you cannot continue debugging (on the next ODB access the application asserts). This 
can be avoided by setting the watchdog to zero, which is implemented in most applications by supplying 
"-d" on the command line. Now assume you debug a program, so you set the watchdog timeout to zero, but in 
the debugging session you decide to quit. Since the watchdog timeout is zero, you will never be removed 
from the ODB. Therefore, the code inside cm_cleanup() doe NOT check client->watchdog > 0. Therefore, a 
"cleanup" inside odbedit will even remove clients having the timeout set to zero. 

Now there might be more clever ways to accomplish that, but that's how it is implemented right now.
  518   28 Oct 2008 Stefan RittBug Reportstrange output from "odbedit cleanup"
> When I run odbedit remotely (odbedit -h ladd09), the "cleanup" command unexpectedly produces the 
> output of the "sor" command (sure enough, there is a call to db_get_open_records() there), but when I run 
> it locally, I do not get this output (but db_get_open_records() is still called). Strange. K.O.

The db_get_open_records() call was by mistake there, I removed it. What remains is that the notification 
message if a client is removed from the ODB goes through the system messages. When running locally, odbedit 
echoes it's own messages, but when running remotely, this is not the case. So the messages can be seen by 
everybody else (plus it ends up in the message file), but not by the remote odbedit where the cleanup is 
started. The quick fix for that is to say "old" in odbedit which shows the last few lines of the message 
file, so one can see any successful cleanup. 
  519   28 Oct 2008 Stefan RittInfomscb timeouts and retries
> A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
> applications:
> 
> +   int EXPRT mscb_get_max_retry();
> +   int EXPRT mscb_set_max_retry(int max_retry);
> +   int EXPRT mscb_get_usb_timeout();
> +   int EXPRT mscb_set_usb_timeout(int timeout);
> +   int EXPRT mscb_get_eth_max_retry();
> +   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

In the spirit of this, a variable retry scheme has been implemented in the mscbdev.c device driver. At the 
MEG experiment, we have one mscb device which is pretty slow, while the others are fast. Therefore it is 
necessary to have a per-device max retry count which can be different for different submasters. I moved 
therefore the max_eth_retry variable into the mscb_fd structure and adjusted a few functions accordingly. I 
did not bother with the other timeouts and retries, since I don't need this for the moment, but it would be 
nice if they would be handled in the same way. Then I added code into mscbdev.c to read the retry variable 
form the ODB under /Equipment/<name>/Settings/Device/<Name>/Retries. The default is 10, but it can be 
changed and becomes valid after the program has been restarted. 
  520   29 Oct 2008 Stefan RittBug Reportcustom web pages: customscript buttons and start/stop buttons generate errors
To fix this problem, do the following:

- Update to the current SVN revision 4368 of mhttpd.c
- Add following tag into your custom page:

  <input type=hidden name="redir" value="name">

  where "name" is the name of your custom page which follows the CS/ in the URL. Like 
if you have a custom page which you access through httpd://localhost/CS/junk then the 
tag would be 

  <input type=hidden name="redir" value="junk">

The "redir" parameter is now evaluated inside mhttpd and brings you back to the proper 
custom page. You can also define another custom page as the target, if that makes 
sense in your application.

Pierre: Would be nice to document this somewhere more officially.
  525   09 Nov 2008 Stefan RittBug Reportbool values in "/custom/images/my_image.gif/labels/src" seem to lose their format string
> Not sure if this is a bug or a feature:
> 
> Writing a boolean label on an image seems to produce rather strange behaviour.
> 
> For example,
> odb>ls /Equipment/gas/settings/my_bool -lt
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> my_bool                         BOOL    1     4     14m  0   RWD  y
> 
> 
> odb>cd /custom/images/my_image.gif/labels
> odb>ls
> Src                             /Equipment/gas/settings/my_bool
> Format                          val: %d (bool)
> Font                            Medium
> X                               10
> Y                               10
> Align                           0
> FGColor                         FFFFFF
> BGColor                         FF8800
> 
> Instead of the expected string "val: y (bool)", only the value of the key
> appears, i.e. "y". 
> The behaviour is the same whether I use %d, %u, %s, %c etc as the format character. 

That has been fixed in rev. 4379
  526   09 Nov 2008 Stefan RittSuggestion<odb ... edit=1> buttons and javascript
> When writing custom webpages, it would be nice to be able to write code such as
> 
> <odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset (ms)" edit=1>
> 
> from Javascript, e.g.
> <script  type="text/javascript">
> if ( flag != 3)
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans3/time offset
> (ms)" edit=1>ms');
> else
>    document.write('<odb src="/Equipment/TITAN_ACQ/ppg cycle/trans4/time offset
> (ms)" edit=1>ms');
> </script>
> 
> This is not translated correctly by mhttpd; the final quote and bracket get
> stripped off, and it gives Javascript error
> 
>  Error: unterminated string literal
> Source File: http://titan04:8089/CS/ppg_cycle?cmd=Edit&index=11
> Line: 477, Column: 18
> Source Code:
>    document.write('<input type=text size=10 maxlength=80 name=value value="1">
> 
> I can get round this by using an input box and a combination of ODBGet and
> ODBSet, but it would be easier if the edit=1 form above worked correctly, or
> there was a command like ODBSet that would accept input from the user.
> 
> Thanks.
> 
>  would be nice is there was a command such as ODBGet or ODBSet that would work
> with javascript to 

Actually that won't work, even if I would fix it. The <odb> tag is evaluated on the
server side (mhttpd), where is gets replaced by the actual ODB value. But if you
use JavaScript to generate the <odb> tag dynamically, this only happens on the
client side, so the server has no chance to substitute them. So you have to go with
ODBGet's I'm afraid. Nevertheless, I changed the code such that any ODB tags inside
a JavaScript is not interpreted by mhttpd.
  527   09 Nov 2008 Stefan RittBug Reportcustom web pages: customscript buttons and start/stop buttons generate errors
> Thanks Stefan. 
> Your fix works nicely with the start/stop buttons not returning to the same or to a
> different web page.
> 
> However, it does not seem to have fixed the problem with the Customscript button. It does
> not seem to pick up the redirect, nor do the Pause/Resume buttons (which are programmed to
> appear when the run starts).

That has been fixed in rev. 4377
  529   20 Nov 2008 Stefan RittInfoRecommended platform for running MIDAS
> Dear All,
> 
> Is there any recommended platforms for running MIDAS? Have anyone encountered 
> problems when running MIDAS on Scientific Linux?
> 
> Thanks.
> 
> Jimmy

I run MIDAS on scientific Linux 5.1 without any problem.
  531   26 Nov 2008 Stefan RittInfoSend email alert in alarm system
> We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
> sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
> laboratory environment or the liquid level reached some critical levels. Can 
> MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
> triggered? If yes, how can I config it?

Sure that's possible, that's why MIDAS contains an alarm system. To use it, define 
an ODB alarm on your liquid level, like

/Alarms/Alarms/Liquid Level
Active	                 y
Triggered	         0 (0x0)
Type	                 3 (0x3)
Check interval	        60 (0x3C)
Checked last	1227690148 (0x492D10A4)
Time triggered first	(empty)
Time triggered last	(empty)
Condition	        /Equipment/Environment/Variables/Input[0] < 10
Alarm Class	        Level Alarm
Alarm Message	        Liquid Level is only %s

The Condition if course might be different in your case, just select the correct 
variable from your equipment. In this case, the alarm triggers an alarm of class 
"Level Alarm". Now you define this alarm class:

/Alarms/Classes/Level Alarm
Write system message	y
Write Elog message	n
System message interval	600 (0x258)
System message last	0 (0x0)
Execute command	        /home/midas/level_alarm '%s'
Execute interval	1800 (0x708)
Execute last	        0 (0x0)
Stop run	        n
Display BGColor	        red
Display FGColor	        black

The key here is to call a script "level_alarm", which can send emails. Use 
something like:

#/bin/csh
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'

The second command just generates a midas system message for confirmation. Most 
cell phones (depends on the provider) have an email address. If you send an email 
there, it gets translated into a SMS message.

The script file above can of course be more complicated. We use a perl script 
which parses an address list, so everyone can register by adding his/her email 
address to that list. The script collects also some other slow control variables 
(like pressure, temperature) and combines this into the SMS message.

For very sensitive systems, having an alarm via SMS is not everything, since the 
alarm system could be down (computer crash or whatever). In this case we use 
'negative alarms' or however you might call it. The system sends every 30 minutes 
an SMS with the current levels etc. If the SMS is missing for some time, it might 
be an indication that something in the midas system is wrong and one can go there 
and investigate.
  536   01 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.

The test program gave under 64-bit SL5:

For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'

Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program.
  538   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler 
complains that stat64 is undefined. Since many Makefiles for front-ends out there do 
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used 
if that flag is defined:

#ifdef _LARGEFILE64_SOURE
   struct stat64 stat_buf;
   int status;

   /* allocate buffer with file size */
   status = stat64(path, &stat_buf);
   if (status != 0)
      return -1;
   return (double) stat_buf.st_size;
#else
   ...
  540   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux

K.O. wrote:
This does not work (observe the typoe in the #ifdef).


Sorry for that, I fixed and committed it.


K.O. wrote:
But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems)..


Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.

If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit.
  542   14 Dec 2008 Stefan RittInfoCustom page which executes custom function
> How can I add a button at the top of the "Status" webpage which will show a 
> page similar to the "CNAF" one after I click on it? and how can I make a 
> custom page similar to "CNAF" which allow me to call some custom funtions? I 
> want to make a page which is particularly for doing calibration.

The CNAF page calls directly functions through the RPC layer of midas, which is 
not possible from custom pages. All you can do is to execute a scrip on the 
server side, which then causes some action. For details please consult the 
documentation.
  545   22 Dec 2008 Stefan RittBug ReportOverflow on "cm_msg" command generates segfault
> The following error has been reported to me by T2K colleagues:
> 
> When using  "odbedit -c "msg my_message", the following behavior 
> has been observed depending on the length "n" of the message. 
> 
> 1)  n < 100        All is well
> 2)  100 <= n < 245 Log not written but exit code = 0
> 3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1
> 
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
> 
> Please fix

Uhhh, who wants this long messages? You should consider to split this into several 
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in 
SVN revision 4422. I increased the maximum length to 1000 characters. Above that, 
the message gets truncated. If you need even more, we can make it a #define.

The second problem you describe (logging from compiled C code) I could not 
reproduce, so maybe it was related to the first one. Please try again and report 
if it persists.
  549   13 Jan 2009 Stefan RittForummlogger problem
> Hi,
> 
> I am running Scientific Linux with kernel 2.6.9-34.EL and  I have
> glibc-2.3.4-2.25. When I run mlogger, I receive the error:
> 
> *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> Aborted
> 
> Any ideas?

Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared 
memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this 
problem does not occur. If you cannot fix it, try to run mlogger inside gdb and 
make a stack trace to see who called the free().
ELOG V3.1.4-2e1708b5