Back Midas Rome Roody Rootana
  Midas DAQ System, Page 20 of 142  Not logged in ELOG logo
ID Date Authordown Topic Subject
  531   26 Nov 2008 Stefan RittInfoSend email alert in alarm system
> We have a temperature/humidity sensor in MIDAS now and will add a liquid level 
> sensor to MIDAS soon. We want the operators to get alerted ASAP when the 
> laboratory environment or the liquid level reached some critical levels. Can 
> MIDAS send email alerts or SMS alerts to cell phones when the alarms are 
> triggered? If yes, how can I config it?

Sure that's possible, that's why MIDAS contains an alarm system. To use it, define 
an ODB alarm on your liquid level, like

/Alarms/Alarms/Liquid Level
Active	                 y
Triggered	         0 (0x0)
Type	                 3 (0x3)
Check interval	        60 (0x3C)
Checked last	1227690148 (0x492D10A4)
Time triggered first	(empty)
Time triggered last	(empty)
Condition	        /Equipment/Environment/Variables/Input[0] < 10
Alarm Class	        Level Alarm
Alarm Message	        Liquid Level is only %s

The Condition if course might be different in your case, just select the correct 
variable from your equipment. In this case, the alarm triggers an alarm of class 
"Level Alarm". Now you define this alarm class:

/Alarms/Classes/Level Alarm
Write system message	y
Write Elog message	n
System message interval	600 (0x258)
System message last	0 (0x0)
Execute command	        /home/midas/level_alarm '%s'
Execute interval	1800 (0x708)
Execute last	        0 (0x0)
Stop run	        n
Display BGColor	        red
Display FGColor	        black

The key here is to call a script "level_alarm", which can send emails. Use 
something like:

#/bin/csh
echo $1 | mail -s \"Level Alarm\" your.name@domain.edu
odbedit -c 'msg 2 level_alarm \"Alarm was sent to your.name@domain.edu\"'

The second command just generates a midas system message for confirmation. Most 
cell phones (depends on the provider) have an email address. If you send an email 
there, it gets translated into a SMS message.

The script file above can of course be more complicated. We use a perl script 
which parses an address list, so everyone can register by adding his/her email 
address to that list. The script collects also some other slow control variables 
(like pressure, temperature) and combines this into the SMS message.

For very sensitive systems, having an alarm via SMS is not everything, since the 
alarm system could be down (computer crash or whatever). In this case we use 
'negative alarms' or however you might call it. The system sends every 30 minutes 
an SMS with the current levels etc. If the SMS is missing for some time, it might 
be an indication that something in the midas system is wrong and one can go there 
and investigate.
  536   01 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I also changed ss_file_size(), ss_disk_size() and ss_disk_free() to return -1 if
> the system call returns an error. I also added a test program
> utils/test_ss_file_size.c.

The test program gave under 64-bit SL5:

For [(null)], file size: -1, disk size: -0.001, disk free -0.001
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/ls -ld (null)'
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `/bin/df -k (null)'

Anyhow I guess that this test program just accidentally slipped into the repository.
Test programs for the developers should not be in the repository since they are of
not much use for the average user. If I would have added every test I made as an
individual test program, we would by now have tons of test programs making the whole
distribution pretty bulky, which nobody would know how to use now. So I removed the
test program again. If people do not agree, I suggest to make a central "main" test
program which combines all tests. I know there are also some C structure alignment
tests etc., which then could all be combined into a single, well documented, test
program.
  538   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux
> I now fixed this problem by using the stat64() system call for "#ifdef OS_LINUX".

That does not work if _LARGEFILE64_SOURCE is not defined. In that case, the compiler 
complains that stat64 is undefined. Since many Makefiles for front-ends out there do 
not have _LARGEFILE64_SOURCE defined, I changed system.c so that stat64 is only used 
if that flag is defined:

#ifdef _LARGEFILE64_SOURE
   struct stat64 stat_buf;
   int status;

   /* allocate buffer with file size */
   status = stat64(path, &stat_buf);
   if (status != 0)
      return -1;
   return (double) stat_buf.st_size;
#else
   ...
  540   02 Dec 2008 Stefan RittBug FixFix ss_file_size() on 32-bit Linux

K.O. wrote:
This does not work (observe the typoe in the #ifdef).


Sorry for that, I fixed and committed it.


K.O. wrote:
But you cannot know this because you already deleted the test program I wrote and committed to svn exactly to detect and prevent this kind of breakage (+ plus to give the Solaris, BSD and other wierdo users some way to check that ss_file_size() works on their systems)..


Well, you figured it out even without the test program in the distribution! But I'm sure no other user would have known how to use your test program to diagnose this problem. So 99% of the users would scratch their head about this undocumented program and get confused. I believe we two are responsible that the midas kernel functions work correctly and the average user should not have to bother with it. I agree that it's handy for you to have this little test program in the distribution, so you can run it everywhere you install midas. But for me it would be handy to have files with, let's say, nature's constants, particle decay life times, list of ASCII codes, and so on. But it would clutter up the distribution and the disadvantage of annoying users would be bigger than my personal benefit, so I don't do it.

If you absolutely want to keep a certain test functionality, you can add it into a "central" test program, write some help and documentation for it, educate users how to use it and how to report any errors back to you. Maybe some printout like "all tests ok" and some specific comment if a test fails would be helpful for the normal user. This test program could then also contain other tests like C structure alignment (which sometimes is a problem), some mutex tests and whatever we collected along the road. An alternative would be to add this into a "test" command inside odbedit.
  542   14 Dec 2008 Stefan RittInfoCustom page which executes custom function
> How can I add a button at the top of the "Status" webpage which will show a 
> page similar to the "CNAF" one after I click on it? and how can I make a 
> custom page similar to "CNAF" which allow me to call some custom funtions? I 
> want to make a page which is particularly for doing calibration.

The CNAF page calls directly functions through the RPC layer of midas, which is 
not possible from custom pages. All you can do is to execute a scrip on the 
server side, which then causes some action. For details please consult the 
documentation.
  545   22 Dec 2008 Stefan RittBug ReportOverflow on "cm_msg" command generates segfault
> The following error has been reported to me by T2K colleagues:
> 
> When using  "odbedit -c "msg my_message", the following behavior 
> has been observed depending on the length "n" of the message. 
> 
> 1)  n < 100        All is well
> 2)  100 <= n < 245 Log not written but exit code = 0
> 3)  245 <= n < 280 Error: "Experiment not defined" and exit code = 1
> 4)  280 <= n       Error: "Cannot connect to remote host" and exit code = 1
> 
> Also, when logging from compiled C code - when messages reach some magic length
> the MIDAS client sending them segfaults.
> 
> Please fix

Uhhh, who wants this long messages? You should consider to split this into several 
smaller messages. Anyhow, having the above behavior is not good, so I fixed it in 
SVN revision 4422. I increased the maximum length to 1000 characters. Above that, 
the message gets truncated. If you need even more, we can make it a #define.

The second problem you describe (logging from compiled C code) I could not 
reproduce, so maybe it was related to the first one. Please try again and report 
if it persists.
  549   13 Jan 2009 Stefan RittForummlogger problem
> Hi,
> 
> I am running Scientific Linux with kernel 2.6.9-34.EL and  I have
> glibc-2.3.4-2.25. When I run mlogger, I receive the error:
> 
> *** glibc detected *** free(): invalid pointer: 0x0073e93e ***
> Aborted
> 
> Any ideas?

Not much. Try to clean up the ODB (delete the .ODB.SHM file, remove all shared 
memory via ipcrm) and run again. I run under kernel 2.6.18 and glibc 2.5 and this 
problem does not occur. If you cannot fix it, try to run mlogger inside gdb and 
make a stack trace to see who called the free().
  551   13 Jan 2009 Stefan RittForummlogger problem
> Sorry for being vague. I cleaned up the ODB, but it doesn't seem to be the
> problem. Here is a sample run of mlogger and gdb:

Thanks for the info, that explained the problem. It is related to the lines

rargv[rargc] = (char *)malloc(3);
rargv[rargc++] = "-b";

where one first allocates some memory (3 bytes), but then overwrites the pointer with 
another pointer to some static memory ("-b"). The following

free(rargv[1]);

then tries to free the static memory which fails.

The problem was already fixed some time ago, so please update your version from the SVN 
revision (see https://midas.psi.ch/download.html for details).
  552   13 Jan 2009 Stefan RittInfoCustom page which executes custom function
The UDP connection you mention is only used locally for inter-process communication. When I implemented that, I 
made extensive tests and found that there is never a packet being dropped. This happens for UDP only if the packet 
goes over a physical network. Maybe this is different in modern Linux versions, so one should double check this 
again.

For remote hot-link notification, the notification is sent over the TCP link, so it should not be lost either. But 
your second point is correct. The hot-link mechanism was developed to change parameters in front-end programs for 
example. So by design it is guaranteed that if you change a value in the ODB, any client hot-linked to that will 
see the change (sooner or later). If there are many changes in short intervals (or the callback function on the 
remote client takes long time), only the last change is guaranteed to arrive. Therefore, as you correctly state, 
the hot-link mechanism is not a save replacement for the RPC layer (That's why the RPC layer is there after all).
  553   14 Jan 2009 Stefan RittInfoodb "hot link" magic explored

KO wrote:
note 1: I do not completely understand the ss_suspend_xxx() stuff. The best I can tell is it creates a number of udp sockets bound to the local host and at least one udp rpc receive socket ultimately connected to the cm_dispatch_rpc() function.


The ss_suspend_xxx() stuff is indeed the most complicated thing in midas an I have to remind myself always
on how this works. So let me try again:

The basic idea is that for a high performance system, you cannot do the inter-process communication via
polling. That would waste CPU time. Inter-process communication is necessary for for buffer manager
(producer notifies consumer when new events are there), for the RPC mechanism (odbedit tells mlogger to
start a run) or for ODB hot-links. To avoid polling, the inter-process communication works with sockets (UDP
and TCP). This allows to use the select() call, which suspends the calling process until some socket
receives data or a pre-defined time-out expires. This is the only portable method I found which works under
unix and windows (signals are only poorly supported under windows).

So after creating all sockets, ss_suspend() does a select() on these sockets:

_suspend_struct[idx].listen_socket Server side for any new RPC connection (each client is also a RPC server which gets contacted directly during run transitions for example
_suspend_struct[idx].server_acception.recv_sock Receive socket (TCP) for any active RPC connection
_suspend_struct[idx].server_acception.event_sock Receive socket (TCP) for bare events (bypassing RPC layer for performance reasons)
_suspend_struct[idx].server_connection->recv_sock Outgoing TCP connection to mserver. Used for example for hot-link notifications from mserver
_suspend_struct[idx].ipc_recv_socket UDP socket for inter-process notification


For each socket there is a dispatch function, which gets called if that socket receives some data. Hope this sheds some light on the guts of that.
  556   20 Jan 2009 Stefan RittInfoSubrun scheme implemented
A new "subrun" scheme has been implemented in mlogger to split a big data file into several individual data files. This feature might be helpful if a data file from a single run gets too large (>4 GB for example) and if shorter runs are not wanted for efficiency reasons. The scheme works as follows:

  • Set /Channels/x/Settings/Subrun Byte limit to the number of bytes for a subrun
  • Set /Channels/x/Settings/Filename to something like run%05d_%02d.mid. The first %05d gets replaced by the run number, while the second one gets replaced by the subrun number. This will result in files such as
    run00001_00.mid    run #1
    run00001_01.mid      "
    run00001_02.mid      "
    run00001_03.mid      "
    run00002_00.mid    run #2
    run00002_01.mid      "
    run00002_02.mid      "
    run00002_03.mid      "

Each subrun will contain an ODB dump if this is turned on via /Channels/x/Settings/ODB dump. The stopping of the "main" run (after four subruns in the above example) can be done in the usual way (event limit in the front-end, manually through odbedit, etc.).

The code has been tested in two test environments, but not yet in a real experiment. So please test it before going into production. The modification in mlogger requires SVN revision 4440 of mlogger.c and 4441 of odb.c.

Please note that the lazylogger cannot be used with this scheme at the moment since it does not recognize the subruns. That will be fixed in a future version and announced in this forum.

- Stefan
  559   25 Jan 2009 Stefan RittInfoSubrun scheme implemented

Renee Poutissou wrote:
I have tested the new subrun functionality a bit more and I have two observations. First, it seems to work on a basic level, i.e. subruns are created, which are equal in size. However, I can't relate their size to the byte limit set in the ODB.


What you describe is expected. The logger process maintains a write cache, which is 32 kB under linux and 1 MB under Windows. The size is controlled through the constant TAPE_BUFFER_SIZE defined in midas.h. The reason for this buffer is to optimize writes to disks and tapes and has been carefully optimized to give maximum performance. It means however that data gets written only in 32 kB chunks to disk. That's the reason why your run size is 32kB plus a few bytes. You can change this by modifying TAPE_BUFFER_SIZE, but be aware that this will then slow down your logging of data.
  564   24 Mar 2009 Stefan RittForumAnalyzer gets killed cm_watchdog
Hi,

your log script sound to me like the analyzer either got into an infinite loop or 
did a segment violation and just died. I would recommend to run the analyzer from 
inside the debugger. When you then get the segment violation, you can inspect the 
stack trace and see where the bad things happen. Since the analyzer works nicely in 
other experiment, I expect that your problem is related to the user code. Maybe it 
happens at the end of the run, but there is a timeout before the crashed process 
gets cleaned from the ODB, that's why you might think that it happens "between" 
runs.

Best regards,

  Stefan

> 
> Hello Midas experts:
> 
> We have setup a DAQ using MIDAS to readout two ADCs in the crate.
> We are running into problem of analyzer getting killed between 
> runs.  Sometimes it would crash after a few runs and sometimes it 
> would go on for many many runs before analyzer gets killed.  It always 
> occurred between runs not when we are taking data.  Any suggestions 
> on what we could try?  The error message from the midas.log file is 
> appended below.
> 
> Thanks,
> 
> Dawei
> 
> Wed Mar  4 11:53:11 2009 [Analyzer,ERROR] [midas.c:1739:,ERROR]
> cm_disconnect_experiment not called at end of program
> Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' on buffer 'SYSMSG'
> removed by cm_watchdog (idle 10.7s,TO 10s)
> Wed Mar  4 11:53:22 2009 [mhttpd,INFO] Client 'Analyzer' (PID 1) on buffer 'ODB'
> removed by cm_watchdog (idle 10.7s,TO 10s)
> Wed Mar  4 11:53:22 2009 [AL Experiment Frontend,INFO] Client 'Analyzer' on
> buffer 'SYSTEM' removed by cm_watchdog (idle 10.9s,TO 10s)
> Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,TALK] starting new run
> Wed Mar  4 11:53:29 2009 [AL Experiment Frontend,ERROR]
> [midas.c:8264:rpc_client_check,ERROR] Connection broken to "Analyzer" on host
> tsunami
  567   06 May 2009 Stefan RittForumMIDAS mhttpd custom page questions
> 1) I display the status of the run with <odb src="/Runinfo/State">, but it 
> returns numbers which is not user friendly. How can I make something 
> like "Running" with green background and "Stopped" with red background in the 
> default status page?

Sorry my late reply, I was really busy. You need JavaScript to perform such a 
task. See the attached example.

> 2) When I click either Start/Stop/Pause/Resume, it can performs the right 
> things, but afterward it jumps to the page "http://domain.name:8081/CS/" 
> which shows "Invalid custom page: NULL path". How can I make it returns 
> to the correct page "http://domain.name:8081/CS/Control%20panel"?

You add a hidden redirect statement:

  <input type=hidden name=redir value="CS/Control panel">

Best regards,

  Stefan
Attachment 1: control.html
  568   06 May 2009 Stefan RittForumMIDAS mhttpd custom page questions
> I have one more question. I use <odb src="odb field" edit=1> to display an 
> editable ODB value, but how can I show this value in hexadecimal?

Again with JavaScript:

  var v = ODBGet('/some/path&format=%X');

this will retrieve /some/path and format it in hexadecimal. Then you can set a table 
cell with "v" as I wrote in the last reply. If you want to change this value 
however, you need to encode this yourself in JavaScript.

- Stefan
  580   18 May 2009 Stefan RittSuggestionQuestion about using mvmestd.h

Exaos Lee wrote:
The "mvmestd.h" uses the following function to open a VME device:
int mvme_open(MVME_INTERFACE **vme, int idx)
I found that the "driver/vme/sis3100/sis3100.c" uses the implementation as:
   /* open VME */
   sprintf(str, "/dev/sis1100_%02dremote", idx);
   (*vme)->handle = open(str, O_RDWR, 0);
   if ((*vme)->handle < 0)
      return MVME_NO_INTERFACE;
   }

The problem is: I renamed my SIS1100 devices as /dev/sis1100/xxxxx. So I have to hack the "sis3100.c".
Shall we have some smart way? Smile


In principle one could pass the device name to the user level. But I would like to keep the same code for Windows and Linux, and Windows does not need a device name. So you can either hack the file (I'm pretty sure it won't change in the next few years) or what I do is to make a symbolic link

/dev/sis1100/xxxx -> /dev/sis1100_00remote

Best regards,

Stefan
  588   04 Jun 2009 Stefan RittBug Reportodbedit bad ctrl-C
> When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal 
> enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes 
> ^H instead).
> 
> This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.
> 
> When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty 
> erase ^H".
> 
> The terminal is set into this funny state by system.c::getchar() and normal settings are never restored 
> unless the midas program calls getchar(1) at the end. If the program does not finish normally, original 
> terminal settings are never restored and the terminal is left in a funny state.
> 
> It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal 
> settings automatically for us.
> K.O.

Who uses bash ??? And who keeps baning on Ctrl-C, when there is a nice "exit" command ;-)

Well, I implemented a simple CTRL-C handler in odbedit (Rev. 4503) which resets the terminal before exiting. 
Give it a try. Of course this cannot catch a hard kill (-9), but CTRL-C works now correctly under bash at 
least.
  589   04 Jun 2009 Stefan RittInfoRPC.SHM gyration
> Right now, MIDAS does not have an abstraction for "local multi-thread mutex" (i.e. pthread_mutex & co) and mostly uses global semaphores 
> for this task (with interesting coding results, i.e. for multithreaded locking of ODB). Perhaps such an abstraction should be introduced?

Yes. In the old days when I designed the inter-process communication (~1993), there was no such thing like pthread_mutex (only under Windows). 
Now it would be time to implement this thing, since it then will work under Posix and Windows (don't know about VxWorks). But that will at least 
allow multi-threaded client applications, which can safely call midas functions through the RPC layer. For local thread-safeness, all midas 
functions have to be checked an modified if necessary, which is a major work right now, but for remote clients it's rather simple.
  591   05 Jun 2009 Stefan RittBug Reportmhttpd command line experiment specifying
> Not sure how the rest of you specify mhttpd to work with multiple experiments on
> one machine, but it would seem not the same as me ;-)

Please note that there has been a change concerning multiple experiments inside 
mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment, 
and the experiment name in the URL (aka ?exp=name) is not supported any more. So if 
you have several experiments, you start several instances of mhttpd now on 
different ports.

> that experiment name is not transfered to transitions as cm_transition never
> specifies the experiment in the call to "transition STOP" etc.
> the only flag it sends is a -d for debug if selected.

When connecting to an experiment, any midas client uses the ODB from that 
experiment so lives in that "namespace". So one client can never call any client 
from another experiment. So your problem must be something else. Of course there is 
not parameter "experiment" passed to cm_transition() since the experiment is 
implicitly defined by the ODB mhttpd is attached to.

> The result is that the stop and start button of the webinterface does not work,
> and transitions sit endlessly doing nothing but consuming all the processor,
> odbedit works fine though.

I guess you have to do some debugging there. Note that "detached" transitions have 
been implemented recently by Konstantin, so maybe your problem is related to that. 
In this case Konstantin should check what's wrong.

> Does everyone else use an apache reverse proxy and or explicit experiment choice
> in the url ?

I use a

ProxyPass /megon/ http://megon.psi.ch/

on our public web server to make an online machine accessible from outside the 
firewall, but just with a single experiment.

> As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
> should be a -e. line 13378.

Fixed in revision 4504.
  599   25 Jun 2009 Stefan RittBug ReportTR_STARTABORT transition, mlogger duplicate event problem
> Stefan suggested implementing a new transition, TR_STARTABORT, issued if TR_START fails. mlogger can use it to cleanup open files, etc, similar to TR_STOP.
> 
> This is now implemented. In mlogger, TR_STARTABORT is similar to TR_STOP, but deletes open output files and does not save end-of-run information into databases, etc. mfe.c does not handle this trnasition yet, but I 
> plan to add it - to fix the observed situations where the run failed to start, but some equipment does not know about it and continues to generate events and send data.
> 
> svn rev 4514
> K.O.

There is one problem with the TR_STARTABORT: If you combine old and new clients they will crash, since the old clients don't know anything about TR_STARTABORT. The way to prevent this is to increase the Midas version from 
2.0.0 to 2.1.0. Then you will get a warning if you mix clients. Please test this and commit the change if it works.
ELOG V3.1.4-2e1708b5