Back Midas Rome Roody Rootana
  Midas DAQ System, Page 96 of 143  Not logged in ELOG logo
ID Datedown Author Topic Subject
  949   16 Jan 2014 Konstantin OlchanskiInfoMIDAS and "international characters", UTF-8 and Unicode.
I made some tests of MIDAS support for "international characters" and we seem to be in a reasonable 
shape.

The standard standard is UTF-8 encoding of Unicode and the MIDAS core is believed to be UTF-8 clean - 
one can use "international characters" in ODB names, in ODB values, in filenames, etc.

The web interface had some problems with percent-encoding of ODB URLs, but as of current git version, 
everything seems to work okey, as long as the web browser is in the UTF-8 encoding mode. The default 
mode is "Western ISO-8859-1" and javascript encodeURIComponent() is mangling some stuff making the 
ODB editor not work. Switching to UTF-8 mode seems to fix that.

Perhaps we should make the UTF-8 encoding the default for mhttpd-generated web pages. This should be 
okey for TRIUMF - we use English language almost exclusively, but need to check with other labs before 
making such a change. I especially worry about PSI because I am not sure if and how they any of the special 
German-language characters.

On the minus side, odbedit does not seem to accept non-English characters at all. Maybe it is easy to fix.

K.O.
  948   15 Jan 2014 Konstantin OlchanskiBug FixFixed spurious symlinks to midas.log
In some experiments (i.e. DEAP), we see spurious symlinks to midas.log scattered just about everywhere. I 
now traced this to an uninitialized variable in cm_msg_log() and it should be fixed now. K.O.
  947   15 Jan 2014 Konstantin OlchanskiBug ReportMIDAS password protection is broken
> I through to improve this by fixing a bug in cm_msg_log() (where the messages are coming from)

The periodic messages about broken semaphore actually come from al_check(). I put some whining there, too.

K.O.
  946   15 Jan 2014 Konstantin OlchanskiBug ReportMIDAS Web password broken
The MIDAS Web password function is broken - with the web password enabled, I am not prompted for a 
password when editing ODB. The password still partially works - I am prompted for the web password 
when starting a run. K.O.

P.S. https://midas.triumf.ca/MidasWiki/index.php/Security says "web password" needed for "write access", 
but does not specify if this includes editing odb. (I would think so, and I think I remember that it used to).
  945   15 Jan 2014 Konstantin OlchanskiBug ReportMIDAS password protection is broken
If you follow the MIDAS documentation for setting up password protection, you will get strange messages:

ladd00:midas$ ./linux/bin/odbedit
[local:testexpt:S]/>passwd                <---- setup a password
Password: 
Retype password: 
[local:testexpt:S]/> exit

ladd00:midas$ odbedit
Password:    <---- enter correct password here
ss_semaphore_wait_for: semop/semtimedop(21135376) returned -1, errno 22 (Invalid argument)
ss_semaphore_release: semop/semtimedop(21135376) returned -1, errno 22 (Invalid argument)
[local:testexpt:S]/>ss_semaphore_wait_for: semop/semtimedop(21037069) returned -1, errno 43 (Identifier removed)

The same messages will appear from all other programs - mhttpd, etc. They will be printed about every 1 second.

So what do they mean? They mean what they say - the semaphore is not there, it is easy to check using "ipcs" that semaphores with 
those ids do not exist. In fact all the semaphores are missing (the ODB semaphore is eventually recreated, so at least ODB works 
correctly).

In this situation, MIDAS will not work correctly.

What is happening?

- cm_connect_experiment1() creates all the semaphores and remembers them in cm_set_experiment_semaphore()
- calls cm_set_client_info()
- cm_set_client_info() finds ODB /expt/sec/password, and returns CM_WRONG_PASSWORD
- before returning, it calls db_close_all_databases() and bm_close_all_buffers(), which delete all semaphores (put a print statement in 
ss_semaphore_delete() to see this).
- (values saved by cm_set_experiment_semaphore() are stale now).
- (if by luck you have other midas programs still running, the semaphores will not be deleted)
- we are back to cm_connect_experiment1() which will ask for the password, call cm_set_client_info() again and continue as usual
- it will reopen ODB, recreating the ODB semaphore
- (but all the other semaphores are still deleted and values saved by cm_set_experiment_semaphore() are stale)

I through to improve this by fixing a bug in cm_msg_log() (where the messages are coming from) - it tries to lock the "MSG" 
semaphore, but even if it could not lock it, it continues as usual and even calls an unlock at the end. (very bad). For catastrophic 
locking failures like this (semaphore is deleted), we usually abort. But if I abort here, I get completely locked out from odb - odbedit 
crashes right away and there is no way to do any corrective action other than delete odb and reload it from an xml file.

I know that some experiments use this password protection - why/how does it work there?

I think they are okey because they put critical programs like odbedit, mserver, mlogger and mhttpd into "/expt/sec/allowed 
programs". In this case the pass the password check in cm_set_client_info() and the semaphores are not deleted. If any subsequent 
program asks for the password, the semaphores survive because mlogger or mhttpd is already running and keeps semaphores from 
being deleted.

What a mess.

K.O.
  944   17 Dec 2013 Stefan RittInfoIEEE Real Time 2014 Call for Abstracts
Hello,

I'm co-organizing the upcoming Real Time Conference, which covers also the field of data acquisition, so it might be interesting for people working 
with MIDAS. If you have something to report, you could also consider to send an abstract to this conference. It will be located in Nara, Japan. The conference
site is now open at http://rt2014.rcnp.osaka-u.ac.jp/

Best regards,
Stefan Ritt
  943   16 Dec 2013 Konstantin OlchanskiBug FixAbolished SYNC and ASYNC defines
A few months ago, definitions of SYNC and ASYNC in midas.h have been changed away from "0" and "1", 
and this caused problems with some event buffer management functions bm_xxx().

For example, when event buffers are getting full, bm_send_event(SYNC) unexpectedly started returning 
BM_ASYNC_RETURN instead of waiting for free space, causing unexpected crashes of frontend programs.

Part of the problem was confusion between SYNC/ASYNC used by buffer management (bm_xxx) and by run 
transition (cm_transition()) functions. Adding to confusion, documentation of bm_send_event() & co used 
FALSE/TRUE while most actual calls used SYNC/ASYNC.

To sort this out, an executive decision was made to abolish the SYNC/ASYNC defines:

For buffer management calls bm_send_event(), bm_receive_event(), etc, please use:
SYNC -> BM_WAIT
ASYNC -> BM_NO_WAIT

For run transitions, please use:
SYNC -> TR_SYNC
ASYNC -> TR_ASYNC
MTHREAD -> TR_MTHREAD
DETACH -> TR_DETACH

K.O.
  942   16 Dec 2013 Konstantin OlchanskiInfoMIDAS on ARM
I added MIDAS Makefile rules for building ARM binaries: "make linuxarm" and "make cleanarm" will create 
(and clean) object files, libraries and executables under "linux-arm" using the TI Sitara ARM SDK or the 
Yocto SDK ARM cross-compilers (GCC 4.7.x and 4.8.x respectively). (Makefile rules for building PPC 
binaries have existed for years).

The hardware we have at TRIUMF are "ARMv7" machines - TI Sitara 335x CPUs (google mityarm) and Altera 
Cyclone 5 FPGA ARM (google sockit). (as opposed to the ARMv5 CPU on the RaspberryPi). The software 
binary API standard settled by Fedora Linux is "hard float" (as opposed to "soft float" used by older SDKs).

So "ARMv7 hard float" is what we intend to use at TRIUMF, but ARMv5 and soft-float should also work ok, 
so please report successes and/or problems to this forum.

K.O.
  941   28 Nov 2013 Konstantin OlchanskiInfoAudit of fixed size arrays
In one of the experiments, we hit a long time bug in mdump - there was an array of 32 equipments and if 
there were more than 32 entries under /equipment, it would overrun and corrupt memory. Somehow this 
only showed up after mdump was switched to c++. The solution was to use std::vector instead of fixed 
size array.

Just in case, I checked other midas programs for fixed size arrays (other than fixed size strings) and found 
none. (in midas.c, there is a fixed size array of TR_FIFO[10], but code inspection shows that it cannot 
overrun).

I used this script. It can be modified to also identify any strange sized string arrays.

K.O.

#!/usr/bin/perl -w

while (1) {
  my $in = <STDIN>;
  last unless $in;
  #print $in;

  $in =~ s/^\s+//;

  next if $in =~ /^char/;
  next if $in =~ /^static char/;

  my $a = $in =~ /(.*)[(\d+)\]/;

  next unless $a;

  my $a1 = $1;
  my $a2 = $2;

  next if $a2 == 0;
  next if $a2 == 1;
  next if $a2 == 2;
  next if $a2 == 3;

  #print "[$a] [$a1] [$a2]\n";
  print "-> $a1[$a2]\n";
}

# end
  940   21 Nov 2013 Stefan RittBug ReportToo many bm_flush_cache() in mfe.c
> And I think that works just fine for frontends directly connected to the shared memory, one call to 
> bm_flush_buffer() should be sufficient.

That's correct. What you want is once per second or so for polled events, and once per periodic event (which anyhow will typically come only every 10 seconds or so). If there are 3 calls 
per event, this is certainly too much.


> But for remote fronends connected through the mserver, it turns out there is a race condition between 
> sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another 
> tcp connection.
> 
> ...
> 
> One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to 
> bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).
> 
> Another solution could be to send events with a special flag telling the mserver to "flush the buffer right 
> away".

That's a very good and useful observation. I never really thought about that. 

Looking at your proposed solutions, I prefer the second one. mserver is just an interface for RPC calls, it should not do anything "by itself". This was a strategic decision at the beginning. 
So sending a flag to punch through the cache on mserver seems to me has less side effects. Will just break binary compatibility :-)

/Stefan
  939   20 Nov 2013 Konstantin OlchanskiBug ReportToo many bm_flush_cache() in mfe.c
I was looking at something in the mserver and noticed that for remote frontends, for every periodic event, 
there are about 3 RPC calls to bm_flush_cache().

Sure enough, in mfe.c::send_event(), for every event sent, there are 2 calls to bm_flush_cache() (once for 
the buffer we used, second for all buffers). Then, for a good measure, the mfe idle loop calls 
bm_flush_cache() for all buffers about once per second (even if no events were generated).

So what is going on here? To allow good performance when processing many small events,
the MIDAS event buffer code (bm_send_event()) buffers small events internally, and only after this internal
buffer is full, the accumulated events are flushed into the shared memory event buffer,
where they become visible to the mlogger, mdump and other consumers.

Because of this internal buffering, infrequent small size periodic events can become
stuck for quite a long time, confusing the user: "my frontend is sending events, how come I do not
see them in mdump?"

To avoid this, mfe.c manually flushes these internal event buffers by calling bm_flush_buffer().

And I think that works just fine for frontends directly connected to the shared memory, one call to 
bm_flush_buffer() should be sufficient.

But for remote fronends connected through the mserver, it turns out there is a race condition between 
sending the event data on one tcp connection and sending the bm_flush_cache() rpc request on another 
tcp connection.

I see that the mserver always reads the rpc connection before the event connection, so bm_flush_cache() 
is done *before* the event is written into the buffer by bm_send_event(). So the newly
send event is stuck in the buffer until bm_flush_cache() for the *next* event shows up:

mfe.c: send_event1 -> flush -> ... wait until next event ... -> send_event2 -> flush
mserver: flush -> receive_event1 -> ... wait ... -> flush -> receive_event2 -> ... wait ...
mdump -> ... nothing ... -> ... nothing ... -> event1 -> ... nothing ...

Enter the 2nd call to bm_flush_cache in mfe.c (flush all buffers) - now because mserver seems to be 
alternating between reading the rpc connection and the event connection, the race condition looks like 
this:

mfe.c: send_event -> flush -> flush
mserver: flush -> receive_event -> flush
mdump: ... -> event -> ...

So in this configuration, everything works correctly, the data is not stuck anywhere - but by accident, and 
at the price of an extra rpc call.

But what about the periodic 1/second bm_flush_cache() on all buffers? I think it does not quite work
either because the race condition is still there: we send an event, and the first flush may race it and only 
the 2nd flush gets the job done, so the delay between sending the event and seeing it in mdump would be 
around 1-2 seconds. (no more than 2 seconds, I think). Since users expect their events to show up "right
away", a 2 second delay is probably not very good.

Because periodic events are usually not high rate, the current situation (4 network transactions to send 1 
event - 1x send event, 3x flush buffer) is probably acceptable. But this definitely sets a limit on the 
maximum rate to 3x (2x?) the mserver rpc latency - without the rpc calls to bm_flush_buffer() there
would be no limit - the events themselves are sent through a pipelined tcp connection without 
handshaking.

One solution to this would be to implement periodic bm_flush_buffer() in the mserver, making all calls to 
bm_flush_buffer() in mfe.c unnecessary (unless it's a direct connection to shared memory).

Another solution could be to send events with a special flag telling the mserver to "flush the buffer right 
away".

P.S. Look ma!!! A race condition with no threads!!!

K.O.
  938   15 Nov 2013 Konstantin OlchanskiBug Reportstuck data buffers
We have seen several times a problem with stuck data buffers. The symptoms are very confusing - 
frontends cannot start, instead hang forever in a state very hard to kill. Also "mdump -s -d -z 
BUF03" for the affected data buffers is stuck.

We have identified the source of this problem - the semaphore for the buffer is locked and nobody 
will ever unlock it - MIDAS relies on a feature of SYSV semaphores where they are automatically 
unlocked by the OS and cannot ever be stuck ever. (see man semop, SEM_UNDO function).

I think this SEM_UNDO function is broken in recent Linux kernels and sometimes the semaphore 
remains locked after the process that locked it has died. MIDAS is not programmed to deal with this 
situation and the stuck semaphore has to be cleared manually.

Here, "BUF3" is used as example, but we have seen "SYSTEM" and ODB with stuck semaphores, too.

Steps:
a) confirm that we are using SYSV semaphores: "ipcs" should show many semaphores
b) identify the stuck semaphore: "strace mdump -s -d -z BUF03".
c) here will be a large printout, but ultimately you will see repeated entries of 
"semtimedop(9633800, {{0, -1, SEM_UNDO}}, 1, {1, 0}^C <unfinished ...>"
d) erase the stuck semaphore "ipcrm -s 9633800", where the number comes from semtimedop() in 
the strace output.
e) try again: "mdump -s -d -z BUF03" should work now.

Ultimately, I think we should switch to POSIX semaphores - they are easier to manage (the strace 
and ipcrm dance becomes "rm /dev/shm/deap_BUF03.sem" - but they do not have the SEM_UNDO 
function, so detection of locked and stuck semaphores will have to be done by MIDAS. (Unless we 
can find some library of semaphore functions that already provides such advanced functionality).

K.O.
  937   14 Nov 2013 Konstantin OlchanskiBug ReportMacOS10.9 strlcpy() problem
On MacOS 10.9 MIDAS will crashes in strlcpy() somewhere inside odb.c. We think this is because strlcpy() 
in MacOS 10.9 was changed to abort() if input and output strings overlap. For overlapping memory one is 
supposed to use memmove(). This is fixed in current midas, for older versions, you can try this patch:

konstantin-olchanskis-macbook:midas olchansk$ git diff
diff --git a/src/odb.c b/src/odb.c
index 1589dfa..762e2ed 100755
--- a/src/odb.c
+++ b/src/odb.c
@@ -6122,7 +6122,10 @@ INT db_paste(HNDLE hDB, HNDLE hKeyRoot, const char *buffer)
                   pc++;
                while ((*pc == ' ' || *pc == ':') && *pc)
                   pc++;
-               strlcpy(data_str, pc, sizeof(data_str));
+
+               //strlcpy(data_str, pc, sizeof(data_str)); // MacOS 10.9 does not permit strlcpy() of overlapping 
strings
+               assert(strlen(pc) < sizeof(data_str)); // "pc" points at a substring inside "data_str"
+               memmove(data_str, pc, strlen(pc)+1);
 
                if (n_data > 1) {
                   data_str[0] = 0;
konstantin-olchanskis-macbook:midas olchansk$ 


As historical reference:

a) MacOS documentation says "behavior is undefined", which is no longer true, the behaviour is KABOOM!
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/strlcpy.3.h
tml

b) the original strlcpy paper from OpenBSD does not contain the word "overlap" 
http://www.courtesan.com/todd/papers/strlcpy.html

c) the OpenBSD man page says the same as Apple man page (behaviour undefined)
http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy

d) the linux kernel strlcpy() uses memcpy() and is probably unsafe for overlapping strings
http://lxr.free-electrons.com/source/lib/string.c#L149

e) midas strlcpy() looks to be safe for overlapping strings.

K.O.
  936   14 Nov 2013 Konstantin OlchanskiForumInstallation problem
> #include "use.h"
>  { USED int i=foo(); }

Sounds nifty, but google does not find use.h.

As for unused variables, some can be removed, others not so much, there is some code in there:

int i = blah...
#if 0
if (i=42) printf("wow, we got a 42!\n");
#endif
and
if (0) printf("debug: i=%d\n", i);

(difference is if you remove "i" or otherwise break the disabled debug code, "#if 0" will complain the next time you need that debugging code, "if (0)" will 
complain right away).

Some of this disabled debug code I would rather not remove - so much debug scaffolding I have added, removed, added again, removed again, all in the same 
places that I cannot be bothered with removing it anymore. I "#if 0" it and it stays there until I need it next time. But of course now gcc complains about it.

K.O.
  935   14 Nov 2013 Konstantin OlchanskiForumInstallation problem
# slackpkg file-search sql.h
[ installed ] - libiodbc-3.52.7-x86_64-2
...
# slackpkg search package
...
# cat /var/log/packages/libiodbc-3.52.7-x86_64-2
            usr/include/sql.h
...
            usr/lib64/libiodbc.so.2.1.19
...

Thanks, I am saving the slackpkg commands for future reference. Looks like the immediate problem is 
with the library name: libiodbc instead of libodbc. But the header file sql.h is the same.

I am not sure if it is worth making a generic solution for this: on MacOS, all ODBC functions are now 
obsoleted, to be removed, and since we are stanardized on MySQL anyway, so I think I will rewrite the SQL 
history driver to use the MySQL interface directly. Then all this ODBC extra layering will go away.

K.O.
  934   14 Nov 2013 Razvan Stefan GorneaForumInstallation problem

Hi, Thanks a lot for the response! Yes to search packages and list their content in Slackware it is pretty similar to your illustration. Slackware seems to use iODBC in which case it would link with -liodbc I guess.

root@lheppc83:~# slackpkg file-search sql.h

Looking for sql.h in package list. Please wait... DONE

The list below shows the packages that contains "sql\.h" file.

[ installed ] - libiodbc-3.52.7-x86_64-2

You can search specific packages using "slackpkg search package".

root@lheppc83:~# cat /var/log/packages/libiodbc-3.52.7-x86_64-2
PACKAGE NAME:     libiodbc-3.52.7-x86_64-2
COMPRESSED PACKAGE SIZE:     255.0K
UNCOMPRESSED PACKAGE SIZE:     1.0M
PACKAGE LOCATION: /var/log/mount/slackware64/l/libiodbc-3.52.7-x86_64-2.txz
PACKAGE DESCRIPTION:
libiodbc: libiodbc (Independent Open DataBase Connectivity)
libiodbc:
libiodbc: iODBC is the acronym for Independent Open DataBase Connectivity,
libiodbc: an Open Source platform independent implementation of both the ODBC
libiodbc: and X/Open specifications.  It allows for developing solutions
libiodbc: that are language, platform and database independent.
libiodbc:
libiodbc:
libiodbc:
libiodbc: Homepage: http://iodbc.org/
libiodbc:
FILE LIST:
./
usr/
usr/share/
usr/share/libiodbc/
usr/share/libiodbc/samples/
usr/share/libiodbc/samples/iodbctest.c
usr/share/libiodbc/samples/Makefile
usr/man/
usr/man/man1/
usr/man/man1/iodbc-config.1.gz
usr/man/man1/iodbctestw.1.gz
usr/man/man1/iodbctest.1.gz
usr/man/man1/iodbcadm-gtk.1.gz
usr/bin/
usr/bin/iodbctest
usr/bin/iodbcadm-gtk
usr/bin/iodbctestw
usr/bin/iodbc-config
usr/include/
usr/include/iodbcinst.h
usr/include/sqlext.h
usr/include/iodbcunix.h
usr/include/isqltypes.h
usr/include/sql.h
usr/include/iodbcext.h
usr/include/isql.h
usr/include/odbcinst.h
usr/include/isqlext.h
usr/include/sqlucode.h
usr/include/sqltypes.h
usr/lib64/
usr/lib64/libiodbc.la
usr/lib64/libdrvproxy.so.2.1.19
usr/lib64/libiodbcinst.la
usr/lib64/libiodbcadm.so.2.1.19
usr/lib64/libiodbcinst.so.2.1.19
usr/lib64/libiodbcadm.la
usr/lib64/pkgconfig/
usr/lib64/pkgconfig/libiodbc.pc
usr/lib64/libiodbc.so.2.1.19
usr/lib64/libdrvproxy.la
usr/doc/
usr/doc/libiodbc-3.52.7/
usr/doc/libiodbc-3.52.7/ChangeLog
usr/doc/libiodbc-3.52.7/README
usr/doc/libiodbc-3.52.7/COPYING
usr/doc/libiodbc-3.52.7/AUTHORS
usr/doc/libiodbc-3.52.7/INSTALL
install/
install/doinst.sh
install/slack-desc

  933   13 Nov 2013 Stefan RittForumInstallation problem
> got around to look at compile messages on ubuntu: in addition to "variable 'error' set but not used" we have these:
> 
> warning: ignoring return value of 'ssize_t write(int, const void*, size_t)'
> warning: ignoring return value of 'ssize_t read(int, void*, size_t)'
> warning: ignoring return value of 'int setuid(__uid_t)'
> and a few more of similar

Arghh, now it is getting even more picky. I can understand the "variable xyz set but not used" and I'm willing to remove all the variables. But checking the 
return value from every function? Well, if the disk gets full, our code will silently ignore this for write(), so maybe it's not a bad idea to add a few checks. Also 
for the read(), there could be some problem, where an explicit cm_msg() in case of an error would help.
  932   13 Nov 2013 Konstantin OlchanskiForumInstallation problem
> > I run into problems while trying to install Midas on Slackware 14.0.
> 
> Thank you for reporting this. We do not have any slackware computers so we cannot see these message usually.
> 
> 
> src/midas.c: In function 'cm_transition2':
> src/midas.c:3769:74: warning: variable 'error' set but not used [-Wunused-but-set-variable]
> 

got around to look at compile messages on ubuntu: in addition to "variable 'error' set but not used" we have these:

warning: ignoring return value of 'ssize_t write(int, const void*, size_t)'
warning: ignoring return value of 'ssize_t read(int, void*, size_t)'
warning: ignoring return value of 'int setuid(__uid_t)'
and a few more of similar

K.O.
  931   12 Nov 2013 Stefan RittForumInstallation problem
The warnings with the set but unused variables are real. While John O'Donnell proposed:

==========

somewhere I long the way I found an include file to help remove this kind of message.  try something like:

#include "use.h"
int foo () { return 3; }
int main () {
 { USED int i=foo(); }
 return 0;
}

with -Wall, and you will see the unused messages are gone.

==========

I would rather go and remove the unused variables to clean up the code a bit. Unfortunately my gcc version does 
not yet bark on that. So once I get a new version and I got plenty of spare time (....) I will consider removing all 
these variables.

/Stefan
  930   11 Nov 2013 Konstantin OlchanskiForumInstallation problem
> > I run into problems while trying to install Midas on Slackware 14.0.
> 
> b) an actual error in fal.c:
> 
> src/fal.c:131:0: warning: "EQUIPMENT_COMMON_STR" redefined [enabled by default]
> 
> c) actual error in fal.c: assignment into string constant is not permitted: char*x="aaa"; x[0]='c'; // core dump
> 
> src/fal.c:383:1: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
> 
> these are fixed by making sure all such pointers are "const char*" and the corresponding midas functions are 

the warnings in fal.c are now fixed.

K.O.
ELOG V3.1.4-2e1708b5