Back Midas Rome Roody Rootana
  Midas DAQ System, Page 122 of 136  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectup
  896   26 Jul 2013 Konstantin OlchanskiBug Reportodbedit fixed size buffer overrun
odbedit uses a fixed size buffer for ODB data. If an array in ODB is bugger than this size, 
db_get_data() correctly returns DB_TRUNCATED and there is no memory overwrite, but the following 
code for printing the data does not know about this truncation and proceeds printing memory 
values contained after the end of the fixed size data buffer - in the current case, this memory 
somehow had the contents of the shell history - this very confused the MIDAS users as they though 
that the funny printout was actually in ODB. This is in the print_key() function. If db_get_data() 
returns DB_TRUNCATED, it should allocate a buffer of bigger size. This (and the previous) bug found 
by the TIGRESS experiment. K.O.
  676   25 Nov 2009 Konstantin OlchanskiBug Reportonce in 100 years midas shared memory bug
We were debugging a strange problem in the event builder, where out of 14
fragments, two fragments were always getting serial number mismatches and the
serial numbers were not sequentially increasing (the other 12 fragments were
just fine).

Then we noticed in the event builder debug output that these 2 fragments were
getting assigned the same buffer handle number, despite having different names -
BUF09 and BUFTPC.

Then we looked at "ipcs", counted the buffers, and there are only 13 buffers for
14 frontends.

Aha, we went, maybe we have unlucky buffer names, renamed BUFTPC to BUFAAA and
everything started to work just fine.

It turns out that the MIDAS ss_shm_open() function uses "ftok" to convert buffer
names to SystemV shared memory keys. This "ftok" function promises to create
unique keys, but I guess, just not today.

Using a short test program, I confirmed that indeed we have unlucky buffer names
and ftok() returns duplicate keys, see below.

Apparently ftok() uses the low 16 bits of the file inode number, but in our
case, the files are NFS mounted and inode numbers are faked inside NFS. When I
run my test program on a different computer, I get non-duplicate keys. So I
guess we are double unlucky.

Test program:

#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>

int main(int argc, char* argv[])
{
  //key_t ftok(const char *pathname, int proj_id);
  
  int k1 = ftok("/home/t2kdaq/midas/nd280/backend/.BUF09.SHM", 'M');
  int k2 = ftok("/home/t2kdaq/midas/nd280/backend/.BUFTPC.SHM", 'M');
  int k3 = ftok("/home/t2kdaq/midas/nd280/backend/.BUFFGD.SHM", 'M');

  printf("key1: 0x%08x, key2: 0x%08x, key3: 0x%08x\n", k1, k2, k3);
  return 0;
}

[t2kfgd@t2knd280logger ~/xxx]$ g++ -o ftok -Wall ftok.cxx
[t2kfgd@t2knd280logger ~/xxx]$ ./ftok
key1: 0x4d138154, key2: 0x4d138154, key3: 0x4d138152

Also:

[t2kfgd@t2knd280logger ~/xxx]$ ls -li ...
14385492 -rw-r--r-- 1 t2kdaq t2kdaq  8405052 Nov 24 17:42
/home/t2kdaq/midas/nd280/backend/.BUF09.SHM
36077906 -rw-r--r-- 1 t2kdaq t2kdaq 67125308 Nov 26 10:19
/home/t2kdaq/midas/nd280/backend/.BUFFGD.SHM
36077908 -rw-r--r-- 1 t2kdaq t2kdaq  8405052 Nov 25 15:53
/home/t2kdaq/midas/nd280/backend/.BUFTPC.SHM

(hint: print the inode numbers in hex and compare to shm keys printed by the
program)

K.O.
  1179   17 May 2016 Konstantin OlchanskiInfoopenssl situation, MacOS 10.11 (El Capitan) openssl compilation errors
> I recently upgraded my macbook to MacOS 10.11. 
> [ and midas would not compile ]
> It seems that MacOS has now fully removed openssl ...

My read of tea leaves - the macos version of openssl was so old it was almost useless, did not support any of the modern HTTPS 
features. So to use mhttpd with https you pretty much had to install openssl from macports anyway. For macos 10.11 maybe they 
looked at upgrading to newer version, but since the openssl kerfuffle last year, there is several forks of openssl (the OpenBSD fork 
named libressl is the best, IMO), so rather than picking and choosing, they deleted the whole thing.

Now back to MIDAS.

We use the mongoose web server module and I have expected by now for them to make a move on improving HTTPS support, but no 
move happened.

Right now mongoose support OpenSSL only (I would expect the OpenBSD LibreSSL fork to work to of the box, too). Other then that, 
they have:
a) their own mickey-mouse https library (krypton) which does not support any modern cryptography (RC4 only - when RC4 is known to 
be useless).
b) an adapter library (polar) for interfacing with PolarSSL (mbedtls)

At this point I would rather abandon the implicit dependency on the system-provided openssl and have an explicit dependancy on a 
modern https crypto library.

Option (b) would work for us - 
1) add "git clone mbedtls; cd mbedtls; make" to midas build instructions
2) add polarssl_compat.c to midas git (from cessanta/polar repo)
3) retest mhttpd against ssllabs https scanner, retest against all web browsers.

The downside of this route is loss of automatic nightly updates to the https crypto library (for better or for worse).

K.O.

P.S. Because on MacOS use of openssl from macports is pretty much required, it should be moved from the "tricks" page to the 
standard midas installation instructions ("install required packages").
  480   20 May 2008 Konstantin OlchanskiBug Reportpending problems and fixes from triumf
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn:
  481   20 May 2008 Konstantin OlchanskiBug Reportpending problems and fixes from triumf
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn:

1) added variable /equiment/foo/common/PerVariableHistory breaks stuff (mostly
mhttpd). It is not clear how this problem escaped my pre-commit checks. This
per-equipment variable enables the per-variable history for the given equipment.
Local consensus is that this variable should not be in "common" and should not
be in "settings". Probably in "/history"? Or have only one variable to enable
this for all equipments at once (like we do in ALPHA).

2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
size reaches 2 GBytes. This problem could be new in SL5.1.

3) when a midas client becomes unresponsive, runs cannot be stopped using the
"stop" button in mhttpd. This is because cm_transition() loops over all attached
clients, but never removes clients that are known to be dead. Proposed fix is to
call cm_check_client() for each client before calling their rpc transition handler.

4) the discussed before fix for reading broken history files (skip bad data).

5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
present it either does not return all exiting data or crashes mhttpd. (no fix)

6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
to be corrected for "relative URL"). (no fix)

7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)

K.O.
  483   28 May 2008 Konstantin OlchanskiBug Reportpending problems and fixes from triumf
> Here is the list of known problems I am aware of and of fixes not yet committed
> to midas svn:
> 
> 1) added variable /equiment/foo/common/PerVariableHistory

corrected in svn revision 4203, read
http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log

> 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> size reaches 2 GBytes. This problem could be new in SL5.1.

(no change)

> 3) when a midas client becomes unresponsive, runs cannot be stopped using the
> "stop" button in mhttpd. This is because cm_transition() loops over all attached
> clients, but never removes clients that are known to be dead. Proposed fix is to
> call cm_check_client() for each client before calling their rpc transition handler.

Fixed in SVN revision 4198, read
http://savannah.psi.ch/viewcvs/trunk/src/midas.c?root=midas&rev=4201&sortby=rev&view=log

> 4) the discussed before fix for reading broken history files (skip bad data).

Fixed in SVN revision 4202, read https://ladd00.triumf.ca/elog/Midas/482

> 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> present it either does not return all exiting data or crashes mhttpd. (no fix)

(no change)

> 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> to be corrected for "relative URL").

Apply this patch to src/mhttpd.c

@@ -11156,10 +11190,7 @@
          sprintf(str, "SC/%s/%s", eq_name, group);
          redirect(str);
       } else {
-         strlcpy(str, path, sizeof(str));
-         if (strrchr(str, '/'))
-            strlcpy(str, strrchr(str, '/')+1, sizeof(str));
-         redirect(str);
+         redirect("./");
       }

> 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)

(no change)

K.O.
  484   29 May 2008 Konstantin OlchanskiBug Reportpending problems and fixes from triumf
> > Here is the list of known problems I am aware of and of fixes not yet committed to midas svn:
> > 1) added variable /equiment/foo/common/PerVariableHistory
> 
> corrected in svn revision 4203, read
> http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log

Was still broken - all should work in revision 4207.

> > 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> > size reaches 2 GBytes. This problem could be new in SL5.1.

It turns out that on SL5 and SL5.1 (and others?) the 32-bit version of ZLIB opens the
compressed output file without the O_LARGEFILE flag, this limits the file size to 2 GB.

Fixed by opening the file ourselves, then attach compression stream using gzdopen().

Revision 4207. (Not tested on Windows - may be broken!)

> > 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> > present it either does not return all exiting data or crashes mhttpd. (no fix)
> 
> (no change)
> 
> > 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> > to be corrected for "relative URL").
> 
> Apply this patch to src/mhttpd.c
> 
> @@ -11156,10 +11190,7 @@
>           sprintf(str, "SC/%s/%s", eq_name, group);
>           redirect(str);
>        } else {
> -         strlcpy(str, path, sizeof(str));
> -         if (strrchr(str, '/'))
> -            strlcpy(str, strrchr(str, '/')+1, sizeof(str));
> -         redirect(str);
> +         redirect("./");
>        }
> 
> > 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
> 
> (no change)
> 
> K.O.
  206   05 Apr 2005 Donald ArseneauBug Reportpointers and segfault in yb_any_file_rclose
I'm getting segfaults in yb_any_file_rclose (closing a file opened with
yb_any_file_ropen with type MIDAS).

I think there are bugs with freeing from uninitialized pointers my.pmagta,
my.pyh, and my.pylrl (which are only set when opening a YBOS file).  These
should be set to NULL in yb_any_file_ropen (case MIDAS).  Likewise, the MIDAS
format pointers my.pmp and my.pmrd should be NULLed for YBOS opens. 

It might be wise to also initialize the pointers in the "my" structure to null.

--Donald              
  207   21 Apr 2005 Konstantin OlchanskiBug Reportpointers and segfault in yb_any_file_rclose
> I'm getting segfaults in yb_any_file_rclose (closing a file opened with
> yb_any_file_ropen with type MIDAS).
> 
> I think there are bugs with freeing from uninitialized pointers my.pmagta,
> my.pyh, and my.pylrl (which are only set when opening a YBOS file).  These
> should be set to NULL in yb_any_file_ropen (case MIDAS).  Likewise, the MIDAS
> format pointers my.pmp and my.pmrd should be NULLed for YBOS opens. 
> 
> It might be wise to also initialize the pointers in the "my" structure to null.

Do you see this crash even after my fix to (another?) double free?

K.O.
  2071   13 Jan 2021 Isaac Labrie BoulayForumpoll_event() is very slow.
Hi all,

I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
There must be a way to make this faster. From what I understand, changing the poll 
time (500ms by default) won't affect the frequency of polling just the 'lam' 
period.

Any suggestions?

Thanks for your help!

Isaac

Hi,

What is the actual readout time, event size?
Do you have multiple equipment and of what type if any?

PAA
  2072   13 Jan 2021 Konstantin OlchanskiForumpoll_event() is very slow.
> 
> I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
> Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
> There must be a way to make this faster. From what I understand, changing the poll 
> time (500ms by default) won't affect the frequency of polling just the 'lam' 
> period.
> 
> Any suggestions?
> 

You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
where all this "lam" and "poll" business is removed.

At the moment, there are two example programs using the C++ TMFE frontend,
single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).

K.O.
  Draft   13 Jan 2021 Pierre-Andre AmaudruzForumpoll_event() is very slow.
> Hi all,
> 
> I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
> Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
> There must be a way to make this faster. From what I understand, changing the poll 
> time (500ms by default) won't affect the frequency of polling just the 'lam' 
> period.
> 
> Any suggestions?
> 
> Thanks for your help!
> 
> Isaac

Hi,

How many equipment do you have and of what type?
What is the measured readout time of your equipment?

As you mentioned the polling time define the maximum time you spend in the in polling call before checking other equipment and system activities. But as soon as you get a LAM during the polling loop, the event is readout. The readout time of this equipment is obviously to be considered as well.

In case you have multiple equipment, the readout time of the other equipment is to be taken in account as you wont return to your polling prior the completion of them.
  2074   13 Jan 2021 Stefan RittForumpoll_event() is very slow.
Something must be wrong on your side. If you take the example frontend under

midas/examples/experiment/frontend.cxx

and let it run to produce dummy events, you get about 90 Hz. This is because we have a

  ss_sleep(10);

in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
you get an event rate of about 500'000 Hz. So the framework is really quick.

Probably your routine which looks for a 'lam' takes really long and should be fixed.

Stefan
  2075   14 Jan 2021 Pintaudi GiorgioForumpoll_event() is very slow.
> Something must be wrong on your side. If you take the example frontend under
> 
> midas/examples/experiment/frontend.cxx
> 
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
> 
>   ss_sleep(10);
> 
> in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
> you get an event rate of about 500'000 Hz. So the framework is really quick.
> 
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
> 
> Stefan

Sorry if I am going off-topic but, because the ss_sleep function was mentioned here, I 
would like to take the chance and report an issue that I am having.

In all my slow control frontends, the CPU usage for each frontend is close to 100%. This 
means that each frontend is monopolizing a single core. When I did some profiling, I 
noticed that 99% of the time is spent inside the ss_sleep function. Now, I would expect 
that the ss_sleep function should not require any CPU usage at all or very little.

So my two questions are:
Is this a bug or a feature?
Would you able to check/reproduce this behavior or do you need additional info from my 
side?
  2076   14 Jan 2021 Isaac Labrie BoulayForumpoll_event() is very slow.
> Something must be wrong on your side. If you take the example frontend under
> 
> midas/examples/experiment/frontend.cxx
> 
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
> 
>   ss_sleep(10);
> 
> in the read_trigger_event() routine to throttle things down. If you remove that sleep, 
> you get an event rate of about 500'000 Hz. So the framework is really quick.
> 
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
> 
> Stefan

Hi Stefan,

I should mention that I was using midas/examples/Triumf/c++/fevme.cxx. I was trying to see 
the max speed so I had the 'lam' always = 1 with nothing else to add overhead in the 
poll_event(). I was getting <200 Hz. I am assuming that this is a bug. There is no 
ss_sleep() in that function.

Thanks for your quick response!

Isaac
  2077   15 Jan 2021 Isaac Labrie BoulayForumpoll_event() is very slow.
> > 
> > I'm currently trying to see if I can speed up polling in a frontend I'm testing. 
> > Currently it seems like I can't get 'lam's to happen faster than 120 times/second. 
> > There must be a way to make this faster. From what I understand, changing the poll 
> > time (500ms by default) won't affect the frequency of polling just the 'lam' 
> > period.
> > 
> > Any suggestions?
> > 
> 
> You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
> where all this "lam" and "poll" business is removed.
> 
> At the moment, there are two example programs using the C++ TMFE frontend,
> single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).
> 
> K.O.

Ok. I did not know that there was a C++ OOD frontend example in MIDAS. I'll take a look at 
it. Is there any documentation on it works?

Thanks for the support!

Isaac
  2084   08 Feb 2021 Konstantin OlchanskiForumpoll_event() is very slow.
> I should mention that I was using midas/examples/Triumf/c++/fevme.cxx

this is correct, the fevme frontend is written to do 100% CPU-busy polling.

there is several reasons for this:
- on our VME processors, we have 2 core CPUs, 1st core can poll the VME bus, 2nd core can run 
mfe.c and the ethernet transmitter.
- interrupts are expensive to use (in latency and in cpu use) because kernel handler has to call 
use handler, return back etc
- sub-millisecond sleep used to be expensive and unreliable (on 1-2GHz "core 1" and "core 2" 
CPUs running SL6 and SL7 era linux). As I understand, current linux and current 3+GHz CPUs can 
do reliable microsecond sleep.

K.O.
  434   18 Feb 2008 Konstantin OlchanskiBug Reportpotential memory corruption in odb,c:extract_key()
It looks like ODB function extract_key() will overwrite the array pointed to by "key_name" if given an odb 
path with very long names (as seems to happen when redirection explodes in the Safari web browser, via 
db_get_value(TRUE) via mhttpd "start program" button). All  callers of this function seem to provide 256 
byte strings, so the problem would not show up in normal use - only when abnormal odb paths are being 
parsed. Proposed solution is to add a "length" argument to this function. (Actually ODB path elements 
should be restricted to NAME_LENGTH (32 bytes), right?). K.O.
  444   21 Feb 2008 Konstantin OlchanskiBug Reportpotential memory corruption in odb,c:extract_key()
> It looks like ODB function extract_key() will overwrite the array pointed to by "key_name" if given an odb 
> path with very long names (as seems to happen when redirection explodes in the Safari web browser, via 
> db_get_value(TRUE) via mhttpd "start program" button). All  callers of this function seem to provide 256 
> byte strings, so the problem would not show up in normal use - only when abnormal odb paths are being 
> parsed. Proposed solution is to add a "length" argument to this function. (Actually ODB path elements 
> should be restricted to NAME_LENGTH (32 bytes), right?). K.O.

This is fixed in svn revision 4129.

K.O.
  1216   24 Oct 2016 Tim GorringeBug Reportproblem with error code DB_NO_MEMORY from db_open_record() call when establish additional hotlinks
Hi Midas forum,

I'm having a problem with odb hotlinks after increasing sub-directories in an 
odb. I now get the error code DB_NO_MEMORY after some db_open_record() calls. I 
tried 

1) increasing the parameter DEFAULT_ODB_SIZE in midas.h and make clean, make
but got the same error

2) increasing the parameter  MAX_OPEN_RECORDS in midas.h and make clean, make
but got fatal errors from odbedit and my midas FE and couldnt run anything

3) deleting my expts SHM files and starting odbedit with "odbedit -e SLAC -s 
0x1000000" to increse the odb size but got the same error?

4) I tried a different computer and got the same error code DB_NO_MEMORY

Maybe I running into some system limit that restricts the humber of open records? 
Or maybe I've not increased the correct midas parameter?

Best ,Tim.
ELOG V3.1.4-2e1708b5