ID |
Date |
Author |
Topic |
Subject |
896
|
26 Jul 2013 |
Konstantin Olchanski | Bug Report | odbedit fixed size buffer overrun |
odbedit uses a fixed size buffer for ODB data. If an array in ODB is bugger than this size,
db_get_data() correctly returns DB_TRUNCATED and there is no memory overwrite, but the following
code for printing the data does not know about this truncation and proceeds printing memory
values contained after the end of the fixed size data buffer - in the current case, this memory
somehow had the contents of the shell history - this very confused the MIDAS users as they though
that the funny printout was actually in ODB. This is in the print_key() function. If db_get_data()
returns DB_TRUNCATED, it should allocate a buffer of bigger size. This (and the previous) bug found
by the TIGRESS experiment. K.O. |
676
|
25 Nov 2009 |
Konstantin Olchanski | Bug Report | once in 100 years midas shared memory bug |
We were debugging a strange problem in the event builder, where out of 14
fragments, two fragments were always getting serial number mismatches and the
serial numbers were not sequentially increasing (the other 12 fragments were
just fine).
Then we noticed in the event builder debug output that these 2 fragments were
getting assigned the same buffer handle number, despite having different names -
BUF09 and BUFTPC.
Then we looked at "ipcs", counted the buffers, and there are only 13 buffers for
14 frontends.
Aha, we went, maybe we have unlucky buffer names, renamed BUFTPC to BUFAAA and
everything started to work just fine.
It turns out that the MIDAS ss_shm_open() function uses "ftok" to convert buffer
names to SystemV shared memory keys. This "ftok" function promises to create
unique keys, but I guess, just not today.
Using a short test program, I confirmed that indeed we have unlucky buffer names
and ftok() returns duplicate keys, see below.
Apparently ftok() uses the low 16 bits of the file inode number, but in our
case, the files are NFS mounted and inode numbers are faked inside NFS. When I
run my test program on a different computer, I get non-duplicate keys. So I
guess we are double unlucky.
Test program:
#include <stdio.h>
#include <sys/types.h>
#include <sys/ipc.h>
int main(int argc, char* argv[])
{
//key_t ftok(const char *pathname, int proj_id);
int k1 = ftok("/home/t2kdaq/midas/nd280/backend/.BUF09.SHM", 'M');
int k2 = ftok("/home/t2kdaq/midas/nd280/backend/.BUFTPC.SHM", 'M');
int k3 = ftok("/home/t2kdaq/midas/nd280/backend/.BUFFGD.SHM", 'M');
printf("key1: 0x%08x, key2: 0x%08x, key3: 0x%08x\n", k1, k2, k3);
return 0;
}
[t2kfgd@t2knd280logger ~/xxx]$ g++ -o ftok -Wall ftok.cxx
[t2kfgd@t2knd280logger ~/xxx]$ ./ftok
key1: 0x4d138154, key2: 0x4d138154, key3: 0x4d138152
Also:
[t2kfgd@t2knd280logger ~/xxx]$ ls -li ...
14385492 -rw-r--r-- 1 t2kdaq t2kdaq 8405052 Nov 24 17:42
/home/t2kdaq/midas/nd280/backend/.BUF09.SHM
36077906 -rw-r--r-- 1 t2kdaq t2kdaq 67125308 Nov 26 10:19
/home/t2kdaq/midas/nd280/backend/.BUFFGD.SHM
36077908 -rw-r--r-- 1 t2kdaq t2kdaq 8405052 Nov 25 15:53
/home/t2kdaq/midas/nd280/backend/.BUFTPC.SHM
(hint: print the inode numbers in hex and compare to shm keys printed by the
program)
K.O. |
1179
|
17 May 2016 |
Konstantin Olchanski | Info | openssl situation, MacOS 10.11 (El Capitan) openssl compilation errors |
> I recently upgraded my macbook to MacOS 10.11.
> [ and midas would not compile ]
> It seems that MacOS has now fully removed openssl ...
My read of tea leaves - the macos version of openssl was so old it was almost useless, did not support any of the modern HTTPS
features. So to use mhttpd with https you pretty much had to install openssl from macports anyway. For macos 10.11 maybe they
looked at upgrading to newer version, but since the openssl kerfuffle last year, there is several forks of openssl (the OpenBSD fork
named libressl is the best, IMO), so rather than picking and choosing, they deleted the whole thing.
Now back to MIDAS.
We use the mongoose web server module and I have expected by now for them to make a move on improving HTTPS support, but no
move happened.
Right now mongoose support OpenSSL only (I would expect the OpenBSD LibreSSL fork to work to of the box, too). Other then that,
they have:
a) their own mickey-mouse https library (krypton) which does not support any modern cryptography (RC4 only - when RC4 is known to
be useless).
b) an adapter library (polar) for interfacing with PolarSSL (mbedtls)
At this point I would rather abandon the implicit dependency on the system-provided openssl and have an explicit dependancy on a
modern https crypto library.
Option (b) would work for us -
1) add "git clone mbedtls; cd mbedtls; make" to midas build instructions
2) add polarssl_compat.c to midas git (from cessanta/polar repo)
3) retest mhttpd against ssllabs https scanner, retest against all web browsers.
The downside of this route is loss of automatic nightly updates to the https crypto library (for better or for worse).
K.O.
P.S. Because on MacOS use of openssl from macports is pretty much required, it should be moved from the "tricks" page to the
standard midas installation instructions ("install required packages"). |
480
|
20 May 2008 |
Konstantin Olchanski | Bug Report | pending problems and fixes from triumf |
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn: |
481
|
20 May 2008 |
Konstantin Olchanski | Bug Report | pending problems and fixes from triumf |
Here is the list of known problems I am aware of and of fixes not yet committed
to midas svn:
1) added variable /equiment/foo/common/PerVariableHistory breaks stuff (mostly
mhttpd). It is not clear how this problem escaped my pre-commit checks. This
per-equipment variable enables the per-variable history for the given equipment.
Local consensus is that this variable should not be in "common" and should not
be in "settings". Probably in "/history"? Or have only one variable to enable
this for all equipments at once (like we do in ALPHA).
2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
size reaches 2 GBytes. This problem could be new in SL5.1.
3) when a midas client becomes unresponsive, runs cannot be stopped using the
"stop" button in mhttpd. This is because cm_transition() loops over all attached
clients, but never removes clients that are known to be dead. Proposed fix is to
call cm_check_client() for each client before calling their rpc transition handler.
4) the discussed before fix for reading broken history files (skip bad data).
5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
present it either does not return all exiting data or crashes mhttpd. (no fix)
6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
to be corrected for "relative URL"). (no fix)
7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
K.O. |
483
|
28 May 2008 |
Konstantin Olchanski | Bug Report | pending problems and fixes from triumf |
> Here is the list of known problems I am aware of and of fixes not yet committed
> to midas svn:
>
> 1) added variable /equiment/foo/common/PerVariableHistory
corrected in svn revision 4203, read
http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log
> 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> size reaches 2 GBytes. This problem could be new in SL5.1.
(no change)
> 3) when a midas client becomes unresponsive, runs cannot be stopped using the
> "stop" button in mhttpd. This is because cm_transition() loops over all attached
> clients, but never removes clients that are known to be dead. Proposed fix is to
> call cm_check_client() for each client before calling their rpc transition handler.
Fixed in SVN revision 4198, read
http://savannah.psi.ch/viewcvs/trunk/src/midas.c?root=midas&rev=4201&sortby=rev&view=log
> 4) the discussed before fix for reading broken history files (skip bad data).
Fixed in SVN revision 4202, read https://ladd00.triumf.ca/elog/Midas/482
> 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> present it either does not return all exiting data or crashes mhttpd. (no fix)
(no change)
> 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> to be corrected for "relative URL").
Apply this patch to src/mhttpd.c
@@ -11156,10 +11190,7 @@
sprintf(str, "SC/%s/%s", eq_name, group);
redirect(str);
} else {
- strlcpy(str, path, sizeof(str));
- if (strrchr(str, '/'))
- strlcpy(str, strrchr(str, '/')+1, sizeof(str));
- redirect(str);
+ redirect("./");
}
> 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
(no change)
K.O. |
484
|
29 May 2008 |
Konstantin Olchanski | Bug Report | pending problems and fixes from triumf |
> > Here is the list of known problems I am aware of and of fixes not yet committed to midas svn:
> > 1) added variable /equiment/foo/common/PerVariableHistory
>
> corrected in svn revision 4203, read
> http://savannah.psi.ch/viewcvs/trunk/src/mlogger.c?root=midas&rev=4203&sortby=rev&view=log
Was still broken - all should work in revision 4207.
> > 2) writing compressed midas files (foo.mid.gz) crashes the mlogger when file
> > size reaches 2 GBytes. This problem could be new in SL5.1.
It turns out that on SL5 and SL5.1 (and others?) the 32-bit version of ZLIB opens the
compressed output file without the O_LARGEFILE flag, this limits the file size to 2 GB.
Fixed by opening the file ourselves, then attach compression stream using gzdopen().
Revision 4207. (Not tested on Windows - may be broken!)
> > 5) mhttpd history "export" button needs to be fixed (by request from ALPHA). At
> > present it either does not return all exiting data or crashes mhttpd. (no fix)
>
> (no change)
>
> > 6) mhttpd ODB editor in "set value" page, the "cancel" button is broken (needs
> > to be corrected for "relative URL").
>
> Apply this patch to src/mhttpd.c
>
> @@ -11156,10 +11190,7 @@
> sprintf(str, "SC/%s/%s", eq_name, group);
> redirect(str);
> } else {
> - strlcpy(str, path, sizeof(str));
> - if (strrchr(str, '/'))
> - strlcpy(str, strrchr(str, '/')+1, sizeof(str));
> - redirect(str);
> + redirect("./");
> }
>
> > 7) mhttpd needs AJAX-style methods for reading and writing ODB. (no fix)
>
> (no change)
>
> K.O. |
206
|
05 Apr 2005 |
Donald Arseneau | Bug Report | pointers and segfault in yb_any_file_rclose |
I'm getting segfaults in yb_any_file_rclose (closing a file opened with
yb_any_file_ropen with type MIDAS).
I think there are bugs with freeing from uninitialized pointers my.pmagta,
my.pyh, and my.pylrl (which are only set when opening a YBOS file). These
should be set to NULL in yb_any_file_ropen (case MIDAS). Likewise, the MIDAS
format pointers my.pmp and my.pmrd should be NULLed for YBOS opens.
It might be wise to also initialize the pointers in the "my" structure to null.
--Donald |
207
|
21 Apr 2005 |
Konstantin Olchanski | Bug Report | pointers and segfault in yb_any_file_rclose |
> I'm getting segfaults in yb_any_file_rclose (closing a file opened with
> yb_any_file_ropen with type MIDAS).
>
> I think there are bugs with freeing from uninitialized pointers my.pmagta,
> my.pyh, and my.pylrl (which are only set when opening a YBOS file). These
> should be set to NULL in yb_any_file_ropen (case MIDAS). Likewise, the MIDAS
> format pointers my.pmp and my.pmrd should be NULLed for YBOS opens.
>
> It might be wise to also initialize the pointers in the "my" structure to null.
Do you see this crash even after my fix to (another?) double free?
K.O. |
2071
|
13 Jan 2021 |
Isaac Labrie Boulay | Forum | poll_event() is very slow. |
Hi all,
I'm currently trying to see if I can speed up polling in a frontend I'm testing.
Currently it seems like I can't get 'lam's to happen faster than 120 times/second.
There must be a way to make this faster. From what I understand, changing the poll
time (500ms by default) won't affect the frequency of polling just the 'lam'
period.
Any suggestions?
Thanks for your help!
Isaac
Hi,
What is the actual readout time, event size?
Do you have multiple equipment and of what type if any?
PAA |
2072
|
13 Jan 2021 |
Konstantin Olchanski | Forum | poll_event() is very slow. |
>
> I'm currently trying to see if I can speed up polling in a frontend I'm testing.
> Currently it seems like I can't get 'lam's to happen faster than 120 times/second.
> There must be a way to make this faster. From what I understand, changing the poll
> time (500ms by default) won't affect the frequency of polling just the 'lam'
> period.
>
> Any suggestions?
>
You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
where all this "lam" and "poll" business is removed.
At the moment, there are two example programs using the C++ TMFE frontend,
single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).
K.O. |
Draft
|
13 Jan 2021 |
Pierre-Andre Amaudruz | Forum | poll_event() is very slow. |
> Hi all,
>
> I'm currently trying to see if I can speed up polling in a frontend I'm testing.
> Currently it seems like I can't get 'lam's to happen faster than 120 times/second.
> There must be a way to make this faster. From what I understand, changing the poll
> time (500ms by default) won't affect the frequency of polling just the 'lam'
> period.
>
> Any suggestions?
>
> Thanks for your help!
>
> Isaac
Hi,
How many equipment do you have and of what type?
What is the measured readout time of your equipment?
As you mentioned the polling time define the maximum time you spend in the in polling call before checking other equipment and system activities. But as soon as you get a LAM during the polling loop, the event is readout. The readout time of this equipment is obviously to be considered as well.
In case you have multiple equipment, the readout time of the other equipment is to be taken in account as you wont return to your polling prior the completion of them. |
2074
|
13 Jan 2021 |
Stefan Ritt | Forum | poll_event() is very slow. |
Something must be wrong on your side. If you take the example frontend under
midas/examples/experiment/frontend.cxx
and let it run to produce dummy events, you get about 90 Hz. This is because we have a
ss_sleep(10);
in the read_trigger_event() routine to throttle things down. If you remove that sleep,
you get an event rate of about 500'000 Hz. So the framework is really quick.
Probably your routine which looks for a 'lam' takes really long and should be fixed.
Stefan |
2075
|
14 Jan 2021 |
Pintaudi Giorgio | Forum | poll_event() is very slow. |
> Something must be wrong on your side. If you take the example frontend under
>
> midas/examples/experiment/frontend.cxx
>
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
>
> ss_sleep(10);
>
> in the read_trigger_event() routine to throttle things down. If you remove that sleep,
> you get an event rate of about 500'000 Hz. So the framework is really quick.
>
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
>
> Stefan
Sorry if I am going off-topic but, because the ss_sleep function was mentioned here, I
would like to take the chance and report an issue that I am having.
In all my slow control frontends, the CPU usage for each frontend is close to 100%. This
means that each frontend is monopolizing a single core. When I did some profiling, I
noticed that 99% of the time is spent inside the ss_sleep function. Now, I would expect
that the ss_sleep function should not require any CPU usage at all or very little.
So my two questions are:
Is this a bug or a feature?
Would you able to check/reproduce this behavior or do you need additional info from my
side? |
2076
|
14 Jan 2021 |
Isaac Labrie Boulay | Forum | poll_event() is very slow. |
> Something must be wrong on your side. If you take the example frontend under
>
> midas/examples/experiment/frontend.cxx
>
> and let it run to produce dummy events, you get about 90 Hz. This is because we have a
>
> ss_sleep(10);
>
> in the read_trigger_event() routine to throttle things down. If you remove that sleep,
> you get an event rate of about 500'000 Hz. So the framework is really quick.
>
> Probably your routine which looks for a 'lam' takes really long and should be fixed.
>
> Stefan
Hi Stefan,
I should mention that I was using midas/examples/Triumf/c++/fevme.cxx. I was trying to see
the max speed so I had the 'lam' always = 1 with nothing else to add overhead in the
poll_event(). I was getting <200 Hz. I am assuming that this is a bug. There is no
ss_sleep() in that function.
Thanks for your quick response!
Isaac |
2077
|
15 Jan 2021 |
Isaac Labrie Boulay | Forum | poll_event() is very slow. |
> >
> > I'm currently trying to see if I can speed up polling in a frontend I'm testing.
> > Currently it seems like I can't get 'lam's to happen faster than 120 times/second.
> > There must be a way to make this faster. From what I understand, changing the poll
> > time (500ms by default) won't affect the frequency of polling just the 'lam'
> > period.
> >
> > Any suggestions?
> >
>
> You could switch from the traditional midas mfe.c frontend to the C++ TMFE frontend,
> where all this "lam" and "poll" business is removed.
>
> At the moment, there are two example programs using the C++ TMFE frontend,
> single threaded (progs/fetest_tmfe.cxx) and multithreaed (progs/fetest_tmfe_thread.cxx).
>
> K.O.
Ok. I did not know that there was a C++ OOD frontend example in MIDAS. I'll take a look at
it. Is there any documentation on it works?
Thanks for the support!
Isaac |
2084
|
08 Feb 2021 |
Konstantin Olchanski | Forum | poll_event() is very slow. |
> I should mention that I was using midas/examples/Triumf/c++/fevme.cxx
this is correct, the fevme frontend is written to do 100% CPU-busy polling.
there is several reasons for this:
- on our VME processors, we have 2 core CPUs, 1st core can poll the VME bus, 2nd core can run
mfe.c and the ethernet transmitter.
- interrupts are expensive to use (in latency and in cpu use) because kernel handler has to call
use handler, return back etc
- sub-millisecond sleep used to be expensive and unreliable (on 1-2GHz "core 1" and "core 2"
CPUs running SL6 and SL7 era linux). As I understand, current linux and current 3+GHz CPUs can
do reliable microsecond sleep.
K.O. |
434
|
18 Feb 2008 |
Konstantin Olchanski | Bug Report | potential memory corruption in odb,c:extract_key() |
It looks like ODB function extract_key() will overwrite the array pointed to by "key_name" if given an odb
path with very long names (as seems to happen when redirection explodes in the Safari web browser, via
db_get_value(TRUE) via mhttpd "start program" button). All callers of this function seem to provide 256
byte strings, so the problem would not show up in normal use - only when abnormal odb paths are being
parsed. Proposed solution is to add a "length" argument to this function. (Actually ODB path elements
should be restricted to NAME_LENGTH (32 bytes), right?). K.O. |
444
|
21 Feb 2008 |
Konstantin Olchanski | Bug Report | potential memory corruption in odb,c:extract_key() |
> It looks like ODB function extract_key() will overwrite the array pointed to by "key_name" if given an odb
> path with very long names (as seems to happen when redirection explodes in the Safari web browser, via
> db_get_value(TRUE) via mhttpd "start program" button). All callers of this function seem to provide 256
> byte strings, so the problem would not show up in normal use - only when abnormal odb paths are being
> parsed. Proposed solution is to add a "length" argument to this function. (Actually ODB path elements
> should be restricted to NAME_LENGTH (32 bytes), right?). K.O.
This is fixed in svn revision 4129.
K.O. |
1216
|
24 Oct 2016 |
Tim Gorringe | Bug Report | problem with error code DB_NO_MEMORY from db_open_record() call when establish additional hotlinks |
Hi Midas forum,
I'm having a problem with odb hotlinks after increasing sub-directories in an
odb. I now get the error code DB_NO_MEMORY after some db_open_record() calls. I
tried
1) increasing the parameter DEFAULT_ODB_SIZE in midas.h and make clean, make
but got the same error
2) increasing the parameter MAX_OPEN_RECORDS in midas.h and make clean, make
but got fatal errors from odbedit and my midas FE and couldnt run anything
3) deleting my expts SHM files and starting odbedit with "odbedit -e SLAC -s
0x1000000" to increse the odb size but got the same error?
4) I tried a different computer and got the same error code DB_NO_MEMORY
Maybe I running into some system limit that restricts the humber of open records?
Or maybe I've not increased the correct midas parameter?
Best ,Tim. |