ID |
Date |
Author |
Topic |
Subject |
110
|
14 Nov 2003 |
Stefan Ritt | | more odb | Ok, I apologize. It's all ok. Thanks for clearifying. Concerning the assert's, it
would be nice to be able to disable them in release code. Under Windows, the
assert() is actually a macro which expands to zero if NDEBUG is defined. I
believe it's the same under linux, but I don't know about VxWorks. So we have
three options:
1) Keep asserts always. This might possible slow down a DAQ system, but I'm not
sure how much. Might be negligible.
2) Disable asserts by default (standard make). Only the "experts" can enable it
in the make file (by removing NDEBUG), since only they know what to do with the
assertation messages.
3) Let the user decide on the standard installation. Maybe have two libraries,
one debug, one no-debug. The no-debug can even have the compiler optimization
disabled, which makes debugging easier.
So what is your opinion (comments from others are welcome as well) of which way
to go? |
107
|
31 Oct 2003 |
Konstantin Olchanski | | more odb "run number" error checking | I added error checking to the places where we read "/runinfo/run number". In
general, I do this:
status = db_get_value("/runinfo/run number",&run_number);
assert(status==SUCCESS);
assert(run_number >= 0); (and run_number>0, where appropriate)
Here is the rationale: if we cannot read the run number, something must be
very terribly wrong. I cannot think of any recovery action other than
abort() and make a core dump for our debugging enjoyment.
I considered and rejected adding a "retry" loop: if we allow db_get_value()
to intermittently fail, then it's every use has to be wrapped in a retry
loop, which then should be inside db_get_value(), making it pointless to
have external "retry" loops.
I am now pondering on proposing a "db_get_value_cannot_possibly_fail()"
function (it would abort(), exit() with an error or commit harakiri if it
can't get the value). They way most db_xxx() functions are used in midas,
maybe they should be made "void" and "unfailible", with "STATUS
db_xxx_yes_I_can_fail_and_return_an_error_code()" evil twins. I guess this
is why "they" invented C/C++ exceptions. Anyway, something to think about.
Affected files:
src/lazylogger.c
src/odbedit.c
src/mlogger.c
src/mfe.c
src/odb.c
src/mana.c
src/midas.c
src/mhttpd.c
K.O. |
2045
|
30 Nov 2020 |
Konstantin Olchanski | Info | more wisdom from linux kernel people | As you may know, I am a big fan of two software projects - the linux kernel and ROOT. The linux kernel is one of
the few software projects "done right". ROOT is where normal people try to "get it right" with real-world level
of success. I use both softwares daily and I try to apply their ways and methods to MIDAS as much as I can.
So just in time for our discussion of array indexes, a talk by gregkh shows
up on slashdot. The title is "how to keep your users happy". (Nobody
ever wants to be nasty to their users, but do read his talk).
https://git.sr.ht/~gregkh/presentation-application_summit/tree/main/keep_users_happy.pdf
The talk refers to some older stuff, still relevant, of course, in case you miss the links
in the pdf file, here they are:
https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
https://ozlabs.org/~rusty/ols-2003-keynote/img0.html (click on "continue" to see next page)
K.O. |
367
|
09 Apr 2007 |
Konstantin Olchanski | Info | move history, elog and alarm functions into separate files | As approved by Stefan, I moved the history (hs_xxx), alarm (al_xxx) and elog (el_xxx) functions out of
midas.c into separate files. Commited as revision 3665. This change should be transparent to all users.
K.O. |
513
|
22 Oct 2008 |
Konstantin Olchanski | Info | mscb timeouts and retries | A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific
applications:
+ int EXPRT mscb_get_max_retry();
+ int EXPRT mscb_set_max_retry(int max_retry);
+ int EXPRT mscb_get_usb_timeout();
+ int EXPRT mscb_set_usb_timeout(int timeout);
+ int EXPRT mscb_get_eth_max_retry();
+ int EXPRT mscb_set_eth_max_retry(int eth_max_retry);
There are 3 settings:
1) mscb_max_retry: most (all?) mscb operations, like mscb_read(), retry failed mscb transactions up to
10 times. The corresponding set and get functions allow tuning this retry limit.
2) mscb_usb_timeout: the driver for the USB-MSCB adapter uses a timeout of 6 seconds.
mscb_set_usb_timeout() permits changing this value.
3) mscb_eth_max_retry: the driver for the Ethernet-MSCB adapter has to deal with UDP packet loss. If
the adapter does not respond to a UDP command, the UDP command is sent again, with a bigger
timeout (timeout = 100 * (retry+1), in ms), this is repeated up to 10 times. mscb_set_eth_max_retry()
permits adjusting this number of retries.
This is how it works for the usb interface:
int mscb_read(...)
for (retry=0; retry<mscb_max_retry; retry++)
mscb_exch()
musb_write(..., mscb_usb_timeout)
musb_read(..., mscb_usb_timeout)
This is how it works for the ethernet interface:
int mscb_read(...)
for (retry=0; retry<mscb_max_retry; retry++)
mscb_exch()
for (retry=0; retry<mscb_eth_max_retry; retry++)
send_udp_command()
wait_for_udp_response(timeout = 100 * (retry+1))
This is how the new functions are intended to be used:
...
int old = mscb_set_max_retry(2);
... do stuff ...
mscb_set_max_retry(old); // restore default value
svn revision 4356.
K.O. |
519
|
28 Oct 2008 |
Stefan Ritt | Info | mscb timeouts and retries | > A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific
> applications:
>
> + int EXPRT mscb_get_max_retry();
> + int EXPRT mscb_set_max_retry(int max_retry);
> + int EXPRT mscb_get_usb_timeout();
> + int EXPRT mscb_set_usb_timeout(int timeout);
> + int EXPRT mscb_get_eth_max_retry();
> + int EXPRT mscb_set_eth_max_retry(int eth_max_retry);
In the spirit of this, a variable retry scheme has been implemented in the mscbdev.c device driver. At the
MEG experiment, we have one mscb device which is pretty slow, while the others are fast. Therefore it is
necessary to have a per-device max retry count which can be different for different submasters. I moved
therefore the max_eth_retry variable into the mscb_fd structure and adjusted a few functions accordingly. I
did not bother with the other timeouts and retries, since I don't need this for the moment, but it would be
nice if they would be handled in the same way. Then I added code into mscbdev.c to read the retry variable
form the ODB under /Equipment/<name>/Settings/Device/<Name>/Retries. The default is 10, but it can be
changed and becomes valid after the program has been restarted. |
150
|
03 Oct 2004 |
Konstantin Olchanski | Info | mscb usb support for macosx | After a felicitous confuence of stellar bodies (Stefan, myself, some mscb hardware
and a mac laptop all in the same room for a few days), I wrote some MacOSX
code to support the MSCB-USB dongle using the native IoKit USB API. During testing,
I was able to communicate with an MSCB High voltage regulator module. I am now
commiting this code to CVS, warts and all (we can clean it up when somebody actually
uses it). Tested compilation on Linux (with libusb) and MacOSX (native
IoKit. MacOSX+libusb is possible but untested), Win32 should be unaffected by my changes,
but I could not test it.
K.O. |
391
|
29 Jun 2007 |
Konstantin Olchanski | Bug Fix | mscb, musbstd fixed on Linux, MacOS | I commited a few minor changes to musbstd and mscb code to make them work on
MacOSX (tested on 10.3.9) and Linux (tested on Fedora 6).
The basic functions work with the MSCB USB master, but I still need to
investigate some cases where the connection hangs and usb communications do not
work until the USB cable is unplugged and plugged back in. I see this problem
both on MacOS and Linux.
Important changes:
1) mscb_select_device() does not work on both Linux and MacOS and is disabled.
Please run "msc -d usb0".
2) on Linux, the Makefile should define -DOS_LINUX and -DHAVE_LIBUSB;
on MacOS, the Makefile should define -DOS_LINUX and -DOS_DARWIN. (This is
because MacOS is treated as a funny type of Linux).
3) when doing USB communications, one has to use the correct endpoint numbers,
which seem to be system dependant and for now, I hard code them in mscb.c for
the tested systems.
There supposed to be no changes to the Windows code, but I cannot test on
Windows, so if somebody does and finds breakage, please let me know.
K.O. |
392
|
02 Jul 2007 |
Stefan Ritt | Bug Fix | mscb, musbstd fixed on Linux, MacOS |
KO wrote: | There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know. |
I can confirm that revision 3713 still works under Windows. |
394
|
06 Jul 2007 |
Konstantin Olchanski | Bug Fix | mscb, musbstd fixed on Linux, MacOS | > I commited a few minor changes to musbstd and mscb code...
>
> The basic functions work with the MSCB USB master, but I still need to
> investigate some cases where the connection hangs and usb communications do not
> work until the USB cable is unplugged and plugged back in. I see this problem
> both on MacOS and Linux.
I think I fixed the hangs we see on linux and macos - at the end all I had to do is
issue a usb reset to make mscb communicate again.
Also tested on Linux FC6 and SL4.5.
K.O. |
1173
|
30 Mar 2016 |
Belina von Krosigk | Forum | mserver ERR message saying data area 100% full, though it is free | Hi,
I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
a SL6.7 machine). All works fine except that I receive the following error message:
[mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
100% full
Which is puzzling for the following reason:
-> I have created the ODB with: odbedit -s 4194304
-> Checking the size of the .ODB.SHM it says: 4.2M
-> When I save the ODB as .xml and check the file's size it says: 1.1M
-> When I start odbedit and check the memory usage issuing 'mem', it says:
...
Free Key area: 1982136 bytes out of 2097152 bytes
...
Free Data area: 2020072 bytes out of 2097152 bytes
Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data
So it seems like nearly all memory is still free. As a test I created more
instances of one of our front-ends and checked 'mem' again. As expected the free
memory was decreasing. I did this ten times in fact, reaching
...
Free Key area: 1440976 bytes out of 2097152 bytes
...
Free Data area: 1861264 bytes out of 2097152 bytes
Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data
So I could use another >20% of the database data area, which is according to the
error message 100% (resp. >95%) full. Am I misunderstanding the error message?
I'd appreciate any comments or ideas on that subject!
Thanks, Belina |
2718
|
26 Feb 2024 |
Maia Henriksson-Ward | Forum | mserver ERR message saying data area 100% full, though it is free | > Hi,
>
> I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
> a SL6.7 machine). All works fine except that I receive the following error message:
>
> [mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
> 100% full
>
> Which is puzzling for the following reason:
>
> -> I have created the ODB with: odbedit -s 4194304
> -> Checking the size of the .ODB.SHM it says: 4.2M
> -> When I save the ODB as .xml and check the file's size it says: 1.1M
> -> When I start odbedit and check the memory usage issuing 'mem', it says:
> ...
> Free Key area: 1982136 bytes out of 2097152 bytes
> ...
> Free Data area: 2020072 bytes out of 2097152 bytes
> Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data
>
> So it seems like nearly all memory is still free. As a test I created more
> instances of one of our front-ends and checked 'mem' again. As expected the free
> memory was decreasing. I did this ten times in fact, reaching
>
> ...
> Free Key area: 1440976 bytes out of 2097152 bytes
> ...
> Free Data area: 1861264 bytes out of 2097152 bytes
> Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data
>
> So I could use another >20% of the database data area, which is according to the
> error message 100% (resp. >95%) full. Am I misunderstanding the error message?
> I'd appreciate any comments or ideas on that subject!
>
> Thanks, Belina
This is an old post, but I encountered the same error message recently and was looking for a
solution here. Here's how I solved it, for anyone else who finds this:
The size of .ODB.SHM was bigger than the maximum ODB size (4.2M > 4194304 in Belina's case). For us,
the very large odb size was in error and I suspect it happened because we forgot to shut down midas
cleanly before shutting the computer down. Using odbedit to load a previously saved copy of the ODB
did not help me to get .ODB.SHM back to a normal size. Following the instructions on the wiki for
recovery from a corrupted odb,
https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB, (odbinit with --cleanup option) should
work, but didn't for me. Unfortunately I didn't save the output to figure out why. My solution was to manually delete/move/hide
the .ODB.SHM file, and an equally large file called .ODB.SHM.1701109528, then run odbedit again and reload that same saved copy of my ODB.
Manually changing files used by mserver is risky - for anyone who has the same problem, I suggest trying odbinit --cleanup -s
<yoursize> first. |
2543
|
21 Jun 2023 |
Gennaro Tortone | Bug Report | mserver and script execution |
Hi,
I have the following setup:
- MIDAS release: release/midas-2022-05-c
- host with MIDAS frontend (mclient)
- host with MIDAS server (mhttpd / mserver)
On mclient I run a frontend with:
./feodt5751 -h mserver -e develop -i 0
On mserver I see frontend ready and ODB variables in place;
I noticed a strange behavior with "/Programs/Execute on start run" and
"/Programs/Execute on stop run". In details the script to execute at start of run
is executed on "mserver" host but the script to execute at stop of run is executed on
"mclient" host (!)
Is this a bug or I'm missing some documentation links ?
Thanks in advance,
Gennaro |
2550
|
26 Jun 2023 |
Stefan Ritt | Bug Report | mserver and script execution | Indeed that could well be (and is certainly not intended like that). I checked the code
and found that "execute on start run" and "execute on stop run" are called inside
cm_transition(). That means they are executed on the computer which calls cm_transition().
If you use mhttpd and start a run through the web interface, then mhttpd runs on your
server and "execute on start run" gets executed on your server. If you stop the run
by your frontend running on the client machine (like if a certain number of events
is reached), then "execute on stop run" gets executed on your client.
An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
gets check by your frontend and therefore on the client computer, but use
"/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
therefore executed on the server computer.
Getting a consistent behaviour (like always executing scripts on the server) would
require a major rework of the run transition framework with probably many undesired
side-effects, so lots of debugging work.
Stefan |
2551
|
27 Jun 2023 |
Gennaro Tortone | Bug Report | mserver and script execution | Hi Stefan,
> Indeed that could well be (and is certainly not intended like that). I checked the code
> and found that "execute on start run" and "execute on stop run" are called inside
> cm_transition(). That means they are executed on the computer which calls cm_transition().
> If you use mhttpd and start a run through the web interface, then mhttpd runs on your
> server and "execute on start run" gets executed on your server. If you stop the run
> by your frontend running on the client machine (like if a certain number of events
> is reached), then "execute on stop run" gets executed on your client.
ok, this is clear to me...
> An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
> gets check by your frontend and therefore on the client computer, but use
> "/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
> therefore executed on the server computer.
we never used "/Equipment/Trigger/Common/Event limit" but we always used
"/Logger/Channels/0/Settings/Event limit"...
btw I did some tests and I understand that this issue is related to 'deferred transition'
on frontend. Indeed I disabled deferred transition on frontend side and now script
execution is carried out always on MIDAS server;
Cheers,
Gennaro |
2552
|
27 Jun 2023 |
Stefan Ritt | Bug Report | mserver and script execution | > btw I did some tests and I understand that this issue is related to 'deferred transition'
> on frontend. Indeed I disabled deferred transition on frontend side and now script
> execution is carried out always on MIDAS server;
Ah, that's clear now. In a deferred transition, the frontend finally stops the run (after the
condition is given to finish). Since the client calls cm_transition(), the script gets executed on
the client. Changing that would be a rather large rework of the code. So maybe better call a
script which executes another script via ssh on the server.
Stefan |
2386
|
24 Apr 2022 |
Konstantin Olchanski | Bug Fix | mserver buffer overrun and crash | There is a memory allocation bug in the mserver.
ALIGN8() was missing when receiving events from the event socket and data buffer
was allocated 4 bytes too short. but only for some received events and only in
very unlucky sequence of received events. result was a rare but obnoxious crash
of fevme frontend in alpha-2 at CERN. (we do not see any crash from this in
alpha-g or anywhere else, the best I can tell).
fixed in commit 4dc06ba47ff7caa5251fd8c48d8533f35799f3a6.
If you use the mserver, please update to this commit or apply following patch in
midas.cxx:
- int bufsize = sizeof(INT) + event_size;
+ int bufsize = sizeof(INT) + total_size;
K.O. |
2405
|
16 May 2022 |
Konstantin Olchanski | Bug Fix | mserver buffer overrun and crash | > There is a memory allocation bug in the mserver.
Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache,
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer
This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401
Event buffer write cache (as set via ODB Equipment/Common and via
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.
mfe.c and tmfe c++ frontends updated to request the new write cache size by default.
if events are getting stuck in the write cache for too long, instead of reducing the
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by
default).
commit 373bcc3ab7f83c3c7bf6c051c237de043a982502
K.O. |
654
|
08 Oct 2009 |
Tim Nicholls | Bug Report | mserver linking fails when using shared library | I have experienced a problem building MIDAS from the head of the SVN repository (rev 4458) when
specifying the shared library flag. Whie the shared library appears to compile and link OK, the
subsequent compilation of mserver fails as follows:
$ make ROOTSYS= NEED_SHLIB=1
<... snipped some lines ...>
ld -shared -o linux/lib/libmidas.so linux/lib/midas.o linux/lib/system.o linux/lib/mrpc.o
linux/lib/odb.o linux/lib/ybos.o linux/lib/ftplib.o linux/lib/mxml.o linux/lib/history_midas.o
linux/lib/history_sql.o linux/lib/history.o linux/lib/alarm.o linux/lib/elog.o linux/lib/strlcpy.o -lutil -
lpthread -lz -lc
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mana.o
src/mana.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o
linux/lib/cnaf_callback.o src/cnaf_callback.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mfe.o
src/mfe.c
g++ -Dextname -DMANA_LITE -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -
Llinux/lib -DINCLUDE_FTPLIB -D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-
unused-function -o linux/lib/fal.o src/fal.c
cc -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o
linux/bin/mserver src/mserver.c -lmidas -Wl,-rpath,/usr/local/lib -lutil -lpthread -lz
/usr/bin/ld: linux/bin/mserver: hidden symbol `__dso_handle' in /usr/lib/gcc/x86_64-redhat-
linux/4.1.2/crtbegin.o is referenced by DSO
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make: *** [linux/bin/mserver] Error 1
Having googled the error, it appears to be solved by modifying the linker statement for the shared
library in the Makefile at line 464 to use g++ rather than ld:
463c463
< ld -shared -o $@ $^ $(LIBS) -lc
---
> $(CXX) -shared -o $@ $^ $(LIBS) -lc
Presumably this is because g++ knows better how to link in the appropriate system libraries required
for some of the recently added C++ code?
This was on Scientific Linux SL5.2 x86_64, gcc version 4.1.2, glibc version 2.5-24.
Tim |
679
|
26 Nov 2009 |
Konstantin Olchanski | Bug Fix | mserver network routing fix | mserver update svn rev 4625 fixes an anomaly in the MIDAS RPC network code where
in some network configurations MIDAS mserver connections work, but some RPC
transactions, such as starting and stopping runs, do not (use the wrong network
names or are routed over the wrong network).
The problem is a possible discrepancy between network addresses used to
establish the mserver connection and the value of "/System/Clients/xxx/Host"
which is ultimately set to the value of "hostname" of the remote client. This
ODB setting is then used to establish additional network connections, for
example to start or stop runs.
Use the client "hostname" setting works well for standard configurations, when
there is only one network interface in the machine, with only one IP address,
and with "hostname" set to the value that this IP address resolves to using DNS.
However, if there are private networks, multiple network interfaces, or multiple
network routes between machines, "/System/Clients/xxx/Host" may become set to an
undesirable value resulting in asymmetrical network routing or complete failure
to establish RPC connections.
Svn rev 4625 updates mserver.c to automatically set "/System/clients/xxx/Host"
to the same network name as was used to establish the original mserver connection.
As always with networking, any fix always breaks something somewhere for
somebody, in which case the old behavior can be restored by "setenv
MIDAS_MSERVER_DO_NOT_USE_CALLBACK_ADDR 1" before starting mserver.
The specific problem fixed by this change is when the MIDAS client and server
are on machines connected by 2 separate networks ("client.triumf.ca" and
"client.daq"; "server.triumf.ca" and "server.daq"). The ".triumf.ca" network
carries the normal SSH, NFS, etc traffic, and the ".daq" network carries MIDAS
data traffic.
The client would use the "server.daq" name to connect to the server and this
traffic would go over the data network (good).
However, previously, the client "/System/Clients/xxx/Host" would be set to
"client.triumf.ca" and any reverse connections (i.e. RPC to start/stop runs)
would go over the normal ".triumf.ca" network (bad).
With this modification, mserver will set "/System/Clients/xxx/Host" to
"client.daq" (the IP address of the interface on the ".daq" network) and all
reverse connections would also go over the ".daq" network (good).
P.S. This modification definitely works only for the default "mserver -m" mode,
but I do not think this is a problem as using "-s" and "-t" modes is not
recommended, and the "-s" mode is definitely broken (see my previous message).
svn rev 4625
K.O. |
|