Back Midas Rome Roody Rootana
  Midas DAQ System, Page 118 of 137  Not logged in ELOG logo
ID Date Author Topic Subjectup
  110   14 Nov 2003 Stefan Ritt more odb
Ok, I apologize. It's all ok. Thanks for clearifying. Concerning the assert's, it 
would be nice to be able to disable them in release code. Under Windows, the 
assert() is actually a macro which expands to zero if NDEBUG is defined. I 
believe it's the same under linux, but I don't know about VxWorks. So we have 
three options:

1) Keep asserts always. This might possible slow down a DAQ system, but I'm not 
sure how much. Might be negligible.

2) Disable asserts by default (standard make). Only the "experts" can enable it 
in the make file (by removing NDEBUG), since only they know what to do with the 
assertation messages.

3) Let the user decide on the standard installation. Maybe have two libraries, 
one debug, one no-debug. The no-debug can even have the compiler optimization 
disabled, which makes debugging easier.

So what is your opinion (comments from others are welcome as well) of which way 
to go? 
  107   31 Oct 2003 Konstantin Olchanski more odb "run number" error checking
I added error checking to the places where we read "/runinfo/run number". In
general, I do this:

  status = db_get_value("/runinfo/run number",&run_number);
  assert(status==SUCCESS);
  assert(run_number >= 0); (and run_number>0, where appropriate)

Here is the rationale: if we cannot read the run number, something must be
very terribly wrong. I cannot think of any recovery action other than
abort() and make a core dump for our debugging enjoyment.

I considered and rejected adding a "retry" loop: if we allow db_get_value()
to intermittently fail, then it's every use has to be wrapped in a retry
loop, which then should be inside db_get_value(), making it pointless to
have external "retry" loops.

I am now pondering on proposing a "db_get_value_cannot_possibly_fail()"
function (it would abort(), exit() with an error or commit harakiri if it
can't get the value). They way most db_xxx() functions are used in midas,
maybe they should be made "void" and "unfailible", with "STATUS
db_xxx_yes_I_can_fail_and_return_an_error_code()" evil twins. I guess this
is why "they" invented C/C++ exceptions. Anyway, something to think about.

Affected files:
src/lazylogger.c
src/odbedit.c
src/mlogger.c
src/mfe.c
src/odb.c
src/mana.c
src/midas.c
src/mhttpd.c

K.O.
  2045   30 Nov 2020 Konstantin OlchanskiInfomore wisdom from linux kernel people
As you may know, I am a big fan of two software projects - the linux kernel and ROOT. The linux kernel is one of 
the few software projects "done right". ROOT is where normal people try to "get it right" with real-world level 
of success. I use both softwares daily and I try to apply their ways and methods to MIDAS as much as I can.

So just in time for our discussion of array indexes, a talk by gregkh shows
up on slashdot. The title is "how to keep your users happy". (Nobody
ever wants to be nasty to their users, but do read his talk).

https://git.sr.ht/~gregkh/presentation-application_summit/tree/main/keep_users_happy.pdf

The talk refers to some older stuff, still relevant, of course, in case you miss the links
in the pdf file, here they are:

https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
https://ozlabs.org/~rusty/ols-2003-keynote/img0.html (click on "continue" to see next page)

K.O.
  367   09 Apr 2007 Konstantin OlchanskiInfomove history, elog and alarm functions into separate files
As approved by Stefan, I moved the history (hs_xxx), alarm (al_xxx) and elog (el_xxx) functions out of 
midas.c into separate files. Commited as revision 3665. This change should be transparent to all users. 
K.O.
  513   22 Oct 2008 Konstantin OlchanskiInfomscb timeouts and retries
A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
applications:

+   int EXPRT mscb_get_max_retry();
+   int EXPRT mscb_set_max_retry(int max_retry);
+   int EXPRT mscb_get_usb_timeout();
+   int EXPRT mscb_set_usb_timeout(int timeout);
+   int EXPRT mscb_get_eth_max_retry();
+   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

There are 3 settings:

1) mscb_max_retry: most (all?) mscb operations, like mscb_read(), retry failed mscb transactions up to 
10 times. The corresponding set and get functions allow tuning this retry limit.

2) mscb_usb_timeout: the driver for the USB-MSCB adapter uses a timeout of 6 seconds. 
mscb_set_usb_timeout() permits changing this value.

3) mscb_eth_max_retry: the driver for the Ethernet-MSCB adapter has to deal with UDP packet loss. If 
the adapter does not respond to a UDP command, the UDP command is sent again, with a bigger 
timeout (timeout = 100 * (retry+1), in ms), this is repeated up to 10 times. mscb_set_eth_max_retry() 
permits adjusting this number of retries.

This is how it works for the usb interface:

int mscb_read(...)
   for (retry=0; retry<mscb_max_retry; retry++)
       mscb_exch()
            musb_write(..., mscb_usb_timeout)
            musb_read(..., mscb_usb_timeout)     

This is how it works for the ethernet interface:

int mscb_read(...)
   for (retry=0; retry<mscb_max_retry; retry++)
       mscb_exch()
            for (retry=0; retry<mscb_eth_max_retry; retry++)
                 send_udp_command()
                 wait_for_udp_response(timeout = 100 * (retry+1))

This is how the new functions are intended to be used:
   ...
   int old = mscb_set_max_retry(2);
   ... do stuff ...
   mscb_set_max_retry(old); // restore default value

svn revision 4356.
K.O.
  519   28 Oct 2008 Stefan RittInfomscb timeouts and retries
> A new set of functions was added to mscb.h to adjust mscb timeouts and retries to better match specific 
> applications:
> 
> +   int EXPRT mscb_get_max_retry();
> +   int EXPRT mscb_set_max_retry(int max_retry);
> +   int EXPRT mscb_get_usb_timeout();
> +   int EXPRT mscb_set_usb_timeout(int timeout);
> +   int EXPRT mscb_get_eth_max_retry();
> +   int EXPRT mscb_set_eth_max_retry(int eth_max_retry);

In the spirit of this, a variable retry scheme has been implemented in the mscbdev.c device driver. At the 
MEG experiment, we have one mscb device which is pretty slow, while the others are fast. Therefore it is 
necessary to have a per-device max retry count which can be different for different submasters. I moved 
therefore the max_eth_retry variable into the mscb_fd structure and adjusted a few functions accordingly. I 
did not bother with the other timeouts and retries, since I don't need this for the moment, but it would be 
nice if they would be handled in the same way. Then I added code into mscbdev.c to read the retry variable 
form the ODB under /Equipment/<name>/Settings/Device/<Name>/Retries. The default is 10, but it can be 
changed and becomes valid after the program has been restarted. 
  150   03 Oct 2004 Konstantin OlchanskiInfomscb usb support for macosx
After a felicitous confuence of stellar bodies (Stefan, myself, some mscb hardware
and a mac laptop all in the same room for a few days), I wrote some MacOSX
code to support the MSCB-USB dongle using the native IoKit USB API. During testing,
I was able to communicate with an MSCB High voltage regulator module. I am now
commiting this code to CVS, warts and all (we can clean it up when somebody actually
uses it). Tested compilation on Linux (with libusb) and MacOSX (native
IoKit. MacOSX+libusb is possible but untested), Win32 should be unaffected by my changes,
but I could not test it.
K.O.
  391   29 Jun 2007 Konstantin OlchanskiBug Fixmscb, musbstd fixed on Linux, MacOS
I commited a few minor changes to musbstd and mscb code to make them work on
MacOSX (tested on 10.3.9) and Linux (tested on Fedora 6).

The basic functions work with the MSCB USB master, but I still need to
investigate some cases where the connection hangs and usb communications do not
work until the USB cable is unplugged and plugged back in. I see this problem
both on MacOS and Linux.

Important changes:
1) mscb_select_device() does not work on both Linux and MacOS and is disabled.
Please run "msc -d usb0".
2) on Linux, the Makefile should define -DOS_LINUX and -DHAVE_LIBUSB;
   on MacOS, the Makefile should define -DOS_LINUX and -DOS_DARWIN. (This is
because MacOS is treated as a funny type of Linux).
3) when doing USB communications, one has to use the correct endpoint numbers,
which seem to be system dependant and for now, I hard code them in mscb.c for
the tested systems.

There supposed to be no changes to the Windows code, but I cannot test on
Windows, so if somebody does and finds breakage, please let me know.

K.O.
  392   02 Jul 2007 Stefan RittBug Fixmscb, musbstd fixed on Linux, MacOS

KO wrote:
There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know.


I can confirm that revision 3713 still works under Windows.
  394   06 Jul 2007 Konstantin OlchanskiBug Fixmscb, musbstd fixed on Linux, MacOS
> I commited a few minor changes to musbstd and mscb code...
>
> The basic functions work with the MSCB USB master, but I still need to
> investigate some cases where the connection hangs and usb communications do not
> work until the USB cable is unplugged and plugged back in. I see this problem
> both on MacOS and Linux.

I think I fixed the hangs we see on linux and macos - at the end all I had to do is
issue a usb reset to make mscb communicate again.

Also tested on Linux FC6 and SL4.5.

K.O.
  1173   30 Mar 2016 Belina von KrosigkForummserver ERR message saying data area 100% full, though it is free
Hi,

I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
a SL6.7 machine). All works fine except that I receive the following error message:

[mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
100% full

Which is puzzling for the following reason:

-> I have created the ODB with: odbedit -s 4194304
-> Checking the size of the .ODB.SHM it says: 4.2M
-> When I save the ODB as .xml and check the file's size it says: 1.1M
-> When I start odbedit and check the memory usage issuing 'mem', it says: 
...
Free Key area: 1982136 bytes out of 2097152 bytes
...
Free Data area: 2020072 bytes out of 2097152 bytes
Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data

So it seems like nearly all memory is still free. As a test I created more
instances of one of our front-ends and checked 'mem' again. As expected the free
memory was decreasing. I did this ten times in fact, reaching

...
Free Key area: 1440976 bytes out of 2097152 bytes
...
Free Data area: 1861264 bytes out of 2097152 bytes
Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data

So I could use another >20% of the database data area, which is according to the
error message 100% (resp. >95%) full. Am I misunderstanding the error message?
I'd appreciate any comments or ideas on that subject!

Thanks, Belina
  2718   26 Feb 2024 Maia Henriksson-WardForummserver ERR message saying data area 100% full, though it is free
> Hi,
> 
> I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
> a SL6.7 machine). All works fine except that I receive the following error message:
> 
> [mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
> 100% full
> 
> Which is puzzling for the following reason:
> 
> -> I have created the ODB with: odbedit -s 4194304
> -> Checking the size of the .ODB.SHM it says: 4.2M
> -> When I save the ODB as .xml and check the file's size it says: 1.1M
> -> When I start odbedit and check the memory usage issuing 'mem', it says: 
> ...
> Free Key area: 1982136 bytes out of 2097152 bytes
> ...
> Free Data area: 2020072 bytes out of 2097152 bytes
> Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data
> 
> So it seems like nearly all memory is still free. As a test I created more
> instances of one of our front-ends and checked 'mem' again. As expected the free
> memory was decreasing. I did this ten times in fact, reaching
> 
> ...
> Free Key area: 1440976 bytes out of 2097152 bytes
> ...
> Free Data area: 1861264 bytes out of 2097152 bytes
> Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data
> 
> So I could use another >20% of the database data area, which is according to the
> error message 100% (resp. >95%) full. Am I misunderstanding the error message?
> I'd appreciate any comments or ideas on that subject!
> 
> Thanks, Belina

This is an old post, but I encountered the same error message recently and was looking for a 
solution here. Here's how I solved it, for anyone else who finds this: 
The size of .ODB.SHM was bigger than the maximum ODB size (4.2M > 4194304 in Belina's case). For us, 
the very large odb size was in error and I suspect it happened because we forgot to shut down midas 
cleanly before shutting the computer down. Using odbedit to load a previously saved copy of the ODB 
did not help me to get .ODB.SHM back to a normal size. Following the instructions on the wiki for 
recovery from a corrupted odb, 
https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB, (odbinit with --cleanup option) should 
work, but didn't for me. Unfortunately I didn't save the output to figure out why. My solution was to manually delete/move/hide 
the .ODB.SHM file, and an equally large file called .ODB.SHM.1701109528, then run odbedit again and reload that same saved copy of my ODB. 
Manually changing files used by mserver is risky - for anyone who has the same problem, I suggest trying odbinit --cleanup -s 
<yoursize> first.
  2543   21 Jun 2023 Gennaro TortoneBug Reportmserver and script execution
Hi,
I have the following setup:

- MIDAS release: release/midas-2022-05-c
- host with MIDAS frontend (mclient)
- host with MIDAS server (mhttpd / mserver)

On mclient I run a frontend with:

./feodt5751 -h mserver -e develop -i 0

On mserver I see frontend ready and ODB variables in place;

I noticed a strange behavior with "/Programs/Execute on start run" and 
"/Programs/Execute on stop run". In details the script to execute at start of run
is executed on "mserver" host but the script to execute at stop of run is executed on
"mclient" host (!)

Is this a bug or I'm missing some documentation links ?

Thanks in advance,
Gennaro
  2550   26 Jun 2023 Stefan RittBug Reportmserver and script execution
Indeed that could well be (and is certainly not intended like that). I checked the code
and found that "execute on start run" and "execute on stop run" are called inside
cm_transition(). That means they are executed on the computer which calls cm_transition().
If you use mhttpd and start a run through the web interface, then mhttpd runs on your
server and "execute on start run" gets executed on your server. If you stop the run
by your frontend running on the client machine (like if a certain number of events 
is reached), then "execute on stop run" gets executed on your client.

An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
gets check by your frontend and therefore on the client computer, but use 
"/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
therefore executed on the server computer.

Getting a consistent behaviour (like always executing scripts on the server) would
require a major rework of the run transition framework with probably many undesired
side-effects, so lots of debugging work.

Stefan
  2551   27 Jun 2023 Gennaro TortoneBug Reportmserver and script execution
Hi Stefan,

> Indeed that could well be (and is certainly not intended like that). I checked the code
> and found that "execute on start run" and "execute on stop run" are called inside
> cm_transition(). That means they are executed on the computer which calls cm_transition().
> If you use mhttpd and start a run through the web interface, then mhttpd runs on your
> server and "execute on start run" gets executed on your server. If you stop the run
> by your frontend running on the client machine (like if a certain number of events 
> is reached), then "execute on stop run" gets executed on your client.

ok, this is clear to me...

> An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
> gets check by your frontend and therefore on the client computer, but use 
> "/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
> therefore executed on the server computer.

we never used "/Equipment/Trigger/Common/Event limit" but we always used
"/Logger/Channels/0/Settings/Event limit"...

btw I did some tests and I understand that this issue is related to 'deferred transition'
on frontend. Indeed I disabled deferred transition on frontend side and now script
execution is carried out always on MIDAS server;

Cheers,
Gennaro
  2552   27 Jun 2023 Stefan RittBug Reportmserver and script execution
> btw I did some tests and I understand that this issue is related to 'deferred transition'
> on frontend. Indeed I disabled deferred transition on frontend side and now script
> execution is carried out always on MIDAS server;

Ah, that's clear now. In a deferred transition, the frontend finally stops the run (after the 
condition is given to finish). Since the client calls cm_transition(), the script gets executed on 
the client. Changing that would be a rather large rework of the code. So maybe better call a 
script which executes another script via ssh on the server.

Stefan
  2386   24 Apr 2022 Konstantin OlchanskiBug Fixmserver buffer overrun and crash
There is a memory allocation bug in the mserver.

ALIGN8() was missing when receiving events from the event socket and data buffer 
was allocated 4 bytes too short. but only for some received events and only in 
very unlucky sequence of received events. result was a rare but obnoxious crash 
of fevme frontend in alpha-2 at CERN. (we do not see any crash from this in 
alpha-g or anywhere else, the best I can tell).

fixed in commit 4dc06ba47ff7caa5251fd8c48d8533f35799f3a6.

If you use the mserver, please update to this commit or apply following patch in 
midas.cxx:

-   int bufsize = sizeof(INT) + event_size;
+   int bufsize = sizeof(INT) + total_size;

K.O.
  2405   16 May 2022 Konstantin OlchanskiBug Fixmserver buffer overrun and crash
> There is a memory allocation bug in the mserver.

Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache, 
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer

This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401

Event buffer write cache (as set via ODB Equipment/Common and via 
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful 
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.

mfe.c and tmfe c++ frontends updated to request the new write cache size by default.

if events are getting stuck in the write cache for too long, instead of reducing the 
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by 
default).

commit 373bcc3ab7f83c3c7bf6c051c237de043a982502

K.O.
  654   08 Oct 2009 Tim NichollsBug Reportmserver linking fails when using shared library
I have experienced a problem building MIDAS from the head of the SVN repository (rev 4458) when 
specifying the shared library flag. Whie the shared library appears to compile and link OK, the 
subsequent compilation of mserver fails as follows:

$ make ROOTSYS= NEED_SHLIB=1

<... snipped some lines ...>

ld -shared -o linux/lib/libmidas.so linux/lib/midas.o linux/lib/system.o linux/lib/mrpc.o 
linux/lib/odb.o linux/lib/ybos.o linux/lib/ftplib.o linux/lib/mxml.o linux/lib/history_midas.o 
linux/lib/history_sql.o linux/lib/history.o linux/lib/alarm.o linux/lib/elog.o linux/lib/strlcpy.o -lutil -
lpthread -lz -lc
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mana.o 
src/mana.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o 
linux/lib/cnaf_callback.o src/cnaf_callback.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mfe.o 
src/mfe.c
g++ -Dextname -DMANA_LITE -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -
Llinux/lib -DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-
unused-function -o linux/lib/fal.o src/fal.c
cc -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o 
linux/bin/mserver src/mserver.c -lmidas -Wl,-rpath,/usr/local/lib -lutil -lpthread -lz
/usr/bin/ld: linux/bin/mserver: hidden symbol `__dso_handle' in /usr/lib/gcc/x86_64-redhat-
linux/4.1.2/crtbegin.o is referenced by DSO
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make: *** [linux/bin/mserver] Error 1

Having googled the error, it appears to be solved by modifying the linker statement for the shared 
library in the Makefile at line 464 to use g++ rather than ld:

463c463
< 	ld -shared -o $@ $^ $(LIBS) -lc
---
> 	$(CXX) -shared -o $@ $^ $(LIBS) -lc

Presumably this is because g++ knows better how to link in the appropriate system libraries required 
for some of the recently added C++ code? 

This was on Scientific Linux SL5.2 x86_64, gcc version 4.1.2, glibc version 2.5-24.

Tim
  679   26 Nov 2009 Konstantin OlchanskiBug Fixmserver network routing fix
mserver update svn rev 4625 fixes an anomaly in the MIDAS RPC network code where
in some network configurations MIDAS mserver connections work, but some RPC
transactions, such as starting and stopping runs, do not (use the wrong network
names or are routed over the wrong network).

The problem is a possible discrepancy between network addresses used to
establish the mserver connection and the value of "/System/Clients/xxx/Host"
which is ultimately set to the value of "hostname" of the remote client. This
ODB setting is then used to establish additional network connections, for
example to start or stop runs.

Use the client "hostname" setting works well for standard configurations, when
there is only one network interface in the machine, with only one IP address,
and with "hostname" set to the value that this IP address resolves to using DNS.

However, if there are private networks, multiple network interfaces, or multiple
network routes between machines, "/System/Clients/xxx/Host" may become set to an
undesirable value resulting in asymmetrical network routing or complete failure
to establish RPC connections.

Svn rev 4625 updates mserver.c to automatically set "/System/clients/xxx/Host"
to the same network name as was used to establish the original mserver connection.

As always with networking, any fix always breaks something somewhere for
somebody, in which case the old behavior can be restored by "setenv
MIDAS_MSERVER_DO_NOT_USE_CALLBACK_ADDR 1" before starting mserver.

The specific problem fixed by this change is when the MIDAS client and server
are on machines connected by 2 separate networks ("client.triumf.ca" and
"client.daq"; "server.triumf.ca" and "server.daq"). The ".triumf.ca" network
carries the normal SSH, NFS, etc traffic, and the ".daq" network carries MIDAS
data traffic.

The client would use the "server.daq" name to connect to the server and this
traffic would go over the data network (good).

However, previously, the client "/System/Clients/xxx/Host" would be set to
"client.triumf.ca" and any reverse connections (i.e. RPC to start/stop runs)
would go over the normal ".triumf.ca" network (bad).

With this modification, mserver will set "/System/Clients/xxx/Host" to
"client.daq" (the IP address of the interface on the ".daq" network) and all
reverse connections would also go over the ".daq" network (good).

P.S. This modification definitely works only for the default "mserver -m" mode,
but I do not think this is a problem as using "-s" and "-t" modes is not
recommended, and the "-s" mode is definitely broken (see my previous message).

svn rev 4625
K.O.
ELOG V3.1.4-2e1708b5