ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 132 of 152

Not logged in

Find | Login | Help

New entries since:

Wed Dec 31 16:00:00 1969

Full | Summary | Threaded | Hide attachments

3027 Entries

Goto page Previous 1, 2, 3 ... 131, 132, 133 ... 150, 151, 152 Next

mscb, musbstd fixed on Linux, MacOS

KO wrote:

There supposed to be no changes to the Windows code, but I cannot test on Windows, so if somebody does and finds breakage, please let me know.

I can confirm that revision 3713 still works under Windows.

mscb, musbstd fixed on Linux, MacOS

> I commited a few minor changes to musbstd and mscb code...
>
> The basic functions work with the MSCB USB master, but I still need to
> investigate some cases where the connection hangs and usb communications do not
> work until the USB cable is unplugged and plugged back in. I see this problem
> both on MacOS and Linux.

I think I fixed the hangs we see on linux and macos - at the end all I had to do is
issue a usb reset to make mscb communicate again.

Also tested on Linux FC6 and SL4.5.

K.O.

mserver ERR message saying data area 100% full, though it is free

Hi,

I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
a SL6.7 machine). All works fine except that I receive the following error message:

[mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
100% full

Which is puzzling for the following reason:

-> I have created the ODB with: odbedit -s 4194304
-> Checking the size of the .ODB.SHM it says: 4.2M
-> When I save the ODB as .xml and check the file's size it says: 1.1M
-> When I start odbedit and check the memory usage issuing 'mem', it says: 
...
Free Key area: 1982136 bytes out of 2097152 bytes
...
Free Data area: 2020072 bytes out of 2097152 bytes
Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data

So it seems like nearly all memory is still free. As a test I created more
instances of one of our front-ends and checked 'mem' again. As expected the free
memory was decreasing. I did this ten times in fact, reaching

...
Free Key area: 1440976 bytes out of 2097152 bytes
...
Free Data area: 1861264 bytes out of 2097152 bytes
Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data

So I could use another >20% of the database data area, which is according to the
error message 100% (resp. >95%) full. Am I misunderstanding the error message?
I'd appreciate any comments or ideas on that subject!

Thanks, Belina

mserver ERR message saying data area 100% full, though it is free

> Hi,
> 
> I have just installed Midas and set-up the ODB for a SuperCDMS test-facility (on
> a SL6.7 machine). All works fine except that I receive the following error message:
> 
> [mserver,ERROR] [odb.c:944:db_validate_db,ERROR] Warning: database data area is
> 100% full
> 
> Which is puzzling for the following reason:
> 
> -> I have created the ODB with: odbedit -s 4194304
> -> Checking the size of the .ODB.SHM it says: 4.2M
> -> When I save the ODB as .xml and check the file's size it says: 1.1M
> -> When I start odbedit and check the memory usage issuing 'mem', it says: 
> ...
> Free Key area: 1982136 bytes out of 2097152 bytes
> ...
> Free Data area: 2020072 bytes out of 2097152 bytes
> Free: 1982136 (94.5%) keylist, 2020072 (96.3%) data
> 
> So it seems like nearly all memory is still free. As a test I created more
> instances of one of our front-ends and checked 'mem' again. As expected the free
> memory was decreasing. I did this ten times in fact, reaching
> 
> ...
> Free Key area: 1440976 bytes out of 2097152 bytes
> ...
> Free Data area: 1861264 bytes out of 2097152 bytes
> Free: 1440976 (68.7%) keylist, 1861264 (88.8%) data
> 
> So I could use another >20% of the database data area, which is according to the
> error message 100% (resp. >95%) full. Am I misunderstanding the error message?
> I'd appreciate any comments or ideas on that subject!
> 
> Thanks, Belina

This is an old post, but I encountered the same error message recently and was looking for a 
solution here. Here's how I solved it, for anyone else who finds this: 
The size of .ODB.SHM was bigger than the maximum ODB size (4.2M > 4194304 in Belina's case). For us, 
the very large odb size was in error and I suspect it happened because we forgot to shut down midas 
cleanly before shutting the computer down. Using odbedit to load a previously saved copy of the ODB 
did not help me to get .ODB.SHM back to a normal size. Following the instructions on the wiki for 
recovery from a corrupted odb, 
https://daq00.triumf.ca/MidasWiki/index.php/FAQ#How_to_recover_from_a_corrupted_ODB, (odbinit with --cleanup option) should 
work, but didn't for me. Unfortunately I didn't save the output to figure out why. My solution was to manually delete/move/hide 
the .ODB.SHM file, and an equally large file called .ODB.SHM.1701109528, then run odbedit again and reload that same saved copy of my ODB. 
Manually changing files used by mserver is risky - for anyone who has the same problem, I suggest trying odbinit --cleanup -s 
<yoursize> first.

mserver and script execution

Hi,
I have the following setup:

- MIDAS release: release/midas-2022-05-c
- host with MIDAS frontend (mclient)
- host with MIDAS server (mhttpd / mserver)

On mclient I run a frontend with:

./feodt5751 -h mserver -e develop -i 0

On mserver I see frontend ready and ODB variables in place;

I noticed a strange behavior with "/Programs/Execute on start run" and 
"/Programs/Execute on stop run". In details the script to execute at start of run
is executed on "mserver" host but the script to execute at stop of run is executed on
"mclient" host (!)

Is this a bug or I'm missing some documentation links ?

Thanks in advance,
Gennaro

mserver and script execution

Indeed that could well be (and is certainly not intended like that). I checked the code
and found that "execute on start run" and "execute on stop run" are called inside
cm_transition(). That means they are executed on the computer which calls cm_transition().
If you use mhttpd and start a run through the web interface, then mhttpd runs on your
server and "execute on start run" gets executed on your server. If you stop the run
by your frontend running on the client machine (like if a certain number of events 
is reached), then "execute on stop run" gets executed on your client.

An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
gets check by your frontend and therefore on the client computer, but use 
"/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
therefore executed on the server computer.

Getting a consistent behaviour (like always executing scripts on the server) would
require a major rework of the run transition framework with probably many undesired
side-effects, so lots of debugging work.

Stefan

mserver and script execution

Hi Stefan,

> Indeed that could well be (and is certainly not intended like that). I checked the code
> and found that "execute on start run" and "execute on stop run" are called inside
> cm_transition(). That means they are executed on the computer which calls cm_transition().
> If you use mhttpd and start a run through the web interface, then mhttpd runs on your
> server and "execute on start run" gets executed on your server. If you stop the run
> by your frontend running on the client machine (like if a certain number of events 
> is reached), then "execute on stop run" gets executed on your client.

ok, this is clear to me...

> An easy way around would not to use "/Equipment/Trigger/Common/Event limit" which
> gets check by your frontend and therefore on the client computer, but use 
> "/Logger/Channels/0/Settings/Event limit" which gets checked by the logger and
> therefore executed on the server computer.

we never used "/Equipment/Trigger/Common/Event limit" but we always used
"/Logger/Channels/0/Settings/Event limit"...

btw I did some tests and I understand that this issue is related to 'deferred transition'
on frontend. Indeed I disabled deferred transition on frontend side and now script
execution is carried out always on MIDAS server;

Cheers,
Gennaro

mserver and script execution

> btw I did some tests and I understand that this issue is related to 'deferred transition'
> on frontend. Indeed I disabled deferred transition on frontend side and now script
> execution is carried out always on MIDAS server;

Ah, that's clear now. In a deferred transition, the frontend finally stops the run (after the 
condition is given to finish). Since the client calls cm_transition(), the script gets executed on 
the client. Changing that would be a rather large rework of the code. So maybe better call a 
script which executes another script via ssh on the server.

Stefan

mserver buffer overrun and crash

There is a memory allocation bug in the mserver.

ALIGN8() was missing when receiving events from the event socket and data buffer 
was allocated 4 bytes too short. but only for some received events and only in 
very unlucky sequence of received events. result was a rare but obnoxious crash 
of fevme frontend in alpha-2 at CERN. (we do not see any crash from this in 
alpha-g or anywhere else, the best I can tell).

fixed in commit 4dc06ba47ff7caa5251fd8c48d8533f35799f3a6.

If you use the mserver, please update to this commit or apply following patch in 
midas.cxx:

-   int bufsize = sizeof(INT) + event_size;
+   int bufsize = sizeof(INT) + total_size;

K.O.

mserver buffer overrun and crash

> There is a memory allocation bug in the mserver.

Fix for this problem introduced a new problem, an infinite loop in bm_flush_cache, 
bitbucket bugs https://bitbucket.org/tmidas/midas/issues/339/infinite-loop-in-
mserver-due-to-mfes and https://bitbucket.org/tmidas/midas/issues/331/stuck-
semaphore-of-system-buffer

This is now fixed and the buffer write cache logic and size was rejigged
according to calculations in https://daq00.triumf.ca/elog-midas/Midas/2401

Event buffer write cache (as set via ODB Equipment/Common and via 
bm_set_cache_size()) now take 2 possible values:
0 - write cache is disabled and
MIN_WRITE_CACHE_SIZE - (10 Mbytes) minimum permitted cache size
bigger cache size values are permitted, up to buffer_size/3, but probably not useful 
if my calculations are right.
smaller cache size values are generally not useful, if my calculations are right.

mfe.c and tmfe c++ frontends updated to request the new write cache size by default.

if events are getting stuck in the write cache for too long, instead of reducing the 
cache size, one should increase frequency of bm_flush_cache() calls (1/sec by 
default).

commit 373bcc3ab7f83c3c7bf6c051c237de043a982502

K.O.

mserver linking fails when using shared library

I have experienced a problem building MIDAS from the head of the SVN repository (rev 4458) when 
specifying the shared library flag. Whie the shared library appears to compile and link OK, the 
subsequent compilation of mserver fails as follows:

$ make ROOTSYS= NEED_SHLIB=1

<... snipped some lines ...>

ld -shared -o linux/lib/libmidas.so linux/lib/midas.o linux/lib/system.o linux/lib/mrpc.o 
linux/lib/odb.o linux/lib/ybos.o linux/lib/ftplib.o linux/lib/mxml.o linux/lib/history_midas.o 
linux/lib/history_sql.o linux/lib/history.o linux/lib/alarm.o linux/lib/elog.o linux/lib/strlcpy.o -lutil -
lpthread -lz -lc
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mana.o 
src/mana.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o 
linux/lib/cnaf_callback.o src/cnaf_callback.c
cc -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o linux/lib/mfe.o 
src/mfe.c
g++ -Dextname -DMANA_LITE -c -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -
Llinux/lib -DINCLUDE_FTPLIB   -D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-
unused-function -o linux/lib/fal.o src/fal.c
cc -g -O2 -Wall -Wuninitialized -Iinclude -Idrivers -I../mxml -Llinux/lib -DINCLUDE_FTPLIB   -
D_LARGEFILE64_SOURCE -DHAVE_ZLIB -DOS_LINUX -fPIC -Wno-unused-function -o 
linux/bin/mserver src/mserver.c -lmidas -Wl,-rpath,/usr/local/lib -lutil -lpthread -lz
/usr/bin/ld: linux/bin/mserver: hidden symbol `__dso_handle' in /usr/lib/gcc/x86_64-redhat-
linux/4.1.2/crtbegin.o is referenced by DSO
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make: *** [linux/bin/mserver] Error 1

Having googled the error, it appears to be solved by modifying the linker statement for the shared 
library in the Makefile at line 464 to use g++ rather than ld:

463c463
< 	ld -shared -o $@ $^ $(LIBS) -lc
---
> 	$(CXX) -shared -o $@ $^ $(LIBS) -lc

Presumably this is because g++ knows better how to link in the appropriate system libraries required 
for some of the recently added C++ code? 

This was on Scientific Linux SL5.2 x86_64, gcc version 4.1.2, glibc version 2.5-24.

Tim

mserver network routing fix

mserver update svn rev 4625 fixes an anomaly in the MIDAS RPC network code where
in some network configurations MIDAS mserver connections work, but some RPC
transactions, such as starting and stopping runs, do not (use the wrong network
names or are routed over the wrong network).

The problem is a possible discrepancy between network addresses used to
establish the mserver connection and the value of "/System/Clients/xxx/Host"
which is ultimately set to the value of "hostname" of the remote client. This
ODB setting is then used to establish additional network connections, for
example to start or stop runs.

Use the client "hostname" setting works well for standard configurations, when
there is only one network interface in the machine, with only one IP address,
and with "hostname" set to the value that this IP address resolves to using DNS.

However, if there are private networks, multiple network interfaces, or multiple
network routes between machines, "/System/Clients/xxx/Host" may become set to an
undesirable value resulting in asymmetrical network routing or complete failure
to establish RPC connections.

Svn rev 4625 updates mserver.c to automatically set "/System/clients/xxx/Host"
to the same network name as was used to establish the original mserver connection.

As always with networking, any fix always breaks something somewhere for
somebody, in which case the old behavior can be restored by "setenv
MIDAS_MSERVER_DO_NOT_USE_CALLBACK_ADDR 1" before starting mserver.

The specific problem fixed by this change is when the MIDAS client and server
are on machines connected by 2 separate networks ("client.triumf.ca" and
"client.daq"; "server.triumf.ca" and "server.daq"). The ".triumf.ca" network
carries the normal SSH, NFS, etc traffic, and the ".daq" network carries MIDAS
data traffic.

The client would use the "server.daq" name to connect to the server and this
traffic would go over the data network (good).

However, previously, the client "/System/Clients/xxx/Host" would be set to
"client.triumf.ca" and any reverse connections (i.e. RPC to start/stop runs)
would go over the normal ".triumf.ca" network (bad).

With this modification, mserver will set "/System/Clients/xxx/Host" to
"client.daq" (the IP address of the interface on the ".daq" network) and all
reverse connections would also go over the ".daq" network (good).

P.S. This modification definitely works only for the default "mserver -m" mode,
but I do not think this is a problem as using "-s" and "-t" modes is not
recommended, and the "-s" mode is definitely broken (see my previous message).

svn rev 4625
K.O.

Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the 
mserver in messages. Mostly we see

11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted

which happens 2-3 times per minute. 

We have also been seeing occasional dropped rpc connections to our frontends, which could be related. 

The version we were running with previously was ~1 year old, and we have just updated to the newest version 
on bitbucket. 

Thanks,
Wes

> Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the 
> mserver in messages. Mostly we see
> 
> 11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted
> 
> which happens 2-3 times per minute. 
> 
> We have also been seeing occasional dropped rpc connections to our frontends, which could be related. 
> 
> The version we were running with previously was ~1 year old, and we have just updated to the newest version 
> on bitbucket. 

Hmm... usually mserver will not restart automatically, maybe you have set it to autorestart on ODB (/programs/mserver/auto_restart 
set to "y").

It would be unusual for the main mserver program to crash, to debug it, you will need to run it in a terminal
and see if there is any error messages. Even better to run it in a terminal inside "gdb" and capture the stack trace
when it crashes.

Anyhow, crash of main mserver will not cause "dropped rpc connections" to clients - this would require for their
individual mserver subprocesses to crash. Such crashes would be highly unusual and are harder to debug.

Perhaps for the crashes you see there is some error messages in midas.log?

K.O.

mstrcpy, was: strlcpy and strlcat added to glibc 2.38

for the record, as ultimate solution, strlcpy() and strlcat() were wholesale 
replaced by mstrlcpy() and mstrlcat(). this should fix "missing strlcpy()" 
problem for good and make midas more consistent across all platforms (including 
non-linux, non-unix). on my side, I continue replacing these function with proper 
std::string operations. K.O.

multithreaded run transitions work!

As of commit
https://bitbucket.org/tmidas/midas/commits/dfa5fb1a93cae11a2960d441044c7fd277e1f0ec
(we are now liberated from the tyranny of SVN IDs),
multithreaded run transitions seem to work reliably and are now the default in mhttpd.

In odbedit and mtransition, the default is the old sequential transitions. "-m" and "-a" flags activate 
the new multithread run transitions. mhttpd now uses the equivalent of "mtransition -a" 
(mutithreaded asynchronous).

This is one of the new features implemented by Stefan while at TRIUMF.

K.O.

(We hope to write up all the recent changes soon).

30 Jun 2004

Piotr Zolnierczuk

mvme167 problems

Hi,
 I am really puzzled: I am running the very same as far as sources
are concerned (Dec 12, 2003 snapsot) midas frontend (miniexp + camacnul)
on two different machines (and the same trusted private network):

1) one is an ancient Pentium/100 MHz laptop with RedHat Linux 7.3 and 
2) another one is event more ancient MVME167 25MHz running VxWorks 5.4.2

The front end on my Linux PC works just fine, whereas on the MVME167
I get intermittent crashes (most often at the end of the run).
[Correction: the crashes happen, I think, when the frontend wants 
to update the ODB]

The crashes happen in db_set_record routines

Any ideas what might be wrong? 
Except that MVME167 is a piece of ...#@!% 

Piotr

30 Jun 2004

Piotr Zolnierczuk

mvme167 problems

A followup: I traced back the problem to version 1.9.2.

Version 1.9.1 does not have this problem but 1.9.2 does. 
For now I stick with 1.9.1

Piotr

mvodb functionality to get the 'LastWritten' property of a key



I want to read data from the ODB with the mvodb interface in one of my frontends, it's useful to know how old that data is, so I prototyped functionality in a pull request to mvodb:

https://bitbucket.org/tmidas/mvodb/pull-requests/2/add-readkeylastwritten-function-to-extract

mvodb functionality to get the 'LastWritten' property of a key

> I want to read data from the ODB with the mvodb interface in one of my frontends, it's useful to know how old that data is, so I prototyped functionality in a pull request to mvodb:
> 
> https://bitbucket.org/tmidas/mvodb/pull-requests/2/add-readkeylastwritten-function-to-extract

Thanks for raising that point. I realized that the odbxx API was also missing that functionality, so I added it: 
https://bitbucket.org/tmidas/midas/commits/6991a92c19292eaf67721cb80f182c61db077f45

Best,
Stefan

Goto page Previous 1, 2, 3 ... 131, 132, 133 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5