Back Midas Rome Roody Rootana
  Midas DAQ System, Page 108 of 138  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Authordown Topic Subject
  2322   26 Jan 2022 Konstantin OlchanskiBug ReportWritting MIDAS Events via FPGAs
> 
> > Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes? 
> > I do not understand what you mean by "did not get the data into midas". You create events
> > and send them to a midas event buffer and you do not see them there? With mdump?
> > Do you see this both connected locally and connected remotely through the mserver?
> 
> I simply don't see the event counter counting up and I also don't see them using mdump. No logs, no dumps and no crashes - every is quite. I only tested it locally.
>

If you are connected locally (no mserver), I want to know the value returned by bm_send_event(). Simplest
if you edit mfe.c and everywhere it calls bm_send_event() and rpc_send_event(), print the returned value.

It would be very interesting to see if bm_send_event() returns 1 (SUCCESS), but the event vanishes
without a trace.

Before you do that, try something simpler:

Run "mdump -s -d", it will print some event buffer internals.

Watch to see if any data pointers change when you send your events ("wp", "rp", etc).

If nothing changes at all, then we are not sending anything (fault is in your code or on mfe.c).

If you see "wp" counting up, then we definitely write your events into the buffer and mdump & mlogger should see them.

But there is some funny logic for event_id and trigger_mask and it is worth checking their
values. For a good test, set event_id=1 and trigger_mask=0x1. There might be trouble if either is set to zero.

K.O.
  2323   26 Jan 2022 Konstantin OlchanskiBug ReportUnknown Error 319 from client
> I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops 
> after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on 
> localhost.

actually run never starts.

> 11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size 
> mismatch for "/" (expected size: 1, size in ODB: 41920)

this is the error that causes run start to fail. for reasons unknown
your frontend is trying to do a db_get_record() from "/" (ODB root top directory).

if this is an mfe.c frontend, I do not think I have ever seen it do something
like this.

so, a puzzle.

K.O.
  2324   26 Jan 2022 Konstantin OlchanskiForummhttpd error
> > Enable IPv6                     y
> 
> Probably the IPv6 problem, see here elog:2269
> 
> I asked to turn off IPv6 by default, or at least mention this in the documentation,
> but unfortunately nothing happened.

But IPv4 and IPv6 code is completely separate, if IPv6 bind fails, IPv4 should still 
work.

This is all very strange.

It does not help that the OP does not say in which way things do not work,
"the server is not accessible from other machines" is not an error message
reported by any browser, and we do not know what URL he is using
to access mhttpd - http: or https:

Also he is enabling the "insecure" port 8081, I am pretty sure the documentation
is pretty clear, either use the secure https port or the insecure port,
but not both at the same time.

In any case, I see current version of mongoose have removed support
for password files, so all this stuff will likely become reworked
and at the end mhttpd will only listen to localhost ports. To make it "accessible
to other machines", one will have to use the apache https proxy. (or mtpcproxy from 
midas).

K.O.
  2329   07 Feb 2022 Konstantin OlchanskiForumMidasWiki moved from ladd00 to daq00.triumf.ca and updated to MediaWiki 1.35
MidasWiki moved from ladd00 (obsolete SL6) to daq00.triumf.ca (Ubuntu LTS 20.04) 
and updated from obsolete MediaWiki LTS 1.27.7 to MediaWiki LTS 1.35, supported 
until mid-2023, see https://www.mediawiki.org/wiki/Version_lifecycle

Old URL https://midas.triumf.ca and https://midas.triumf.ca/MidasWiki/...
redirect to new URL https://daq00.triumf.ca/MidasWiki/index.php/Main_Page

All old links and bookmarks should continue to work (via redirect).

To report problems with this MediaWiki instance and to request
any changes in configuration or installed extensions, please reply to this
message here.

K.O.
  2331   08 Feb 2022 Konstantin OlchanskiBug FixODBINC/Sequencer Issue
Please post the output of odbedit "ls -l" for /eq/ar.../variables. (you posted the 
variable name as an image, and I cannot cut-and-paste the odb path!). BTW data size 4 is 
correct, 4 bytes for INT32/UINT32/FLOAT. For DOUBLE it should be 8. For you it prints 32 
and this is wrong, we need to see the output of "ls -l".
K.O.
  2333   09 Feb 2022 Konstantin OlchanskiBug FixODBINC/Sequencer Issue
> 
> [local:mu3eMSci:S]/>cd Equipment/ArduinoTestStation/Variables
> [local:mu3eMSci:S]Variables>ls -l
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> _T_                             FLOAT   1     4     1h   0   RWD  20.93
> _F_                             FLOAT   1     4     1h   0   RWD  12.8
> _P_                             FLOAT   1     4     1h   0   RWD  56
> _S_                             FLOAT   1     4     1h   0   RWD  5
> _H_                             FLOAT   1     4     60h  0   RWD  44.74
> _B_                             FLOAT   1     4     60h  0   RWD  18.54
> _A_                             FLOAT   1     4     1h   0   RWD  14.41
> _RH_                            FLOAT   1     4     1h   0   RWD  41.81
> _AT_                            FLOAT   1     4     1h   0   RWD  20.46
> SP                              INT16   1     2     1h   0   RWD  10
> 

This looks okey, so we still have no explanation for your error. Please post your sequencer 
script?

K.O.
  2350   03 Mar 2022 Konstantin OlchanskiInfozlib required, lz4 internal
as of commit 8eb18e4ae9c57a8a802219b90d4dc218eb8fdefb, the gzip compression
library is required, not optional.

this fixes midas and manalyzer mis-build if the system gzip library
is accidentally not installed. (is there any situation where
gzip library is not installed on purpose?)

midas internal lz4 compression library was renamed to mlz4 to avoid collision
against system lz4 library (where present). lz4 files from midasio are now
used, lz4 files in midas/include and midas/src are removed.

I see that on recent versions of ubuntu we could switch to the system version 
of the lz4 library. however, on centos-7 systems it is usually not present
and it still is a supported and widely used platform, so we stay
with the midas-internal library for now.

K.O.
  2351   03 Mar 2022 Konstantin OlchanskiInfomanalyzer updated
manalyzer was updated to latest version. mostly multi-threading improvements from 
Joseph and myself. K.O.
  2354   15 Mar 2022 Konstantin OlchanskiBug Fixmhttpd ipv6 bind should be fixed now
Something changed after my initial implementation of ipv6 in mhttpd
and listening to ipv6 http/https connections was broken.

It turns out, I do not need to listen to both ipv4 and ipv6 sockets,
it is sufficient to listen to just ipv6. ipv4 connections will also
magically work. see linux kernel "bindv6only" sysctl setting: https://sysctl-
explorer.net/net/ipv6/bindv6only/

The specific bug in mhttpd was to bind to ipv4 socket first, subsequent bind() to ipv6 socket 
fails with error "Address already in use", which is silent, not reported by the mongoose library. 
For reasons unknown, this does not happen to bind() to "localhost" aka ipv6 "::1".

Apparently other web servers (apache, nginx) are/were also affected by this problem. 
https://chrisjean.com/fix-nginx-emerg-bind-to-80-failed-98-address-already-in-use/

First fix was to bind to ipv6 first (success) and to ipv4 second (fails). Second fix
committed to git is to only listen to ipv6.

This works both on MacOS and on Linux. Linux reports the listener socket is "tcp6", MacOS reports 
the listener socket as "tcp46":

4ed0:javascript1 olchansk$ netstat -an | grep 808 | grep LISTEN
tcp46      0      0  *.8081                 *.*                    LISTEN     
tcp6       0      0  ::1.8080               *.*                    LISTEN     
tcp4       0      0  127.0.0.1.8080         *.*                    LISTEN     
4ed0:javascript1 olchansk$ 

K.O.
  2359   22 Mar 2022 Konstantin OlchanskiBug Fixfix for event buffer corruption in bm_flush_cache()
multithreaded frontends have an unusual event buffer corruption if the write 
cache is enabled. For a long time now I had to disable the write cache on
all multithreaded frontends in alpha-g, I was hitting this bug quite often.
(somehow I do not see this problem reported on bitbucket!)

last week I reworked the multithread locking of event buffers, in hope
that this bug will turn up, but nope, all mutexes and locking look okey,
except for a number of unrelated problems (races against bm_close_buffer()
were the most troublesome to fix).

but finally found the trouble.

first, some background.

because multiprocess locking is expensive, frontends that generate
a large number of small events can use the write cache to reduce
this overhead. instead of locking the shared memory event buffer for
each event, events are accumulated in the write cache, and periodic
calls to bm_flush_buffer() flush them to shared memory. For best effect,
one should increase the size of the write cache until lock rate is around
10/second.

it turns out introduction of multithreading broke bm_flush_cache().

it does this:

- int ask_free = pbuf->wp; // how much data we have in the write cache now
- call bm_wait_for_free_space(ask_free); // ensure we have this much free shared 
memory space
- copy pbuf->wp worth if events to shared memory

looks okey at first sight. this is what happens to trigger the bug:

- int ask_free = pbuf->wp; // ok
- call bm_wait_for_free_space(ask_free); // ok, but if shared memory is full, it 
will go to sleep waiting for free space
- in the mean time, another thread calls bm_send_event(), this adds more data to 
the write cache, moves pbuf->wp
- bm_wait_for_free_space() eventually returns
- copy pbuf->wp worth of data to shared memo KABOOM! shared memory corruption!

we just overwrote some unlucky event in shared memory: we only have "ask_free"
free bytes available, but pbuf->wp moved and now has more data,
and it does not fit, and there is no check against it.

of course in the single threaded world this bug did not exist, there was no 
other thread to call bm_send_event() while bm_flush_cache() is sleeping.

the obvious fix is to ask for more free space if cached data does not fit.

this is now implemented on the branch feature/buffer_mutex. after a bit more 
tested I will merge it into develop.

so that's it?

not so fast. there was more going on. as described, the bug will only happen
when shared memory event buffer is full. (i.e. rarely or never). It turns
out the old version of thread locking code was defective and permitted
a race between bm_send_event() and bm_send_event() in another thread:

thread 1: while (1) { bm_send_event(very small event); }
thread 2:
-> bm_send_event(very big event)
-> no space in the cache for the very big event, call bm_flush_cache()
-> bm_flush_cache() asks bm_wait_for_free_space() to make space for cached data
-> this was done with write cache mutex released (mistake!)
-> at the same time bm_send_event(very small event) added 1 more small event to 
the cache
-> back in bm_flush_cache() write cache mutex is locked correctly, we copy 
cached data to shared memory and again KABOOM because we now have more data than 
we asked free space for.

So in the original implementation, corruption was possible even when share 
memory event buffer was pretty much empty.

The reworked locking code closed that loop hole - bm_flush_cache() is now
called with write cache locked, and bm_send_event() from another thread
cannot confuse things, unless shared memory buffer is full and we go to
sleep inside bm_wait_for_free_space(). And this is now fixed, too.

K.O.
  2362   23 Mar 2022 Konstantin OlchanskiBug Fixmhttpd ipv6 bind should be fixed now
> Something changed after my initial implementation of ipv6 in mhttpd
> and listening to ipv6 http/https connections was broken.

Reporting that mhttpd ipv6 works at CERN. The hostnames for ipv6 connections
come back as alphacpc05.ipv6.cern.ch instead of alphacpc05.cern.ch
so both are added to the http "insecure port" whitelist.

K.O.
  2363   23 Mar 2022 Konstantin OlchanskiBug Fixfix for event buffer corruption in bm_flush_cache()
I confirm, there is no problem in single-threaded programs, and
there is no problem if all bm_send_event() and bm_flush_cache() are called
from the same thread.

> ... instead of struggling with all your locks.

it is better to have midas fully thread safe. ODB has been so for a long time,
event buffer partially (except for this bug), now fully.

without that the problem still exists, because in many frontends,
bm_flush_buffer() is called from the main thread, and will race
against the "bm_send_event() thread". Of course you can do
everything on the main thread, but this opens you to RPC timeouts
during run transitions (if you sleep in bm_wait_for_free_space()).

also the SYSMSG buffer is subject to the same bug. cm_msg() is of course
safe to call from anywhere, but cm_msg_flush_buffer() and cm_periodic_tasks()
can be called from any thread, and they issue bm_send_event(SYSMSG),
and there will be mysterious crashes and SYSMSG corruptions, probably
only during message storms, but still!

K.O.
  2364   23 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
the mhttpd bug should be fixed now (branch feature/buffer_mutex).

simplest way to reproduce:

wget http://localhost:8080/
quickly ctrl-C it
wget http://localhost:8080/
inside mhttpd (by hook or crook) observe that the second wget got the data meant for the first wget.

if you cannot ctrl-C the first wget quickly enough, put a sleep somewhere in the worker thread (in 
mongoose_write(), I think).

this is what happens.

1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
(corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)

2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
nc pointer is no longer stale, but points to 2nd wget's connection.

so we think we are clever and we check the socket file descriptors. but same thing
happens there, too. if 1st wget was file descriptor 7, it is closed, (1st wget worker now has
a stale file handle), then reopened for the 2nd wget, per POSIX, we get back the same
file descriptor 7. 1st wget worker now has the file handle for the 2nd wget tcp socket and
the famous test/crash for "sending data to wrong socket" is defeated.

now, worker thread for the 1st wget wants to send a reply, it has a valid nc pointer (points to 2nd wget's
mg_connection object) and a valid file descriptor (points to 2nd wget's tcp socket),
reply meant for the 1st wget is successfully sent to the 2nd wget, 2nd wget finishes, it's socket
is closed, mg_connection object is free'ed. Now the worker thread for the 2nd wget has stale
connection info, but this is okey, mongoose does not find a matching connection, 2nd wget
worked thread reply goes nowhere, thread finishes silently (no memory leaks here, I checked).

so, connection for 2nd wget completely impersonates the closed connection of 1st wget (I guess I could
check the full socket address info, remote ip address, remote port number, etc, but...)

in practice, this bug does not happen often because modern browsers tend to keep tcp sockets open
for very long time. (not sure about sundry web proxies, etc).

solution of course is very simple. match worker thread data to mongoose mg_connection objects
using our own connection sequential number, which are unique and very easy to keep track
of through the mongoose event handler. all this mess runs in the main thread,
so no locking trouble here, small blessing.

K.O.
  2368   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> > 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> > (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> > 
> > 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> > but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> > nc pointer is no longer stale, but points to 2nd wget's connection.
> 
> Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
> mistakenly re-used by the next thread.
> 

My description was unclear. I will try better now.

When http replies are generated by worker threads, matching of reply to mg_connection is done
by checking the address of the mg_connection object. (mongoose itself unhelpfully offers
to send the reply to every mg_connection, see the responder to mg_broadcast() messages).

This works for open/active connections, addresses of all mg_connections are unique.

But if connection is closed and a new connection is opened, the address is reused (by malloc()/free()
reusing memory blocks or by mongoose using a pool of mg_connection objects, does not matter).

So matching http reply to mg_connection using only address of mg_connection can match the wrong connection.

(contents of mg_connection object does not matter, only address is used by matching. so memzero() of
mg_connection object does not help).

I saw this during my testing - wrong data was sent to wrong browser often enough - but did
not understand that the above problem is happening.

Because I was unable to reliably reproduce the problem, I could not debug it. I tried to add
a check for the tcp socket file descriptor number, in case there is a straight bug or multithread race
or simple memory corruption. This replaced "we sent wrong data to wrong browser, poisoned browser
cache, confused the user" with a crash. This "fix" seemed effective at the time.

Maybe I should mention browser cache poisoning again. What happened is html pages and rpc replies
were returned as responses to load things like CSS files, these bad responses are cached by the browser
pretty much forever, so all subsequent midas pages will look wrong (bad css!) forever, until
user manually clears browser cache. reload of page did not help, restart of browser did not help (I think).

So a very bad bug.

Unfortunately, the check for file descriptor was not effective because file descriptors are also
reused. And I did see wrong data returned by mhttpd, but even more rarely. And everybody (myself
included) complained about mhttpd crashes.

Now, matching of responses to connections is done by connection sequential/serial number,
which is unique 32-bit counter. Mismatch of reply to connection should not happen again.

P.S. Latest version of the mongoose web server library does not help with this problem,
the example code for matching reply to connection in their multithread example looks bogus:
https://github.com/cesanta/mongoose/blob/master/examples/multi-threaded/main.c

K.O.
  2369   24 Mar 2022 Konstantin OlchanskiBug Fixfix for event buffer corruption in bm_flush_cache()
> Thanks for the investigation. Back in 2020, we had some issues
> of losing data between the system buffer and the logger writing them
> to disk (https://daq00.triumf.ca/elog-midas/Midas/1966). This was polled equipment
> but we had a multithreaded FE running at the same time. Could this be related to the same problem?

I think we will have to follow up on your problem 1966 separately.

I think this bug cannot lose events. Writing events to the write cache has correct
locking, no loss here. writing write cache to shared memory has correct locking,
no loss there. the bug will cause the *next* event in the event buffer to be overwritten,
this will be detected by most programs as shared memory corruption and everybody
will quit. (mhttpd, mserver, odbedit will probably survive).

I guess there could be unlucky corruption that looks like nothing was corrupted,
but this will affect only a few events right at the shared memory read/write
pointer, it so happens that they are the oldest events in the buffer and likely
mlogger already wrote them to disk. mlogger read pointer will likely follow
the shared memory write pointer closely, well ahead of the shared memory
read pointer which always pointe to the older event and where this bug's corruption
will happen.

So no, I do not think this bug can cause event loss between frontend and mlogger.

K.O.
  2370   24 Mar 2022 Konstantin OlchanskiBug Reportdata missing in runXXXXXX.mid
> > It would be good to pin point there the data is lost. This is the sequence:
> > 
> > frontend user code -> mfe.c code -> SYSTEM buffer -> mlogger -> disk
> > 
> > To see if correct data arrives to the SYSTEM buffer, run:
> > mdump -z SYSTEM
> > 
> > To see if mlogger is receiving events from the SYSTEM buffer, run:
> > mlogger -v ### mlogger should report all events, history and data
> > 
> > To see if mlogger writes events to disk, examine the disk file (in this case, you already did, data is not there).
> > 
> > I would guess that your data does not make it out from the frontend (mdump shows "nothing"),
> > if data were to arrive into the SYSTEM buffer, it would make it to disk, unless
> > mlogger is misconfigured (but you already checked that).
> > 
> > If you have trouble with the frontend framework code, you can try to switch from the mfe.c frontend
> > to the newer c++ tmfe frontend (see progs/fetest_tmfe.cxx and progs/fetest_tmfe_thread.cxx).
> > 
> > K.O.
> 
> Good evening
> 
> I tried to reproduce the behavior in a very simple FE but it did not work out.
> The next thing for me would be to take the FE that is producing this behavior,
> replace all the device communication and data with dummies. If the problem is still
> there I would start to simplify as much as possible. 
> 
> Following the inputs of KO, I pin-pointed the data loss. The system buffer still
> gets the data but the mlogger does not write the data event. Then of course the data
> is also not anymore present in the data file. Therefore, I checked the logger
> settings again, Event ID and Trigger Mask still -1. Nothing else, at least from my point of view,
> that is misconfigured. Nevertheless, if it helps I can send my ODB settings. 
> 
> When doing the tests just before I found something else that probably
> can give a hint to the problem. The data is only lost if the time between
> two runs is long (a few seconds). As an example: If I run a sequence with a loop
> and after the FE stops the run the loop ends and the next run is started automatically,
> then only the first run has no data, which is the one after a longer time of
> no data taking. When I add a "WAIT Seconds 5" after the run before starting
> the next, not data is written to the disk for any run. I also found this
> once when adding a sleep(1) at the end of the FE readout function
> but back then did not think about it any further. 
> 

Looks like this problem fell into the covid crack.

As far as I know, MIDAS does not lose any events between bm_send_event() and the shared memory
buffer. It does not lose any events in the mlogger (unless the "event request" is misconfigured).
(there is lots of opportunity to lose events in complicated frontends).

If you have some evidence otherwise, I would very much like to hear about
it and I want to fix all problems that cause it.

In your previous report I was under the impression that you lose random events here and there,
but your latest report is about mlogger not writing anything at all.

Which case is it?

If you can definitely say that all your events make it to the SYSTEM buffer
but mlogger sometimes does not see some of them and sometimes does not see all of them,
we should look very closely at bm_receive_event() and mlogger itself.

In the case where mlogger is not seeing any events at all (output file is empty), as this is
happening, I would like to see the output of mdump (to confirm events are written to SYSTEM
buffer with correct event_id and trigger_mask) and the output of (say)
"manalyzer_test.exe --dump run01161.mid.lz4" on your output file.

If the output is very long, you can email it to me directly instead of posting it here.

K.O.
  2373   24 Mar 2022 Konstantin OlchanskiBug Reportdata missing in runXXXXXX.mid
> One idea: we should have a look at mlogger::close_channels().
> There the SYSTEM buffer is emptied through the cm_yield() call.
> Instrumenting this with some debugging code will enlighten us.

right. this will "last few events are lost at the end of run".

but that code in the mlogger was not touched in years, if there is a problem there,
we would have seen it by now, most experiments check that the number
of events in the data file is same as number of triggers generated, both
numbers are shown on the midas status page.

> Another possible problem: If the frontend requested to be notified for a run stop AFTER the logger, then the problem might happen: Logger closes file, and THEN the frontend flushes events ending up in the SYSTEM buffer and being logged at the beginning of the next run. The mfe.cxx framework takes care of this by calling 
> cm_register_transition(TR_STOP, 500);

default sequence, both mfe.c frontend and c++ tmfe frontend:

start of run:
- mlogger first (configure history, open data file)
- frontends last
- (if any frontend fails, TR_STARTABORT is sent to mlogger to close the output file and "undo" the run start)

end of run:
- frontends first (must not send any events after after processing the TR_STOP RPC call, inside the TR_STOP handler, bm_flush_cache() takes care of the write cache)
- mlogger last
- (if any frontend fails, failure is ignored, run stops regardless)

wrong order will be only if they manually change it, and whatever order
they set, you see it on the midas transition page (and mtransition -v and odbedit stop now -v, etc).

K.O.
  2374   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> As for the browser cache problem: This Chrome extension is your friend ...

for google chrome, it is easy, open the javascript debugger (left-click "inspect"),
the reload button becomes a left-click menu, one left-click option is "clear cache and reload".
(there is no button for "clear cookies and reload", re recent elog cookie problem).

but this does not help me personally any. if midas web pages get confused, I will also get confused, too,
and I will spend hours debugging mhttpd before thinking "hmm... maybe I should clear the browser cache!"

not sure about firefox, safari, microsoft edge and opera. if I ever need it, I google it.

K.O.
  2377   29 Mar 2022 Konstantin OlchanskiBug Fixmdump can read lz4 and bz2 files now
I converted mdump file i/o from older mdsupport library to newer midasio library 
and it can now read .mid, .mid.gz, .mid.lz4 and .mid.bz2 files. Output should be 
identical to what it printed before, if you see any differences, please report 
them here or on bitbucket. K.O.
  2378   30 Mar 2022 Konstantin OlchanskiBug Fixerroneous removal of odb clients, fixed
commit https://bitbucket.org/tmidas/midas/commits/b1fe21445109774be3f059c2124727b414abf835
made on 2022-02-21 fixed a serious bug in ODB.

a multithread race condition against an incorrectly updated shared variable caused removal of 
random clients from ODB with error message:

My client index %d in ODB is invalid: out of range 0..%d. Maybe this client was removed by a 
timeout, see midas.log. Cannot continue, aborting...

the race is between db_open_database() in one program (executed when any midas program starts) and 
db_get_my_client_locked() in all running midas programs.

as long as no midas programs are started (db_open_database() is not executed), this bug does not 
happen.

if i.e. odbedit is executed very often, i.e. from a script, probability of hitting this bug becomes 
quite high.

fixed now.

K.O.
ELOG V3.1.4-2e1708b5