Back Midas Rome Roody Rootana
  Midas DAQ System, Page 95 of 150  Not logged in ELOG logo
    Reply  17 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver 
Hi, Frederick, this is my personal opinion on the slow controls hv classes, I have
used them a couple of times and I found them full of little buglets like this,
plus some incomplete functions, plus some missing features, plus it is all
written in C trying to do object oriented programming. On the balance my opinion
is that it is less work to write a high voltage control program in C++ from scratch
using the regular midas frontend infrastructure compared to having to understand
the hv class driver, write the missing bits, fix the little buglets, debug
the crashes in the C string handling, and what not. (For example I had to debug
mysterious failures to pass float and double values through the C stdarg interface,
there are more fun things to do out there).

K.O.

> bug in init
> -----------
> 
> I used the lv.c class driver, combined with a custom device driver, to control 
> our Keithley2611B source meter. This to set negative voltage on Si detectors.
> 
> In the 'init' routing, the class driver sets the hv:
> 
>   hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
> 
> This fails for negative voltage, as it sets the (negative) voltage limit, instead 
> of the demand voltage. A simple 'fabs' solves this.
> 
> suggestion for 'idle'
> ---------------------
> 
> I let the device do the ramping, not the driver. This also means I have to reset 
> the state of the device (current limit) after ramping. The easiest way to to 
> this, is using CMD_IDLE of the device driver. This is currently not done in the 
> hv.c class driver.
    Reply  21 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver 
> 
> The original idea behind the hv driver is that all voltages in the ODB and the class driver are
> positive. If you have a negative power supply, then the voltage is inverted at the device
> driver level. That's why you have MIN and MAX in the class driver.
> 

This rings a bell. I used the hv class driver to write a frontend for the L1440 mainframe (negative voltage),
on ODB it will be positive values, when writing to the device I had to add a minus sign,
and when reading back they came back negative and I had to add an fabs() in the comparison
between readback and demand.

Persons with bipolar power supplies need not apply.

K.O.
Entry  21 Nov 2017, Konstantin Olchanski, Info, MIDAS support on el5? 
It has been reported that the current midas release candidate does not build on el5 linux (SL/RHEL/CentOS-5).

According to Red Hat, el5 is end-of-life, last SL 5 (SL5.11) was done in 2014, so this linux is very old. Also as it happens, I do not have access to any 
el5 machines to check if midas builds or runs (but this can be fixed).

https://www.scientificlinux.org/downloads/sl-versions/sl5/
https://access.redhat.com/support/policy/updates/errata

On the midas web page (https://midas.triumf.ca) we do not explicitly state which versions of which linux we definitely support. Most other open-
source projects only support current major linux distributions, hardly anybody supports end-of-life linuxes such as el5. Some projects do not even 
support recent linuxes still widely in use (ROOT6 does not build on stock el6 and there is no KDE5 for el7).

So back to midas. Support for different operating systems comes down to:

1) C/C++ language support. We still use el6 (GCC 4.4.7), so use of c++-11 language features should be avoided
2) operating system features support:
a) sysv semaphores (sysv shared memory no longer used, cannot be used on macos)
aa) (macos also is missing parts of the sysv semaphore api, such as "wait for lock, with timeout", we are using an ugly work-around)
b) posix shared memory with mprotect() & co
c) posix mutexes, including recursive-type mutexes (this seems to be the problem on el5)
d) bsd networking (need to migrate from select() to poll() and from gethostbyname() to getaddrinfo() & co (for IPv6 support))

Not all of these operating system functions are required for all of midas. Running mhttpd and mlogger requires
pretty much everything. Running just a frontend connected to midas through the mserver requires the least features,
just the networking is enough, I think.

Obviously we cannot support midas in perpetuity on all versions of all operating systems, once I do not have
access to a machine, I cannot even check that midas builds and that it runs the basic functions.

Instead, we could provide a "feature reduced" build of midas (makefile target) that includes "just enough" of midas
to (say) run a frontend, maybe even odbedit. We already have some provisions for this, but no obvious documented
way actually doing it.

So back to el5.

How important it is to support very old operating systems?
How many people still use el5?
How about old  versions of Ubuntu? Macos?

If you use anything older than el6, can you speak up,
(and if possible say why you cannot migrate to an up-to-date linux).

K.O.
Entry  21 Nov 2017, Konstantin Olchanski, Release, Pending release of midas 
We are readying a new release of midas and it is almost here except for a few buglets on the new html status page.

The current release candidate branch is "feature/midas-2017-10" and if you have problems with the older versions
of midas, I recommend that you try this release candidate to check if your problem is already fixed. If the problem
still exists, please file a bug report on this forum or on the bitbucket issue tracker
https://bitbucket.org/tmidas/midas/issues?status=new&status=open

Highlights of the new release include
- new and improved web pages done in html and javascript
- many bug fixes and improvements for json and json-rpc support, including improvements in handling of long strings in odb
- locked (protected) operation of odb, where odb shared memory is not writable outside of odb operations
- improved multithead support for odb
- fixes for odb corruption when odb becomes 100% full

For the next release we hope to switch midas from C to fully C++ (building everything with C++ already works). To support el6 we avoid use of 
c++11 language constructs.

K.O.
    Reply  11 Jan 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
> 4096 returned -1, errno 21 (Is a directory)

On some linux systems, "/root" exists, it is a directory used as the home directory 
of user "root" (~root is /root; traditional UNIX has ~root as /).

open() of a directory succeeds on some UNIX systems, on some of them,
read() also works, but on other systems one is supposed
to use opendir() and readdir().

MacOS is of course a BSD system (not SysV like Solaris, not Linux), so things
are different yet again. I think MacOS does not have a /root.

In any case, IMO, mhttpd has no business serving the contents of /root,
or serving any files outside of the mhttpd user $HOME directory. (but also
should not serve files from ~user/.ssh, or any other "secret" files, good
luck making a complete axhuastive list of all secret files that should not be
served).

K.O.


> 
> and in the browser I get a popup which tries to save a file called 'root'.
> 
> I track this down to the following: in mhttpd, interprete (line 18046) it is 
> check if a custom page file exists (ss_file_exist) and if yes, it tries to 'load' 
> it. Now, at this stage the variable dec_path contains '/root'.
> 
> Here now what goes wrong: ss_file_exist tries to open the given path, and if a 
> valid file descriptor is returned it assumes the file exists. This is not 
> perfectly correct since it also will get a valid file descriptor is path is an 
> accessible directory!
> 
> Now for whatever reason, on RHEL/Fedora '/root' will return a valid file 
> descriptor, but not on macOS and Ubuntu. Others I haven't tested. A possible fix 
> would be to check explicitly if path is a directory and if yes return 0 in 
> ss_file_exist (see attached diff).
> 
> Perhaps there is cleaner way to deal with this issue?! 
Entry  20 Jul 2018, Konstantin Olchanski, Info, ROOT I/O workshop notable 
The ROOT I/O workshop was held on June 20th at CERN. A few things of interest in MIDAS land:

- LZ4 is now used as default compression (replacing gzip-1)
- JSON class streamer is finally implemented (XML streamer updated/reworked)
- recursive read-write lock class implemented
- do not see any special mention of Javascript I/O or jsroot, but jsroot git repo seems to be quite active

Of these the recursive read-write lock is most interesting - using something similar would improve ODB performance
and presumably fix the existing lock fairness problems.

https://root.cern.ch/doc/master/TReentrantRWLock_8hxx_source.html
https://indico.cern.ch/event/715802/contributions/2942560/attachments/1670191/2680682/ROOT_IO_June_Workshop_v2.pdf
https://github.com/root-project/jsroot

K.O.
    Reply  20 Jul 2018, Konstantin Olchanski, Forum, strings in sqlite 
> Invalid tag 0 'Comment' in event 21 'Run Parameters': cannot do history for 
> TID_STRING data, sorry!

The original MIDAS history API does not have provisions for storing TID_STRING data,
it is a very unfortunate limitation that has been with us for a very long time.

If I ever get around to rewrite the MIDAS history API, I will definitely add support for TID_STRING data.

But not today.

K.O.

P.S. Support for arbitrary binary blobs is also possible, but this will make the midas history
a kind of "a daq inside the daq" thing, probably we do not want to go this direction.

K.O.
    Reply  20 Jul 2018, Konstantin Olchanski, Forum, ODB full 
Concurrence.

Normally, MIDAS data events are saved to ODB (via RO_ODB into /eq/xxx/variables) to make them go into the midas history (/eq/xxx/common/history > 0).

If you do not want events to go into the history, but still want them saved to ODB, it should work (as long as ODB itself
is big enough), but you may run into other problems, specifically ODB free space fragmentation, when no matter how big ODB is, there is never
enough continuous free space for saving a large event. If it happens you will also see random "odb full" errors.

K.O.



> Two options:
> 
> 1) Do NOT send your events into the ODB. This is controlled via the flag RO_ODB in your frontend setting. For simple experiments with small events, it might make sense to copy each 
> event into the ODB for debugging, but if you have large events, this does not make sense. Use the "mdump" utility to check your events instead.
> 
> 2) Increase the size of the ODB. See the first FAQ here: https://midas.triumf.ca/MidasWiki/index.php/FAQ
> 
> Stefan
> 
> 
> > Dear expert,
> >       I'm developing a frontend and I'm getting this kind of error at each event:
> > 
> > 10:14:56.564 2018/05/04 [Sample Frontend,ERROR] [odb.c:5911:db_set_data1,ERROR]
> > online database full
> > 
> > If I run the mem command in odbedit I get the result at the end of this post.
> > 
> > Notice that I need to use an event size which is significantly larger than the
> > default one. I don't know if it is relevant for this error. I have in the ODB:
> > 
> >   /Experiment/MAX_EVENT_SIZE = 900000000
> > 
> > and in the frontend code:
> > 
> >   /* maximum event size produced by this frontend */
> >   INT max_event_size = 300000000;
> > 
> >   /* maximum event size for fragmented events (EQ_FRAGMENTED) */
> >   INT max_event_size_frag = 5 * 1024 * 1024;
> > 
> >   /* buffer size to hold events */
> >   INT event_buffer_size = 600000000;
> > 
> > Events seem to be properly stored in the output files, but I'm afraid I could
> > get some other problem.
> > 
> > Thank you for your help,
> >          Francesco
> > 
> > -------------------------------------------------------------------------
> > 
> > Database header size is 0x21040, all following values are offset by this!
> > Key area  0x00000000 - 0x0007FFFF, size 524288 bytes
> > Data area 0x00080000 - 0x00100000, size 524288 bytes
> > 
> > Keylist:
> > --------
> > Free block at 0x00000B58, size 0x00000008, next 0x000053E0
> > Free block at 0x000053E0, size 0x00000008, next 0x00006560
> > Free block at 0x00006560, size 0x00079AA0, next 0x00000000
> > 
> > Free Key area: 498352 bytes out of 524288 bytes
> > 
> > Data:
> > -----
> > Free block at 0x000847F0, size 0x0007B810, next 0x00000000
> > 
> > Free Data area: 505872 bytes out of 524288 bytes
> > 
> > Free: 498352 (95.1%) keylist, 505872 (96.5%) data
    Reply  23 Jul 2018, Konstantin Olchanski, Forum, Question about distributing event builder function on remote PC 
> I'm going to develop MIDAS DAQ for COMET experiment.
> I'm thinking to distribute the load of event building to different PCs.
> I attach a schematics of one of the examples of the design.

Your schematic is reminiscent of the T2K/ND280 structure where the MIDAS DAQ
was split into several separate MIDAS instances (separate "experiments": the FGD, the TPC, 
the slow controls, etc).

They were joined together by the "cascade" equipment which provided a path
for the data events to flow from subsidiary midas instances to the main system (the one
with the final mlogger). It also provided a reverse path for run control, where starting
a run in the main experiment also started the run in all the subsidiary experiments.

This cascade frontend was never included in the midas distribution (an oversight),
but I still have the code for it somewhere.

How many "frontend PC" components do you envision? (10, 100, 1000?).

In T2K/ND280, each subsidiary experiment had it's own ODB which made sense
because e.g. the FGD and the TPC were quite different and were managed by different
groups.

But for you it probably makes sense to have one common ODB. This means a MIDAS
structure where ODB is located on the main computer ("event builder PC"),
all others connect to it via the mserver and midas rpc.

But you will need to have the MIDAS shared event buffers on each "frontend PC" to be local,
which means the bm_xxx() functions have run locally instead of throuhg the mserver rpc.
This is not how midas works right now, but it could be modified to do this.

On the other hand, you do not have to use midas to write the "frontend pc" code. Today's
C++ provides enough features - threads, locks, mutexes, shared memories, event queues,
etc so you can write the whole sub-event builder as one monolithic c++ program
and use midas only to send the data to the main event builder. (plus midas rpc to handle
run control). In this scheme, technically, this "frontend pc" program would
be a multithreaded midas frontend.

K.O.
    Reply  28 Aug 2018, Konstantin Olchanski, Bug Report, Deleting Links in ODB via mhttpd 
> Asume you have a variable foo and a link bar -> foo. When you go to the ODB in
> mhttpd, click "Delete" and select bar, it actually deletes foo. bar stays,
> stating "<cannot resolve link>". Trying the same in odbedit with rm gives the
> expected result (bar is gone, foo is still there).
> 
> I'm on the develop branch.

I think I can confirm this. Created a bug report on bitbucket:

https://bitbucket.org/tmidas/midas/issues/148/mhttpd-odb-editor-deletes-wrong-symlink

K.O.
    Reply  28 Aug 2018, Konstantin Olchanski, Forum, Problems with virtual history events 
Hi, what you try should have worked. Perhaps your symlink is wrong and should say "/External/..." (with a leading slash). The "links period" would have
worked same as equipment/common/history period - as a rate limiter.

Anyhow, I suggest another way to do the same - create a fake equipment - the logger does
not care if the equipment is real or not and if you write into /eq/fake/variables from a proper frontend
or from a script. To hide the fake equipment from the status page, set /eq/fake/common/hidden to "true".

This will work for sure.

K.O.




> Hi,
> I am trying to set up virtual history events following
> https://midas.triumf.ca/MidasWiki/index.php/History_System#Virtual_History_Event 
> 
> Trying it the first way, using the following setup:
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links                           DIR
>     dirlink -> External/dir     KEY     1     12    >99d 0   RWD  <subdirectory>
> 
> 
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> External                        DIR
>     dir                         DIR
>         foo                     FLOAT   1     4     16s  0   RWD  12.5
> 
> 
> Then I get the following error message:
> ==================== History link "dirlink", ID 28150  =======================
> [Logger,ERROR] [mlogger.cxx:4942:open_history,ERROR] History event dirlink has
> no variables in ODB
> 
> 
> Trying the second way, I set up the following:
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> Links                           DIR
>     dir                         DIR
>         testlink -> External/foo
>                                 FLOAT   1     4     8m   0   RWD  5.2
> 
> Key name                        Type    #Val  Size  Last Opn Mode Value
> ---------------------------------------------------------------------------
> External                        DIR
>     foo                         FLOAT   1     4     6m   0   RWD  5.2
> 
> 
> Starting mlogger in verbose mode yields the following error:
> ==================== History link "dir", ID 28150  =======================
> [Logger,ERROR] [mlogger.cxx:4935:open_history,ERROR] History link
> /History/Links/dir/testlink is invalid
> Error in history system, aborting startup.
> 
> I'm not sure if this is a bug or just a case of PEBCAK.
> 
> Finally, to set the update period, do I need entries in /history/links periods
> with the tag name? Is there a way to only write them in the history file when
> they change? I want to use the virtual history events for measurements I get
> from external scripts, some periodic, some manual.
> 
> Thanks
    Reply  28 Aug 2018, Konstantin Olchanski, Forum, Int64 datatype 
> I would like to store the address of 1-Wire temperature sensors in a device
> driver. However, the supportet data types (as definded around
> include/midas.h:311) do not foresee a type large enough.
>

Hmm... you do not say what sensor you use and how many bits you actually need.

For up to 32 bits you can use TID_DWORD (uint32_t) (obviously)

For up to 48 bits (or so), you can use TID_DOUBLE (double) (wierd, but IEEE754 double precision variables would work as 48-bit (or so) integers).

For more, I would use arrays of TID_DWORD (64 bits, store low 32 bits into a[0], high bits into a[1]).

> 
> Is there a good reason against this?
> 

We had requests for implementing uint64_t 64-bit data types in MIDAS before. There are two problems:

a) in the MIDAS data banks, there is a problem with the bank header definition which only has 3 DWORDSs so causes
each alternating data bank to be 64-bit misaligned. And misaligned 64-bit data is very bad.

b) in ODB, 64-bit data support will need to be added from scratch and again it is not clear without doing it
if there will be any alignement problems. If one were to implement ODB from scratch, one would have everything
aligned to 64-bits or maybe even 128-bits, with uint64_t fully supported.

It is unlikely this kind of work will ever be done on ODB, but who knows.

> I know that other experiments use this kind of sensor, how do you store the
> addresses? I've noticed that most of the address is just zeroes, but I wouldn't
> like to store just half the address, assuming that half the address is always
> zeroes.

Cannot answer without knowing what sensor you use, but certainly you can use an array of bytes
or an array of integers to store arbitrarily long addresses. You can also use a TID_STRING
and store the address as a text string "0xabcdabcdabcdabcd" of arbitrary length.

K.O.
    Reply  28 Aug 2018, Konstantin Olchanski, Bug Report, mserver problem 
> Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the 
> mserver in messages. Mostly we see
> 
> 11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted
> 
> which happens 2-3 times per minute. 
> 
> We have also been seeing occasional dropped rpc connections to our frontends, which could be related. 
> 
> The version we were running with previously was ~1 year old, and we have just updated to the newest version 
> on bitbucket. 

Hmm... usually mserver will not restart automatically, maybe you have set it to autorestart on ODB (/programs/mserver/auto_restart 
set to "y").

It would be unusual for the main mserver program to crash, to debug it, you will need to run it in a terminal
and see if there is any error messages. Even better to run it in a terminal inside "gdb" and capture the stack trace
when it crashes.

Anyhow, crash of main mserver will not cause "dropped rpc connections" to clients - this would require for their
individual mserver subprocesses to crash. Such crashes would be highly unusual and are harder to debug.

Perhaps for the crashes you see there is some error messages in midas.log?

K.O.
Entry  29 Aug 2018, Konstantin Olchanski, Forum, midas forum mail relay changed to smtp.triumf.ca 
Per changes at TRIUMF, the MIDAS forum mail relay was changed from trmail.triumf.ca to 
smtp.triumf.ca. K.O.
Entry  22 Nov 2018, Konstantin Olchanski, Info, status of self-signed https certificates 
I just happened to check the current situation with self-signed https certificates as implemented in mhttpd.

(To remember, the powers-that-be are pushing for universal use of https for all web access. The https
implementation in mhttpd at the moment can only generate self-signed certificates, so...)

plain unencrypted http:
- both google chrome and firefox say "connection not secure", but connect without any fuss.
- apple safari does not say anything

https with self-signed certificate:
- google chrome goes through an "are you sure?" page, "red not secure" status in toolbar
- firefox does the same thing, requires adding a security exception, but still shows "not secure" status in toolbar
- apple safari goes through a sequence of "are you sure?" pages, asks for the user password to add the self-signed certificate to 
the macos key store, then marks the connection as "secure" (good)

So clearly powers-that-be do not want us to use self-signed certificates for https. (And frown on use of unencrypted
http even for localhost connections). Properly signed certificates can be obtained from letsencrypt almost
automatically, but of course mhttpd needs to know how to use them and how to do handle their automatic renewals.

I plan to update the mongoose web server library inside mhttpd and with luck I will straighten some of this certificate business at 
the same time.

In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache 
httpd, etc).

K.O.
    Reply  03 Dec 2018, Konstantin Olchanski, Info, status of self-signed https certificates 
> > In the mean time, we continue to recommend that mhttpd should be used behind a password protected https proxy (i.e. apache 
> > httpd, etc).
> 
> I guess this is what moste people do anyhow these days. Do I understand correctly that this then rules out the usage of letsencrype certificates, since the 
> host needs to be accessed from outside, which is not possible if running behind a password protected firewall.
> 
> Stefan

Careful, firewall != proxy, very different things.

A firewall prevents network communications, period. (Like fences and locked doors, there are good reasons to have them).

An https proxy is a way to have encrypted (protected) web communications with a machine behind a firewall.

Basically, we have 4 main cases, all with trouble.

1) mhttpd running on localhost, "just for testing", is in trouble. there is no simple way to get a "blessed" certificate, and self-signed certificates are now "almost forbidden". http is "okey 
for now", but the writing is on the wall. There is no special exception for "local-only" connections.

2a) mhttpd running on an internet-connected machine, with apache httpd, our best case. To get this working one has to configure both apache httpd and the "blessed certificate" 
certbot tool. With luck, both tools work smoothly on current OSes (they do NOT).

2b) same, but without apache httpd. One still has to run certbot, and the "glue" between mhttpd and certbot is currently missing: need a way to point mhttpd to the certbot certificate 
files and a way to reload mhttpd when the certificate is auto-renewed.

3) mhttpd running on a machine behind a corporate firewall. worst case. if firewall Gods make an opening for ports 80 and 443, it becomes case (2a/b), otherwise, one must use some 
kind of https proxy. (Plus there is no trivial way to setup an encrypted secure communication channel between mhttpd and this proxy, a double bad).

K.O.

P.S. I guess one can use nginx as the https proxy instead of apache httpd. I did not try yet. My impression is that everybody uses nginx, except for people who started with apache httpd 
and are too lazy to try nginx.

K.O.
Entry  05 Dec 2018, Konstantin Olchanski, Info, Partial refactoring of ODB code 
The current ODB code has several structural problems and I think I now figured out how to straighten them out.

Here is the problems:

a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock

Discussion:

a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV 
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.

For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.

For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but 
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).

I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally 
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full 
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).

A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is 
naturally eliminated.

Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once

Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).

As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return"  - are all 
gone. Extra conversions between hDB and pheader are gone.

b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".

This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to 
"db_lock_read_only()" and "db_lock_read_write()".

Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS - 
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".

c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair" 
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains - 
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and 
repair).

d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code 
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of 
ODB, etc.

Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy 
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on 
the screen, they are not accessible in the core dump.

To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into 
cm_msg() immediately after we unlock odb.

If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an 
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is 
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...

Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.

K.O.
Entry  18 Dec 2018, Konstantin Olchanski, Info, mxml update 
the mxml library was updated to make it thread-safe.
https://bitbucket.org/tmidas/mxml/src/master/

I also take an opportunity to remind all to update your copy to the latest version
as I just stumbled on old bug that I fixed 1 year ago (crash of mlogger)
but forgot to update all and every of my copies of mxml.

I also looked at the xml encoder and I see that it has several places where it may
truncate the data, but none of these places can cause truncation of ODB data
because the fixed-size internal buffers are big enough to hold the longest
values sent by the odb xml encoder.

K.O.
    Reply  26 Dec 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> > [mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
> > 4096 returned -1, errno 21 (Is a directory)
> 
> On some linux systems, "/root" exists, it is a directory used as the home directory 
> of user "root" (~root is /root; traditional UNIX has ~root as /).
> 

I just got burned by the same problem on MacOS. mhttpd odb editor cannot open ODB "/System" 
because on MacOS there is a subdirectory called "/System".

So the question is why did mhttpd suddenly started serving files from the main URL?

K.O.
    Reply  26 Dec 2018, Konstantin Olchanski, Bug Report, mhttpd - custom page - RHEL/Fedora 
> I implemented that fix. Thank you to Andreas. Creating "Custom" directory from the web now does 
> not have that problem any more.

This fix also stops mhttpd from serving the /etc/passwd file.

BTW, "the fix" in mhttpd unconditionally creates /Custom/Path and sets it to the value of $MIDASSYS. This path 
seems to be prepended to all file paths, so this fix also breaks the normal use of /Custom/xxx that contain the full 
path name of the file to serve...

Looks like file serving in mhttpd got messed up and needs to be reviewed. I still strongly believe that mhttpd should 
be serve arbitrary files (only serve files explicitly listed in ODB) or as next best option, only serve files from 
subdirectories explicitly listed in ODB.

K.O.
ELOG V3.1.4-2e1708b5