The current ODB code has several structural problems and I think I now figured out how to straighten them out.
Here is the problems:
a) nested (recursive) odb locks
b) no clear separation between read-only access and read-write access
c) no clear separation between odb validation and repair functions
d) cm_msg() is called while holding a database lock
Discussion:
a) odb locks are nested because most functions lock the database, then call other functions that lock the database again. Most locking primitives - SystemV
semaphores, POSIX semaphores and mutexes - usually do not permit nested (recursive) locking.
For locking the odb shared memory we use a SystemV semaphore with recursion implemented "by hand" in ss_semaphore_wait_for(). This works ok.
For making odb thread-safe, we use POSIX mutexes, and we rely on an optional feature (PTHREAD_MUTEX_RECURSIVE) which seems to work on most OSes, but
is not required to exist and work by any standard. For example, recursive mutexes do not work in uclinux (linux for machines without an MMU).
I looked at implementing recursive mutexes "by hand", same as we have the recursive semaphores, and realized that it is quite complicated and computationally
expensive (read: inefficient). (Also I think nested and recursive locks is "not mainstream" and should rather be avoided). As an example you can see full
complexity of a nested lock as recent implementation in ROOT. (good luck finding it).
A solution for this problem is well known. All functions are separated into "unlocked" user-callable functions and "locked" internal functions. Nested locking is
naturally eliminated.
Call sequences:
db_get_key() -> db_find_key() // odb is locked twice
become
db_get_key() -> db_get_key_locked() -> db_find_key_locked() // odb is locked once
Actual implementation of this scheme turns out to be a very clean and mechanical refactoring (moving the code without changing what it does).
As a try, I refactored db_find_key() and db_get_key() and I like the result. Locking is now obvious - obscure error paths with hidden "unlock before return" - are all
gone. Extra conversions between hDB and pheader are gone.
b) in this refactoring, functions that do not (should not) modify odb become easy to identify - the pheader argument is tagged "const".
This simplifies the implementation of "write-protected" odb - instead of ad-hoc db_allow_write_locked() sprinkled everywhere, one can have obvious calls to
"db_lock_read_only()" and "db_lock_read_write()".
Separation of locks into "read" and "write" locks, in turn, improves locking behaviour - helps against problems like lock starvation - which we did see with MIDAS -
as "read" locks are much more efficient - all readers can read the data at the same time, locking is only done when somebody need to "write".
c) some db_validate() functions also try to do repair. this cannot work if validation is called from "read-only" functions like db_find_key(). I now think the "repair"
functions should be separate from "validate" functions. validate functions should detect problems, repair functions would repair them. The question remains -
when is good time to run a full repair. (probably at the time when we connect to the database - this way, simply starting "odbedit" will force a database check and
repair).
d) calls to cm_msg() when odb is locked has been a problem for a long time. because cm_msg() itself calls odb and because it also calls event buffer code
(SYSMSG buffer) which in turn call odb functions, there was trouble with deadlocks between ODB and event buffer semaphores, trouble with recursive use of
ODB, etc.
Right now we have all this partially papered over by having cm_msg() put messages into a memory buffer that we periodically flush, but I was never super happy
with that solution. For example, if we crash before the message buffer is flushed, all error messages are lost, they do not go into midas.log, they are not printed on
the screen, they are not accessible in the core dump.
To resolve this problem, I have all "locked" functions call db_msg() instead of cm_msg(). db_msg() saves the messages in a linked list which is flushed into
cm_msg() immediately after we unlock odb.
If we crash after generating an error message but before it is flushed to cm_msg(), we can still access it through the linked list inside the core dump. This is an
improvement over what we have now. Ideally, all messages should be printed to the terminal and saved to midas.log and pushed into SYSMSG, but most of this is
impractical at a moment when odb is locked - as we already know it leads to deadlocks and other trouble...
Bottom line, I now have a path to improve the odb code and to resolve some of the long standing structural problems.
K.O. |