In the darkside vertical slice midas daq, we observed odb corruption which I
traced to db_delete_key(). cause of corruption is not important. important is to
have a robust odb where small corruption will stay localized and will not
require erasing corrupt odb and reloading it from a backup file.
To help debug such corruption one can try to set ODB "/Experiment/Protect ODB"
to "yes". This will make ODB shared memory read-only and user code scribbling
into the wrong memory address will cause a seg fault and core dump instead of
silent ODB corruption. This feature is not enabled by default because changing
ODB shared memory mapping from "read-only" to "writable" (and back) is not very
fast and it slows down MIDAS noticably.
MIDAS right before this merge was tagged "midas-2025-11-a", if you see this ODB
update cause trouble, please report it here and revert to this tagged version.
Updates:
- harden db_delete_key() against internal corruption, if odb inconsistency is
detected, do a clean crash instead of trying to delete stuff and corrupting odb
to the point where it has to be erased and reloaded from a backup file.
- additional refactoring to separate read-locked and write-locked code.
- merge of missing patch to avoid odb corruption when key area becomes 100% full
(or was it the data area? I forget now, I fixed one of them long time ago, now
both are fixed).
- remove the "follow_links" argument from db_delete_key(), see separate
discussion on this.
- add db_delete() to delete things by ODB path not by hkey (atomic fused
together db_find_link() and db_delete_key()).
- fixes for incorrect use of db_find_key() and db_delete_key(), this
unexpectedly follows symlinks and deletes the wrong ODB entry. (should have been
db_find_link(), now replaced with atomic db_delete()).
K.O. |