We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.
Crash is memset() called by db_delete_key1() called by cm_connect_experiment().
I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.
Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().
I will not do this immediately today, but hopefully next week or so.
Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250 ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0 __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1 0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2 free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3 0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>,
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4 db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5 0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6 0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7 0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8 0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca",
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
at /usr/include/c++/11/bits/basic_string.h:194
#9 0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "",
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit",
func=func@entry=0x0,
odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4 db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789 free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0,
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1,
last_written = 1953785965}
(gdb)
K.O. |