Back Midas Rome Roody Rootana
  Midas DAQ System, Page 3 of 153  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subject
  3036   05 May 2025 Konstantin OlchanskiBug Reportabort and core dump in cm_disconnect_experiment()
I noticed that some programs like mhist, if they take too long, there is an abort and core dump at the very end. This is because they forgot to 
set/disable the watchdog timeout, and they got remove from odb and from the SYSMSG event buffer.

mhist is easy to fix, just add the missing call to disable the watchdog, but I also see a similar crash in the mserver which of course requires 
the watchdog.

In either case, the crash is in cm_disconnect_experiment() where we know we are shutting down and we know there is no useful information in the 
core dump.

I think I will fix it by adding a flag to bm_close_buffer() to bypass/avoid the crash from "we are already removed from this buffer".

Stack trace from mhist:

[mhist,ERROR] [midas.cxx:5977:bm_validate_client_index,ERROR] My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my 
pid 3113263
[mhist,ERROR] [midas.cxx:5980:bm_validate_client_index,ERROR] Maybe this client was removed by a timeout. See midas.log. Cannot continue, 
aborting...
bm_validate_client_index: My client index 6 in buffer 'SYSMSG' is invalid: client name '', pid 0 should be my pid 3113263
bm_validate_client_index: Maybe this client was removed by a timeout. See midas.log. Cannot continue, aborting...

Program received signal SIGABRT, Aborted.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
warning: 44	./nptl/pthread_kill.c: No such file or directory
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff71df27e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff71c28ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00005555555768b4 in bm_validate_client_index_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:5993
#6  0x000055555557ed7a in bm_get_my_client_locked (pbuf_guard=...) at /home/olchansk/git/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=1) at /home/olchansk/git/midas/src/midas.cxx:7162
#8  0x000055555557f101 in cm_msg_close_buffer () at /home/olchansk/git/midas/src/midas.cxx:490
#9  0x000055555558506b in cm_disconnect_experiment () at /home/olchansk/git/midas/src/midas.cxx:2904
#10 0x000055555556d2ad in main (argc=<optimized out>, argv=<optimized out>) at /home/olchansk/git/midas/progs/mhist.cxx:882
(gdb) 

Stack trace from mserver:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=138048230684480) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=138048230684480, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007d8ddbc4e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007d8ddbc347f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x000059beb439dab0 in bm_validate_client_index_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:5993
#6  0x000059beb43a859c in bm_get_my_client_locked (pbuf_guard=...) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:6000
#7  bm_close_buffer (buffer_handle=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7162
#8  0x000059beb43a89af in bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7256
#9  bm_close_all_buffers () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:7243
#10 0x000059beb43afa20 in cm_disconnect_experiment () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:2905
#11 0x000059beb43afdd8 in rpc_check_channels () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:16317
#12 0x000059beb43b0cf5 in rpc_server_loop () at /home/dsdaqdev/packages_common/midas/src/midas.cxx:15858
#13 0x000059beb4390982 in main (argc=9, argv=0x7ffc07e5bed8) at /home/dsdaqdev/packages_common/midas/progs/mserver.cxx:387

K.O.
  3035   05 May 2025 Konstantin OlchanskiInfodb_delete_key(TRUE)
I was working on an odb corruption crash inside db_delete_key() and I noticed 
that I did not test db_delete_key() with follow_links set to TRUE. Then I noticed 
that nobody nowhere seems to use db_delete_key() with follow_links set to TRUE. 

Instead of testing it, can I just remove it?

This feature existed since day 1 (1st commit) and it does something unexpected 
compared to filesystem "/bin/rm": the best I can tell, it is removes the link 
*and* whatever the link points to. For people familiar with "/bin/rm", this is 
somewhat unexpected and by my thinking, if nobody ever added such a feature to 
"/bin/rm", it is probably not considered generally useful or desirable. (I would 
think it dangerous, it removes not 1 but 2 files, the 2nd file would be in some 
other directory far away from where we are).

By this thinking, I should remove "follow_links" (actually just make it do thing 
, to reduce the disturbance to other source code). db_delete_key() should work 
similar to /bin/rm aka the unlink() syscall.

K.O.
  3034   05 May 2025 Konstantin OlchanskiBug FixBug fix in SQL history
A bug was introduced to the SQL history in 2022 that made renaming of variable names not work. This is now fixed.

break commit:
54bbc9ed5d65d8409e8c9fe60b024e99c9f34a85
fix commit:
159d8d3912c8c92da7d6d674321c8a26b7ba68d4

P.S.

This problem was caused by an unfortunate design of the c++ class system. If I want to add more data to an existing 
class, I write this:

class old_class {
int i,j,k;
}

class bigger_class: public old_class {
int additional_variable;
}

But if I have this:

struct x { int i,j; }

class y {
std::vector<x> array_of_x;
}

and I want to add "k" to "x", c++ has not way to do this. history code has this workaround:

class bigger_y: public y
{
std::vector<int> array_of_k;
}

int bigger_y:foo(int n) {
printf("%d %d %d\", array_of_x[n].i, array_of_x[n].j, array_of_k[n]);
}

problem is that it is not obvious that "array_of_x" and "array_of_k" are connected
and they can easily get out of sync (if elements are added or removed). this is the
bug that happened in the history code. I now added assert(array_of_x.size()==array_of_k.size())
to offer at least some protection going forward.

P.S. As final solution I think I want to completely separate file history and sql history code,
they have more things different than common.

K.O.
  3033   30 Apr 2025 Stefan RittBug ReportODBXX : ODB links in the path names ?
Indeed this was missing from the very beginning. I added it, please report back if it's not working.

Stefan
  3032   30 Apr 2025 Pavel MuratInfoNew ODB++ API
it is a very convenient interface! Does it support the ODB links in the path names ? -- thanks, regards, Pasha
  3031   30 Apr 2025 Stefan RittInfoNew ODB++ API
I had to change the ODBXX API: https://bitbucket.org/tmidas/midas/commits/273c4e434795453c0c6bceb46bac9a0d2d27db18

The old C API is case-insensitive, meaning db_find_key("name") returns a key "name" or "Name" or "NAME". We can discuss if this is good or bad, but that's how it is since 30 years.

I now realized the the ODBXX API keys is case sensitive, so a o["NAME"] does not return any key "name". Rather, it tries to create a new key which of course fails. I changed therefore
the ODBXX to become case-insensitive like the old C API.

Stefan
  3030   29 Apr 2025 Pavel MuratBug ReportODBXX : ODB links in the path names ?
Dear MIDAS experts,

does the ODBXX interface to ODB currently ODB links in the path names? - From what I see so far, it currently fails to do so,
but I could be doing something else wrong... 

-- thanks, regards, Pasha
  3029   16 Apr 2025 Thomas LindnerInfoMIDAS workshop (online) Sept 22-23, 2025
Dear MIDAS enthusiasts,

We are planning a fifth MIDAS workshop, following on from previous successful 
workshops in 2015, 2017, 2019 and 2023.  The goals of the workshop include:

- Getting updates from MIDAS developers on new features, bug fixes and planned 
changes.
- Getting reports from MIDAS users on how they are using MIDAS and what problems 
they are facing.
- Making plans for future MIDAS changes and improvements

We are planning to have an online workshop on Sept 22-23, 2025 (it will coincide 
with a visit of Stefan to TRIUMF).  We are tentatively planning to have a four 
hour session on each day, with the sessions timed for morning in Vancouver and 
afternoon/evening in Europe.  Sorry, the sessions are likely to again not be well 
timed for our colleagues in Asia.  

We will provide exact times and more details closer to the date.  But I hope 
people can mark the dates in their calendars; we are keen to hear from as much of 
the MIDAS community as possible.  

Best Regards,
Thomas Lindner
  3028   08 Apr 2025 Lukas MandokkInfoMSL Syntax Highlighting Extension for VSCode (Release)

Hello everyone, 

I just wanted to let you know, that I published a MSL Syntax Highlighting Extension for VSCode.
It is still in a quite early stage, so there might be some missing keywords and edge cases which are not fully handled. So in case you find any issues or have suggestions for improvements, I am happy to implement them. Also I only tested it with a custom theme (One Monokai), so it might look very different with the default theme and other ones.

The extension is called "MSL Syntax Highlighter" and can be found in the extension marketplace in VSCode.  (vscode marketplace: https://marketplace.visualstudio.com/items?itemName=LukasMandok.msl-syntax-highlighter, github repo: https://github.com/LukasMandok/msl-syntax-highlighter)

One additional remark:
- To keep a consitent style with existing themes, one is a bit limited in regard to colors. For this reason a distinction betrween LOOP and IF Blocks is not really possible without writing a custom theme. A workaround would be to add the theming in the custom user settings (explained in the readme).

  3027   07 Apr 2025 Stefan RittSuggestionSequencer ODBSET feature requests
If people are simultaneously editing scripts this is indeed an issue, which probably never can be resolved by technical means. It need communication between the users.

For the main script some ODB locking might look like:

- First person clicks on "Edit", system checks that file is not locked and sequencer is not running, then goes into edit mode
- When entering edit mode, the editor puts a lock in to the ODB, like "Scrip lock = pc1234".
- When another person clicks on "Edit", the system replies "File current being edited on pc1234"
- When the first person saves the file or closes the web browser, the lock gets removed.
- Since a browser can crash without removing a lock, we need some automatic lock recovery, like if the lock is there, the next users gets a message "file currently locked. Click "override" to "steal" the lock and edit the file".

All that is not 100% perfect, but will probably cover 99% of the cases.

There is still the problem on all other scripts. In principle we would need a lock for each file which is not so simple to implement (would need arrays of files and host names).

Another issue will arise if a user opens a file twice for editing. The second attempt will fail, but I believe this is what we want.

A hostname for the lock is the easiest we can get. Would be better to also have a user name, but since the midas API does not require a log in, we won't have a user name handy. At it would be too tedious to ask "if you want to edit this file, enter your username".

Just some thoughts.

Stefan
  3026   07 Apr 2025 Zaher SalmanSuggestionSequencer ODBSET feature requests

Lukas Gerritzen wrote:

I also encountered a small annoyance in the current workflow of editing sequencer files in the browser:
  • Load a file
  • Double-click it to edit it, acknowledge the "To edit the sequence it must be opened in an editor tab" dialog
  • A new tab opens
  • Edit something, click "Start", acknowledge the "Save and start?" dialog (which pops up even if no changes are made)
  • Run the script
  • Double-click to make more changes -> another tab opens
After a while, many tabs with the same file are open. I understand this may be considered "user error", but perhaps the sequencer could avoid opening redundant tabs for the same file, or prompt before doing so?

Thanks for considering these suggestions!


The original reason the restricting edits in the first tab is that it is used to reflect the state of the sequencer, i.e. the file that is currently loaded in the ODB.

Imagine two users are working in parallel on the same file, each preparing their own sequence. One finishes editing and starts the sequencer. How would the second person know that by now the file was changed and is running?

I am open to suggestions to minimize the number of clicks and/or other options to make the first tab editable while making it safe and visible to all other users. Maybe a lock mechanism in the ODB can help here.

Zaher
  3025   02 Apr 2025 Stefan RittSuggestionSequencer ODBSET feature requests
And there is one more argument: 

We have a Python expert in our development team who wrote already the Python-to-C bindings. That means when running a Python 
script, we can already start/stop runs, write/read to the ODB etc. We only have to get the single stepping going which seems feasible to 
me, since there are some libraries like inspect.currentframe() and traceback.extract_stack(). For single-stepping there are debug APIs 
like debugpy. With Lua we really would have to start from scratch.

Stefan
  3024   02 Apr 2025 Konstantin OlchanskiSuggestionSequencer ODBSET feature requests
> I once looked at using LUA for this
>
> > but I think basing off an full featured programming language like python
> > is better.
> 
> if it came to a vote, my vote would go to Lua: it would allow to do everything needed, 
> with much less external dependencies and with much less motivation to over-use the interpreter. 
> The CMS experience was very teaching in this respect... 

Unfortunately I am only slightly aware of Lua to say how nicve or how bad it is. And we are
not sure how well it supports the single-line-stepping that permits the nice graphical
visualization of Stefan's sequencer.

It looks like python has the single-line-stepping built-in as a standard feature
and python is a more popular and more versatile machine, so to me python looks
like a better choice compared to lua (obscure), perl ("nobody uses it anymore")
or bash (ugly syntax).

K.O.
  3023   02 Apr 2025 Konstantin OlchanskiBug ReportMIDAS history system not using the event timestamps ?
> > You can always include your "true" data timestamp as the first value in your data.
> 
> Are you saying that if the first data word of a history event were a timestamp, 
> the MIDAS history system, when plotting the time dependencies, would use that timestamp 
> instead of the mlogger timestamp?
>

you are correct, midas knows nothing about what you put in the history data.

what I suggested is: if you want your true data timestamp recorded in the history,
you can put it into the history data yourself, and I suggested using the 1st value,
but you can also make it the last value or the 10th value, it is up to you.

for making history plots, the history timestamp is used, as you wrote and I confirmed,
this timestamp is generated by mlogger.

what is not clear to me is why this is a problem? do you see a big difference between the 
true data timestamp and the mlogger data timestamp? bigger than 1 second? (this would change 
the shape of "last 10 minutes" plots (600 seconds). bigger than 1 minute? (this would change 
the shape of "last 1 hour plots" (60 minutes, 3600 seconds).

that said, note that we currently store the timestamp as a DWORD 32-bit UNIX time value 
which will overflow in 2038 and which is quickly becoming incompatible with the ongoing 
switch to 64-bit time_t. Ubuntu-24 already build a large number of system libraries with 64-
bit time_t and building MIDAS with 32-bit time_t may soon become as difficult as building 
32-bit MIDAS for 32-bit i686 VME processors. we have to move with the times.

what it means is that the history system data format will have to be updated to 64-bit 
time_t and at the same time, we may try to change the timestamp from mlogger-generated to 
frontend-generated.

but it is still not clear to me how that helps you, because the frontend-generated timestamp 
is not the true data timestamp that you wanted. (and only you know what the true data 
timestamp is and where it comes from and how to tell it to MIDAS).

K.O.
  3022   02 Apr 2025 Pavel MuratBug ReportMIDAS history system not using the event timestamps ?
> You can always include your "true" data timestamp as the first value in your data.

Are you saying that if the first data word of a history event were a timestamp, 
the MIDAS history system, when plotting the time dependencies, would use that timestamp 
instead of the mlogger timestamp?  

if that is true, what tells MIDAS that the first data word is the timestamp? 

I couldn't find a discussion of that on the page describing the history system - 

 https://daq00.triumf.ca/MidasWiki/index.php/History_System#Frontend_history_event

- perhaps I should be looking at a different page?

-- thanks again, regards, Pasha 
  3021   01 Apr 2025 Pavel MuratSuggestionSequencer ODBSET feature requests
 I once looked at using LUA for this,
> but I think basing off an full featured programming language like python
> is better.

if it came to a vote, my vote would go to Lua: it would allow to do everything needed, 
with much less external dependencies and with much less motivation to over-use the interpreter. 
The CMS experience was very teaching in this respect... 

-- my 2c, regards, Pasha
  3020   01 Apr 2025 Konstantin OlchanskiBug ReportMIDAS history system not using the event timestamps ?
> I confirm that when writing out history files corresponding to the slow control event data, 
> MIDAS history system timestamps the data not with the event time coming from the event data, 
> but with the current time determined by [mlogger].

This is correct. The timestamp in the history file is the mlogger timestamp.

In theory we could use the ODB "last_written" timestamp, but in practice,
timestamps are 1 second granularity and the difference between the two
timestamps normally would be less than 1 second. (time to react to db_watch()).

But ODB last_written also is not the data timestamp. For remote connected clients
it includes the mserver communication delay.

What is the data timestamp, only the user knows - for some FPGA based equipments,
I can see the data timestamp being read from an FPGA register together with the data.

But back to earth.

For making history plots, 1 second granularity with a small (a few seconds) delay should be okey,
and I think the mserver timestamp is good enough.

For data analysis, you are reading history data from a history data file and you are
not constrained to using the MIDAS timestamp.

You can always include your "true" data timestamp as the first value in your data.

We do this in felaview for writing labview data to midas history in the ALPHA antihydrogen experiment at CERN.

This also anticipates your next request, can we have millisecond, microsecond, nanosecond history timestamps:
since you define your "true" data timestamp, you an make it anything you want. (I use "double" time in seconds,
64-bit IEEE-754 "double" has enough precision for microsecond granularity. FPGA based devices can have timestamps
with 10 ns or 8 ns granularity, in this case a uint64_t clock counter could be more appropriate).

K.O.
  3019   01 Apr 2025 Konstantin OlchanskiSuggestionSequencer ODBSET feature requests
> ODBSET "/Path/value[1,3,5]"
> ODBSET "/Path/value[1-5,7-9]"

we support this array index syntax in several places,
specifically, in javascript odb get and set mjsonrpc RPCs.

> SET GOODCHANNELS, "1-5,7,9"; ODBSET "/Path/value[$GOODCHANNELS]"
> SET BADCHANNELS, "6,8"; ODBSET "/Path/value[!$BADCHANNELS]"
> ODBSET "/Path/value[0-100, except $BADCHANNELS]"

this is very clever syntax, but I have not seen any programming
language actually implement it (not even perl).

there must be a good reason why nobody does this. probably we should not do it either.

but as Stefan said (and my opinion), the route of extending MIDAS sequencer
language until it becomes a superset of python, perl, tcl, bash, javascript
and algol is not a sustainable approach. I once looked at using LUA for this,
but I think basing off an full featured programming language like python
is better.

K.O.
  3018   01 Apr 2025 Konstantin OlchanskiBug ReportODB corruption
We see ODB corruption crashes in the DS20k vertical slice MIDAS instance.

Crash is memset() called by db_delete_key1() called by cm_connect_experiment().

I look at the source code and I see that ODB pkey and hkey validation is absent
from most iterators and it is possible for "bad" pkey to cause corruption. Many
other places in the ODB code use db_get_pkey() and db_validate_hkey() to prevent
invalid data from causing further corruption and breakage.

Also db_delete_key1() needs to be refactored and renamed db_delete_key_wlocked().

I will not do this immediately today, but hopefully next week or so.

Stack trace is attached, observe how free_data() was called on a completely invalid pkey,
bad pkey->type, bad pkey sizes, etc.

#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
250	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1  0x00005ad4102b4217 in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>) at /usr/include/x86_64-linux-
gnu/bits/string_fortified.h:59
#2  free_data (pheader=pheader@entry=0x75aaea4f4000, address=0x75aaed4cea50, size=<optimized out>, caller=caller@entry=0x5ad4102ffb6c 
"db_delete_key1") at /home/dsdaqdev/packages_common/midas/src/odb.cxx:513
#3  0x00005ad4102b6a5b in free_data (caller=0x5ad4102ffb6c "db_delete_key1", size=<optimized out>, address=<optimized out>, 
pheader=0x75aaea4f4000) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:453
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
#5  0x00005ad4102b6979 in db_delete_key1 (hDB=1, hKey=288672, level=0, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3731
#6  0x00005ad4102cc923 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12916
#7  0x00005ad4102cca73 in db_create_record (hDB=hDB@entry=1, hKey=hKey@entry=0, orig_key_name=orig_key_name@entry=0x7ffd75987280 
"/Programs/ODBEdit", init_str=<optimized out>) at /home/dsdaqdev/packages_common/midas/src/odb.cxx:12942
#8  0x00005ad4102a00ba in cm_set_client_info (hDB=1, hKeyClient=0x7ffd75987420, host_name=0x5ad412262ee0 "dsdaqgw.triumf.ca", 
client_name=0x7ffd759874c0 "ODBEdit", hw_type=<optimized out>, password=<optimized out>, watchdog_timeout=<optimized out>)
    at /usr/include/c++/11/bits/basic_string.h:194
#9  0x00005ad4102a902b in cm_connect_experiment1 (host_name=<optimized out>, host_name@entry=0x7ffd759876c0 "", 
default_exp_name=default_exp_name@entry=0x7ffd759876a0 "vslice", client_name=client_name@entry=0x5ad4102f28fa "ODBEdit", 
func=func@entry=0x0, 
    odb_size=odb_size@entry=1048576, watchdog_timeout=<optimized out>, watchdog_timeout@entry=10000) at 
/usr/include/c++/11/bits/basic_string.h:194
#10 0x00005ad41027e58d in main (argc=3, argv=0x7ffd759881e8) at /home/dsdaqdev/packages_common/midas/progs/odbedit.cxx:3025
(gdb) up
...
#4  db_delete_key1 (hDB=1, hKey=<optimized out>, level=<optimized out>, follow_links=0) at 
/home/dsdaqdev/packages_common/midas/src/odb.cxx:3789
3789	            free_data(pheader, (char *) pheader + pkey->data, pkey->total_size, "db_delete_key1");
(gdb) p pkey
$1 = (KEY *) 0x75aaea53b400
(gdb) p *pkey
$2 = {type = 1684370529, num_values = 0, name = '\000' <repeats 16 times>, "xQ\375\002\004\000\000\000\004\000\000\000\a\000\000", data = 0, 
total_size = 290944, item_size = 1743544378, access_mode = 0, notify_count = 0, next_key = 15, parent_keylist = 1, 
  last_written = 1953785965}
(gdb) 

K.O.
  3017   01 Apr 2025 Konstantin OlchanskiBug FixODB and event buffer - release semaphore before abort() and core dump
There is a long standing problem with ODB and event buffers. If they detect an 
internal data inconsistency and cannot continue running, they call abort() to 
dump core and stop.

Problem is in some code paths, they do this while holding the ODB or event 
buffer semaphore. (Linux kernel automatically releases SYSV semaphores after 
core dump is finished and program holding them is stopped).

If core dump takes longer than 10 seconds (for whatever reason, but we see this 
often enough), all other programs that wait for ODB or event buffer access, will 
also timeout and also crash (with core dump). Result is a core dump storm, at 
the end all MIDAS programs are crashed. (Luckily recovery is easy, simply 
restart everything).

Now I realize that in many situation, we do not need to hold the semaphore while 
dumping core - the content of ODB and event buffer shared memories is not 
important for debugging the crash - and it is safe to release the semaphore 
before calling abort().

This is now implemented for ODB and event buffers. Hopefully core dump storms 
will not happen again.

commit 96369c29deba1752fd3d25bed53e6594773d7e1a
release ODB semaphore before calling abort() to dump core. if core dump takes 
longer than 10 sec all other midas programs will timeout and crash.

commit 2506406813f1e7581572f0d5721d3761b7c8e8dd
unlock event buffer before calling abort() in bm_validate_client_index_locked(), 
refactor bm_get_my_client_locked()


K.O.
ELOG V3.1.4-2e1708b5