| ID |
Date |
Author |
Topic |
Subject |
|
3179
|
09 Dec 2025 |
Konstantin Olchanski | Bug Report | odbxx memory leak with JSON ODB dump | > Thanks for reporting this. It was caused by a
>
> MJsonNode* node = MJsonNode::Parse(str.c_str());
>
> not followed by a
>
> delete node;
>
> I added that now in odb::odb_from_json_string(). Can you try again?
>
> Stefan
Close, but no cigar, node you delete is not the node you got from Parse(), see "node = subnode;".
If I delete the node returned by Parse(), I confirm the memory leak is gone.
BTW, it looks like we are parsing the whole JSON ODB dump (200k+) on every odbxx access. Can you parse it just once?
K.O. |
|
3180
|
10 Dec 2025 |
Stefan Ritt | Bug Report | odbxx memory leak with JSON ODB dump | > BTW, it looks like we are parsing the whole JSON ODB dump (200k+) on every odbxx access. Can you parse it just once?
You are right. I changed the code so that the dump is only parsed once. Please give it a try.
Stefan |
|
3182
|
11 Dec 2025 |
Konstantin Olchanski | Bug Report | odbxx memory leak with JSON ODB dump | > > BTW, it looks like we are parsing the whole JSON ODB dump (200k+) on every odbxx access. Can you parse it just once?
> You are right. I changed the code so that the dump is only parsed once. Please give it a try.
Confirmed fixed, thanks! There are 2 small changes I made in odbxx.h, please pull.
K.O. |
|
3184
|
11 Dec 2025 |
Stefan Ritt | Bug Report | odbxx memory leak with JSON ODB dump | > Confirmed fixed, thanks! There are 2 small changes I made in odbxx.h, please pull.
There was one missing enable_jsroot in manalyzer, please pull yourself.
Stefan |
|
3185
|
12 Dec 2025 |
Konstantin Olchanski | Bug Report | odbxx memory leak with JSON ODB dump | > > Confirmed fixed, thanks! There are 2 small changes I made in odbxx.h, please pull.
> There was one missing enable_jsroot in manalyzer, please pull yourself.
pulled, pushed to rootana, thanks for fixing it!
K.O. |
|
2871
|
09 Oct 2024 |
Lukas Gerritzen | Suggestion | odbedit minor quality of life | I have made two minor quality of life changes to odbedit.
- cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
- Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)
/* cd */
else if (param[0][0] == 'c' && param[0][1] == 'd') {
- compose_name(pwd, param[1], str);
+ if (strlen(param[1]) == 0)
+ strcpy(str, "/");
+ else
+ compose_name(pwd, param[1], str);
status = db_find_key(hDB, 0, str, &hKey);
@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)
cm_disconnect_experiment();
+ printf("\n");
exit(EXIT_SUCCESS);
}
Please consider incorporating those changes to odbedit.
Lukas |
|
2872
|
09 Oct 2024 |
Stefan Ritt | Suggestion | odbedit minor quality of life | Ok, accepted, done and pushed.
Stefan
| Lukas Gerritzen wrote: | I have made two minor quality of life changes to odbedit.
- cd command: Typing cd without arguments now changes the directory to /, similar to the behaviour of the cd command in Linux sending you to the home directory.
- Exit behavior: Upon exiting the program with Ctrl+C, a newline character is printed so that the command line starts on an empty line rather than the last line from odbedit.
Here's the diff:
@@ -1668,7 +1668,10 @@ int command_loop(char *host_name, char *exp_name, char *cmd, char *start_dir)
/* cd */
else if (param[0][0] == 'c' && param[0][1] == 'd') {
- compose_name(pwd, param[1], str);
+ if (strlen(param[1]) == 0)
+ strcpy(str, "/");
+ else
+ compose_name(pwd, param[1], str);
status = db_find_key(hDB, 0, str, &hKey);
@@ -2962,6 +2965,7 @@ void ctrlc_odbedit(INT i)
cm_disconnect_experiment();
+ printf("\n");
exit(EXIT_SUCCESS);
}
Please consider incorporating those changes to odbedit.
Lukas |
|
|
2770
|
17 May 2024 |
Konstantin Olchanski | Bug Report | odbedit load into the wrong place | Trying to restore IRIS ODB was a nasty surprise, old save files are in .odb format and odbedit "load xxx.odb"
does an unexpected thing.
mkdir tmp
cd tmp
load odb.xml loads odb.xml into the current directory "tmp"
load odb.json same thing
load odb.odb loads into "/" unexpectedly overwriting everything in my ODB with old data
this makes it impossible for me to restore just /equipment/beamline from old .odb save file (without
overwriting all of my odb with old data).
I look inside db_paste() and it looks like this is intentional, if ODB path names in the odb save file start
with "/" (and they do), instead of loading into the current directory it loads into "/", overwriting existing
data.
The fix would be to ignore the leading "/", always restore into the current directory. This will make odbedit
load consistent between all 3 odb save file formats.
Should I apply this change?
K.O. |
|
2774
|
18 May 2024 |
Stefan Ritt | Bug Report | odbedit load into the wrong place | Taht's strange. I always was under the impression that .odb files are loaded relatively to the current location in
the ODB. The behaviour should not be different for different data formats, so I agree to change the .odb loading to
behave like the .xml and .json save/load.
Stefan |
|
896
|
26 Jul 2013 |
Konstantin Olchanski | Bug Report | odbedit fixed size buffer overrun | odbedit uses a fixed size buffer for ODB data. If an array in ODB is bugger than this size,
db_get_data() correctly returns DB_TRUNCATED and there is no memory overwrite, but the following
code for printing the data does not know about this truncation and proceeds printing memory
values contained after the end of the fixed size data buffer - in the current case, this memory
somehow had the contents of the shell history - this very confused the MIDAS users as they though
that the funny printout was actually in ODB. This is in the print_key() function. If db_get_data()
returns DB_TRUNCATED, it should allocate a buffer of bigger size. This (and the previous) bug found
by the TIGRESS experiment. K.O. |
|
575
|
07 May 2009 |
Konstantin Olchanski | Bug Report | odbedit bad ctrl-C | When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal
enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes
^H instead).
This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.
When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty
erase ^H".
The terminal is set into this funny state by system.c::getchar() and normal settings are never restored
unless the midas program calls getchar(1) at the end. If the program does not finish normally, original
terminal settings are never restored and the terminal is left in a funny state.
It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal
settings automatically for us.
K.O. |
|
588
|
04 Jun 2009 |
Stefan Ritt | Bug Report | odbedit bad ctrl-C | > When using "/bin/bash" shell, if I exit odbedit (and other midas programs) using ctrl-C, the terminal
> enters a funny state, "echo" is turned off (I cannot see what I type), "delete" key does not work (echoes
> ^H instead).
>
> This problem does not happen if I exit using the "exit" command or if I use the "/bin/tcsh" shell.
>
> When this happens, the terminal can be restored to close to normal state using "stty sane", and "stty
> erase ^H".
>
> The terminal is set into this funny state by system.c::getchar() and normal settings are never restored
> unless the midas program calls getchar(1) at the end. If the program does not finish normally, original
> terminal settings are never restored and the terminal is left in a funny state.
>
> It is not clear why the problem does not happen with /bin/tcsh - perhaps they restore sane terminal
> settings automatically for us.
> K.O.
Who uses bash ??? And who keeps baning on Ctrl-C, when there is a nice "exit" command ;-)
Well, I implemented a simple CTRL-C handler in odbedit (Rev. 4503) which resets the terminal before exiting.
Give it a try. Of course this cannot catch a hard kill (-9), but CTRL-C works now correctly under bash at
least. |
|
562
|
18 Feb 2009 |
Konstantin Olchanski | Info | odbc sql history mlogger update | > mhttpd and mlogger have been updated with potentially troublesome changes.
> These new features are now available:
> - a "feature complete" implementation of "history in an SQL database".
The mlogger SQL history driver has been updated with improvements that make this new system usable in
production environment: the silly "create all tables on startup, every time, even if they already exist" is fixed,
mlogger survives restarts of mysqld and checks that existing sql columns have data types compatible with the
data we are trying to write.
There are still a few trouble spots remaining. For example, in mapping midas names into sql names (sql names
have more restrictions on permitted characters) and in reverse mapping of sql data types to midas data types.
To properly solve this, I may have to save the midas names and data types into an additional index table.
Included is the mh2sql utility for importing existing history files into an SQL database (in the same way as if
they were written into the database by mlogger).
The mhttpd side of this system still needs polishing, but should be already fully functional.
A preliminary version of documentation for this new SQL history system is here. After additional review and
editing it will be committed to the midas midox documentation. Included are full instructions on enabling
writing of midas history into a MySQL database.
http://ladd00.triumf.ca/~olchansk/midas/Internal.html#History_sql_internal
svn revision 4452
K.O. |
|
1448
|
20 Feb 2019 |
Konstantin Olchanski | Info | odb needs protection against ctrl-c | Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
and strange things will happen (including odb corruption).
In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
arrangements to make it happen, but I have seen it happen in normal use when running experiments.
K.O.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL :
y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD :
0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at
alarm.c:310 [opt]
frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at
odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1 |
|
1449
|
20 Feb 2019 |
Stefan Ritt | Info | odb needs protection against ctrl-c | Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas. The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is
done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the
main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
while (status != RPC_SHUTDOWN && status != SS_ABORT);
Any use-written program should do the same (well, probably this is nowhere documented).
Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the
semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore
is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
Stefan
> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
>
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
>
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
>
> K.O.
>
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL :
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD :
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at
> alarm.c:310 [opt]
> frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1 |
|
1450
|
20 Feb 2019 |
Konstantin Olchanski | Info | odb needs protection against ctrl-c | Commit f81ff3c protects db_lock/unlock, but not any of the other functions. What if we do ctrl-c in the middle
of some odb write operation in the middle of memory allocation, etc.
A sure way to corrupt odb.
Perhaps we should disallow odb access from signal handlers? But we still want to be able to stop midas
programs using ctrl-c, even if the program is in some infinite loop somewhere and is not processing
midas events (no calls to cm_yield(), etc).
Maybe I should change the ctrl-c handler to set a flag for cm_yield() to return SS_EXIT,
and additional ctrl-c do nothing if this flag is already set? (maybe abort() if they do ctrl-c 10 times?).
K.O.
> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
>
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
>
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
>
> K.O.
>
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL :
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD :
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at
> alarm.c:310 [opt]
> frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1 |
|
1451
|
20 Feb 2019 |
Konstantin Olchanski | Info | odb needs protection against ctrl-c | > Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.
Hmm... I am looking at the ctrl-c handler inside odbedit.
Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
see stack trace in https://bitbucket.org/tmidas/midas/issues/99
So maybe only the odbedit ctrl-c handler is defective...
I will take a look at what the other ctrl-c handler does.
Safest is probably to call exit() without calling cm_disconnect_experiment().
From the ctrl-c handler, if we call cm_disconnect_experiment() -
- if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
- if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)
I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().
exit() is not super safe because the user may have attached some code to it that may access odb. (our
default atexit() handler just prints an error message).
K.O.
> The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is
> done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the
> main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
>
> while (status != RPC_SHUTDOWN && status != SS_ABORT);
>
> Any use-written program should do the same (well, probably this is nowhere documented).
>
> Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the
> semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore
> is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
>
> Stefan
>
> > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> >
> > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > and strange things will happen (including odb corruption).
> >
> > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> >
> > K.O.
> >
> > (lldb) bt
> > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> > * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> > frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> > frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> > frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> > frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> > frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> > frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> > frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> > frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> > frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> > frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> > frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> > frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> > frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL :
> > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD :
> > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> > frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> > frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at
> > alarm.c:310 [opt]
> > frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> > frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> > frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> > frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> > frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at
> > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> > frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> > frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> > frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1 |
|
1452
|
20 Feb 2019 |
Stefan Ritt | Info | odb needs protection against ctrl-c | Have you read what I wrote? The current ctrl-c handler just sets the _ctrlc_pressed flag. It might be that some programs do not correctly interprete the return of cm_yield(), certainly the frontend does it correctly. On the SECOND ctrl-c, the program gets
(internally) hard aborted, equivalent to calling abort(). Not sure if the code works everywhere, I see now that cm_yield(() should maybe return SS_ABORT like
if (_ctrlc_pressed)
return SS_ABORT;
then command_loop will exit gracefully. Have to check mfe.c, mlogger.c etc. if they all check for SS_ABORT returned from cm_yield().
> > Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.
>
> Hmm... I am looking at the ctrl-c handler inside odbedit.
>
> Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
> see stack trace in https://bitbucket.org/tmidas/midas/issues/99
>
> So maybe only the odbedit ctrl-c handler is defective...
>
> I will take a look at what the other ctrl-c handler does.
>
> Safest is probably to call exit() without calling cm_disconnect_experiment().
>
> From the ctrl-c handler, if we call cm_disconnect_experiment() -
> - if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
> - if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)
>
> I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().
>
> exit() is not super safe because the user may have attached some code to it that may access odb. (our
> default atexit() handler just prints an error message).
>
> K.O.
>
>
> > The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is
> > done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the
> > main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> >
> > while (status != RPC_SHUTDOWN && status != SS_ABORT);
> >
> > Any use-written program should do the same (well, probably this is nowhere documented).
> >
> > Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the
> > semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore
> > is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> >
> > Stefan
> >
> > > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > >
> > > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > > and strange things will happen (including odb corruption).
> > >
> > > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > >
> > > K.O.
> > >
> > > (lldb) bt
> > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> > > * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> > > frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> > > frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> > > frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> > > frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> > > frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> > > frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> > > frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> > > frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> > > frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> > > frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> > > frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> > > frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> > > frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL :
> > > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD :
> > > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> > > frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> > > frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at
> > > alarm.c:310 [opt]
> > > frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> > > frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> > > frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> > > frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> > > frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at
> > > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> > > frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> > > frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> > > frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1 |
|
1318
|
13 Oct 2017 |
Konstantin Olchanski | Info | odb multithread support repaired | multithreaded access to odb was implemented back in 2013-2014. but recently a bug surfaced -
there was a race condition in the odb locking code against cm_watchdog(). Somehow this only
affected the mserver for the DRAGON experiment at TRIUMF. This is now fixed on the branch
feature/midas-2017-10. (this branch collects all the code that needs additional testing before
merging into develop and becoming the next release of midas).
K.O. |
|
2422
|
08 Aug 2022 |
Konstantin Olchanski | Info | odb disallow key names that start or end with spaces | while testing the new odb editor, we ran into a number of problems with key names
that start or end with spaces. we cannot think of any valid use case for such key
names (subdirectories and variables) and we think they could only have been
created by mistake. ODB now disallows such names. K.O. |
|