Back Midas Rome Roody Rootana
  Midas DAQ System, Page 39 of 154  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 7, event buffer polling frequencies 
> > > > > In this technical note, I write down the workings of the midas event buffer code
> > > > > we need to understand and write down how the event buffer code works.
> > > > > bm_send_event() does this ...
> > > > rpc_send_event() does this ...
> > > > mserver rpc_server_receive() does this ...
> > bm_read_buffer() does this ...
> > bm_dispatch_event() does this ...
> > - bm_push_event(buffer name) ...
> > - bm_check_buffers() - call bm_push_event() for all open buffers ...
> > - bm_receive_event() ...
> > - bm_receive_event_alloc() ...
> remote client: cm_yield() -> bm_poll_event() -> bm_receive_event() -> bm_dispatch_event()
> remote client: ss_suspend() -> receive MSG_BM -> rpc_client_dispatch() -> bm_poll_event() -> ...
> mserver: cm_dispatch_ipc -> bm_notify_client() -> send MSG_BM

We now understand that cm_yield() polls the data buffers for new events, adding to buffer lock congestion. What is the
polling frequency in different programs:

mlogger, no run: 1000 ms cm_yield() split by firing of cm_watchdog() (average sleep time is 500ms), poll frequency 2 Hz
mlogger, when running at high data rate: most time is spent looping inside bm_check_buffers(), poll frequency is ~1 Hz
mdump: same thing, 1000 ms cm_yield() is split by cm_watchdog(), poll frequency is 2 Hz
odbedit: 100 ms cm_yield(), poll frequency is 10 Hz, normally it polls just the SYSMSG buffer.
mfe.c frontend: 100 ms cm_yield(), but it makes no event requests, so no polling of event buffers
mhttpd: 0 ms (!!!) cm_yield(), period 10 ms (there is a 10 ms ss_suspend() somewhere there), polls SYSMSG at crazy frequency of 100 Hz.
mserver: 5000 ms cm_yield() no polling for events

hmm... perhaps the logic of cm_yield() should be changed to only poll for new events once per second -
as for rare events we receive them through the "B" messages from the buffer writer, for high rate
messages we process them in a batch via the read cache and the loop in bm_check_buffer().

Next is to write down the communication between buffer writers and buffer readers.

to be continued,
K.O.
    Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > In this technical note, I write down the workings of the midas event buffer code

Event buffer readers and writers need to communicate following information:

- writer to reader: when new events are written to the buffer, send a message to wakeup any readers that have matching requests
- reader to writer: when buffer is full and we are waiting for free space, readers need to send us a message after they free up some space

Writer to reader path:
writer - bm_send_event() -> bm_wake_up_client_locked() -> for all readers with matching requests -> if "read_wait" is set, send "B" message
reader - ... -> bm_read_buffer() -> bm_wait_for_more_events() -> set "read_wait" to TRUE -> ss_suspend(1000, MSG_BM) - wait for "B" message, also poll the 
buffer every 1000 msec -> when new event is received, clear "read_wait".

Reader to writer path:
writer - bm_send_event() -> bm_wait_for_free_space() -> set "write_wait" to the amount of free space requested -> ss_suspend(1000, MSG_BM) - wait for "B" 
message, also poll every 1000 msec -> clear "write_wait" to zero.
reader - ... -> bm_read_buffer() -> (also via bm_fill_read_cache() -> bm_wakeup_producers_locked() -> if we are a GET_ALL reader -> if buffer is 50% empty -> 
if write_wait < free_space -> send "B" message.

N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
I think this causes several logic errors: "read_wait" is mistakenly set for buffers where we have
not requested anything; and "read_wait" is forcibly cleared when we are actually waiting
for a "B" message, with "read_wait" cleared, this message will not be sent. But because we  also
poll for new events (nominally every 1000 msec, with cm_watchdog() cutting it down to average 500 msec),
we will not see this fault unless we look carefully for any extra delays between sending and receiving events.

N.B.2. There is a mistake in bm_wakeup_producers(), it should not send "B" messages to clients with zero "write_wait". (fixed in the new code).

and that's all she wrote,
K.O.
    Reply  10 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > > In this technical note, I write down the workings of the midas event buffer code
> Event buffer readers and writers need to communicate following information:
> 
> N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
> bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
> I think this causes several logic errors...

bm_mark_read_waiting() is removed in the new code, it is unnecessary. This uncovered a number of problems
in the writer-to-reader communications, fixed in the new code:

- bm_poll_event() did not poll anything because internal flags were not set right
- reader "read_wait" now means "I am waiting for new data, please send me a notification "B" message".
- writer sends a notification "B" message and clears "read_wait" to stop sending more notifications until the reader asks for more data.
- for remote clients, there is interplay between the 500 ms sec blank-off in sending BM_MSG notifications from the mserver and the 1000 ms timeout in the loop of bm_poll_event(). Do not 
change one of them without thinking how it will affect the other one and how they interact.

K.O.
Entry  10 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
cm_watchdog() has been removed from the latest midas sources. The watchdog functions performed by cm_watchdog() were 
moved to cm_yield() - those are - maintaining odb and event buffer "last active" timestamps and checking for and removing of 
timed-out clients.

Those who write midas programs should ensure that they call cm_yield() at least every 1-2 seconds (for the normal 10 second 
timeout settings). As always, before calling potentially time consuming operations, such as accessing slow responding 
hardware, one should increase or disable their watchdog timeout.

Those who write midas programs that do not use cm_yield() (i.e. the mserver), should call cm_periodic_tasks() instead with 
the same period as described for cm_yield() above.

Removal of cm_watchdog() solves many problems in the midas code base:
- firing of cm_watchdog() at random times in random places makes it difficult to do static code path analysis (call paths, etc) - 
this is needed to ensure correct multithread locking, etc
- ditto caused trouble with multithread locking when cm_watchdog() fires *inside* the pthread mutex locking library itself 
causing mutexes to be in an inconsistent state. This had to be kludged against in the ODB multithread locks - now this kludge 
can be removed.
- many non-midas codes in experiment frontends, etc was not expecting and did not correctly handle firing of the 
cm_watchdog() SIGALARM at random times (i.e. select() with timeout returned too soon, etc).
- today, UNIX signals are pretty obscure, best avoided. (i.e. interaction of signals and threads is not super will defined, etc).

commits up to (and including) plus merge of branch feature/remove_cm_watchdog
https://bitbucket.org/tmidas/midas/commits/9f1775d2fc75d0de0b9d4ef1abc7b2fb9bacca28

K.O.
Entry  18 Jan 2019, Konstantin Olchanski, Info, bitbucket issue tracker "feature" 
It turns out the bitbucket issue tracker has a feature - I cannot make it automatically add me 
to the watcher list of all new issues.

So when you create a new issue, I think I get one message about it, and no more.

If there is some other activity on the issue (Stefan answers, Thomas answers),
I most definitely will not see any of it.

So in case you wondered why I sometimes completely do not react to some bug reports, this 
is why.

So much for "advanced" and "highly automated" and "intelligent" bug tracking.

K.O.
    Reply  21 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> cm_watchdog() has been removed from the latest midas sources
> Removal of cm_watchdog() solves many problems in the midas code base:

Removal of cm_watchdog() creates new problems:

a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT) wait for free space and wait for new event do not update the timeouts (need to add a call 
to cm_periodic_tasks())
b) frontends that talk to slow external equipment now die unless they have their timeout adjusted to be longer than the longest equipment operation (they 
were already supposed to do this, but...)
c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

As one solution, we may bring an automatic cm_watchdog() back, but running from a thread instead of from SIGALARM.

K.O.
    Reply  24 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> > cm_watchdog() has been removed from the latest midas sources
> > Removal of cm_watchdog() solves many problems in the midas code base:
> Removal of cm_watchdog() creates new problems:
> a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT)
> b) frontends that talk to slow external equipment
> c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

The watchdog is back, in a "light" form. Added:

- cm_watchdog_thread() - runs every 2 seconds and updates the timestamps on ODB and all open event buffers (SYSMSG, SYSTEM, etc).
- cm_start_watchdog_thread() - added to mfe.c and mhttpd - so user frontends work the same as before cm_watchdog() removal
- cm_stop_watchdog_thread() - added to cm_disconnect_experiment() to avoid leaving the thread running after we closed odb and all event buffers.

As before, the watchdog only runs on locally attached midas programs. For programs attached remotely via the mserver, the mserver handles the watchdog functions.

This new light-weight watchdog thread only updates the timestamps, it does not check and remove dead clients, it does not check the alarms. These functions are now performed 
by cm_yield() and cm_periodic_tasks(). At least some program in an experiment should call them periodically. (normally, at least mlogger and mhttpd will do that).

Programs that accidentally relied on SIGALRM firing at 1Hz may still be affected - i.e. with the old cm_watchdog(), ::sleep(1000) will only sleep for 1 second (interrupted by 
SIGALARM), now it will sleep for the full 1000 seconds. Other syscalls, i.e. select(), are similarly affected.

For now, I think only mfe.c frontends and mhttpd need the watchdog thread. With luck all the other midas programs (mlogger, mdump, etc) will run fine without it.

K.O.
Entry  07 Feb 2019, Stefan Ritt, Info, History panels in custom pages Screenshot_2019-02-07_at_10.39.44_.pngtriggerrate.txt
A new tag has been implemented to display history panels in custom pages, integrated in the 
new custom page design from 2017. The full documentation can be found at 

https://midas.triumf.ca/MidasWiki/index.php/New_Custom_Pages_(2017)#mhistory

Attached is a simple example of such a panel and the result. You have to rename triggerrate.txt to triggerrate.html if you want to use it (can't do it here since otherwise the browser wants to render it which does not work outside of midas).

Stefan
    Reply  08 Feb 2019, Thomas Lindner, Info, History panels in custom pages 
> A new tag has been implemented to display history panels in custom pages, integrated in the 
> new custom page design from 2017. The full documentation can be found at 
> 

As part of consolidating/cleaning the MIDAS Wiki documentation, the "New Custom Pages" was folded into the main "Custom Page".  So to see a
description of Stefan's new functionality please go to 

https://midas.triumf.ca/MidasWiki/index.php/Custom_Page#mhistory
Entry  11 Feb 2019, Konstantin Olchanski, Info, json-rpc request for ODB /Script and /CustomScript. 
I added json-rpc requests for ODB /Script and /CustomScript (the first one shows up on the status page in the left hand side menu, the 
second one is "hidden", intended for use by custom pages).

To invoke the RPC method do this: (from mhttpd.js). Use parameter "customscript" instead of "script" to execute scripts from ODB 
/CustomScript.

One can identify the version of MIDAS that has this function implemented by the left hand side menu - the script links are placed by script 
buttons.

<pre>
function mhttpd_exec_script(name)
{
   //console.log("exec_script: " + name);
   var params = new Object;
   params.script = name;
   mjsonrpc_call("exec_script", params).then(function(rpc) {
      var status = rpc.result.status;
      if (status != 1) {
         dlgAlert("Exec script \"" + name + "\" status " + status);
      }
   }).catch(function(error) {
      mjsonrpc_error_alert(error);
   });
}
</pre>

The underlying code moved from mhttpd.cxx to the midas library as cm_exec_script(odb_path);

K.O.
Entry  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.

This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
and strange things will happen (including odb corruption).

In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
arrangements to make it happen, but I have seen it happen in normal use when running experiments.

K.O.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
    frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
    frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
    frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
    frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
    frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
    frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
    frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
    frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
    frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
    frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
    frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
    frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
alarm.c:310 [opt]
    frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
    frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
    frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
    frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
    frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
    frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
    frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
    frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Stefan Ritt, Info, odb needs protection against ctrl-c 
Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas. The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in

 while (status != RPC_SHUTDOWN && status != SS_ABORT);

Any use-written program should do the same (well, probably this is nowhere documented).

Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.

Stefan

> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> 
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
> 
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> 
> K.O.
> 
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
>   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
>     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
>     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
>     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
>     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
>     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
>     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
>     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
>     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
>     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
>     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
>     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
>     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
>     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
>     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> alarm.c:310 [opt]
>     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
>     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
>     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
>     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
>     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
>     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
>     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
>     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Commit f81ff3c protects db_lock/unlock, but not any of the other functions. What if we do ctrl-c in the middle
of some odb write operation in the middle of memory allocation, etc.

A sure way to corrupt odb.

Perhaps we should disallow odb access from signal handlers? But we still want to be able to stop midas
programs using ctrl-c, even if the program is in some infinite loop somewhere and is not processing
midas events (no calls to cm_yield(), etc).

Maybe I should change the ctrl-c handler to set a flag for cm_yield() to return SS_EXIT,
and additional ctrl-c do nothing if this flag is already set? (maybe abort() if they do ctrl-c 10 times?).

K.O.

> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> 
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
> 
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> 
> K.O.
> 
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
>   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
>     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
>     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
>     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
>     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
>     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
>     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
>     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
>     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
>     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
>     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
>     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
>     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
>     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
>     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> alarm.c:310 [opt]
>     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
>     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
>     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
>     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
>     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
>     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
>     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
>     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
> Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.

Hmm... I am looking at the ctrl-c handler inside odbedit.

Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
see stack trace in https://bitbucket.org/tmidas/midas/issues/99

So maybe only the odbedit ctrl-c handler is defective...

I will take a look at what the other ctrl-c handler does.

Safest is probably to call exit() without calling cm_disconnect_experiment().

From the ctrl-c handler, if we call cm_disconnect_experiment() -
- if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
- if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)

I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().

exit() is not super safe because the user may have attached some code to it that may access odb. (our
default atexit() handler just prints an error message).

K.O.


> The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
> done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
> main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> 
>  while (status != RPC_SHUTDOWN && status != SS_ABORT);
> 
> Any use-written program should do the same (well, probably this is nowhere documented).
> 
> Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
> semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
> is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> 
> Stefan
> 
> > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > 
> > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > and strange things will happen (including odb corruption).
> > 
> > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > 
> > K.O.
> > 
> > (lldb) bt
> > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> >   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> >     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> >     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> >     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> >     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> >     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> >     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> >     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> >     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> >     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> >     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> >     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> >     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> >     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> >     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> >     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> > alarm.c:310 [opt]
> >     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> >     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> >     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> >     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> >     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> >     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> >     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> >     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Stefan Ritt, Info, odb needs protection against ctrl-c 
Have you read what I wrote? The current ctrl-c handler just sets the _ctrlc_pressed flag. It might be that some programs do not correctly interprete the return of cm_yield(), certainly the frontend does it correctly. On the SECOND ctrl-c, the program gets 
(internally) hard aborted, equivalent to calling abort(). Not sure if the code works everywhere, I see now that cm_yield(() should maybe return SS_ABORT like 

if (_ctrlc_pressed)
  return SS_ABORT;

then command_loop will exit gracefully. Have to check mfe.c, mlogger.c etc. if they all check for SS_ABORT returned from cm_yield().

> > Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.
> 
> Hmm... I am looking at the ctrl-c handler inside odbedit.
> 
> Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
> see stack trace in https://bitbucket.org/tmidas/midas/issues/99
> 
> So maybe only the odbedit ctrl-c handler is defective...
> 
> I will take a look at what the other ctrl-c handler does.
> 
> Safest is probably to call exit() without calling cm_disconnect_experiment().
> 
> From the ctrl-c handler, if we call cm_disconnect_experiment() -
> - if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
> - if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)
> 
> I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().
> 
> exit() is not super safe because the user may have attached some code to it that may access odb. (our
> default atexit() handler just prints an error message).
> 
> K.O.
> 
> 
> > The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
> > done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
> > main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> > 
> >  while (status != RPC_SHUTDOWN && status != SS_ABORT);
> > 
> > Any use-written program should do the same (well, probably this is nowhere documented).
> > 
> > Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
> > semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
> > is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> > 
> > Stefan
> > 
> > > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > > 
> > > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > > and strange things will happen (including odb corruption).
> > > 
> > > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > > 
> > > K.O.
> > > 
> > > (lldb) bt
> > > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> > >   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> > >     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> > >     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> > >     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> > >     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> > >     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> > >     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> > >     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> > >     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> > >     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> > >     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> > >     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> > >     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> > >     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> > > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> > > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> > >     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> > >     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> > > alarm.c:310 [opt]
> > >     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> > >     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> > >     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> > >     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> > >     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> > > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> > >     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> > >     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> > >     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
Entry  27 Feb 2019, Konstantin Olchanski, Info, mhttpd magic urls 
Here is the list of mhttpd magic URLs.

http "get" path:

handle_http_message()
handle_http_get()
?mjsonrpc_schema -> serve mjsonrpc_get_schema() // JSON RPC Schema in JSON format
?mjsonrpc_schema_text -> serve mjsonrpc_schema_to_text() // same, but human-readable
handle_decode_get()
decode_get()
interprete()

http "post" path:

handle_http_message()
handle_http_post()
?mjsonrpc -> serve mjsonrpc_decode_post_data() // process RPC request
handle_decode_post()
decode_post()
- maybe decode file attachment
interprete()

interprete() path:

url contains favicon.{ico,png} -> send_icon()
url contains mhttpd.css -> send_css() (see ODB /Experiment/CSS File) // obsolete? see midas.css below
url ends with "mp3" -> send_resource(url) // alarm sound
url contains midas.js -> send_resource("midas.js")
url contains midas.css -> send_resource("midas.css")
url ... ditto mhttpd.js
url ... ditto obsolete.js
url ... ditto controls.js
cmd is "example" -> send_resource("example.html")
?script -> cm_exec_script(), see ODB /Script/...
?customscript -> same, see ODB /CustomScript/...
cmd is "start" -> send resource start.html
cmd is blank -> send resource status.html
cmd is "status" -> send resource status.html
cmd is "newODB" -> send resource "odb.html" // not used at the moment
cmd is "programs" -> programs.html
cmd is "alarms" -> alarms.html
cmd is "transition" -> transition.html
cmd is "messages" -> messages.html
cmd is "config" and url is not "HS/" -> config.html
cmd is "chat" -> chat.html
cmd is "buffers" -> buffers.html
// elog section
cmd is "Show elog" -> elog
cmd is "Query elog" -> elog
cmd is "New elog" -> elog
cmd is "Edit elog" -> elog
cmd is "Reply elog" -> elog
cmd is "Last elog" -> elog
cmd is "Submit Query" -> elog
// end of elog section
url is "spinning-wheel.gif" -> send_resource("spinning-wheel.gif")
// section "new custom pages"
if ODB /Custom exists,
get value of $MIDAS_DIR or $MIDASSYS or "/home/custom"
write it to ODB /Custom/Path (if it does not already exist)
concatenate value of ODB /CustomPath and the URL (without a "/" in between)
if this file exists, send_resource() it.
// end of "new custom pages" section
// section for old AJAX requests
cmd is "jset", "jget", etc -> javascript_commands()
// commented out: send_resource(command+".html") // if cmd is "start" will send start.html
cmd is "mscb" -> show_mscb_page()
cmd is "help" -> show_help_page()
cmd is "trigger" -> send RPC RPC_MANUAL_TRIG
cmd is "Next subrun" -> set ODB "/Logger/Next subrun" to TRUE
cmd is "cancel" -> redirect to getparam("redir")
cmd is "set" -> show_set_page() // set ODB value
cmd is "find" -> show_find_page()
cmd is "CNAF" or url is "CNAF" -> show_cnaf_page()
cmd is "elog" -> redirect to external ELOG or send_resource("elog_show.html")
cmd starts with "Elog last" -> send_resource("elog_query.html") // Elog last N days & co
cmd is "Create Elog from this page" -> redirect to "?cmd=new elog" // called from ODB editor
cmd is "Submit elog" -> submit_elog() // usually a POST request from the "elog_edit.html"
cmd is "elog_att" -> show_elog_attachment()
cmd is "accept" -> what does this do?!?
cmd is "eqtable" -> show_eqtable_path() // page showing equipment variables as a table ("slow control page")
// section for the sequencer
cmd is "sequencer" -> show_seq_page()
cmd is "start script" -> seq
cmd is "cancel script" -> seq
cmd is "load script" -> ...
cmd is "new script" -> ...
cmd is "save script" -> ...
cmd is "edit script" -> ...
cmd is "spause" -> ...
cmd is "sresume" -> ...
cmd is "stop immeditely" -> ...
cmd is "stop after current run" -> ...
cmd is "cancel stop after current run" -> ...
cmd is "custom" -> show_custom_page()
cmd is "odb" -> show_odb_page()
show_error()

K.O.
    Reply  28 Feb 2019, Konstantin Olchanski, Info, resource file search path, mhttpd magic urls 
> url contains midas.js -> send_resource("midas.js")

mhttpd looks for resource files in these directories in this order:

(ODB /experiment/Resources)/filename ### this ODB entry is not created automatically (hidden)
./filename                       ### for testing custom files, start mhttpd in the directory with the test files
./resources/filename     ### ditto
$MIDAS_DIR/filename   ### per experiment custom files or overwrite of midas standard files
$MIDAS_DIR/resources/filename
$MIDASSYS/resources/filename ### standard midas resource files live here: midas.js, midas.css, etc

K.O.
Entry  01 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
Before the days of javascript and ajax and web 2.0, MIDAS introduced "custom pages" for
building graphical display that could show "live" data from MIDAS and that could
have buttons and controls to operate slow controls equipment, etc.

This is how it works:
- entries from ODB /Custom are shown on the MIDAS menu -
an odb entry /Custom/Foo generate a link labeled "Foo"
to a special mhttpd url /CS/Foo.
- access to mhttpd url /CS/Foo invokes show_custom_page()
- show_custom_page() reads the custom page file name from ODB /Custom/Foo
- content of this file is served the web browser (after substituting the <odb> tags with values from ODB).
- in addition, it is possible to store the contents of the custom page in the ODB variable /Custom/Foo itself,
making it easy to edit the custom pages through the web browser (using the mhttpd odb editor).
- (if the value of /Custom/Foo has no "\n", then it's a file name, otherwise it is the page contents).
- if /Custom/Path exists, it is prepended to all file names.
- read more about this here:
https://midas.triumf.ca/MidasWiki/index.php//Custom_ODB_tree
https://midas.triumf.ca/MidasWiki/index.php/Internal_Custom_Page

This method required each custom web page served by mhttpd to have a corresponding
entry in ODB /Custom. Quite tedious for big experiments with a large number
of web pages (in T2K/ND280/FGD, for 1 page per frontend board, these entries
had to be created using a script, no practical to create them manually).

To fix this, in 2015, a modification was made to the code for ODB /Custom/Path. Instead of reading the file name
from ODB /Custom/Foo, the filename was made by adding /Custom/Path and the URL itself:
URL /CS/Foo.html will serve the file ODBValue["/Custom/Path"]/Foo.html. In this scheme,
the ODB entry /Custom/Foo.html is optional and is only needed to create a link in the MIDAS menu.

Commit: https://bitbucket.org/tmidas/midas/commits/5a2ef7d66df353684c4b40882a391b64a068f61f made on 2015-03-19.
Corresponding midas forum entry: https://midas.triumf.ca/elog/Midas/1109 made on 2015-09-09.

This change had the effect of creating 2 different operating modes for custom pages:
- if ODB /Custom/Path was absent (the default case), file names are taken from ODB /Custom/Foo.
- if ODB /Custom/Path is present (it has be created manually), file names are taken from the URLs.

These two modes could not be mixed, some experiments (TRIUMF MUSR/BNMR, CERN ALPHA, etc)
continued to use the old scheme (no /Custom/Path), some experiments switched
to the new scheme (DEAP, etc - TBC?).

(The best I can tell, this also create a security problem were using URLs containing "..", one can
force mhttpd to escape the /Custom/Path filename jail to cause it to serve arbitrary files from
the filesystem. Note that many web browsers remove the ".." entries from URLs, special tricks
are required to exploit this).

Then, in 2017, something called "new custom pages" was added to mhttpd interprete().
commit https://bitbucket.org/tmidas/midas/commits/c574a45c6da1290430f5a85a348fb629596a0de0 made on 2017-07-24
(I cannot find the corresponding explanation on the midas forum).

The best I can tell this code is intended to do the same thing that was done before:
map URLs like http://localhost:8080/Foo.html to file name ODBValue["/Custom/Path"]/Foo.html
and if this file exists, serve it to the browser.

However, the code has a bug - there is no check for absence of ODB /Custom/Path. If it does
not exist (as in experiments that do not use custom pages or use the old-style custom pages without
/Custom/Path), an empty string is used and URLs like http://localhost:8080/Foo.html becomes
mapped to /Foo.html (in the root of the file system, "/"!!!).

This introduced 3 problems:
- http://localhost:8080/etc/hosts serve the system password file (a security problem)
- because this code is before the odb code, it intercepts (and breaks) valid ODB URLs:
- on Linux, the ODB editor URL http://localhost:8080/root instead of showing the ODB root, attempts to serve the local subdirectory "/root" as a file
(fails, /root is a directory on most Linuxes). This is reported here: https://midas.triumf.ca/elog/Midas/1335 (2018-01-10)
- on MacOS, ODB editor URL http://localhost:8080/System instead of showing ODB /System, attempts to serve the local directory "/System" is a file (fails, 
/System is a subdirectory in MacOS). This is reported here: https://bitbucket.org/tmidas/midas/issues/156/mhttpd-odb-editor-cannot-open-system-on 
(2018-12-20)

The problem with broken access to the ODB editor URLs is fixed by commit:
https://bitbucket.org/tmidas/midas/commits/f071bc1daa4dc7ec582586659dd2339f1cd1fa21 (2018-12-21)
https://midas.triumf.ca/elog/Midas/1416

The fix unconditionally creates /Custom/Path and sets it to the value of $MIDAS_DIR or $MIDASSYS.

This breaks experiments that use custom pages with no /Custom/Path (ALPHA, BNMR/BNQR), see
https://bitbucket.org/tmidas/midas/issues/157/odb-custom-is-broken, and this is impossible
to fix by deleting /Custom/Path as it will be created again.

Also the problem with serving the system password file remains, see
https://midas.triumf.ca/elog/Midas/1425

So, a https://en.wikipedia.org/wiki/SNAFU

Since then, we have changed the mhttpd URL scheme: we no longer use URL subdirectories to navigate the ODB editor (we now use &odb_path=xxx)
and to refer to custom files (use now use &cmd=custom&page=xxx).

This liberates the mhttpd URL space for serving custom files without colliding with mhttpd internals and allows us to repair the current situation.

I will propose a solution in a follow-up message.

K.O.

P.S. The actual series of malfunctions is a bit more complicated than I described in this message,
but there is no need for getting to the very bottom bottom bottom of this as we can now do
a fairly clean solution.
    Reply  04 Mar 2019, Stefan Ritt, Info, Gyrations of custom pages and ODB /Custom/Path 
Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 

Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
$MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:

$MIDASSYS/resource -> /usr/local/midas/resources
<exptab>/resources -> /home/users/exp/resources

These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
back-door. 

If users need a more complex structure, they can put soft links into these directories.

The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
then open the file. If not existing, do the same for the <exptab>/resources/ directory.

This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
bullet.

Comments are welcome.

Stefan
    Reply  04 Mar 2019, Thomas Lindner, Info, Gyrations of custom pages and ODB /Custom/Path 
Hi Stefan and Konstantin,

I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  

One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.

Cheers,
Thomas



> Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> 
> Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> 
> $MIDASSYS/resource -> /usr/local/midas/resources
> <exptab>/resources -> /home/users/exp/resources
> 
> These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> back-door. 
> 
> If users need a more complex structure, they can put soft links into these directories.
> 
> The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> 
> This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> bullet.
> 
> Comments are welcome.
> 
> Stefan
ELOG V3.1.4-2e1708b5