Back Midas Rome Roody Rootana
  Midas DAQ System, Page 93 of 142  Not logged in ELOG logo
    Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 7, event buffer polling frequencies 
> > > > > In this technical note, I write down the workings of the midas event buffer code
> > > > > we need to understand and write down how the event buffer code works.
> > > > > bm_send_event() does this ...
> > > > rpc_send_event() does this ...
> > > > mserver rpc_server_receive() does this ...
> > bm_read_buffer() does this ...
> > bm_dispatch_event() does this ...
> > - bm_push_event(buffer name) ...
> > - bm_check_buffers() - call bm_push_event() for all open buffers ...
> > - bm_receive_event() ...
> > - bm_receive_event_alloc() ...
> remote client: cm_yield() -> bm_poll_event() -> bm_receive_event() -> bm_dispatch_event()
> remote client: ss_suspend() -> receive MSG_BM -> rpc_client_dispatch() -> bm_poll_event() -> ...
> mserver: cm_dispatch_ipc -> bm_notify_client() -> send MSG_BM

We now understand that cm_yield() polls the data buffers for new events, adding to buffer lock congestion. What is the
polling frequency in different programs:

mlogger, no run: 1000 ms cm_yield() split by firing of cm_watchdog() (average sleep time is 500ms), poll frequency 2 Hz
mlogger, when running at high data rate: most time is spent looping inside bm_check_buffers(), poll frequency is ~1 Hz
mdump: same thing, 1000 ms cm_yield() is split by cm_watchdog(), poll frequency is 2 Hz
odbedit: 100 ms cm_yield(), poll frequency is 10 Hz, normally it polls just the SYSMSG buffer.
mfe.c frontend: 100 ms cm_yield(), but it makes no event requests, so no polling of event buffers
mhttpd: 0 ms (!!!) cm_yield(), period 10 ms (there is a 10 ms ss_suspend() somewhere there), polls SYSMSG at crazy frequency of 100 Hz.
mserver: 5000 ms cm_yield() no polling for events

hmm... perhaps the logic of cm_yield() should be changed to only poll for new events once per second -
as for rare events we receive them through the "B" messages from the buffer writer, for high rate
messages we process them in a batch via the read cache and the loop in bm_check_buffer().

Next is to write down the communication between buffer writers and buffer readers.

to be continued,
K.O.
    Reply  03 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > In this technical note, I write down the workings of the midas event buffer code

Event buffer readers and writers need to communicate following information:

- writer to reader: when new events are written to the buffer, send a message to wakeup any readers that have matching requests
- reader to writer: when buffer is full and we are waiting for free space, readers need to send us a message after they free up some space

Writer to reader path:
writer - bm_send_event() -> bm_wake_up_client_locked() -> for all readers with matching requests -> if "read_wait" is set, send "B" message
reader - ... -> bm_read_buffer() -> bm_wait_for_more_events() -> set "read_wait" to TRUE -> ss_suspend(1000, MSG_BM) - wait for "B" message, also poll the 
buffer every 1000 msec -> when new event is received, clear "read_wait".

Reader to writer path:
writer - bm_send_event() -> bm_wait_for_free_space() -> set "write_wait" to the amount of free space requested -> ss_suspend(1000, MSG_BM) - wait for "B" 
message, also poll every 1000 msec -> clear "write_wait" to zero.
reader - ... -> bm_read_buffer() -> (also via bm_fill_read_cache() -> bm_wakeup_producers_locked() -> if we are a GET_ALL reader -> if buffer is 50% empty -> 
if write_wait < free_space -> send "B" message.

N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
I think this causes several logic errors: "read_wait" is mistakenly set for buffers where we have
not requested anything; and "read_wait" is forcibly cleared when we are actually waiting
for a "B" message, with "read_wait" cleared, this message will not be sent. But because we  also
poll for new events (nominally every 1000 msec, with cm_watchdog() cutting it down to average 500 msec),
we will not see this fault unless we look carefully for any extra delays between sending and receiving events.

N.B.2. There is a mistake in bm_wakeup_producers(), it should not send "B" messages to clients with zero "write_wait". (fixed in the new code).

and that's all she wrote,
K.O.
    Reply  10 Jan 2019, Konstantin Olchanski, Info, note on the midas event buffer code, part 8, writer and reader communications 
> > > > > > > In this technical note, I write down the workings of the midas event buffer code
> Event buffer readers and writers need to communicate following information:
> 
> N.B.1. There is a fly in the ointment for the writer-to-reader path: a function called
> bm_mark_read_waiting() is called from several places to set and clear the reader "read_wait" flag.
> I think this causes several logic errors...

bm_mark_read_waiting() is removed in the new code, it is unnecessary. This uncovered a number of problems
in the writer-to-reader communications, fixed in the new code:

- bm_poll_event() did not poll anything because internal flags were not set right
- reader "read_wait" now means "I am waiting for new data, please send me a notification "B" message".
- writer sends a notification "B" message and clears "read_wait" to stop sending more notifications until the reader asks for more data.
- for remote clients, there is interplay between the 500 ms sec blank-off in sending BM_MSG notifications from the mserver and the 1000 ms timeout in the loop of bm_poll_event(). Do not 
change one of them without thinking how it will affect the other one and how they interact.

K.O.
Entry  10 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
cm_watchdog() has been removed from the latest midas sources. The watchdog functions performed by cm_watchdog() were 
moved to cm_yield() - those are - maintaining odb and event buffer "last active" timestamps and checking for and removing of 
timed-out clients.

Those who write midas programs should ensure that they call cm_yield() at least every 1-2 seconds (for the normal 10 second 
timeout settings). As always, before calling potentially time consuming operations, such as accessing slow responding 
hardware, one should increase or disable their watchdog timeout.

Those who write midas programs that do not use cm_yield() (i.e. the mserver), should call cm_periodic_tasks() instead with 
the same period as described for cm_yield() above.

Removal of cm_watchdog() solves many problems in the midas code base:
- firing of cm_watchdog() at random times in random places makes it difficult to do static code path analysis (call paths, etc) - 
this is needed to ensure correct multithread locking, etc
- ditto caused trouble with multithread locking when cm_watchdog() fires *inside* the pthread mutex locking library itself 
causing mutexes to be in an inconsistent state. This had to be kludged against in the ODB multithread locks - now this kludge 
can be removed.
- many non-midas codes in experiment frontends, etc was not expecting and did not correctly handle firing of the 
cm_watchdog() SIGALARM at random times (i.e. select() with timeout returned too soon, etc).
- today, UNIX signals are pretty obscure, best avoided. (i.e. interaction of signals and threads is not super will defined, etc).

commits up to (and including) plus merge of branch feature/remove_cm_watchdog
https://bitbucket.org/tmidas/midas/commits/9f1775d2fc75d0de0b9d4ef1abc7b2fb9bacca28

K.O.
    Reply  18 Jan 2019, Konstantin Olchanski, Bug Report, Custom script with new MIDAS 
> I am having difficulty getting the custom scripts to work within the updated MIDAS. Before the 
> update I was using something like : 
> 
> <input type=submit name=customscript value="test">
> 
> on my custom page to run a script under /CustomScript/test, however, with the update to 
> MIDAS this is no longer working. I can't find any information about this functionality being 
> updated in the latest version - has this changed? Or should it still work? 
> 
> Thanks,
> Becky (g-2 DAQ)

I do not see any messages about anybody changing this function. I hope it did not break by accident.

Right now I am working on the event buffer code, and did not plan to look at mhttpd, but it looks like
your problem is important and there is at least on more problem (but it has a work-around),
so I may look at it sooner than later...

K.O.
Entry  18 Jan 2019, Konstantin Olchanski, Info, bitbucket issue tracker "feature" 
It turns out the bitbucket issue tracker has a feature - I cannot make it automatically add me 
to the watcher list of all new issues.

So when you create a new issue, I think I get one message about it, and no more.

If there is some other activity on the issue (Stefan answers, Thomas answers),
I most definitely will not see any of it.

So in case you wondered why I sometimes completely do not react to some bug reports, this 
is why.

So much for "advanced" and "highly automated" and "intelligent" bug tracking.

K.O.
    Reply  21 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> cm_watchdog() has been removed from the latest midas sources
> Removal of cm_watchdog() solves many problems in the midas code base:

Removal of cm_watchdog() creates new problems:

a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT) wait for free space and wait for new event do not update the timeouts (need to add a call 
to cm_periodic_tasks())
b) frontends that talk to slow external equipment now die unless they have their timeout adjusted to be longer than the longest equipment operation (they 
were already supposed to do this, but...)
c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

As one solution, we may bring an automatic cm_watchdog() back, but running from a thread instead of from SIGALARM.

K.O.
    Reply  24 Jan 2019, Konstantin Olchanski, Info, removal of cm_watchdog() 
> > cm_watchdog() has been removed from the latest midas sources
> > Removal of cm_watchdog() solves many problems in the midas code base:
> Removal of cm_watchdog() creates new problems:
> a) the bm_send_event(BM_WAIT) and bm_receive_event(BM_WAIT)
> b) frontends that talk to slow external equipment
> c) mhttpd sometimes dies from from an odb timeout (with the default 10 sec timeout).

The watchdog is back, in a "light" form. Added:

- cm_watchdog_thread() - runs every 2 seconds and updates the timestamps on ODB and all open event buffers (SYSMSG, SYSTEM, etc).
- cm_start_watchdog_thread() - added to mfe.c and mhttpd - so user frontends work the same as before cm_watchdog() removal
- cm_stop_watchdog_thread() - added to cm_disconnect_experiment() to avoid leaving the thread running after we closed odb and all event buffers.

As before, the watchdog only runs on locally attached midas programs. For programs attached remotely via the mserver, the mserver handles the watchdog functions.

This new light-weight watchdog thread only updates the timestamps, it does not check and remove dead clients, it does not check the alarms. These functions are now performed 
by cm_yield() and cm_periodic_tasks(). At least some program in an experiment should call them periodically. (normally, at least mlogger and mhttpd will do that).

Programs that accidentally relied on SIGALRM firing at 1Hz may still be affected - i.e. with the old cm_watchdog(), ::sleep(1000) will only sleep for 1 second (interrupted by 
SIGALARM), now it will sleep for the full 1000 seconds. Other syscalls, i.e. select(), are similarly affected.

For now, I think only mfe.c frontends and mhttpd need the watchdog thread. With luck all the other midas programs (mlogger, mdump, etc) will run fine without it.

K.O.
    Reply  24 Jan 2019, Konstantin Olchanski, Suggestion, json rpc API for history data 
> For us it would be a handy feature if history data could be requested directly
> from a custom page (time range or run based intervals) . Here I am not talking
> about history plots but I am talking about recorded time series data. This way
> we could easily generate useful graphs. For instance, if we measure the voltage
> (constant current) while cooling, we could instantly get the resistance versus
> temperature. Often we would like to 'correlate' recorded slow control data.
> 
> Is it already possible to extract history data the way suggested?

There are sundry hs_read() json rpc methods already implemented in preparation
of writing a javascript based history viewer (did not happen yet).

You can try to use them, they should work, but have not been tested extensively.

To find this stuff:
a) go to the mhttpd "help" page, open the "json rpc schema in text table format", look for the "hs_xxx" methods.
b) also from the "help" page, open "javascript examples- example.html", scroll down to "hs_get_active_events". Press the buttons, they should 
work, look at the source code to see how to call the rpc methods from your own page.

K.O.
    Reply  24 Jan 2019, Konstantin Olchanski, Bug Report, Custom script with new MIDAS 
> <input type=submit name=customscript value="test">

Stefan is right, input-type-submit has to be inside a form. This type of rpc call is "old school". Today, we should 
have a json-rpc request to execute a custom script.

https://bitbucket.org/tmidas/midas/issues/163/need-json-rpc-method-to-execute-custom

K.O.
Entry  11 Feb 2019, Konstantin Olchanski, Info, json-rpc request for ODB /Script and /CustomScript. 
I added json-rpc requests for ODB /Script and /CustomScript (the first one shows up on the status page in the left hand side menu, the 
second one is "hidden", intended for use by custom pages).

To invoke the RPC method do this: (from mhttpd.js). Use parameter "customscript" instead of "script" to execute scripts from ODB 
/CustomScript.

One can identify the version of MIDAS that has this function implemented by the left hand side menu - the script links are placed by script 
buttons.

<pre>
function mhttpd_exec_script(name)
{
   //console.log("exec_script: " + name);
   var params = new Object;
   params.script = name;
   mjsonrpc_call("exec_script", params).then(function(rpc) {
      var status = rpc.result.status;
      if (status != 1) {
         dlgAlert("Exec script \"" + name + "\" status " + status);
      }
   }).catch(function(error) {
      mjsonrpc_error_alert(error);
   });
}
</pre>

The underlying code moved from mhttpd.cxx to the midas library as cm_exec_script(odb_path);

K.O.
Entry  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.

This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
and strange things will happen (including odb corruption).

In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
arrangements to make it happen, but I have seen it happen in normal use when running experiments.

K.O.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
    frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
    frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
    frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
    frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
    frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
    frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
    frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
    frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
    frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
    frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
    frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
    frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
    frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
alarm.c:310 [opt]
    frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
    frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
    frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
    frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
    frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
    frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
    frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
    frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
Commit f81ff3c protects db_lock/unlock, but not any of the other functions. What if we do ctrl-c in the middle
of some odb write operation in the middle of memory allocation, etc.

A sure way to corrupt odb.

Perhaps we should disallow odb access from signal handlers? But we still want to be able to stop midas
programs using ctrl-c, even if the program is in some infinite loop somewhere and is not processing
midas events (no calls to cm_yield(), etc).

Maybe I should change the ctrl-c handler to set a flag for cm_yield() to return SS_EXIT,
and additional ctrl-c do nothing if this flag is already set? (maybe abort() if they do ctrl-c 10 times?).

K.O.

> Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> 
> This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> and strange things will happen (including odb corruption).
> 
> In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> 
> K.O.
> 
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
>   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
>     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
>     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
>     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
>     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
>     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
>     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
>     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
>     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
>     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
>     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
>     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
>     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
>     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
>     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
>     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> alarm.c:310 [opt]
>     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
>     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
>     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
>     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
>     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
>     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
>     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
>     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
    Reply  20 Feb 2019, Konstantin Olchanski, Info, odb needs protection against ctrl-c 
> Not sure if you realized, but there is a two-stage Ctrl-C handling inside midas.

Hmm... I am looking at the ctrl-c handler inside odbedit.

Yes, and the original bug report is against odbedit - press of ctrl-c in odbedit corrupts odb,
see stack trace in https://bitbucket.org/tmidas/midas/issues/99

So maybe only the odbedit ctrl-c handler is defective...

I will take a look at what the other ctrl-c handler does.

Safest is probably to call exit() without calling cm_disconnect_experiment().

From the ctrl-c handler, if we call cm_disconnect_experiment() -
- if we hold odb locked, we deadlock (after I remove the recursive mutex) or corrupt odb (if we run form inside db_create or db_set_data).
- if we run from inside db_lock/unlock, we abort() (with my newly added protection) or explode (if we run from inside mprotect(), like in the stack strace in the bug report)

I would say, from a signal handler, only safe things are - set a flag, or abort()/exit().

exit() is not super safe because the user may have attached some code to it that may access odb. (our
default atexit() handler just prints an error message).

K.O.


> The first time you hit ctrl-c, the handler just sets a flag for the main event loop, so that the program can gracefully exit without trouble. This is 
> done inside cm_ctrlc_handler(), which sets _ctrlc_pressed true if called. Then cm_yield() tests this flag and returns RPC_SHUTDOWN if so. I agree not very obvious, maybe we should return a more appropriate status. So the 
> main loop must check the return status of cm_yield() and break if it's RPC_SHUTDOWN. The frontend framework mfe.c does this for example in
> 
>  while (status != RPC_SHUTDOWN && status != SS_ABORT);
> 
> Any use-written program should do the same (well, probably this is nowhere documented).
> 
> Now when the program does not exit (e.g. if it's in an infinite loop), then the second ctrl-c creates a hard abort and terminates the program non-gracefully, which as you noticed can lead to undesired results. All the 
> semaphores (at least when I implemented it) had a SEM_UNDO flag when obtaining ownership. This means that if the semaphore is locked and the process who owns it terminates (even with a hard kill), then the semaphore 
> is released by the OS. This way a crashed program cannot keep the ODB locked for example. Not sure that with all your modifications in the semaphore calls this functionality is still guaranteed.
> 
> Stefan
> 
> > Even with the cm_watchdog signal removed, some trouble from UNIX signals remains.
> > 
> > This time, when one presses Ctrl-C at the wrong time, the Ctrl-C signal handler will run at the wrong time
> > and strange things will happen (including odb corruption).
> > 
> > In the captured stack trace, I pressed Ctrl-C right when odbedit was inside db_lock_database(). I had to make special
> > arrangements to make it happen, but I have seen it happen in normal use when running experiments.
> > 
> > K.O.
> > 
> > (lldb) bt
> > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
> >   * frame #0: 0x00007fff6c2ceb66 libsystem_kernel.dylib`__pthread_kill + 10
> >     frame #1: 0x00007fff6c499080 libsystem_pthread.dylib`pthread_kill + 333
> >     frame #2: 0x00007fff6c22a1ae libsystem_c.dylib`abort + 127
> >     frame #3: 0x00000001057ccf95 odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2048 [opt]
> >     frame #4: 0x00000001057aed9d odbedit`cm_delete_client_info(hDB=1, pid=46856) at midas.c:1702 [opt]
> >     frame #5: 0x00000001057b08fe odbedit`cm_disconnect_experiment at midas.c:2704 [opt]
> >     frame #6: 0x00000001057a8231 odbedit`ctrlc_odbedit(i=<unavailable>) at odbedit.cxx:2863 [opt]
> >     frame #7: 0x00007fff6c48cf5a libsystem_platform.dylib`_sigtramp + 26
> >     frame #8: 0x00007fff6c2ced83 libsystem_kernel.dylib`__semwait_signal + 11
> >     frame #9: 0x00007fff6c249724 libsystem_c.dylib`nanosleep + 199
> >     frame #10: 0x00007fff6c249586 libsystem_c.dylib`sleep + 41
> >     frame #11: 0x00000001057cce6b odbedit`db_lock_database(hDB=<unavailable>) at odb.c:2057 [opt]
> >     frame #12: 0x00000001057e129a odbedit`db_get_record_size(hDB=1, hKey=141848, align=8, buf_size=0x00007ffeea44c14c) at odb.c:10232 [opt]
> >     frame #13: 0x00000001057e1b58 odbedit`db_get_record1(hDB=1, hKey=141848, data=0x00007ffeea44c320, buf_size=0x00007ffeea44c2a8, align=0, rec_str="[.]\nWrite system message = BOOL : 
> > y\nWrite Elog message = BOOL : n\nSystem message interval = INT : 60\nSystem message last = DWORD : 0\nExecute command = STRING : [256] \nExecute interval = INT : 0\nExecute last = DWORD : 
> > 0\nStop run = BOOL : n\nDisplay BGColor = STRING : [32] red\nDisplay FGColor = STRING : [32] black\n\n") at odb.c:10390 [opt]
> >     frame #14: 0x00000001057f1de3 odbedit`al_trigger_class(alarm_class="Warning", alarm_message="This is an example alarm", first=YES) at alarm.c:389 [opt]
> >     frame #15: 0x00000001057f19e8 odbedit`al_trigger_alarm(alarm_name="Example alarm", alarm_message="This is an example alarm", default_class="Warning", cond_str="", type=<unavailable>) at 
> > alarm.c:310 [opt]
> >     frame #16: 0x00000001057f2c4e odbedit`al_check at alarm.c:655 [opt]
> >     frame #17: 0x00000001057b9f88 odbedit`cm_periodic_tasks at midas.c:5066 [opt]
> >     frame #18: 0x00000001057ba26d odbedit`cm_yield(millisec=100) at midas.c:5137 [opt]
> >     frame #19: 0x00000001057a30b8 odbedit`cmd_idle() at odbedit.cxx:1238 [opt]
> >     frame #20: 0x00000001057a92df odbedit`cmd_edit(prompt="[local:javascript1:S]/>", cmd=<unavailable>, dir=(odbedit`cmd_dir(char*, int*) at odbedit.cxx:705), idle=(odbedit`cmd_idle() at 
> > odbedit.cxx:1233))(char*, int*), int (*)()) at cmdedit.cxx:235 [opt]
> >     frame #21: 0x00000001057a3863 odbedit`command_loop(host_name="", exp_name="javascript1", cmd="", start_dir=<unavailable>) at odbedit.cxx:1435 [opt]
> >     frame #22: 0x00000001057a8664 odbedit`main(argc=1, argv=<unavailable>) at odbedit.cxx:2997 [opt]
> >     frame #23: 0x00007fff6c17e015 libdyld.dylib`start + 1
Entry  27 Feb 2019, Konstantin Olchanski, Info, mhttpd magic urls 
Here is the list of mhttpd magic URLs.

http "get" path:

handle_http_message()
handle_http_get()
?mjsonrpc_schema -> serve mjsonrpc_get_schema() // JSON RPC Schema in JSON format
?mjsonrpc_schema_text -> serve mjsonrpc_schema_to_text() // same, but human-readable
handle_decode_get()
decode_get()
interprete()

http "post" path:

handle_http_message()
handle_http_post()
?mjsonrpc -> serve mjsonrpc_decode_post_data() // process RPC request
handle_decode_post()
decode_post()
- maybe decode file attachment
interprete()

interprete() path:

url contains favicon.{ico,png} -> send_icon()
url contains mhttpd.css -> send_css() (see ODB /Experiment/CSS File) // obsolete? see midas.css below
url ends with "mp3" -> send_resource(url) // alarm sound
url contains midas.js -> send_resource("midas.js")
url contains midas.css -> send_resource("midas.css")
url ... ditto mhttpd.js
url ... ditto obsolete.js
url ... ditto controls.js
cmd is "example" -> send_resource("example.html")
?script -> cm_exec_script(), see ODB /Script/...
?customscript -> same, see ODB /CustomScript/...
cmd is "start" -> send resource start.html
cmd is blank -> send resource status.html
cmd is "status" -> send resource status.html
cmd is "newODB" -> send resource "odb.html" // not used at the moment
cmd is "programs" -> programs.html
cmd is "alarms" -> alarms.html
cmd is "transition" -> transition.html
cmd is "messages" -> messages.html
cmd is "config" and url is not "HS/" -> config.html
cmd is "chat" -> chat.html
cmd is "buffers" -> buffers.html
// elog section
cmd is "Show elog" -> elog
cmd is "Query elog" -> elog
cmd is "New elog" -> elog
cmd is "Edit elog" -> elog
cmd is "Reply elog" -> elog
cmd is "Last elog" -> elog
cmd is "Submit Query" -> elog
// end of elog section
url is "spinning-wheel.gif" -> send_resource("spinning-wheel.gif")
// section "new custom pages"
if ODB /Custom exists,
get value of $MIDAS_DIR or $MIDASSYS or "/home/custom"
write it to ODB /Custom/Path (if it does not already exist)
concatenate value of ODB /CustomPath and the URL (without a "/" in between)
if this file exists, send_resource() it.
// end of "new custom pages" section
// section for old AJAX requests
cmd is "jset", "jget", etc -> javascript_commands()
// commented out: send_resource(command+".html") // if cmd is "start" will send start.html
cmd is "mscb" -> show_mscb_page()
cmd is "help" -> show_help_page()
cmd is "trigger" -> send RPC RPC_MANUAL_TRIG
cmd is "Next subrun" -> set ODB "/Logger/Next subrun" to TRUE
cmd is "cancel" -> redirect to getparam("redir")
cmd is "set" -> show_set_page() // set ODB value
cmd is "find" -> show_find_page()
cmd is "CNAF" or url is "CNAF" -> show_cnaf_page()
cmd is "elog" -> redirect to external ELOG or send_resource("elog_show.html")
cmd starts with "Elog last" -> send_resource("elog_query.html") // Elog last N days & co
cmd is "Create Elog from this page" -> redirect to "?cmd=new elog" // called from ODB editor
cmd is "Submit elog" -> submit_elog() // usually a POST request from the "elog_edit.html"
cmd is "elog_att" -> show_elog_attachment()
cmd is "accept" -> what does this do?!?
cmd is "eqtable" -> show_eqtable_path() // page showing equipment variables as a table ("slow control page")
// section for the sequencer
cmd is "sequencer" -> show_seq_page()
cmd is "start script" -> seq
cmd is "cancel script" -> seq
cmd is "load script" -> ...
cmd is "new script" -> ...
cmd is "save script" -> ...
cmd is "edit script" -> ...
cmd is "spause" -> ...
cmd is "sresume" -> ...
cmd is "stop immeditely" -> ...
cmd is "stop after current run" -> ...
cmd is "cancel stop after current run" -> ...
cmd is "custom" -> show_custom_page()
cmd is "odb" -> show_odb_page()
show_error()

K.O.
    Reply  28 Feb 2019, Konstantin Olchanski, Info, resource file search path, mhttpd magic urls 
> url contains midas.js -> send_resource("midas.js")

mhttpd looks for resource files in these directories in this order:

(ODB /experiment/Resources)/filename ### this ODB entry is not created automatically (hidden)
./filename                       ### for testing custom files, start mhttpd in the directory with the test files
./resources/filename     ### ditto
$MIDAS_DIR/filename   ### per experiment custom files or overwrite of midas standard files
$MIDAS_DIR/resources/filename
$MIDASSYS/resources/filename ### standard midas resource files live here: midas.js, midas.css, etc

K.O.
Entry  01 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
Before the days of javascript and ajax and web 2.0, MIDAS introduced "custom pages" for
building graphical display that could show "live" data from MIDAS and that could
have buttons and controls to operate slow controls equipment, etc.

This is how it works:
- entries from ODB /Custom are shown on the MIDAS menu -
an odb entry /Custom/Foo generate a link labeled "Foo"
to a special mhttpd url /CS/Foo.
- access to mhttpd url /CS/Foo invokes show_custom_page()
- show_custom_page() reads the custom page file name from ODB /Custom/Foo
- content of this file is served the web browser (after substituting the <odb> tags with values from ODB).
- in addition, it is possible to store the contents of the custom page in the ODB variable /Custom/Foo itself,
making it easy to edit the custom pages through the web browser (using the mhttpd odb editor).
- (if the value of /Custom/Foo has no "\n", then it's a file name, otherwise it is the page contents).
- if /Custom/Path exists, it is prepended to all file names.
- read more about this here:
https://midas.triumf.ca/MidasWiki/index.php//Custom_ODB_tree
https://midas.triumf.ca/MidasWiki/index.php/Internal_Custom_Page

This method required each custom web page served by mhttpd to have a corresponding
entry in ODB /Custom. Quite tedious for big experiments with a large number
of web pages (in T2K/ND280/FGD, for 1 page per frontend board, these entries
had to be created using a script, no practical to create them manually).

To fix this, in 2015, a modification was made to the code for ODB /Custom/Path. Instead of reading the file name
from ODB /Custom/Foo, the filename was made by adding /Custom/Path and the URL itself:
URL /CS/Foo.html will serve the file ODBValue["/Custom/Path"]/Foo.html. In this scheme,
the ODB entry /Custom/Foo.html is optional and is only needed to create a link in the MIDAS menu.

Commit: https://bitbucket.org/tmidas/midas/commits/5a2ef7d66df353684c4b40882a391b64a068f61f made on 2015-03-19.
Corresponding midas forum entry: https://midas.triumf.ca/elog/Midas/1109 made on 2015-09-09.

This change had the effect of creating 2 different operating modes for custom pages:
- if ODB /Custom/Path was absent (the default case), file names are taken from ODB /Custom/Foo.
- if ODB /Custom/Path is present (it has be created manually), file names are taken from the URLs.

These two modes could not be mixed, some experiments (TRIUMF MUSR/BNMR, CERN ALPHA, etc)
continued to use the old scheme (no /Custom/Path), some experiments switched
to the new scheme (DEAP, etc - TBC?).

(The best I can tell, this also create a security problem were using URLs containing "..", one can
force mhttpd to escape the /Custom/Path filename jail to cause it to serve arbitrary files from
the filesystem. Note that many web browsers remove the ".." entries from URLs, special tricks
are required to exploit this).

Then, in 2017, something called "new custom pages" was added to mhttpd interprete().
commit https://bitbucket.org/tmidas/midas/commits/c574a45c6da1290430f5a85a348fb629596a0de0 made on 2017-07-24
(I cannot find the corresponding explanation on the midas forum).

The best I can tell this code is intended to do the same thing that was done before:
map URLs like http://localhost:8080/Foo.html to file name ODBValue["/Custom/Path"]/Foo.html
and if this file exists, serve it to the browser.

However, the code has a bug - there is no check for absence of ODB /Custom/Path. If it does
not exist (as in experiments that do not use custom pages or use the old-style custom pages without
/Custom/Path), an empty string is used and URLs like http://localhost:8080/Foo.html becomes
mapped to /Foo.html (in the root of the file system, "/"!!!).

This introduced 3 problems:
- http://localhost:8080/etc/hosts serve the system password file (a security problem)
- because this code is before the odb code, it intercepts (and breaks) valid ODB URLs:
- on Linux, the ODB editor URL http://localhost:8080/root instead of showing the ODB root, attempts to serve the local subdirectory "/root" as a file
(fails, /root is a directory on most Linuxes). This is reported here: https://midas.triumf.ca/elog/Midas/1335 (2018-01-10)
- on MacOS, ODB editor URL http://localhost:8080/System instead of showing ODB /System, attempts to serve the local directory "/System" is a file (fails, 
/System is a subdirectory in MacOS). This is reported here: https://bitbucket.org/tmidas/midas/issues/156/mhttpd-odb-editor-cannot-open-system-on 
(2018-12-20)

The problem with broken access to the ODB editor URLs is fixed by commit:
https://bitbucket.org/tmidas/midas/commits/f071bc1daa4dc7ec582586659dd2339f1cd1fa21 (2018-12-21)
https://midas.triumf.ca/elog/Midas/1416

The fix unconditionally creates /Custom/Path and sets it to the value of $MIDAS_DIR or $MIDASSYS.

This breaks experiments that use custom pages with no /Custom/Path (ALPHA, BNMR/BNQR), see
https://bitbucket.org/tmidas/midas/issues/157/odb-custom-is-broken, and this is impossible
to fix by deleting /Custom/Path as it will be created again.

Also the problem with serving the system password file remains, see
https://midas.triumf.ca/elog/Midas/1425

So, a https://en.wikipedia.org/wiki/SNAFU

Since then, we have changed the mhttpd URL scheme: we no longer use URL subdirectories to navigate the ODB editor (we now use &odb_path=xxx)
and to refer to custom files (use now use &cmd=custom&page=xxx).

This liberates the mhttpd URL space for serving custom files without colliding with mhttpd internals and allows us to repair the current situation.

I will propose a solution in a follow-up message.

K.O.

P.S. The actual series of malfunctions is a bit more complicated than I described in this message,
but there is no need for getting to the very bottom bottom bottom of this as we can now do
a fairly clean solution.
    Reply  04 Mar 2019, Konstantin Olchanski, Info, Gyrations of custom pages and ODB /Custom/Path 
Hi, guys, as I was exploring the code and the commit history on Thursday (git rules!) and
as I worked on getting the old custom files to work with Suzannah on Friday, I think
I know how I want this code to work. I think there is no need to break with the old
way of doing things and force every experiment to move things around if they want
to use the latest midas.

I will write more about this, but in the nutshell, I am happy with the current code:
- the old custom pages work (filename is taken from ODB, with /Custom/Path prepended if it exists; this is what we had for a very long time)
- the serving of new custom pages also works - via Stefan's code in interprete() where /Custom/File is added to the URL
  and if the file exists, we serve it. The only problem in that code was the missing check for absent or empty ODB /Custom/Path.
- the serving of new custom pages through the normal resource path (show_resource()) also works now,
   this serves files from $MIDASSYS/resource, from ODB /Experiment/Resource and a few other locations. (this code was there
   for a long time, but disabled because for a number of reasons, things like http://localhost:8080/Status.html did not work right,
   and after fixing a few buglets, they do now).

The serving of /etc/passwd I killed by forbidding "/" (the directory separator) in resource file names. I think this is safer
for enforcing a file jail compared to checking for "..".

I think the current code fixes all the reported problems (in conjunction with the change of the URL scheme) -
- /Custom/Path set to "" now works and provides the old way of doing custom pages
- /Custom/Path set to a directory name works and all Thomas's experiments should be good. the old custom files way *also* works, as long as the filenames in ODB are adjusted.
- URL /root and /System no longer try to serve system directories, plus /Custom/Path set to "/" is explicitly forbidden.
- mhttpd cannot serve /etc/passwd by default as "/" is forbidden in file names added to /Custom/Path. (I discount the case where mhttpd is running in /etc or /Custom/Path is set to /etc or symlinked to /etc)

But not all is good. The change of custom page URL scheme from /CS/... to top level or ?cmd=custom&page=...
cannot be swept under the rug - the user will have to make changes to their custom pages to adjust for it,
I see no way to avoid that. The current code catches and redirects/serves/helps with some of that,
i.e. pages loading custom files like "bnmr.css" still work even though bnmr.css is no longer under /CS/bnmr.css).

Again, apologies that things are moving faster than I can write them up. I am trying.


K.O.


> I see two separate issues here. 
> 
> One is restricting the custom pages to ONE directory such as 
> <exptab>/resources -> /home/users/exp/resources
> and its subdirectories which seems like a good solution for all the
> reasons you've mentioned. 
> 
> The other issue is the use of the "Path" key in /Custom, which is used to differentiate
> between the "new" way (all resources served from the Path directory) 
> and the original way where all the custom keys are specified with their full directories.
> 
> Recent versions of Midas had broken the original behaviour by insisting on the presence of the
> "Path" key.  Konstantin fixed this by allowing the "Path" key to take the value "".  It is true
> that some experiments currently may be serving resources from more than one directory tree, but changing
> to storage of all custom pages in one directory (and its subdirectories) does not necessarily mean that
> the original way of serving resources must be made obsolete.
>  
> I actually like the original way of specifying the custom keys for the pages and resources under /Custom, which
> is presently selected  without the /Custom/Path key present at all (older versions) or with the
> /Custom/Path key set to "" (latest versions). I like it for debugging, and I like to be able to see 
> at a glance what resource files are in use from /Custom. 
> 
> I have a suggestion:
> 
> The resources could still be served from the /Custom directory if desired, except now mhttpd will ALWAYS add the 
> fixed path in front of the given paths in /Custom. This would mean a fixed path and a minimal disruption to older pages 
> (the <script> and <link> statements in the HTML code to include the resources would not need to be changed).
> The "/Path" key is no longer be useful, since the resource path is now fixed. Instead a key e.g. "FlagRS"  could 
> be used to select the desired behaviour,  with the default being the "new" (no key present).
> 
> For example, the full directory paths in /custom
> ScanParams&                     /home/users/online/custom/scan/scan_select_popup.html
> mpet.css!                       /home/users/online/custom/rs/mpet.css
> scanvoltages!                   /home/users/online/custom/scan/scan_voltages.js
> 
> would become subdirectory path(s)
> ScanParams&                     custom/scan/scan_select_popup.html
> mpet.css!                       custom/rs/mpet.css
> scanvoltages!                   custom/scan/scan_voltages.js
> FlagRS                          y
> 
> The pages would be served from /home/users/exp/resources/custom/...
> 
> Suzannah
> 
>  
> > > Hi Stefan and Konstantin,
> > > 
> > > I think that this proposal sounds fairly reasonable.  I agree that we might as well move to a secure final solution at this point.  
> > > 
> > > One comment: since this change would break almost every experiment I have worked on for the last 4 years, it would be nice to add a command-line option to mhttpd that preserves the old /Custom/Path behavior.  This would allow experiments a transition 
> > > period, so that they didn't immediately need to fix their setup.  The command-line option could be clearly marked as obsolete behaviour and could be removed within a year.
> > > 
> > > Cheers,
> > > Thomas
> > > 
> > > 
> > > 
> > > > Parsing all URL in mhttpd to prevent /etc/passwd etc. to be returned is tricky, because people can use escape sequences etc. Therefore I think it is much better to restrict file access 
> > > > on the file system level when opening a file. The only escape there one could have is "..", which can be tested easily. 
> > > > 
> > > > Therefore, I propose to restrict file access to two well-defined directories, which is one system directory and one user directory. The system directory should be defined via 
> > > > $MIDASSYS/resources, and the user directory should be the experiment directory (as defined in exptab) followed by "resources". So if MIDASSYS equals to /usr/local/midas and the 
> > > > experiment directory equals to /home/users/exp for example, we would only have these two directories (and of course the subdirectories within these) served by mhttpd:
> > > > 
> > > > $MIDASSYS/resource -> /usr/local/midas/resources
> > > > <exptab>/resources -> /home/users/exp/resources
> > > > 
> > > > These directories should be hard-wired into mhttpd, and not go through and ODB entry, since otherwise one could manipulate the ODB entries (knowingly or unknowingly) and open a 
> > > > back-door. 
> > > > 
> > > > If users need a more complex structure, they can put soft links into these directories.
> > > > 
> > > > The code which opens a resource file should then first evaluate $MIDASSYS, then add "/resources/", then add the requested file name, make sure that there is no ".." in the file name, 
> > > > then open the file. If not existing, do the same for the <exptab>/resources/ directory.
> > > > 
> > > > This change will break most experiments, and forces people to move their custom pages to different directories, but I think it's the only clean solution and we just have to bite the 
> > > > bullet.
> > > > 
> > > > Comments are welcome.
> > > > 
> > > > Stefan
    Reply  04 Mar 2019, Konstantin Olchanski, Forum, Best MIDAS branch/version for "production" 
Hi, Giorgio - you are asking excellent questions. I will try to answer them, but as ever, there are no 
easy answers.

In general, the top of the midas "develop" branch is "the best midas there is".

So for a new experiment, it is a reasonable place to start. Of course you can see
that there is quite a bit of commit activity going on, however, most substantial
changes are done on separate branches, where we try the new code, debug
it and test it. Only when the new code is "ready", we commit it to the "develop"
branch. Then, most often, we find some more last minute post-merge bugs,
and fix them right then and there on the head of the "develop" branch. Eventually
the dust settles and we have stable code that stays stable for a long time.

For example, right now we are waiting for the dust to settle on the change
of the MIDAS URL scheme, which was necessitated by needs of several experiments
that have more-complex-then-usual https proxy configurations. Unusual today,
but more common as we move forward, I think.

So if the head of the "develop" branch does not work for you, we encourage
that you file a bug report (here on this forum or on the bitbucket issue tracker).

While we try to sort out your problem, you can fall back to a previous version
of midas:

a) go back to one of the older "midas-YYYY-MM-X" tags
b) go back to one of the release candidate branches "feature/midas-YYYY-MM".

But if you have an already running system and you already have a working
instance of MIDAS, you do not have to update it to the latest version
unless you need some newest feature or you suffer from a bug
that has been fixed in a newer version.

In general, I find that it is fairly safe to update a working instance of MIDAS to
the latest code. But do keep your old working copy, if there is trouble, you can
always go back to something that works.

Now to your questions:

> For a running experiment that needs software stability what branch of the MIDAS 
> repository is better suited?

easy to answer. the most stable is the version you are using right now (doh!). each time
you upgrade, there is a risk that something will go wrong.

if you start from scratch, use the head of the "develop" branch (the latest and greatest),
if you run into trouble, report the trouble and update to a newer version with your
trouble fixed, or go back in time to previous tags and release candidate branches (as I described 
above).

> The master branch or the develop branch?

the "develop" branch.

> Moreover, what point in time do you think is more stable?

I find that it is impossible to have a stable-stable-stable version of midas
because the rest of the world flows forward in time. Old versions of midas
stop compiling because of OS and compiler changes; they stop running
because of OS or web browser or hardware changes. Then somebody
always asks for new features and new or old bugs surface constantly.

But we try to mark some "good" spots using git tags and release branches,
however the more far you go back in time, the more likely the code will
not even compile anymore.


K.O.



> Hello!
> My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
> University (Japan). I also happen to be a T2K collaborator.
> I am currently developing the DAQ software for a T2K near detector called WAGASCI 
> (different from ND280) and we recently decided to adopt MIDAS as a user 
> interface.
> Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
> the remote repository every now and then with my local copy and that is fine ... 
> but on the 25th of April our experiment is officially starting and I was 
> wondering which version of MIDAS should I use for "production".
> So my question is:
> For a running experiment that needs software stability what branch of the MIDAS 
> repository is better suited? The master branch or the develop branch? Moreover, 
> what point in time do you think is more stable?
> Best regards
> Giorgio
    Reply  05 Mar 2019, Konstantin Olchanski, Forum, Best MIDAS branch/version for "production" 
> 
> PS other than the code peculiar to our experiment, I have made a little modification to the MIDAS install 
> Makefile (I noticed that there is an "install" target but not an "uninstall" target so I wrote it).
> If you are interested, I could make a merge request on BitBucket, just let me know.
> 

Hmm... for most experiments, we do not "install" midas. I should probably remove the "install" target from the Makefile.

K.O.



> Bests
> Giorgio
> 
> > Hi, Giorgio - you are asking excellent questions. I will try to answer them, but as ever, there are no 
> > easy answers.
> > 
> > In general, the top of the midas "develop" branch is "the best midas there is".
> > 
> > So for a new experiment, it is a reasonable place to start. Of course you can see
> > that there is quite a bit of commit activity going on, however, most substantial
> > changes are done on separate branches, where we try the new code, debug
> > it and test it. Only when the new code is "ready", we commit it to the "develop"
> > branch. Then, most often, we find some more last minute post-merge bugs,
> > and fix them right then and there on the head of the "develop" branch. Eventually
> > the dust settles and we have stable code that stays stable for a long time.
> > 
> > For example, right now we are waiting for the dust to settle on the change
> > of the MIDAS URL scheme, which was necessitated by needs of several experiments
> > that have more-complex-then-usual https proxy configurations. Unusual today,
> > but more common as we move forward, I think.
> > 
> > So if the head of the "develop" branch does not work for you, we encourage
> > that you file a bug report (here on this forum or on the bitbucket issue tracker).
> > 
> > While we try to sort out your problem, you can fall back to a previous version
> > of midas:
> > 
> > a) go back to one of the older "midas-YYYY-MM-X" tags
> > b) go back to one of the release candidate branches "feature/midas-YYYY-MM".
> > 
> > But if you have an already running system and you already have a working
> > instance of MIDAS, you do not have to update it to the latest version
> > unless you need some newest feature or you suffer from a bug
> > that has been fixed in a newer version.
> > 
> > In general, I find that it is fairly safe to update a working instance of MIDAS to
> > the latest code. But do keep your old working copy, if there is trouble, you can
> > always go back to something that works.
> > 
> > Now to your questions:
> > 
> > > For a running experiment that needs software stability what branch of the MIDAS 
> > > repository is better suited?
> > 
> > easy to answer. the most stable is the version you are using right now (doh!). each time
> > you upgrade, there is a risk that something will go wrong.
> > 
> > if you start from scratch, use the head of the "develop" branch (the latest and greatest),
> > if you run into trouble, report the trouble and update to a newer version with your
> > trouble fixed, or go back in time to previous tags and release candidate branches (as I described 
> > above).
> > 
> > > The master branch or the develop branch?
> > 
> > the "develop" branch.
> > 
> > > Moreover, what point in time do you think is more stable?
> > 
> > I find that it is impossible to have a stable-stable-stable version of midas
> > because the rest of the world flows forward in time. Old versions of midas
> > stop compiling because of OS and compiler changes; they stop running
> > because of OS or web browser or hardware changes. Then somebody
> > always asks for new features and new or old bugs surface constantly.
> > 
> > But we try to mark some "good" spots using git tags and release branches,
> > however the more far you go back in time, the more likely the code will
> > not even compile anymore.
> > 
> > 
> > K.O.
> > 
> > 
> > 
> > > Hello!
> > > My name is Giorgio Pintaudi and I am a Ph.D. student at Yokohama National 
> > > University (Japan). I also happen to be a T2K collaborator.
> > > I am currently developing the DAQ software for a T2K near detector called WAGASCI 
> > > (different from ND280) and we recently decided to adopt MIDAS as a user 
> > > interface.
> > > Now I am using the "develop" branch of the MIDAS BitBucket repository: I merge 
> > > the remote repository every now and then with my local copy and that is fine ... 
> > > but on the 25th of April our experiment is officially starting and I was 
> > > wondering which version of MIDAS should I use for "production".
> > > So my question is:
> > > For a running experiment that needs software stability what branch of the MIDAS 
> > > repository is better suited? The master branch or the develop branch? Moreover, 
> > > what point in time do you think is more stable?
> > > Best regards
> > > Giorgio
ELOG V3.1.4-2e1708b5