Back Midas Rome Roody Rootana
  Midas DAQ System, Page 107 of 146  Not logged in ELOG logo
    Reply  02 May 2017, Konstantin Olchanski, Bug Report, run start/stop oddity  
I should really get around to fix this junk error message:

> 07:13:32.335 2017/04/19 [Logger,ERROR] [midas.c:14079:rpc_server_receive,ERROR]
> rpc check ok, abort canceled

What happens is this. For each run transition, cm_transition does RPC calls
to each client telling them to transition. So even if you run only on localhost, there is still
tcp connections being created and broken to do these RPCs. These connections are
typically created and left open, but when you stop odbedit, it's connections would
be closed/broken. Now in the midas rpc code there is confusion between the main rpc
connection for remote clients and temporary rpc connections for run transitions. This
confusion is the cause of these junk error messages - first the code thinks that the main rpc
connection is closed it it should commit suicide (abort), then it find that it was
just an rpc connection and there is no need to die.

https://bitbucket.org/tmidas/midas/issues/44/junk-messages-about-rpc-check-ok-abort

>
> - Enable non-localhost RPC -> y
> - Disable RPC hosts check  -> y
> 

this is unsafe:

if you only run on localhost, "enable non-localhost rpc" should be "n" and midas will no listen to any 
outside connections (except for mhttpd, of course).

if you have remote clients, enable non-localhost rpc and enter their hostnames to the access control list.

"disable rpc hosts check" is for the case where you do not know the hostnames of your remote clients, 
for example when they come from dynamic ip addresses on a wifi network.

In this case you tell midas to accept connections from everybody everywhere in the world
and hopefully you have a firewall somewhere to prevent the evil hackers from actually connecting.
I hope this is not your situation.

K.O.
Entry  09 May 2017, Andreas Suter, Bug Report, mhttpd / history / export data 
A handy feature of the history of the mhttpd is to export the data. However, this 
seems to be broken. It currently only works if the run marker flag is activated by 
fails otherwise.
    Reply  16 May 2017, Konstantin Olchanski, Bug Report, mhttpd / history / export data 
> A handy feature of the history of the mhttpd is to export the data. However, this 
> seems to be broken. It currently only works if the run marker flag is activated by 
> fails otherwise.

imo, it never worked properly. I think the best hope for working "export" button
is an "export as json" which gives you basically the output of hs_read_buffer() in the json
format. With options for "raw data" or "binned, with mean, rms, min, max for each bin".

K.O.
Entry  16 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record() 
Suddenly the mhttpd odb inline editor is truncating the odb string entries to the actual length of the 
stored string value, this causes db_get_record() explode with "structure mismatch" errors. (Not my 
fault, You Honor! Honest!). For example, I see these errors from al_check() after changing 
"/programs/foo/start command" - suddenly it cannot get the program_info record.

What a mess.

Actually, this is not a new mess, midas was always been rather brittle with db_get_record()
and db_open_record(), always unhappy if something goes wrong in odb, like a lost
entry in equipment statistics or an extra variable in equipment common, etc.

To patch it all up, I added a function db_get_record1() which knows the structure of the data
and can call db_check_record() to fix the odb structure and make db_get_record() happy.
Many places in midas now use it, making odb structure mismatches "self healing" in a way.

But when looking at uses of db_get_record(), I notices that in many places it can be trivially
replaced by one or two db_get_value(). I did change this in a couple of places in mhttpd.
This way of coding is more robust against unexpected contents in odb and is easier
to maintain going forward, when new odb entries must be added for new functionality.

Most uses of db_get_record() are now converted to db_get_record1(), except where it is
used in together with of db_open_record(). (which uses db_get_record() internally).

To fix the db_open_record() uses, I considered adding db_open_record1() which would
also know the data structure and automatically repair any mismatch, but I think instead of that,
I will switch them to use db_watch() (in conjunction with manual db_get_record()/get_record1()
and plain db_get_value()).

When adding automatic repair mechanism like this, one should beware of "update wars",
where two midas programs built against slightly different versions of midas would
each try to change odb in it's way, in an endless loop. (yes, it did happen, more than once).
One solution to this is to assign an "owner" to each data structure, the "consumers"
of the data have to deal with anything missing or unexpected. If they use db_get_value()
it should all be happy. (if the owner has to be reassigned, back to the wars again, until
everything is rebuilt against the same version of midas).

P.S. In languages lacking reflection, like C and C++, it is impossible to trivially implement
a mapping from a data structure to an external entity, such as db_get_record() to map C struct
into ODB. Many attempts have been made, i.e. ROOT CINT, all of them brittle, hard
to maintain, generally unsatisfactory. Java was the first mainstream language
to have reflection. Modern languages, such as Go, have reflection from day 1. Of course
all scripting languages, perl, python, javascript, always had reflection. The C++ language
standard will get reflections some day. Today one can easily do reflection in C++ using the Clang
compiler, the main reason for ROOT v6 switching from CINT to Clang.

K.O.
    Reply  31 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record() 
> What a mess.

The mess with db_get_record() and db_open_record() is even deeper than I thought. There are several anomalies.

Records opened by db_open_record() are later accessed via db_get_record() which requires
that the odb structure and the C structure match exactly.

Of course anybody can modify anything in odb at any time, so there are protections against
modifying the odb structures "from under" db_open_record():

a) db_open_record(MODE_WRITE) makes the odb structure immutable by setting the "exclusive" flag. This works well. In the past 
there were problems with "exclusive mode" getting stuck behind dead clients, but these days it is efficiently cleaned and recovered 
by the odb validation code at the start of all midas programs.

b) db_create_record(), db_reorder_key() and db_delete_key() refuse to function on watched/hotlinked odb structures. One would 
think this is good, but there is a side-effect. If I run "odbedit watch /", all odb delete operations fail (including deletion of temporary 
items in /system/tmp).

c) db_create_key() and db_set_data()/db_set_value() do not have such protections, and they can (and do) add new odb entries and 
change size of existing entries (especially size of strings), and make db_get_record() fail. note that db_get_record() inside 
db_open_record() fails silently and odb hotlinks mysteriously stop working.

One could keep fixing this by adding protections against modification of hotlinked odb structures, but unfortunately, one cannot tell
db_watch() hotlinks from db_open_record() hotlinks. Only the latter ones require protection. db_watch() does not require such 
protections because it does not use db_get_record() internally, it leaves it to the user to sort out any mismatches.

Also it would be nice if "odbedit watch /" did not have the nasty side effect of making all odb unchangable (presently it only makes
things undeletable).

To sort it all out, I am moving in this direction:

1) replace all uses of db_get_record() with db_get_record1() which automatically cures any structure mismatch
2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1(). This is done in mfe.c 
and seems to work ok.
2a) automatic repair of structure mismatch is presently defeated by db_create_record() refusing to work on hotlinked odb entries.
3) with db_get_record() and db_open_record(MODE_READ) removed from use, turn off hotlink protection in item (b) above. This will 
fix problem (2a).
4) maybe replace db_open_record(MODE_WRITE) with explicit db_set_record(). I personally do not like it's "magical" operation, 
when in fact, it is just a short hand for "db_get_key/db_set_record" hidden inside db_send_changed_records().
4a) db_open_record(MODE_WRITE) works well enough right now, no need to touch it.


K.O.
    Reply  31 May 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record() 
> 2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1().

Done to all in-tree programs, except for mana.c (not using it), sequencer.cxx (cannot test it) and a few places where watching a TID_INT.

Nothing more needs to be done, other than turn off the check for hotlink in db_create_record() & co (removed #define CHECK_OPEN_RECORD in odb.c).

K.O.

$ grep db_open_record src/* | grep MODE_READ
src/lazylogger.cxx:   status = db_open_record(hDB, hKey, &run_state, sizeof(run_state), MODE_READ, NULL, NULL); // watch a TID_INT
src/mana.cxx:   db_open_record(hDB, hkey, NULL, 0, MODE_READ, banks_changed, NULL);
src/mana.cxx:   db_open_record(hDB, hkey, NULL, 0, MODE_READ, banks_changed, NULL);
src/mana.cxx:          db_open_record(hDB, hkey, &out_info, sizeof(out_info), MODE_READ, NULL, NULL);
src/mana.cxx:      db_open_record(hDB, hKey, ar_info, sizeof(AR_INFO), MODE_READ, update_request,
src/midas.c:   status = db_open_record(hDB, hKey, &_requested_transition, sizeof(INT), MODE_READ, NULL, NULL);
src/mlogger.cxx:      status = db_open_record(hDB, hKey, hist_log[index].buffer, size, MODE_READ, log_history, NULL);
src/mlogger.cxx:                     db_open_record(hDB, hVarKey, NULL, varkey.total_size, MODE_READ, log_system_history, (void *) (POINTER_T) index);
src/mlogger.cxx:               db_open_record(hDB, hHistKey, NULL, size, MODE_READ, log_system_history, (void *) (POINTER_T) index);
src/odbedit.cxx:            db_open_record(hDB, hKey, data, size, MODE_READ, key_update, NULL);
src/sequencer.cxx:   status = db_open_record(hDB, hKey, &seq, sizeof(seq), MODE_READ, NULL, NULL);
8s-macbook-pro:midas 8ss$ 
    Reply  02 Jun 2017, Stefan Ritt, Bug Report, problem with odb strings and db_get_record() 
That all makes sense to me. 

Stefan

> > What a mess.
> 
> The mess with db_get_record() and db_open_record() is even deeper than I thought. There are several anomalies.
> 
> Records opened by db_open_record() are later accessed via db_get_record() which requires
> that the odb structure and the C structure match exactly.
> 
> Of course anybody can modify anything in odb at any time, so there are protections against
> modifying the odb structures "from under" db_open_record():
> 
> a) db_open_record(MODE_WRITE) makes the odb structure immutable by setting the "exclusive" flag. This works well. In the past 
> there were problems with "exclusive mode" getting stuck behind dead clients, but these days it is efficiently cleaned and recovered 
> by the odb validation code at the start of all midas programs.
> 
> b) db_create_record(), db_reorder_key() and db_delete_key() refuse to function on watched/hotlinked odb structures. One would 
> think this is good, but there is a side-effect. If I run "odbedit watch /", all odb delete operations fail (including deletion of temporary 
> items in /system/tmp).
> 
> c) db_create_key() and db_set_data()/db_set_value() do not have such protections, and they can (and do) add new odb entries and 
> change size of existing entries (especially size of strings), and make db_get_record() fail. note that db_get_record() inside 
> db_open_record() fails silently and odb hotlinks mysteriously stop working.
> 
> One could keep fixing this by adding protections against modification of hotlinked odb structures, but unfortunately, one cannot tell
> db_watch() hotlinks from db_open_record() hotlinks. Only the latter ones require protection. db_watch() does not require such 
> protections because it does not use db_get_record() internally, it leaves it to the user to sort out any mismatches.
> 
> Also it would be nice if "odbedit watch /" did not have the nasty side effect of making all odb unchangable (presently it only makes
> things undeletable).
> 
> To sort it all out, I am moving in this direction:
> 
> 1) replace all uses of db_get_record() with db_get_record1() which automatically cures any structure mismatch
> 2) replace all uses of db_open_record(MODE_READ) with db_watch() in conjunction with db_get_record1(). This is done in mfe.c 
> and seems to work ok.
> 2a) automatic repair of structure mismatch is presently defeated by db_create_record() refusing to work on hotlinked odb entries.
> 3) with db_get_record() and db_open_record(MODE_READ) removed from use, turn off hotlink protection in item (b) above. This will 
> fix problem (2a).
> 4) maybe replace db_open_record(MODE_WRITE) with explicit db_set_record(). I personally do not like it's "magical" operation, 
> when in fact, it is just a short hand for "db_get_key/db_set_record" hidden inside db_send_changed_records().
> 4a) db_open_record(MODE_WRITE) works well enough right now, no need to touch it.
> 
> 
> K.O.
    Reply  06 Jun 2017, Konstantin Olchanski, Bug Report, problem with odb strings and db_get_record() 
> Done to all in-tree programs, except for mana.c (not using it), sequencer.cxx (cannot test it) and a few places where watching a TID_INT.
> Nothing more needs to be done, other than turn off the check for hotlink in db_create_record() & co (removed #define CHECK_OPEN_RECORD in odb.c).

Fixed a bug in mfe.c - it was overwriting odb /eq/xxx/common with default values. fixed now.

Running with CHECK_OPEN_RECORD seems to work okey so far. Will test some more before proposing to make it the default.

K.O.
    Reply  06 Jun 2017, Konstantin Olchanski, Bug Report, MAX_STRING_LENGTH, stop form odbedit broken 
> ... the xml reader, probably has same problem
> ... xml writer truncates long strings via truncation in db_sprintf()

Removed truncation of overlong strings in the xml writer and confirmed that xml reader handles them correctly (always loaded overlong strings correctly).

Both JSON and XML odb dumps now handle strings of unlimited size correctly.

K.O.
Entry  19 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks  
I guess this might be related to the changes in the last elog conversation; but
I'll break it out as a separate problem.

The new mhttpd ODB editor seems to resize all strings (not just strings that are
greater than 256 characters).  So, when I change some string with the mhttpd ODB
editor to 'ffffff', then I find that the string size is now ~7 characters.

This might be fine in general; but it seems to cause a problem when dealing with
alarms.  In particular, I find that if I try to set (through mhttpd) the
"execute command" for an alarm class or the "condition" for an alarm, then I get
into lots of trouble.  For instance, I changed the "execute command" for my
alarm class through mhttpd; when associated alarms were triggered, I got errors

21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
alarm class record

This makes sense, since ALARM_CLASS has a fixed size

typedef struct {
   BOOL write_system_message;
   ...
   char execute_command[256];
   ...
   char display_fgcolor[32];
} ALARM_CLASS;

so problems will clearly occur when I change the size and try to grab it:

   ALARM_CLASS ac;
   status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
 
I guess that similar problems also occur if you edit the string for ALARM or
PROGRAM_INFO instances.  These problems do not occur when I change my strings
with odbedit, which doesn't resize strings below 256.

I'm not sure what the proper solution is.  A temporary solution is that the
mhttpd ODB editor shouldn't resize strings if the new size is less than 256
characters; in that case the size should be left as 256 characters.

This test was done with MIDAS git repository as of today:
commit 45a90dc329554f528485da121501daf6ecde100d
    Reply  21 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks  
To follow up; with some help from Konstantin and Stefan, we realized that this
particular problem should already be fixed.  While I was using the most recent version
of MIDAS, I hadn't rebuild the EPICS frontend programs when I was doing this test.  Once
I did that the error no longer occurred.  This is because the most recent version of
MIDAS includes a check that will resize these particular string variables before using
them (technically, this is included in db_get_record1()); this resizing only happens for
these couple strings that must have a fixed size.

We are still having a separate discussion about whether this treatment of string lengths
that need to have a fixed size can be further improved.  Will update once discussion
converges.


> I guess this might be related to the changes in the last elog conversation; but
> I'll break it out as a separate problem.
> 
> The new mhttpd ODB editor seems to resize all strings (not just strings that are
> greater than 256 characters).  So, when I change some string with the mhttpd ODB
> editor to 'ffffff', then I find that the string size is now ~7 characters.
> 
> This might be fine in general; but it seems to cause a problem when dealing with
> alarms.  In particular, I find that if I try to set (through mhttpd) the
> "execute command" for an alarm class or the "condition" for an alarm, then I get
> into lots of trouble.  For instance, I changed the "execute command" for my
> alarm class through mhttpd; when associated alarms were triggered, I got errors
> 
> 21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
> mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
> 21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
> alarm class record
> 
> This makes sense, since ALARM_CLASS has a fixed size
> 
> typedef struct {
>    BOOL write_system_message;
>    ...
>    char execute_command[256];
>    ...
>    char display_fgcolor[32];
> } ALARM_CLASS;
> 
> so problems will clearly occur when I change the size and try to grab it:
> 
>    ALARM_CLASS ac;
>    status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
>  
> I guess that similar problems also occur if you edit the string for ALARM or
> PROGRAM_INFO instances.  These problems do not occur when I change my strings
> with odbedit, which doesn't resize strings below 256.
> 
> I'm not sure what the proper solution is.  A temporary solution is that the
> mhttpd ODB editor shouldn't resize strings if the new size is less than 256
> characters; in that case the size should be left as 256 characters.
> 
> This test was done with MIDAS git repository as of today:
> commit 45a90dc329554f528485da121501daf6ecde100d
Entry  10 Nov 2017, Frederik Wauters, Bug Report, bug in init of hv class driver 
bug in init
-----------

I used the lv.c class driver, combined with a custom device driver, to control 
our Keithley2611B source meter. This to set negative voltage on Si detectors.

In the 'init' routing, the class driver sets the hv:

  hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);

This fails for negative voltage, as it sets the (negative) voltage limit, instead 
of the demand voltage. A simple 'fabs' solves this.

suggestion for 'idle'
---------------------

I let the device do the ramping, not the driver. This also means I have to reset 
the state of the device (current limit) after ramping. The easiest way to to 
this, is using CMD_IDLE of the device driver. This is currently not done in the 
hv.c class driver.
    Reply  17 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver 
Hi, Frederick, this is my personal opinion on the slow controls hv classes, I have
used them a couple of times and I found them full of little buglets like this,
plus some incomplete functions, plus some missing features, plus it is all
written in C trying to do object oriented programming. On the balance my opinion
is that it is less work to write a high voltage control program in C++ from scratch
using the regular midas frontend infrastructure compared to having to understand
the hv class driver, write the missing bits, fix the little buglets, debug
the crashes in the C string handling, and what not. (For example I had to debug
mysterious failures to pass float and double values through the C stdarg interface,
there are more fun things to do out there).

K.O.

> bug in init
> -----------
> 
> I used the lv.c class driver, combined with a custom device driver, to control 
> our Keithley2611B source meter. This to set negative voltage on Si detectors.
> 
> In the 'init' routing, the class driver sets the hv:
> 
>   hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
> 
> This fails for negative voltage, as it sets the (negative) voltage limit, instead 
> of the demand voltage. A simple 'fabs' solves this.
> 
> suggestion for 'idle'
> ---------------------
> 
> I let the device do the ramping, not the driver. This also means I have to reset 
> the state of the device (current limit) after ramping. The easiest way to to 
> this, is using CMD_IDLE of the device driver. This is currently not done in the 
> hv.c class driver.
    Reply  21 Nov 2017, Stefan Ritt, Bug Report, bug in init of hv class driver 
> bug in init
> -----------
> 
> I used the lv.c class driver, combined with a custom device driver, to control 
> our Keithley2611B source meter. This to set negative voltage on Si detectors.
> 
> In the 'init' routing, the class driver sets the hv:
> 
>   hv_info->demand_mirror[i] = MIN(hv_info->demand[i], hv_info->voltage_limit[i]);
> 
> This fails for negative voltage, as it sets the (negative) voltage limit, instead 
> of the demand voltage. A simple 'fabs' solves this.
> 
> suggestion for 'idle'
> ---------------------
> 
> I let the device do the ramping, not the driver. This also means I have to reset 
> the state of the device (current limit) after ramping. The easiest way to to 
> this, is using CMD_IDLE of the device driver. This is currently not done in the 
> hv.c class driver.

I can't find the line you quote in the class driver. Why don't you make a git pull request
and I will approve it.

The original idea behind the hv driver is that all voltages in the ODB and the class driver are
positive. If you have a negative power supply, then the voltage is inverted at the device
driver level. That's why you have MIN and MAX in the class driver.

Stefan
    Reply  21 Nov 2017, Konstantin Olchanski, Bug Report, bug in init of hv class driver 
> 
> The original idea behind the hv driver is that all voltages in the ODB and the class driver are
> positive. If you have a negative power supply, then the voltage is inverted at the device
> driver level. That's why you have MIN and MAX in the class driver.
> 

This rings a bell. I used the hv class driver to write a frontend for the L1440 mainframe (negative voltage),
on ODB it will be positive values, when writing to the device I had to add a minus sign,
and when reading back they came back negative and I had to add an fabs() in the comparison
between readback and demand.

Persons with bipolar power supplies need not apply.

K.O.
Entry  01 Dec 2017, Frederik Wauters, Bug Report, small bug in mfe.c init 
There is a small bug in the mfe.c initialization for the EQ_POLLED mode. There 
is a routine where the number of polls fitting in eq_info->period is counted:


         count = 1;
         do {
            if (display_period)
               printf(".");

            start_time = ss_millitime();

            poll_event(equipment[idx].info.source, (INT)count, TRUE);

            delta_time = ss_millitime() - start_time;

            ...

            if (delta_time > 0)
               count = count * eq_info->period / delta_time;
            else
               count *= 100;
            
            // avoid overflows
            if (count > 2147483647.0) {
               count = 2147483647.0;
               break;
            }
            
         } while (delta_time > eq_info->period * 1.2 || delta_time < eq_info-
>period * 0.8);

As "start_time = ss_millitime();" resets "delta_time" each time, only the 
"avoid overflows" addition saves the day. 

start_time = ss_millitime(); show be out of the loop.
    Reply  01 Dec 2017, Stefan Ritt, Bug Report, small bug in mfe.c init 
> There is a small bug in the mfe.c initialization for the EQ_POLLED mode. There 
> is a routine where the number of polls fitting in eq_info->period is counted:
> 
> 
>          count = 1;
>          do {
>             if (display_period)
>                printf(".");
> 
>             start_time = ss_millitime();
> 
>             poll_event(equipment[idx].info.source, (INT)count, TRUE);
> 
>             delta_time = ss_millitime() - start_time;
> 
>             ...
> 
>             if (delta_time > 0)
>                count = count * eq_info->period / delta_time;
>             else
>                count *= 100;
>             
>             // avoid overflows
>             if (count > 2147483647.0) {
>                count = 2147483647.0;
>                break;
>             }
>             
>          } while (delta_time > eq_info->period * 1.2 || delta_time < eq_info-
> >period * 0.8);
> 
> As "start_time = ss_millitime();" resets "delta_time" each time, only the 
> "avoid overflows" addition saves the day. 
> 
> start_time = ss_millitime(); show be out of the loop.

Nope.

What I want is to determine how often I have to call poll_event to stay there for a certain time (usually 100ms). So I iterate "count" until I roughly get to my 100ms. Each call to 
poll_event with a different count is a new measurement, therefore I initialize start_time before each measurement. If i do it outside the loop, and kind of incrementally increase 
it, then the whole code inside the loop is added to the measurement which makes it (slightly) wrong.

The whole loop optimization has some background. Polling can be sow (think of talking to a device via Ethernet which can easily take milli seconds). So how often do we poll 
before we do other things in the main look (like looking if a run has been started). If I only poll once, then the average front-end response time would be poor, because I mostly 
look if a run has been started in the main loop. This is not effective. If I poll too often inside the poll_event loop, then the front-end does not react on run stops any more. So 
there is some optimum, and this is set by the polling time of usually 100ms. This ensures that the front-end does optimal polling - without ANYTHING in between - for about 
100ms. But how can I know how often I should poll for 100 ms? As said above, polling can be very fast (reading a memory cell) or very slow (network). The the best method I 
found is to do a calibration at the startup, and this is what the code above does. Maybe there are better ways today, but that code worked nicely in the last 25 years.

Stefan
    Reply  04 Dec 2017, Frederik Wauters, Bug Report, small bug in mfe.c init 
> > There is a small bug in the mfe.c initialization for the EQ_POLLED mode. There 
> > is a routine where the number of polls fitting in eq_info->period is counted:
> > 
> > 
> >          count = 1;
> >          do {
> >             if (display_period)
> >                printf(".");
> > 
> >             start_time = ss_millitime();
> > 
> >             poll_event(equipment[idx].info.source, (INT)count, TRUE);
> > 
> >             delta_time = ss_millitime() - start_time;
> > 
> >             ...
> > 
> >             if (delta_time > 0)
> >                count = count * eq_info->period / delta_time;
> >             else
> >                count *= 100;
> >             
> >             // avoid overflows
> >             if (count > 2147483647.0) {
> >                count = 2147483647.0;
> >                break;
> >             }
> >             
> >          } while (delta_time > eq_info->period * 1.2 || delta_time < eq_info-
> > >period * 0.8);
> > 
> > As "start_time = ss_millitime();" resets "delta_time" each time, only the 
> > "avoid overflows" addition saves the day. 
> > 
> > start_time = ss_millitime(); show be out of the loop.
> 
> Nope.
> 
> What I want is to determine how often I have to call poll_event to stay there for a certain time (usually 100ms). So I iterate "count" until I roughly get to my 100ms. Each call to 
> poll_event with a different count is a new measurement, therefore I initialize start_time before each measurement. If i do it outside the loop, and kind of incrementally increase 
> it, then the whole code inside the loop is added to the measurement which makes it (slightly) wrong.
> 
> The whole loop optimization has some background. Polling can be sow (think of talking to a device via Ethernet which can easily take milli seconds). So how often do we poll 
> before we do other things in the main look (like looking if a run has been started). If I only poll once, then the average front-end response time would be poor, because I mostly 
> look if a run has been started in the main loop. This is not effective. If I poll too often inside the poll_event loop, then the front-end does not react on run stops any more. So 
> there is some optimum, and this is set by the polling time of usually 100ms. This ensures that the front-end does optimal polling - without ANYTHING in between - for about 
> 100ms. But how can I know how often I should poll for 100 ms? As said above, polling can be very fast (reading a memory cell) or very slow (network). The the best method I 
> found is to do a calibration at the startup, and this is what the code above does. Maybe there are better ways today, but that code worked nicely in the last 25 years.
> 
> Stefan

Thanks, I misunderstood the loop then. If poll_event(equipment[idx].info.source, (INT)count, TRUE); doesn`t do anything with "count", the loop becomes infinite except for the overflow 
check. 
    Reply  04 Dec 2017, Stefan Ritt, Bug Report, small bug in mfe.c init 
> Thanks, I misunderstood the loop then. If poll_event(equipment[idx].info.source, (INT)count, TRUE); doesn`t do anything with "count", the loop becomes infinite except for the overflow 
> check. 

Well, the function poll_event() is _supposed_ to use "count" in a for loop as written in the example frontend:

   for (i = 0; i < count; i++) {
      /* poll hardware and set flag to TRUE if new event is available */
      flag = TRUE;

      if (flag)
         if (!test)
            return TRUE;
   }

where "flag = TRUE" must be replaced with the proper hardware check. This can be a VME access, a network TCP exchange with some Ethernet based hardware, or even a mutex check if the events are collected by a 
separate thread in the frontend.

The idea of having the for (i=0 ; i<count ; i++) loop _inside_ the poll_event() function and not outside is the fact that each function call to poll_event() takes time, and we want the minimal possible response time to new 
events. It might be just a micro-second, but having an experiment running at 100 Hz for one year (like Mu3e), this adds up to about one hour per year, which is a considerable amount of precious beam time.

Stefan
Entry  10 Jan 2018, Andreas Suter, Bug Report, mhttpd - custom page - RHEL/Fedora system.c.diff
Description of the problem (starting with 61be7a1):

When starting a new experiment, creating a fresh ODB and than adding the 
directory '/Custom', the mhttpd runs into a problem on RHEL/Fedora, but not on 
Ubuntu and macOS. When trying to open the ODB from within whatever browser I get 
the following error message in the midas message queque:

[mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of 
4096 returned -1, errno 21 (Is a directory)

and in the browser I get a popup which tries to save a file called 'root'.

I track this down to the following: in mhttpd, interprete (line 18046) it is 
check if a custom page file exists (ss_file_exist) and if yes, it tries to 'load' 
it. Now, at this stage the variable dec_path contains '/root'.

Here now what goes wrong: ss_file_exist tries to open the given path, and if a 
valid file descriptor is returned it assumes the file exists. This is not 
perfectly correct since it also will get a valid file descriptor is path is an 
accessible directory!

Now for whatever reason, on RHEL/Fedora '/root' will return a valid file 
descriptor, but not on macOS and Ubuntu. Others I haven't tested. A possible fix 
would be to check explicitly if path is a directory and if yes return 0 in 
ss_file_exist (see attached diff).

Perhaps there is cleaner way to deal with this issue?! 
ELOG V3.1.4-2e1708b5