Back Midas Rome Roody Rootana
  Midas DAQ System, Page 33 of 146  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
    Reply  24 Mar 2022, Konstantin Olchanski, Bug Fix, mhttpd bug fixed 
> > 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> > (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> > 
> > 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> > but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> > nc pointer is no longer stale, but points to 2nd wget's connection.
> 
> Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
> mistakenly re-used by the next thread.
> 

My description was unclear. I will try better now.

When http replies are generated by worker threads, matching of reply to mg_connection is done
by checking the address of the mg_connection object. (mongoose itself unhelpfully offers
to send the reply to every mg_connection, see the responder to mg_broadcast() messages).

This works for open/active connections, addresses of all mg_connections are unique.

But if connection is closed and a new connection is opened, the address is reused (by malloc()/free()
reusing memory blocks or by mongoose using a pool of mg_connection objects, does not matter).

So matching http reply to mg_connection using only address of mg_connection can match the wrong connection.

(contents of mg_connection object does not matter, only address is used by matching. so memzero() of
mg_connection object does not help).

I saw this during my testing - wrong data was sent to wrong browser often enough - but did
not understand that the above problem is happening.

Because I was unable to reliably reproduce the problem, I could not debug it. I tried to add
a check for the tcp socket file descriptor number, in case there is a straight bug or multithread race
or simple memory corruption. This replaced "we sent wrong data to wrong browser, poisoned browser
cache, confused the user" with a crash. This "fix" seemed effective at the time.

Maybe I should mention browser cache poisoning again. What happened is html pages and rpc replies
were returned as responses to load things like CSS files, these bad responses are cached by the browser
pretty much forever, so all subsequent midas pages will look wrong (bad css!) forever, until
user manually clears browser cache. reload of page did not help, restart of browser did not help (I think).

So a very bad bug.

Unfortunately, the check for file descriptor was not effective because file descriptors are also
reused. And I did see wrong data returned by mhttpd, but even more rarely. And everybody (myself
included) complained about mhttpd crashes.

Now, matching of responses to connections is done by connection sequential/serial number,
which is unique 32-bit counter. Mismatch of reply to connection should not happen again.

P.S. Latest version of the mongoose web server library does not help with this problem,
the example code for matching reply to connection in their multithread example looks bogus:
https://github.com/cesanta/mongoose/blob/master/examples/multi-threaded/main.c

K.O.
    Reply  24 Mar 2022, Stefan Ritt, Bug Fix, mhttpd bug fixed 
I see, now I understand.

As for the browser cache problem: This Chrome extension is your friend: 

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn?hl=en

I use it all the time I change the CSS or a JS file. Having the "Developer Tools" open in Chrome helps as well 
(cache is then turned off). Firefox has similar extensions.

Stefan
    Reply  24 Mar 2022, Konstantin Olchanski, Bug Fix, mhttpd bug fixed 
> As for the browser cache problem: This Chrome extension is your friend ...

for google chrome, it is easy, open the javascript debugger (left-click "inspect"),
the reload button becomes a left-click menu, one left-click option is "clear cache and reload".
(there is no button for "clear cookies and reload", re recent elog cookie problem).

but this does not help me personally any. if midas web pages get confused, I will also get confused, too,
and I will spend hours debugging mhttpd before thinking "hmm... maybe I should clear the browser cache!"

not sure about firefox, safari, microsoft edge and opera. if I ever need it, I google it.

K.O.
Entry  25 Jan 2021, Thomas Lindner, Suggestion, mhttpd browser caching 
I have a more subtle point about the new ODB key for using an external elog I mentioned in [1].  I was very confused after changing the ODB "External Elog" because mhttpd still wasn't using my external elog URL.  I started trying to debug mhttpd.cxx, but found a lot of bits of mhttpd didn't seem to be getting called.  I eventually realized that my browser had been caching the responses for some (though not all) of the MIDAS navigation buttons.  Clearing my browser cache fixed the problem and allowed me to use the MIDAS button to the external ELOG.  This caching happens on my macbook for both Firefox 84.0.2 and Safari 13.1.

Many of the requests to mhttpd end up going to send_fp(), where we explicitly set the cache time to 24 hours.

   // send HTTP cache control headers                                                                                               
   time_t now = time(NULL);
   now += (int) (3600 * 24);
   struct tm* gmt = gmtime(&now);
   const char* format = "%A, %d-%b-%y %H:%M:%S GMT";
   char str[256];
   strftime(str, sizeof(str), format, gmt);
   r->rsprintf("Expires: %s\r\n", str);

Some other MIDAS buttons don't seem to be cached by the browser; for instance the response for the 'OldHistory' button doesn't get cached.

Should we remove the cache instruction for at least some of the buttons?  At least for the elog button where we want the link direction to get switched by an ODB key the caching seems a bad idea.

[1] https://midas.triumf.ca/elog/Midas/2078
    Reply  25 Jan 2021, Stefan Ritt, Suggestion, mhttpd browser caching 
Let me first explain a bit why caching is there. Once we had the case that someone from 
TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 

When we looked at it, we found that the custom page pulled about 100 items with individual
HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
the custom page communication so that many ODB entries could be retrieved in one operation,
which improved the loading time from 100s to about 2s.

With the buttons we will have to make the same compromise. If we do not cache anything,
loading the midas status page over the Pacific takes many seconds. If we cache all, any
change on the midas side will not be reflected on the web page. So there is a compromise
to be made. I thought I designed it such that the side menu is cached locally, but when
the user presses "reload", then the full menu is fetched from the server. Of course one
has to remember this, so changing the ELOG URL or other things on the menu require a
reload (or wait a certain time for the cache to expire). So try again if that's working
for you. If not, I can visit it again and check if there is any bug.

If we go the route to disable the cache, better try this to T2K and see what you get before
we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
load times.

Best,
Stefan
    Reply  25 Jan 2021, Thomas Lindner, Suggestion, mhttpd browser caching 
I tried reloading the pages.  If I reloaded the actual elog page 

https://server.triumf.ca/?cmd=Elog

then it bypassed the cache and got the correct updated page from mhttpd.

However, if when I reloaded the status page

https://server.triumf.ca/?cmd=Status

and then clicked the Elog button then I just got the cached (old) page.  Admittedly reloading the status page doesn't make so much sense (once I thought about it), but it is what I tried first (I'm good at modelling unexpected user behaviour); so there is some risk that the user will try reloading the wrong page and will be stuck not getting the external elog page (until 24 hours runs out).

Anyway, I will update the documentation to note that you need to reload the elog page after changing this variable.  That's probably an adequate solution.

I certainly don't suggest getting rid of caching entirely.  I was trying to think whether there was a set of pages where it would make sense to disable the cache (like the elog page).  But maybe that will just cause more problems.


> Let me first explain a bit why caching is there. Once we had the case that someone from 
> TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 
> 
> When we looked at it, we found that the custom page pulled about 100 items with individual
> HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
> the custom page communication so that many ODB entries could be retrieved in one operation,
> which improved the loading time from 100s to about 2s.
> 
> With the buttons we will have to make the same compromise. If we do not cache anything,
> loading the midas status page over the Pacific takes many seconds. If we cache all, any
> change on the midas side will not be reflected on the web page. So there is a compromise
> to be made. I thought I designed it such that the side menu is cached locally, but when
> the user presses "reload", then the full menu is fetched from the server. Of course one
> has to remember this, so changing the ELOG URL or other things on the menu require a
> reload (or wait a certain time for the cache to expire). So try again if that's working
> for you. If not, I can visit it again and check if there is any bug.
> 
> If we go the route to disable the cache, better try this to T2K and see what you get before
> we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
> load times.
> 
> Best,
> Stefan
    Reply  08 Feb 2021, Konstantin Olchanski, Suggestion, mhttpd browser caching 
>    r->rsprintf("Expires: %s\r\n", str);

The best I can tell, none of this works in current browsers. with google-chrome,
I see it cache pretty much everything regardless of "expires", "no cache", etc
and anything else I tried.

Things like shift-<reload>, etc used to work to refresh the cache, but not any more.

So, I too, see confusing side-effects of caching, where I change something in ODB,
but "nothing happens". Then I scratch my head for 30 minutes until I remember
to open the javascript debugger where shift-<reload> (or is it ctrl-<reload>) actually works.

It seems that the only reliable way to bypass the browser cache is to add
a tag with a random number to the URL ("&ts=currenttime").

This is for HTTP GET requests. HTTP POST does not seem to be cached, so I do not worry
about this nonsense for json-rpc requests.

Perhaps we should do this random number trick for all user actions. User can
press buttons only so fast, we should be able to sustain the rate. Anything
loaded automatically or from a timer, we should allow caching.

BTW, things like midas.js are also cached, and it is common to see problems
after updating midas, where status.html is newly loaded, but midas.js is an old
stale version from cache.

Messy.

K.O.
    Reply  08 Feb 2021, Stefan Ritt, Suggestion, mhttpd browser caching 
> It seems that the only reliable way to bypass the browser cache is to add
> a tag with a random number to the URL ("&ts=currenttime").

Indeed that's the only reliable way to avoid caching across browsers. An alternative is

("&r=" + Math.random())

to add a random number.


> BTW, things like midas.js are also cached, and it is common to see problems
> after updating midas, where status.html is newly loaded, but midas.js is an old
> stale version from cache.

Reloading JavaScript file NOT from the cache is really tricky these days. I added a
special Google Chrome extension to clear my browser cache, which works reliably:

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn

Stefan
Entry  12 May 2021, Mathieu Guigue, Bug Report, mhttpd WebServer ODBTree initialization 
Hi,

Using midas version 12-2020,  I am trying to run mhttpd from within a docker container using docker-compose.
Starting from an empty ODB, I simply run `mhttpd` and this is the output I have:
midas_hatfe_1  | <Warning> Starting mhttpd...
midas_hatfe_1  | [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
midas_hatfe_1  | MVOdb::SetMidasStatus: Error: MIDAS db_find_key() at ODB path "/WebServer/Host list" returned status 312
midas_hatfe_1  | Mongoose web server will not use password protection
midas_hatfe_1  | Mongoose web server will not use the hostlist, connections from anywhere will be accepted
midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

According to the documentation, the WebServer tree should be created automatically when starting the mhttpd; but it seems not as it doesn't find the entry "/WebServer/Host list".
If I create it by end (using "create STRING /WebServer/Host list"), I still get the error message that mhttpd didn't bind properly to the local port 8080.
I am not sure what it wrong, as mhttpd is working perfectly well in this exact container for midas 03-2020.

Any idea what difference makes it not possible anymore to run into these container?

Thanks very much for your help.
Cheers
Mathieu
    Reply  12 May 2021, Ben Smith, Bug Report, mhttpd WebServer ODBTree initialization 
> midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
> midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
    Reply  12 May 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization 
> It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.

We had this issue already several times. This info should be put into the documentation at a prominent location.

Stefan
    Reply  13 May 2021, Mathieu Guigue, Bug Report, mhttpd WebServer ODBTree initialization 
> > It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
> 
> We had this issue already several times. This info should be put into the documentation at a prominent location.
> 
> Stefan

Thanks a lot, this solved my issue!
    Reply  14 May 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization 
> Thanks a lot, this solved my issue!

... or we should turn IPv6 off by default, since not many people use this right now.
    Reply  02 Jun 2021, Konstantin Olchanski, Bug Report, mhttpd WebServer ODBTree initialization 
> > Thanks a lot, this solved my issue!
> 
> ... or we should turn IPv6 off by default, since not many people use this right now.

IPv6 certainly works and is used at CERN.

But I am not sure why people see this message. I do not see it on any machines at 
TRIUMF, even those with IPv6 turned off.

K.O.
    Reply  05 Aug 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization 
Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall 
I do it?
Entry  19 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks  
I guess this might be related to the changes in the last elog conversation; but
I'll break it out as a separate problem.

The new mhttpd ODB editor seems to resize all strings (not just strings that are
greater than 256 characters).  So, when I change some string with the mhttpd ODB
editor to 'ffffff', then I find that the string size is now ~7 characters.

This might be fine in general; but it seems to cause a problem when dealing with
alarms.  In particular, I find that if I try to set (through mhttpd) the
"execute command" for an alarm class or the "condition" for an alarm, then I get
into lots of trouble.  For instance, I changed the "execute command" for my
alarm class through mhttpd; when associated alarms were triggered, I got errors

21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
alarm class record

This makes sense, since ALARM_CLASS has a fixed size

typedef struct {
   BOOL write_system_message;
   ...
   char execute_command[256];
   ...
   char display_fgcolor[32];
} ALARM_CLASS;

so problems will clearly occur when I change the size and try to grab it:

   ALARM_CLASS ac;
   status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
 
I guess that similar problems also occur if you edit the string for ALARM or
PROGRAM_INFO instances.  These problems do not occur when I change my strings
with odbedit, which doesn't resize strings below 256.

I'm not sure what the proper solution is.  A temporary solution is that the
mhttpd ODB editor shouldn't resize strings if the new size is less than 256
characters; in that case the size should be left as 256 characters.

This test was done with MIDAS git repository as of today:
commit 45a90dc329554f528485da121501daf6ecde100d
    Reply  21 Jun 2017, Thomas Lindner, Bug Report, mhttpd ODB editor changes string length, breaks  
To follow up; with some help from Konstantin and Stefan, we realized that this
particular problem should already be fixed.  While I was using the most recent version
of MIDAS, I hadn't rebuild the EPICS frontend programs when I was doing this test.  Once
I did that the error no longer occurred.  This is because the most recent version of
MIDAS includes a check that will resize these particular string variables before using
them (technically, this is included in db_get_record1()); this resizing only happens for
these couple strings that must have a fixed size.

We are still having a separate discussion about whether this treatment of string lengths
that need to have a fixed size can be further improved.  Will update once discussion
converges.


> I guess this might be related to the changes in the last elog conversation; but
> I'll break it out as a separate problem.
> 
> The new mhttpd ODB editor seems to resize all strings (not just strings that are
> greater than 256 characters).  So, when I change some string with the mhttpd ODB
> editor to 'ffffff', then I find that the string size is now ~7 characters.
> 
> This might be fine in general; but it seems to cause a problem when dealing with
> alarms.  In particular, I find that if I try to set (through mhttpd) the
> "execute command" for an alarm class or the "condition" for an alarm, then I get
> into lots of trouble.  For instance, I changed the "execute command" for my
> alarm class through mhttpd; when associated alarms were triggered, I got errors
> 
> 21:58:12 [feSourceEpics,ERROR] [odb.c:9133:db_get_record,ERROR] struct size
> mismatch for "/Alarms/Classes/Alarm" (expected size: 348, size in ODB: 100)
> 21:58:12 [feSourceEpics,ERROR] [alarm.c:379:al_trigger_class,ERROR] Cannot get
> alarm class record
> 
> This makes sense, since ALARM_CLASS has a fixed size
> 
> typedef struct {
>    BOOL write_system_message;
>    ...
>    char execute_command[256];
>    ...
>    char display_fgcolor[32];
> } ALARM_CLASS;
> 
> so problems will clearly occur when I change the size and try to grab it:
> 
>    ALARM_CLASS ac;
>    status = db_get_record1(hDB, hkeyclass, &ac, &size, 0, strcomb(alarm_class_str));
>  
> I guess that similar problems also occur if you edit the string for ALARM or
> PROGRAM_INFO instances.  These problems do not occur when I change my strings
> with odbedit, which doesn't resize strings below 256.
> 
> I'm not sure what the proper solution is.  A temporary solution is that the
> mhttpd ODB editor shouldn't resize strings if the new size is less than 256
> characters; in that case the size should be left as 256 characters.
> 
> This test was done with MIDAS git repository as of today:
> commit 45a90dc329554f528485da121501daf6ecde100d
    Reply  17 May 2013, Konstantin Olchanski, Info, mhttpd JSON-P support 
> 
> Added JSON encoding format to Javascript ODBCopy(path,format) ("jcopy"). Use format="json", Javascript example updated with an example example.
> 

More ODBCopy() expansion: format="json-p" returns data suitable for JSON-P ("script tag") messaging.

Also implemented multiple-paths for "jcopy" (similar to "jget"/ODBMGet()). An example ODBMCopy(paths,callback,format) is present in example.html (will move to mhttpd.js).

Added JSON encoding options:
- format="json-nokeys" will omit all KEY information except for "last_written"
- "json-nokeys-nolastwritten" will also omit "last_written"
- "json-nofollowlinks" will return ODB symlink KEYs instead of following them (ODBGet/ODBMGet always follows symlinks)
- "json-p" adds JSON-P encapsulation
All these JSON format options can be used at the same time, i.e. format="json-p-nofollowlinks"

To see how it all works, please look at examples/javascript1/example.html.

The new code seems to be functional enough, but it is still work in progress and there are a few problems:
- ODBMCopy() using the "xml" format returns gibberish (the MIDAS XML encoder has to be told to omit the <?xml> header)
- example.html does not actually parse any of the XML data, so we do not know if XML encoding is okey
- JSON encoding has an extra layer of objects (variables.Variables.foo instead of variables.foo)
- ODBRpc() with JSON/JSON-P encoding not done yet.

mhttpd.cxx, example.html
svn rev 5364
K.O.
    Reply  31 May 2013, Konstantin Olchanski, Info, mhttpd JSON-P support 
> To see how it all works, please look at examples/javascript1/example.html.
> 
> - JSON encoding has an extra layer of objects (variables.Variables.foo instead of variables.foo)
>

This is now fixed. See updated example.html. Current encoding looks like this:

{
  "System" : {
    "Clients" : {
      "24885" : {
        "Name/key" : { "type" : 12, "item_size" : 32, "last_written" : 1370024816 },
        "Name" : "ODBEdit",
        "Host/key" : { "type" : 12, "item_size" : 256, "last_written" : 1370024816 },
        "Host" : "ladd03.triumf.ca",
        "Hardware type/key" : { "type" : 7, "last_written" : 1370024816 },
        "Hardware type" : 44,
        "Server Port/key" : { "type" : 7, "last_written" : 1370024816 },
        "Server Port" : 52539
      }
    },
    "Tmp" : {
...

odb.c, example.html
svn rev 5368
K.O.
    Reply  10 May 2013, Konstantin Olchanski, Info, mhttpd JSON support 
> odbedit can now save ODB in JSON-formatted files.
> Added functions:
>    INT EXPRT db_save_json(HNDLE hDB, HNDLE hKey, const char *file_name);
>    INT EXPRT db_copy_json(HNDLE hDB, HNDLE hKey, char **buffer, int *buffer_size, int *buffer_end, int  save_keys, int follow_links);
> 

Added JSON encoding format to Javascript ODBCopy() ("jcopy"). Use format="json", Javascript example updated with an example example.

Also updated db_copy_json():
- always return NUL-terminated string
- "save_keys" values: 0 - do not save any KEY data, 1 - save all KEY data, 2 - save only KEY.last_written

odb.c, mhttpd.cxx, example.html
svn rev 5362
K.O.
ELOG V3.1.4-2e1708b5