Back Midas Rome Roody Rootana
  Midas DAQ System, Page 34 of 150  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  593   05 Jun 2009 Konstantin OlchanskiBug Reportmhttpd command line experiment specifying
> I guess you have to do some debugging there. Note that "detached" transitions have 
> been implemented recently by Konstantin, so maybe your problem is related to that. 
> In this case Konstantin should check what's wrong.

Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
mtransition can only connect to the "default" experiment. Will fix. K.O.
  596   18 Jun 2009 Konstantin OlchanskiBug Reportmhttpd command line experiment specifying
> > I guess you have to do some debugging there. Note that "detached" transitions have 
> > been implemented recently by Konstantin, so maybe your problem is related to that. 
> > In this case Konstantin should check what's wrong.
> 
> Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
> mtransition can only connect to the "default" experiment. Will fix. K.O.

Fixed midas.c svn rev 4506: in cm_transition(), always pass "-e expt" to mtransition, if connected remotely, pass the
"-h host:port".

svn rev 4506
K.O.
  376   21 May 2007 Konstantin OlchanskiInfomhttpd changes to use /History/Tags data
I am slowly commiting the changes to the history code. This installement adds
code to mhttpd to use the /History/Tags data (to be) generated by the mlogger.

In the nutshell, the logger fills /History/Tags to "remember" what events,
variables and tags exist in the history files.

This replaces the old code that attempts to guess the contents of history files
by looking at /Equipment tree.

To ease the transition to the new system, I am leaving all the old code alive
and active in the absense of "/History/Tags" entries.

As soon as one starts using the new mlogger (to be commited), the new tags based
mhttpd code will activate itself.

K.O.
  2364   23 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
the mhttpd bug should be fixed now (branch feature/buffer_mutex).

simplest way to reproduce:

wget http://localhost:8080/
quickly ctrl-C it
wget http://localhost:8080/
inside mhttpd (by hook or crook) observe that the second wget got the data meant for the first wget.

if you cannot ctrl-C the first wget quickly enough, put a sleep somewhere in the worker thread (in 
mongoose_write(), I think).

this is what happens.

1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
(corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)

2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
nc pointer is no longer stale, but points to 2nd wget's connection.

so we think we are clever and we check the socket file descriptors. but same thing
happens there, too. if 1st wget was file descriptor 7, it is closed, (1st wget worker now has
a stale file handle), then reopened for the 2nd wget, per POSIX, we get back the same
file descriptor 7. 1st wget worker now has the file handle for the 2nd wget tcp socket and
the famous test/crash for "sending data to wrong socket" is defeated.

now, worker thread for the 1st wget wants to send a reply, it has a valid nc pointer (points to 2nd wget's
mg_connection object) and a valid file descriptor (points to 2nd wget's tcp socket),
reply meant for the 1st wget is successfully sent to the 2nd wget, 2nd wget finishes, it's socket
is closed, mg_connection object is free'ed. Now the worker thread for the 2nd wget has stale
connection info, but this is okey, mongoose does not find a matching connection, 2nd wget
worked thread reply goes nowhere, thread finishes silently (no memory leaks here, I checked).

so, connection for 2nd wget completely impersonates the closed connection of 1st wget (I guess I could
check the full socket address info, remote ip address, remote port number, etc, but...)

in practice, this bug does not happen often because modern browsers tend to keep tcp sockets open
for very long time. (not sure about sundry web proxies, etc).

solution of course is very simple. match worker thread data to mongoose mg_connection objects
using our own connection sequential number, which are unique and very easy to keep track
of through the mongoose event handler. all this mess runs in the main thread,
so no locking trouble here, small blessing.

K.O.
  2366   24 Mar 2022 Stefan RittBug Fixmhttpd bug fixed
> 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> 
> 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> nc pointer is no longer stale, but points to 2nd wget's connection.

Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
mistakenly re-used by the next thread.

Stefan
  2368   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> > 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> > (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> > 
> > 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> > but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> > nc pointer is no longer stale, but points to 2nd wget's connection.
> 
> Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
> mistakenly re-used by the next thread.
> 

My description was unclear. I will try better now.

When http replies are generated by worker threads, matching of reply to mg_connection is done
by checking the address of the mg_connection object. (mongoose itself unhelpfully offers
to send the reply to every mg_connection, see the responder to mg_broadcast() messages).

This works for open/active connections, addresses of all mg_connections are unique.

But if connection is closed and a new connection is opened, the address is reused (by malloc()/free()
reusing memory blocks or by mongoose using a pool of mg_connection objects, does not matter).

So matching http reply to mg_connection using only address of mg_connection can match the wrong connection.

(contents of mg_connection object does not matter, only address is used by matching. so memzero() of
mg_connection object does not help).

I saw this during my testing - wrong data was sent to wrong browser often enough - but did
not understand that the above problem is happening.

Because I was unable to reliably reproduce the problem, I could not debug it. I tried to add
a check for the tcp socket file descriptor number, in case there is a straight bug or multithread race
or simple memory corruption. This replaced "we sent wrong data to wrong browser, poisoned browser
cache, confused the user" with a crash. This "fix" seemed effective at the time.

Maybe I should mention browser cache poisoning again. What happened is html pages and rpc replies
were returned as responses to load things like CSS files, these bad responses are cached by the browser
pretty much forever, so all subsequent midas pages will look wrong (bad css!) forever, until
user manually clears browser cache. reload of page did not help, restart of browser did not help (I think).

So a very bad bug.

Unfortunately, the check for file descriptor was not effective because file descriptors are also
reused. And I did see wrong data returned by mhttpd, but even more rarely. And everybody (myself
included) complained about mhttpd crashes.

Now, matching of responses to connections is done by connection sequential/serial number,
which is unique 32-bit counter. Mismatch of reply to connection should not happen again.

P.S. Latest version of the mongoose web server library does not help with this problem,
the example code for matching reply to connection in their multithread example looks bogus:
https://github.com/cesanta/mongoose/blob/master/examples/multi-threaded/main.c

K.O.
  2371   24 Mar 2022 Stefan RittBug Fixmhttpd bug fixed
I see, now I understand.

As for the browser cache problem: This Chrome extension is your friend: 

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn?hl=en

I use it all the time I change the CSS or a JS file. Having the "Developer Tools" open in Chrome helps as well 
(cache is then turned off). Firefox has similar extensions.

Stefan
  2374   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> As for the browser cache problem: This Chrome extension is your friend ...

for google chrome, it is easy, open the javascript debugger (left-click "inspect"),
the reload button becomes a left-click menu, one left-click option is "clear cache and reload".
(there is no button for "clear cookies and reload", re recent elog cookie problem).

but this does not help me personally any. if midas web pages get confused, I will also get confused, too,
and I will spend hours debugging mhttpd before thinking "hmm... maybe I should clear the browser cache!"

not sure about firefox, safari, microsoft edge and opera. if I ever need it, I google it.

K.O.
  2079   25 Jan 2021 Thomas LindnerSuggestionmhttpd browser caching
I have a more subtle point about the new ODB key for using an external elog I mentioned in [1].  I was very confused after changing the ODB "External Elog" because mhttpd still wasn't using my external elog URL.  I started trying to debug mhttpd.cxx, but found a lot of bits of mhttpd didn't seem to be getting called.  I eventually realized that my browser had been caching the responses for some (though not all) of the MIDAS navigation buttons.  Clearing my browser cache fixed the problem and allowed me to use the MIDAS button to the external ELOG.  This caching happens on my macbook for both Firefox 84.0.2 and Safari 13.1.

Many of the requests to mhttpd end up going to send_fp(), where we explicitly set the cache time to 24 hours.

   // send HTTP cache control headers                                                                                               
   time_t now = time(NULL);
   now += (int) (3600 * 24);
   struct tm* gmt = gmtime(&now);
   const char* format = "%A, %d-%b-%y %H:%M:%S GMT";
   char str[256];
   strftime(str, sizeof(str), format, gmt);
   r->rsprintf("Expires: %s\r\n", str);

Some other MIDAS buttons don't seem to be cached by the browser; for instance the response for the 'OldHistory' button doesn't get cached.

Should we remove the cache instruction for at least some of the buttons?  At least for the elog button where we want the link direction to get switched by an ODB key the caching seems a bad idea.

[1] https://midas.triumf.ca/elog/Midas/2078
  2080   25 Jan 2021 Stefan RittSuggestionmhttpd browser caching
Let me first explain a bit why caching is there. Once we had the case that someone from 
TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 

When we looked at it, we found that the custom page pulled about 100 items with individual
HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
the custom page communication so that many ODB entries could be retrieved in one operation,
which improved the loading time from 100s to about 2s.

With the buttons we will have to make the same compromise. If we do not cache anything,
loading the midas status page over the Pacific takes many seconds. If we cache all, any
change on the midas side will not be reflected on the web page. So there is a compromise
to be made. I thought I designed it such that the side menu is cached locally, but when
the user presses "reload", then the full menu is fetched from the server. Of course one
has to remember this, so changing the ELOG URL or other things on the menu require a
reload (or wait a certain time for the cache to expire). So try again if that's working
for you. If not, I can visit it again and check if there is any bug.

If we go the route to disable the cache, better try this to T2K and see what you get before
we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
load times.

Best,
Stefan
  2081   25 Jan 2021 Thomas LindnerSuggestionmhttpd browser caching
I tried reloading the pages.  If I reloaded the actual elog page 

https://server.triumf.ca/?cmd=Elog

then it bypassed the cache and got the correct updated page from mhttpd.

However, if when I reloaded the status page

https://server.triumf.ca/?cmd=Status

and then clicked the Elog button then I just got the cached (old) page.  Admittedly reloading the status page doesn't make so much sense (once I thought about it), but it is what I tried first (I'm good at modelling unexpected user behaviour); so there is some risk that the user will try reloading the wrong page and will be stuck not getting the external elog page (until 24 hours runs out).

Anyway, I will update the documentation to note that you need to reload the elog page after changing this variable.  That's probably an adequate solution.

I certainly don't suggest getting rid of caching entirely.  I was trying to think whether there was a set of pages where it would make sense to disable the cache (like the elog page).  But maybe that will just cause more problems.


> Let me first explain a bit why caching is there. Once we had the case that someone from 
> TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 
> 
> When we looked at it, we found that the custom page pulled about 100 items with individual
> HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
> the custom page communication so that many ODB entries could be retrieved in one operation,
> which improved the loading time from 100s to about 2s.
> 
> With the buttons we will have to make the same compromise. If we do not cache anything,
> loading the midas status page over the Pacific takes many seconds. If we cache all, any
> change on the midas side will not be reflected on the web page. So there is a compromise
> to be made. I thought I designed it such that the side menu is cached locally, but when
> the user presses "reload", then the full menu is fetched from the server. Of course one
> has to remember this, so changing the ELOG URL or other things on the menu require a
> reload (or wait a certain time for the cache to expire). So try again if that's working
> for you. If not, I can visit it again and check if there is any bug.
> 
> If we go the route to disable the cache, better try this to T2K and see what you get before
> we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
> load times.
> 
> Best,
> Stefan
  2085   08 Feb 2021 Konstantin OlchanskiSuggestionmhttpd browser caching
>    r->rsprintf("Expires: %s\r\n", str);

The best I can tell, none of this works in current browsers. with google-chrome,
I see it cache pretty much everything regardless of "expires", "no cache", etc
and anything else I tried.

Things like shift-<reload>, etc used to work to refresh the cache, but not any more.

So, I too, see confusing side-effects of caching, where I change something in ODB,
but "nothing happens". Then I scratch my head for 30 minutes until I remember
to open the javascript debugger where shift-<reload> (or is it ctrl-<reload>) actually works.

It seems that the only reliable way to bypass the browser cache is to add
a tag with a random number to the URL ("&ts=currenttime").

This is for HTTP GET requests. HTTP POST does not seem to be cached, so I do not worry
about this nonsense for json-rpc requests.

Perhaps we should do this random number trick for all user actions. User can
press buttons only so fast, we should be able to sustain the rate. Anything
loaded automatically or from a timer, we should allow caching.

BTW, things like midas.js are also cached, and it is common to see problems
after updating midas, where status.html is newly loaded, but midas.js is an old
stale version from cache.

Messy.

K.O.
  2086   08 Feb 2021 Stefan RittSuggestionmhttpd browser caching
> It seems that the only reliable way to bypass the browser cache is to add
> a tag with a random number to the URL ("&ts=currenttime").

Indeed that's the only reliable way to avoid caching across browsers. An alternative is

("&r=" + Math.random())

to add a random number.


> BTW, things like midas.js are also cached, and it is common to see problems
> after updating midas, where status.html is newly loaded, but midas.js is an old
> stale version from cache.

Reloading JavaScript file NOT from the cache is really tricky these days. I added a
special Google Chrome extension to clear my browser cache, which works reliably:

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn

Stefan
  2163   12 May 2021 Mathieu GuigueBug Reportmhttpd WebServer ODBTree initialization
Hi,

Using midas version 12-2020,  I am trying to run mhttpd from within a docker container using docker-compose.
Starting from an empty ODB, I simply run `mhttpd` and this is the output I have:
midas_hatfe_1  | <Warning> Starting mhttpd...
midas_hatfe_1  | [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
midas_hatfe_1  | MVOdb::SetMidasStatus: Error: MIDAS db_find_key() at ODB path "/WebServer/Host list" returned status 312
midas_hatfe_1  | Mongoose web server will not use password protection
midas_hatfe_1  | Mongoose web server will not use the hostlist, connections from anywhere will be accepted
midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

According to the documentation, the WebServer tree should be created automatically when starting the mhttpd; but it seems not as it doesn't find the entry "/WebServer/Host list".
If I create it by end (using "create STRING /WebServer/Host list"), I still get the error message that mhttpd didn't bind properly to the local port 8080.
I am not sure what it wrong, as mhttpd is working perfectly well in this exact container for midas 03-2020.

Any idea what difference makes it not possible anymore to run into these container?

Thanks very much for your help.
Cheers
Mathieu
  2164   12 May 2021 Ben SmithBug Reportmhttpd WebServer ODBTree initialization
> midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
> midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
  2165   12 May 2021 Stefan RittBug Reportmhttpd WebServer ODBTree initialization
> It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.

We had this issue already several times. This info should be put into the documentation at a prominent location.

Stefan
  2167   13 May 2021 Mathieu GuigueBug Reportmhttpd WebServer ODBTree initialization
> > It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
> 
> We had this issue already several times. This info should be put into the documentation at a prominent location.
> 
> Stefan

Thanks a lot, this solved my issue!
  2168   14 May 2021 Stefan RittBug Reportmhttpd WebServer ODBTree initialization
> Thanks a lot, this solved my issue!

... or we should turn IPv6 off by default, since not many people use this right now.
  2200   02 Jun 2021 Konstantin OlchanskiBug Reportmhttpd WebServer ODBTree initialization
> > Thanks a lot, this solved my issue!
> 
> ... or we should turn IPv6 off by default, since not many people use this right now.

IPv6 certainly works and is used at CERN.

But I am not sure why people see this message. I do not see it on any machines at 
TRIUMF, even those with IPv6 turned off.

K.O.
  2269   05 Aug 2021 Stefan RittBug Reportmhttpd WebServer ODBTree initialization
Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall 
I do it?
ELOG V3.1.4-2e1708b5