Back Midas Rome Roody Rootana
  Midas DAQ System, Page 32 of 139  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  2276   17 Sep 2021 Stefan RittForummhttpd crash
To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI 
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive. 

To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into

/etc/monit.d/mhttpd

You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put 

set daemon 10

into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.

Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is 
multi-threaded, but I haven't tested this in detail.

Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).

I hope this helps some of you.

Stefan
  590   04 Jun 2009 bazinskiBug Reportmhttpd command line experiment specifying
Hi

Not sure how the rest of you specify mhttpd to work with multiple experiments on
one machine, but it would seem not the same as me ;-)

when executing mhttpd with 

mhttpd -e "experimentname" -p "experimentport" -D 

that experiment name is not transfered to transitions as cm_transition never
specifies the experiment in the call to "transition STOP" etc.
the only flag it sends is a -d for debug if selected.

The result is that the stop and start button of the webinterface does not work,
and transitions sit endlessly doing nothing but consuming all the processor,
odbedit works fine though.

Does everyone else use an apache reverse proxy and or explicit experiment choice
in the url ?

As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
should be a -e. line 13378.


Thanks
Sean
  591   05 Jun 2009 Stefan RittBug Reportmhttpd command line experiment specifying
> Not sure how the rest of you specify mhttpd to work with multiple experiments on
> one machine, but it would seem not the same as me ;-)

Please note that there has been a change concerning multiple experiments inside 
mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment, 
and the experiment name in the URL (aka ?exp=name) is not supported any more. So if 
you have several experiments, you start several instances of mhttpd now on 
different ports.

> that experiment name is not transfered to transitions as cm_transition never
> specifies the experiment in the call to "transition STOP" etc.
> the only flag it sends is a -d for debug if selected.

When connecting to an experiment, any midas client uses the ODB from that 
experiment so lives in that "namespace". So one client can never call any client 
from another experiment. So your problem must be something else. Of course there is 
not parameter "experiment" passed to cm_transition() since the experiment is 
implicitly defined by the ODB mhttpd is attached to.

> The result is that the stop and start button of the webinterface does not work,
> and transitions sit endlessly doing nothing but consuming all the processor,
> odbedit works fine though.

I guess you have to do some debugging there. Note that "detached" transitions have 
been implemented recently by Konstantin, so maybe your problem is related to that. 
In this case Konstantin should check what's wrong.

> Does everyone else use an apache reverse proxy and or explicit experiment choice
> in the url ?

I use a

ProxyPass /megon/ http://megon.psi.ch/

on our public web server to make an online machine accessible from outside the 
firewall, but just with a single experiment.

> As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
> should be a -e. line 13378.

Fixed in revision 4504.
  592   05 Jun 2009 bazinskiBug Reportmhttpd command line experiment specifying
Hi

> > Not sure how the rest of you specify mhttpd to work with multiple experiments on
> > one machine, but it would seem not the same as me ;-)
> 
> Please note that there has been a change concerning multiple experiments inside 
> mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment, 
> and the experiment name in the URL (aka ?exp=name) is not supported any more. So if 
> you have several experiments, you start several instances of mhttpd now on 
> different ports.

That i do with : 
mhttpd -p xx -e experiment_name -D

> 
> > that experiment name is not transfered to transitions as cm_transition never
> > specifies the experiment in the call to "transition STOP" etc.
> > the only flag it sends is a -d for debug if selected.
> 
> When connecting to an experiment, any midas client uses the ODB from that 
> experiment so lives in that "namespace". So one client can never call any client 
> from another experiment. So your problem must be something else. Of course there is 
> not parameter "experiment" passed to cm_transition() since the experiment is 
> implicitly defined by the ODB mhttpd is attached to.

Will have to look else where.

> 
> > The result is that the stop and start button of the webinterface does not work,
> > and transitions sit endlessly doing nothing but consuming all the processor,
> > odbedit works fine though.
> 
> I guess you have to do some debugging there. Note that "detached" transitions have 
> been implemented recently by Konstantin, so maybe your problem is related to that. 
> In this case Konstantin should check what's wrong.

cm_transition does a "system(str)" on line 3243 inside the "if(async_flag == DETACH)" of
line 3219, how does an external program know about the state of the originating mhttpd
process ? Surely that str which executes "mtransition ......." should get a -e
specifying the experiment explicitly ? probably a -h as well to be thorough.
The only other way that mtransition.cxx will be able to pull in the experimentname is
from the environment variable in its call to cm_get_environment(....) on its startup.


Ok after some testing .... 
If i start the mhttpd with the environment variable MIDAS_EXPT_NAME set then its happy
as mtransition inherits the environment of mhttpd so cm_get_environment(...) of
mtransition picks up the experiment. Similarly if i insert "-e experimentname" into the
string "str" that is passed in system(str) of line 3243. Then start and stop buttons work. 

Konstantin any comments.

I suppose i can live with starting mhttpd with the environment set before running, but
that kind of negates the command line argument to mhttpd. 

Thanks for the help

Sean
  593   05 Jun 2009 Konstantin OlchanskiBug Reportmhttpd command line experiment specifying
> I guess you have to do some debugging there. Note that "detached" transitions have 
> been implemented recently by Konstantin, so maybe your problem is related to that. 
> In this case Konstantin should check what's wrong.

Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
mtransition can only connect to the "default" experiment. Will fix. K.O.
  596   18 Jun 2009 Konstantin OlchanskiBug Reportmhttpd command line experiment specifying
> > I guess you have to do some debugging there. Note that "detached" transitions have 
> > been implemented recently by Konstantin, so maybe your problem is related to that. 
> > In this case Konstantin should check what's wrong.
> 
> Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
> mtransition can only connect to the "default" experiment. Will fix. K.O.

Fixed midas.c svn rev 4506: in cm_transition(), always pass "-e expt" to mtransition, if connected remotely, pass the
"-h host:port".

svn rev 4506
K.O.
  376   21 May 2007 Konstantin OlchanskiInfomhttpd changes to use /History/Tags data
I am slowly commiting the changes to the history code. This installement adds
code to mhttpd to use the /History/Tags data (to be) generated by the mlogger.

In the nutshell, the logger fills /History/Tags to "remember" what events,
variables and tags exist in the history files.

This replaces the old code that attempts to guess the contents of history files
by looking at /Equipment tree.

To ease the transition to the new system, I am leaving all the old code alive
and active in the absense of "/History/Tags" entries.

As soon as one starts using the new mlogger (to be commited), the new tags based
mhttpd code will activate itself.

K.O.
  2364   23 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
the mhttpd bug should be fixed now (branch feature/buffer_mutex).

simplest way to reproduce:

wget http://localhost:8080/
quickly ctrl-C it
wget http://localhost:8080/
inside mhttpd (by hook or crook) observe that the second wget got the data meant for the first wget.

if you cannot ctrl-C the first wget quickly enough, put a sleep somewhere in the worker thread (in 
mongoose_write(), I think).

this is what happens.

1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
(corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)

2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
nc pointer is no longer stale, but points to 2nd wget's connection.

so we think we are clever and we check the socket file descriptors. but same thing
happens there, too. if 1st wget was file descriptor 7, it is closed, (1st wget worker now has
a stale file handle), then reopened for the 2nd wget, per POSIX, we get back the same
file descriptor 7. 1st wget worker now has the file handle for the 2nd wget tcp socket and
the famous test/crash for "sending data to wrong socket" is defeated.

now, worker thread for the 1st wget wants to send a reply, it has a valid nc pointer (points to 2nd wget's
mg_connection object) and a valid file descriptor (points to 2nd wget's tcp socket),
reply meant for the 1st wget is successfully sent to the 2nd wget, 2nd wget finishes, it's socket
is closed, mg_connection object is free'ed. Now the worker thread for the 2nd wget has stale
connection info, but this is okey, mongoose does not find a matching connection, 2nd wget
worked thread reply goes nowhere, thread finishes silently (no memory leaks here, I checked).

so, connection for 2nd wget completely impersonates the closed connection of 1st wget (I guess I could
check the full socket address info, remote ip address, remote port number, etc, but...)

in practice, this bug does not happen often because modern browsers tend to keep tcp sockets open
for very long time. (not sure about sundry web proxies, etc).

solution of course is very simple. match worker thread data to mongoose mg_connection objects
using our own connection sequential number, which are unique and very easy to keep track
of through the mongoose event handler. all this mess runs in the main thread,
so no locking trouble here, small blessing.

K.O.
  2366   24 Mar 2022 Stefan RittBug Fixmhttpd bug fixed
> 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> 
> 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> nc pointer is no longer stale, but points to 2nd wget's connection.

Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
mistakenly re-used by the next thread.

Stefan
  2368   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> > 1st wget stops (by ctrl-C), socket is closed, mongoose frees it's mg_connection object
> > (corresponding worker is still labouring, hmm... actually sleeping, and now has a stale nc pointer)
> > 
> > 2nd wget starts, new socket is opened, mongoose allocates a new mg_connection object,
> > but malloc() gives it back the same memory we just freed(), and the 1st wget's worker thread
> > nc pointer is no longer stale, but points to 2nd wget's connection.
> 
> Why don't we CLEAR the memory (memset(object,0,sizeof(object)) before the free(), this way it cannot be 
> mistakenly re-used by the next thread.
> 

My description was unclear. I will try better now.

When http replies are generated by worker threads, matching of reply to mg_connection is done
by checking the address of the mg_connection object. (mongoose itself unhelpfully offers
to send the reply to every mg_connection, see the responder to mg_broadcast() messages).

This works for open/active connections, addresses of all mg_connections are unique.

But if connection is closed and a new connection is opened, the address is reused (by malloc()/free()
reusing memory blocks or by mongoose using a pool of mg_connection objects, does not matter).

So matching http reply to mg_connection using only address of mg_connection can match the wrong connection.

(contents of mg_connection object does not matter, only address is used by matching. so memzero() of
mg_connection object does not help).

I saw this during my testing - wrong data was sent to wrong browser often enough - but did
not understand that the above problem is happening.

Because I was unable to reliably reproduce the problem, I could not debug it. I tried to add
a check for the tcp socket file descriptor number, in case there is a straight bug or multithread race
or simple memory corruption. This replaced "we sent wrong data to wrong browser, poisoned browser
cache, confused the user" with a crash. This "fix" seemed effective at the time.

Maybe I should mention browser cache poisoning again. What happened is html pages and rpc replies
were returned as responses to load things like CSS files, these bad responses are cached by the browser
pretty much forever, so all subsequent midas pages will look wrong (bad css!) forever, until
user manually clears browser cache. reload of page did not help, restart of browser did not help (I think).

So a very bad bug.

Unfortunately, the check for file descriptor was not effective because file descriptors are also
reused. And I did see wrong data returned by mhttpd, but even more rarely. And everybody (myself
included) complained about mhttpd crashes.

Now, matching of responses to connections is done by connection sequential/serial number,
which is unique 32-bit counter. Mismatch of reply to connection should not happen again.

P.S. Latest version of the mongoose web server library does not help with this problem,
the example code for matching reply to connection in their multithread example looks bogus:
https://github.com/cesanta/mongoose/blob/master/examples/multi-threaded/main.c

K.O.
  2371   24 Mar 2022 Stefan RittBug Fixmhttpd bug fixed
I see, now I understand.

As for the browser cache problem: This Chrome extension is your friend: 

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn?hl=en

I use it all the time I change the CSS or a JS file. Having the "Developer Tools" open in Chrome helps as well 
(cache is then turned off). Firefox has similar extensions.

Stefan
  2374   24 Mar 2022 Konstantin OlchanskiBug Fixmhttpd bug fixed
> As for the browser cache problem: This Chrome extension is your friend ...

for google chrome, it is easy, open the javascript debugger (left-click "inspect"),
the reload button becomes a left-click menu, one left-click option is "clear cache and reload".
(there is no button for "clear cookies and reload", re recent elog cookie problem).

but this does not help me personally any. if midas web pages get confused, I will also get confused, too,
and I will spend hours debugging mhttpd before thinking "hmm... maybe I should clear the browser cache!"

not sure about firefox, safari, microsoft edge and opera. if I ever need it, I google it.

K.O.
  2079   25 Jan 2021 Thomas LindnerSuggestionmhttpd browser caching
I have a more subtle point about the new ODB key for using an external elog I mentioned in [1].  I was very confused after changing the ODB "External Elog" because mhttpd still wasn't using my external elog URL.  I started trying to debug mhttpd.cxx, but found a lot of bits of mhttpd didn't seem to be getting called.  I eventually realized that my browser had been caching the responses for some (though not all) of the MIDAS navigation buttons.  Clearing my browser cache fixed the problem and allowed me to use the MIDAS button to the external ELOG.  This caching happens on my macbook for both Firefox 84.0.2 and Safari 13.1.

Many of the requests to mhttpd end up going to send_fp(), where we explicitly set the cache time to 24 hours.

   // send HTTP cache control headers                                                                                               
   time_t now = time(NULL);
   now += (int) (3600 * 24);
   struct tm* gmt = gmtime(&now);
   const char* format = "%A, %d-%b-%y %H:%M:%S GMT";
   char str[256];
   strftime(str, sizeof(str), format, gmt);
   r->rsprintf("Expires: %s\r\n", str);

Some other MIDAS buttons don't seem to be cached by the browser; for instance the response for the 'OldHistory' button doesn't get cached.

Should we remove the cache instruction for at least some of the buttons?  At least for the elog button where we want the link direction to get switched by an ODB key the caching seems a bad idea.

[1] https://midas.triumf.ca/elog/Midas/2078
  2080   25 Jan 2021 Stefan RittSuggestionmhttpd browser caching
Let me first explain a bit why caching is there. Once we had the case that someone from 
TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 

When we looked at it, we found that the custom page pulled about 100 items with individual
HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
the custom page communication so that many ODB entries could be retrieved in one operation,
which improved the loading time from 100s to about 2s.

With the buttons we will have to make the same compromise. If we do not cache anything,
loading the midas status page over the Pacific takes many seconds. If we cache all, any
change on the midas side will not be reflected on the web page. So there is a compromise
to be made. I thought I designed it such that the side menu is cached locally, but when
the user presses "reload", then the full menu is fetched from the server. Of course one
has to remember this, so changing the ELOG URL or other things on the menu require a
reload (or wait a certain time for the cache to expire). So try again if that's working
for you. If not, I can visit it again and check if there is any bug.

If we go the route to disable the cache, better try this to T2K and see what you get before
we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
load times.

Best,
Stefan
  2081   25 Jan 2021 Thomas LindnerSuggestionmhttpd browser caching
I tried reloading the pages.  If I reloaded the actual elog page 

https://server.triumf.ca/?cmd=Elog

then it bypassed the cache and got the correct updated page from mhttpd.

However, if when I reloaded the status page

https://server.triumf.ca/?cmd=Status

and then clicked the Elog button then I just got the cached (old) page.  Admittedly reloading the status page doesn't make so much sense (once I thought about it), but it is what I tried first (I'm good at modelling unexpected user behaviour); so there is some risk that the user will try reloading the wrong page and will be stuck not getting the external elog page (until 24 hours runs out).

Anyway, I will update the documentation to note that you need to reload the elog page after changing this variable.  That's probably an adequate solution.

I certainly don't suggest getting rid of caching entirely.  I was trying to think whether there was a set of pages where it would make sense to disable the cache (like the elog page).  But maybe that will just cause more problems.


> Let me first explain a bit why caching is there. Once we had the case that someone from 
> TRIUMF opened a midas custom page at T2K. It took about one minute (!) to load the page. 
> 
> When we looked at it, we found that the custom page pulled about 100 items with individual
> HTTP requests from Japan, each taking about one second for the roundtrip. Then we redesigned
> the custom page communication so that many ODB entries could be retrieved in one operation,
> which improved the loading time from 100s to about 2s.
> 
> With the buttons we will have to make the same compromise. If we do not cache anything,
> loading the midas status page over the Pacific takes many seconds. If we cache all, any
> change on the midas side will not be reflected on the web page. So there is a compromise
> to be made. I thought I designed it such that the side menu is cached locally, but when
> the user presses "reload", then the full menu is fetched from the server. Of course one
> has to remember this, so changing the ELOG URL or other things on the menu require a
> reload (or wait a certain time for the cache to expire). So try again if that's working
> for you. If not, I can visit it again and check if there is any bug.
> 
> If we go the route to disable the cache, better try this to T2K and see what you get before
> we commit ourselves to that. Last time TRIUMF people were complaining a lot about long
> load times.
> 
> Best,
> Stefan
  2085   08 Feb 2021 Konstantin OlchanskiSuggestionmhttpd browser caching
>    r->rsprintf("Expires: %s\r\n", str);

The best I can tell, none of this works in current browsers. with google-chrome,
I see it cache pretty much everything regardless of "expires", "no cache", etc
and anything else I tried.

Things like shift-<reload>, etc used to work to refresh the cache, but not any more.

So, I too, see confusing side-effects of caching, where I change something in ODB,
but "nothing happens". Then I scratch my head for 30 minutes until I remember
to open the javascript debugger where shift-<reload> (or is it ctrl-<reload>) actually works.

It seems that the only reliable way to bypass the browser cache is to add
a tag with a random number to the URL ("&ts=currenttime").

This is for HTTP GET requests. HTTP POST does not seem to be cached, so I do not worry
about this nonsense for json-rpc requests.

Perhaps we should do this random number trick for all user actions. User can
press buttons only so fast, we should be able to sustain the rate. Anything
loaded automatically or from a timer, we should allow caching.

BTW, things like midas.js are also cached, and it is common to see problems
after updating midas, where status.html is newly loaded, but midas.js is an old
stale version from cache.

Messy.

K.O.
  2086   08 Feb 2021 Stefan RittSuggestionmhttpd browser caching
> It seems that the only reliable way to bypass the browser cache is to add
> a tag with a random number to the URL ("&ts=currenttime").

Indeed that's the only reliable way to avoid caching across browsers. An alternative is

("&r=" + Math.random())

to add a random number.


> BTW, things like midas.js are also cached, and it is common to see problems
> after updating midas, where status.html is newly loaded, but midas.js is an old
> stale version from cache.

Reloading JavaScript file NOT from the cache is really tricky these days. I added a
special Google Chrome extension to clear my browser cache, which works reliably:

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn

Stefan
  2163   12 May 2021 Mathieu GuigueBug Reportmhttpd WebServer ODBTree initialization
Hi,

Using midas version 12-2020,  I am trying to run mhttpd from within a docker container using docker-compose.
Starting from an empty ODB, I simply run `mhttpd` and this is the output I have:
midas_hatfe_1  | <Warning> Starting mhttpd...
midas_hatfe_1  | [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
midas_hatfe_1  | MVOdb::SetMidasStatus: Error: MIDAS db_find_key() at ODB path "/WebServer/Host list" returned status 312
midas_hatfe_1  | Mongoose web server will not use password protection
midas_hatfe_1  | Mongoose web server will not use the hostlist, connections from anywhere will be accepted
midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

According to the documentation, the WebServer tree should be created automatically when starting the mhttpd; but it seems not as it doesn't find the entry "/WebServer/Host list".
If I create it by end (using "create STRING /WebServer/Host list"), I still get the error message that mhttpd didn't bind properly to the local port 8080.
I am not sure what it wrong, as mhttpd is working perfectly well in this exact container for midas 03-2020.

Any idea what difference makes it not possible anymore to run into these container?

Thanks very much for your help.
Cheers
Mathieu
  2164   12 May 2021 Ben SmithBug Reportmhttpd WebServer ODBTree initialization
> midas_hatfe_1  | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
> midas_hatfe_1  | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"

It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
  2165   12 May 2021 Stefan RittBug Reportmhttpd WebServer ODBTree initialization
> It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.

We had this issue already several times. This info should be put into the documentation at a prominent location.

Stefan
ELOG V3.1.4-2e1708b5