ELOG Midas

Back Midas Rome Roody Rootana

Midas DAQ System, Page 119 of 152

Not logged in

Find | Login | Help

Full | Summary | Threaded | Hide attachments

3028 Entries

Goto page Previous 1, 2, 3 ... 118, 119, 120 ... 150, 151, 152 Next

I see, now I understand.

As for the browser cache problem: This Chrome extension is your friend: 

https://chrome.google.com/webstore/detail/clear-cache/cppjkneekbjaeellbfkmgnhonkkjfpdn?hl=en

I use it all the time I change the CSS or a JS file. Having the "Developer Tools" open in Chrome helps as well 
(cache is then turned off). Firefox has similar extensions.

Stefan

> As for the browser cache problem: This Chrome extension is your friend ...

for google chrome, it is easy, open the javascript debugger (left-click "inspect"),
the reload button becomes a left-click menu, one left-click option is "clear cache and reload".
(there is no button for "clear cookies and reload", re recent elog cookie problem).

but this does not help me personally any. if midas web pages get confused, I will also get confused, too,
and I will spend hours debugging mhttpd before thinking "hmm... maybe I should clear the browser cache!"

not sure about firefox, safari, microsoft edge and opera. if I ever need it, I google it.

K.O.

mhttpd changes to use /History/Tags data

I am slowly commiting the changes to the history code. This installement adds
code to mhttpd to use the /History/Tags data (to be) generated by the mlogger.

In the nutshell, the logger fills /History/Tags to "remember" what events,
variables and tags exist in the history files.

This replaces the old code that attempts to guess the contents of history files
by looking at /Equipment tree.

To ease the transition to the new system, I am leaving all the old code alive
and active in the absense of "/History/Tags" entries.

As soon as one starts using the new mlogger (to be commited), the new tags based
mhttpd code will activate itself.

K.O.

mhttpd command line experiment specifying

Hi

Not sure how the rest of you specify mhttpd to work with multiple experiments on
one machine, but it would seem not the same as me ;-)

when executing mhttpd with 

mhttpd -e "experimentname" -p "experimentport" -D 

that experiment name is not transfered to transitions as cm_transition never
specifies the experiment in the call to "transition STOP" etc.
the only flag it sends is a -d for debug if selected.

The result is that the stop and start button of the webinterface does not work,
and transitions sit endlessly doing nothing but consuming all the processor,
odbedit works fine though.

Does everyone else use an apache reverse proxy and or explicit experiment choice
in the url ?

As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
should be a -e. line 13378.


Thanks
Sean

mhttpd command line experiment specifying

> Not sure how the rest of you specify mhttpd to work with multiple experiments on
> one machine, but it would seem not the same as me ;-)

Please note that there has been a change concerning multiple experiments inside 
mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment, 
and the experiment name in the URL (aka ?exp=name) is not supported any more. So if 
you have several experiments, you start several instances of mhttpd now on 
different ports.

> that experiment name is not transfered to transitions as cm_transition never
> specifies the experiment in the call to "transition STOP" etc.
> the only flag it sends is a -d for debug if selected.

When connecting to an experiment, any midas client uses the ODB from that 
experiment so lives in that "namespace". So one client can never call any client 
from another experiment. So your problem must be something else. Of course there is 
not parameter "experiment" passed to cm_transition() since the experiment is 
implicitly defined by the ODB mhttpd is attached to.

> The result is that the stop and start button of the webinterface does not work,
> and transitions sit endlessly doing nothing but consuming all the processor,
> odbedit works fine though.

I guess you have to do some debugging there. Note that "detached" transitions have 
been implemented recently by Konstantin, so maybe your problem is related to that. 
In this case Konstantin should check what's wrong.

> Does everyone else use an apache reverse proxy and or explicit experiment choice
> in the url ?

I use a

ProxyPass /megon/ http://megon.psi.ch/

on our public web server to make an online machine accessible from outside the 
firewall, but just with a single experiment.

> As an aside in mhttpd.c in the reply to -? it states 2 -h options the second
> should be a -e. line 13378.

Fixed in revision 4504.

mhttpd command line experiment specifying

Hi

> > Not sure how the rest of you specify mhttpd to work with multiple experiments on
> > one machine, but it would seem not the same as me ;-)
> 
> Please note that there has been a change concerning multiple experiments inside 
> mhttpd. From revision 4346 on, mhttpd can only connect to one single experiment, 
> and the experiment name in the URL (aka ?exp=name) is not supported any more. So if 
> you have several experiments, you start several instances of mhttpd now on 
> different ports.

That i do with : 
mhttpd -p xx -e experiment_name -D

> 
> > that experiment name is not transfered to transitions as cm_transition never
> > specifies the experiment in the call to "transition STOP" etc.
> > the only flag it sends is a -d for debug if selected.
> 
> When connecting to an experiment, any midas client uses the ODB from that 
> experiment so lives in that "namespace". So one client can never call any client 
> from another experiment. So your problem must be something else. Of course there is 
> not parameter "experiment" passed to cm_transition() since the experiment is 
> implicitly defined by the ODB mhttpd is attached to.

Will have to look else where.

> 
> > The result is that the stop and start button of the webinterface does not work,
> > and transitions sit endlessly doing nothing but consuming all the processor,
> > odbedit works fine though.
> 
> I guess you have to do some debugging there. Note that "detached" transitions have 
> been implemented recently by Konstantin, so maybe your problem is related to that. 
> In this case Konstantin should check what's wrong.

cm_transition does a "system(str)" on line 3243 inside the "if(async_flag == DETACH)" of
line 3219, how does an external program know about the state of the originating mhttpd
process ? Surely that str which executes "mtransition ......." should get a -e
specifying the experiment explicitly ? probably a -h as well to be thorough.
The only other way that mtransition.cxx will be able to pull in the experimentname is
from the environment variable in its call to cm_get_environment(....) on its startup.


Ok after some testing .... 
If i start the mhttpd with the environment variable MIDAS_EXPT_NAME set then its happy
as mtransition inherits the environment of mhttpd so cm_get_environment(...) of
mtransition picks up the experiment. Similarly if i insert "-e experimentname" into the
string "str" that is passed in system(str) of line 3243. Then start and stop buttons work. 

Konstantin any comments.

I suppose i can live with starting mhttpd with the environment set before running, but
that kind of negates the command line argument to mhttpd. 

Thanks for the help

Sean

mhttpd command line experiment specifying

> I guess you have to do some debugging there. Note that "detached" transitions have 
> been implemented recently by Konstantin, so maybe your problem is related to that. 
> In this case Konstantin should check what's wrong.

Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
mtransition can only connect to the "default" experiment. Will fix. K.O.

mhttpd command line experiment specifying

> > I guess you have to do some debugging there. Note that "detached" transitions have 
> > been implemented recently by Konstantin, so maybe your problem is related to that. 
> > In this case Konstantin should check what's wrong.
> 
> Yes, I think there is a problem - cm_transition() starts the mtransition helper without the "-h expt" switch, so 
> mtransition can only connect to the "default" experiment. Will fix. K.O.

Fixed midas.c svn rev 4506: in cm_transition(), always pass "-e expt" to mtransition, if connected remotely, pass the
"-h host:port".

svn rev 4506
K.O.

midas version used: midas-2019-05-cxx-1461-g906be8b

I find in the systemd log every couple of days/weeks the following error message related to the mhttpd:

[mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!

with various socket numbers of course.

Can anybody hint me what is going wrong here?

The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

Help would be very much appreciated!

Andreas

> [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.

This is my code. I am the culprit. I had a bit of discussion about this with Stefan.

Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.

I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.

This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.

Back to your troubles.

1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.

2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.

3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).

4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)

In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).

K.O.

Dear Konstantin,

thanks for the prompt response, this helps a lot!

> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.

I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.

> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
> 
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
> 

I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?

Andreas

To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI 
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive. 

To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into

/etc/monit.d/mhttpd

You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put 

set daemon 10

into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.

Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is 
multi-threaded, but I haven't tested this in detail.

Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).

I hope this helps some of you.

Stefan

Attachment 1: mhttpd

check process mhttpd matching "mhttpd"
  start program = "/bin/su -l meg -c '/<path-to-midas>/bin/mhttpd -D'"
  stop program = "/usr/bin/killall mhttpd"
  if failed
    host 127.0.0.1
    port 8082 
    protocol http
    method GET
    request "/midas.css"
    with timeout 30 seconds 
  then restart

Attachment 2: Screenshot_2021-09-17_at_21.11.15_.png

135

11 Aug 2003

Konstantin Olchanski

mhttpd crash on corrupted ODB /RunInfo

Invalid values of ODB /RunInfo/State cause mhttpd crash in
show_status_page() because of an out of bounds access to the array of state
names. Suggest this fix: remove array of state names, use existing ladder of
if/else statements to explicitely set state name. Verified the fix works for
TWIST. Will commit this into MIDAS CVS unless get feedback.

src/mhttpd.c:show_status_page() {
  ...
  rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);

  if (runinfo.state == STATE_STOPPED)
    rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
  else if (runinfo.state == STATE_PAUSED)
    rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
  else if (runinfo.state == STATE_RUNNING)
    rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
  else
    rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");

  if (runinfo.requested_transition)
  ...

K.O.

136

10 Oct 2003

Konstantin Olchanski

mhttpd crash on corrupted ODB /RunInfo

There was no feedback. This code has been commited. K.O.

> Invalid values of ODB /RunInfo/State cause mhttpd crash in
> show_status_page() because of an out of bounds access to the array of state
> names. Suggest this fix: remove array of state names, use existing ladder of
> if/else statements to explicitely set state name. Verified the fix works for
> TWIST. Will commit this into MIDAS CVS unless get feedback.
> 
> src/mhttpd.c:show_status_page() {
>   ...
>   rsprintf("<tr align=center><td>Run #%d", runinfo.run_number);
> 
>   if (runinfo.state == STATE_STOPPED)
>     rsprintf("<td colspan=1 bgcolor=#FF0000>Stopped");
>   else if (runinfo.state == STATE_PAUSED)
>     rsprintf("<td colspan=1 bgcolor=#FFFF00>Paused");
>   else if (runinfo.state == STATE_RUNNING)
>     rsprintf("<td colspan=1 bgcolor=#00FF00>Running");
>   else
>     rsprintf("<td colspan=1 bgcolor=#FFFFFF>Unknown");
> 
>   if (runinfo.requested_transition)
>   ...
> 
> K.O.

mhttpd crashes when including nonexistent script in msequencer

Hi,
the subject line describes the project already
Suppose you have a file foo.msl. Somewhere in the file, you have the line 
INCLUDE bar.msl

Once you click save in the sequencer page, mhttpd crashes:
$ mhttpd
free(): double free detected in tcache 2
[1]    27590 abort (core dumped)  mhttpd


GDB helps shed some light on the problem:

#0  0x00007ffff76b057f in raise () from /lib64/libc.so.6
#1  0x00007ffff769a895 in abort () from /lib64/libc.so.6
#2  0x00007ffff76f39d7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff76fa2ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffff76fbdf5 in _int_free () from /lib64/libc.so.6
#5  0x00000000004b8b41 in mxml_parse_entity (buf=buf@entry=0x7fffffffc2c8,
file_name=file_name@entry=0x7fffffffc710
"/home/luk/packages/mutrig_daq/online/foo.xml",
            error=error@entry=0x7fffffffcd24 "XML read error in file
\"/home/luk/packages/mutrig_daq/online/foo.xml\", line 2: bar.msl.xml is
missing", error_size=error_size@entry=256,
                error_line=error_line@entry=0x7fffffffce24) at ../mxml/mxml.c:1996
#6  0x00000000004b966d in mxml_parse_file
(file_name=file_name@entry=0x7fffffffc710
"/home/luk/packages/mutrig_daq/online/foo.xml",
            error=error@entry=0x7fffffffcd24 "XML read error in file
\"/home/luk/packages/mutrig_daq/online/foo.xml\", line 2: bar.msl.xml is
missing", error_size=error_size@entry=256,
                error_line=error_line@entry=0x7fffffffce24) at ../mxml/mxml.c:2041
#7  0x000000000041d9c2 in init_sequencer () at src/mhttpd.cxx:14321
#8  0x000000000040c2b6 in main (argc=<optimized out>, argv=<optimized out>) at
src/mhttpd.cxx:18028

Cheers
Lukas

P. S. This problem reminds me of the old joke: A man goes to his doctor and says
"Doc, it hurts when I do this" to which the doctor replies "Then don't do that".
However, I think, mhttpd should not crash even if you're not supposed to include
non-existent scripts in msequencer.

mhttpd crashes when including nonexistent script in msequencer

The bug has been fixed. It was actually in the mxml library. So you have to go to the midas/mxml 
subdirectory and update that one via "git pull origin master".

Stefan

> Hi,
> the subject line describes the project already
> Suppose you have a file foo.msl. Somewhere in the file, you have the line 
> INCLUDE bar.msl
> 
> Once you click save in the sequencer page, mhttpd crashes:
> $ mhttpd
> free(): double free detected in tcache 2
> [1]    27590 abort (core dumped)  mhttpd
> 
> 
> GDB helps shed some light on the problem:
> 
> #0  0x00007ffff76b057f in raise () from /lib64/libc.so.6
> #1  0x00007ffff769a895 in abort () from /lib64/libc.so.6
> #2  0x00007ffff76f39d7 in __libc_message () from /lib64/libc.so.6
> #3  0x00007ffff76fa2ec in malloc_printerr () from /lib64/libc.so.6
> #4  0x00007ffff76fbdf5 in _int_free () from /lib64/libc.so.6
> #5  0x00000000004b8b41 in mxml_parse_entity (buf=buf@entry=0x7fffffffc2c8,
> file_name=file_name@entry=0x7fffffffc710
> "/home/luk/packages/mutrig_daq/online/foo.xml",
>             error=error@entry=0x7fffffffcd24 "XML read error in file
> \"/home/luk/packages/mutrig_daq/online/foo.xml\", line 2: bar.msl.xml is
> missing", error_size=error_size@entry=256,
>                 error_line=error_line@entry=0x7fffffffce24) at ../mxml/mxml.c:1996
> #6  0x00000000004b966d in mxml_parse_file
> (file_name=file_name@entry=0x7fffffffc710
> "/home/luk/packages/mutrig_daq/online/foo.xml",
>             error=error@entry=0x7fffffffcd24 "XML read error in file
> \"/home/luk/packages/mutrig_daq/online/foo.xml\", line 2: bar.msl.xml is
> missing", error_size=error_size@entry=256,
>                 error_line=error_line@entry=0x7fffffffce24) at ../mxml/mxml.c:2041
> #7  0x000000000041d9c2 in init_sequencer () at src/mhttpd.cxx:14321
> #8  0x000000000040c2b6 in main (argc=<optimized out>, argv=<optimized out>) at
> src/mhttpd.cxx:18028
> 
> Cheers
> Lukas
> 
> P. S. This problem reminds me of the old joke: A man goes to his doctor and says
> "Doc, it hurts when I do this" to which the doctor replies "Then don't do that".
> However, I think, mhttpd should not crash even if you're not supposed to include
> non-existent scripts in msequencer.

mhttpd elog corruption via double-edit

Aparently the mhttpd elog will corrupt the elog files if two (or more?) elog entries are being edited at the 
same time. K.O.

mhttpd elog corruption via double-edit

K.O. wrote:

Aparently the mhttpd elog will corrupt the elog files if two (or more\?) elog entries are being edited at the same time. K.O.

That's strange. Since mhttpd is single threaded, there should not be any multi-thread/process conflict there, since the elog files cannot be written simultaneously from two different browser sessions. If entries are edited at the same time, they get then submitted one after the other. Of course it is possible to edit the same entry, in which case the second submission "wins", overwriting the first one without notification. Withing the standalone elog server there is the option to lock entries ("use lock = 1") to prevent this, but this feature is not present in the mhttpd elog.

mhttpd elog corruption via double-edit

[quote="Stefan Ritt"][Quote="K.O.]Aparently the mhttpd elog will corrupt the
elog files if two (or more\?) elog entries are being edited at the same time.
K.O.[/quote]

The corruption is very simple. mhttpd elog indexes the elog entries by the elog
file and offset inside the file, i.e. "http://ladd00:8088/EL/060927.318",
"060927" corresponds to log file "060927.log", "318" is the offset inside the
file where the message is located.

During "edit", the code "remembers" the offset of the original message and in
el_submit() blindly writes the edited message into the file at the remembered
offset.

If another message was edited before the edit of the first message is submitted,
the remembered offset becomes invalid (messages have shifted inside the file)
and el_submit() writes the edited text into the wrong place in the file,
corrupting it.

I have now added a check for this and we crash instead of corrupting the elog
file (midas.c rev 3340).

I do not know how to "properly" fix this bug without changing the indexing
scheme to something similar to what is used by elogd- message numbers instead of
file indices. In the existing scheme, message editing also breaks URLs shown in
the email notifications (they contain file indices that point to the wrong
places after messages are moved around by editing) and "reply threading" links.

Here is how I reproduce this bug:

1) start with an empty elog
2) create two messages
3) "edit" the second message, but do not submit it yet.
4) "edit" the first message, change the text to make sure the message size
becomes different; submit this change.
5) submit the "edit" of the first message. !!BOOM!!

K.O.

mhttpd elog corruption via double-edit

> I do not know how to "properly" fix this bug without changing the indexing
> scheme to something similar to what is used by elogd- message numbers instead of
> file indices. In the existing scheme, message editing also breaks URLs shown in
> the email notifications (they contain file indices that point to the wrong
> places after messages are moved around by editing) and "reply threading" links.

Well, the development of elogd with it's message numbers was actually stimulated by
the problem you mentioned. After that all those problems went away. Another
incarnation of that problem is if you edit an mhttpd log file manually. Afterwards
the file offsets are different and the system gets corrupted. To fix this properly,
one would have to backport the el_xxx functions from elogd to mhttpd, or, even
simpler, remove the elog functionality in mhttpd and "force" everybody to use elogd
(after doing elconv to convert the files into the new format).

Goto page Previous 1, 2, 3 ... 118, 119, 120 ... 150, 151, 152 Next

ELOG V3.1.4-2e1708b5