ID |
Date |
Author |
Topic |
Subject |
2274
|
06 Sep 2021 |
Konstantin Olchanski | Forum | mhttpd crash | > [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.
This is my code. I am the culprit. I had a bit of discussion about this with Stefan.
Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.
I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.
This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.
Back to your troubles.
1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.
2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.
3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).
4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)
In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).
K.O. |
2275
|
07 Sep 2021 |
Andreas Suter | Forum | mhttpd crash | Dear Konstantin,
thanks for the prompt response, this helps a lot!
> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.
I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.
> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
>
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
>
I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?
Andreas |
2276
|
17 Sep 2021 |
Stefan Ritt | Forum | mhttpd crash | To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive.
To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into
/etc/monit.d/mhttpd
You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put
set daemon 10
into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.
Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is
multi-threaded, but I haven't tested this in detail.
Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).
I hope this helps some of you.
Stefan |
2282
|
30 Sep 2021 |
Francesco Renga | Forum | OPC client within MIDAS | Dear all,
I need to integrate in my MIDAS project the communication with an OPC UA
server. My plan is to develop an OPC UA client as a "device" in
midas/drivers/device.
Two questions:
1) Is anybody aware of some similar effort for some other project, so that I can
get some example?
2) What could be the more appropriate driver's class to be used? generic.cxx?
multi.cxx?
Thank you for your help,
Francesco |
2284
|
11 Oct 2021 |
Konstantin Olchanski | Forum | test | test, no email. K.O. |
2285
|
11 Oct 2021 |
Konstantin Olchanski | Forum | test | > test, no email. K.O.
test reply, no email. K.O. |
2286
|
11 Oct 2021 |
Konstantin Olchanski | Forum | test | > > test, no email. K.O.
>
> test reply, no email. K.O.
test attachment, no email. K.O. |
2287
|
11 Oct 2021 |
Konstantin Olchanski | Forum | test | > > > test, no email. K.O.
> >
> > test reply, no email. K.O.
>
> test attachment, no email. K.O.
test email. K.O. |
2288
|
11 Oct 2021 |
Konstantin Olchanski | Forum | midas forum updated, moved | The midas forum software (elogd) was updated to latest version and moved from our old server
(ladd00.triumf.ca) to our new server (daq00.triumf.ca).
The following URLs should work:
https://daq00.triumf.ca/elog-midas/Midas/ (new URL)
https://midas.triumf.ca/elog/Midas/ (old URL, redirects to daq00)
https://midas.triumf.ca/forum (link from midas wiki)
The configuration on the old server ladd00.triumf.ca is quite tangled between
several virtual hosts and several DNS CNAMEs. I think I got all the redirects
correct and all old URLs and links in old emails & etc still work.
If you see something wrong, please reply to this message here or email me directly.
K.O. |
2291
|
22 Oct 2021 |
Francesco Renga | Forum | mhttpd error | Dear all,
I am trying to make the MIDAS web server for my DAQ project accessible from other machines. In the ODB, I activated the necessary flags:
[local:CYGNUS_RD:S]/WebServer>ls
Enable localhost port y
localhost port 8080
localhost port passwords n
Enable insecure port y
insecure port 8081
insecure port passwords y
insecure port host list y
Enable https port y
https port 8443
https port passwords y
https port host list n
Host list
localhost
Enable IPv6 y
Proxy
mime.types
Following the instructions on the Wiki I enabled the SSL support. When running mhttpd, I get these messages:
Mongoose web server will use HTTP Digest authentication with realm "CYGNUS_RD" and password file "/home/cygno/DAQ/online/htpasswd.txt"
Mongoose web server will use the hostlist, connections will be accepted only from: localhost
Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
Mongoose web server listening on http address "[::1]:8080", passwords OFF, hostlist OFF
Mongoose web server listening on http address "8081", passwords enabled, hostlist enabled
[mhttpd,ERROR] [mhttpd.cxx:19166:mongoose_listen,ERROR] Cannot mg_bind address "[::0]:8081"
Mongoose web server will use https certificate file "/home/cygno/DAQ/online/ssl_cert.pem"
Mongoose web server listening on https address "8443", passwords enabled, hostlist OFF
[mhttpd,ERROR] [mhttpd.cxx:19166:mongoose_listen,ERROR] Cannot mg_bind address "[::0]:8443"
and the server is not accessible from other machines. Any suggestion to solve or better investigate this problem?
Thank you very much,
Francesco |
2292
|
22 Oct 2021 |
Stefan Ritt | Forum | mhttpd error | > Enable IPv6 y
Probably the IPv6 problem, see here elog:2269
I asked to turn off IPv6 by default, or at least mention this in the documentation,
but unfortunately nothing happened.
Stefan |
2293
|
25 Oct 2021 |
Francesco Renga | Forum | mhttpd error | It worked, thank you very much!
Francesco
> > Enable IPv6 y
>
> Probably the IPv6 problem, see here elog:2269
>
> I asked to turn off IPv6 by default, or at least mention this in the documentation,
> but unfortunately nothing happened.
>
> Stefan |
2294
|
25 Oct 2021 |
Francesco Renga | Forum | Logger crash | Hello,
I'm experiencing crashes of the mlogger program on the time scale of a couple
of days. The only messages from MIDAS are:
05:34:47.336 2021/10/24 [mhttpd,INFO] Client 'Logger' (PID 14281) on database
'ODB' removed by db_cleanup called by cm_periodic_tasks (idle 10.2s,TO 10s)
05:34:47.335 2021/10/24 [mhttpd,INFO] Client 'Logger' on buffer 'SYSMSG' removed
by cm_periodic_tasks (idle 10.2s, timeout 10s)
Any suggestion to further investigate this issue?
Thank you very much,
Francesco |
2295
|
25 Oct 2021 |
Stefan Ritt | Forum | Logger crash | The short term solution would be to increase the logger timeout in the ODB under
/Programs/Logger/Watchdog timeout
and set it to 6000 (one minute). But that is curing just the symptoms. It would be
interesting to understand the cause of this error. Probably the logger takes more than 10
seconds to start or stop the run. The reason could be that the history grow too big (what
we have right now in MEG II), or some disk problems. But that needs detailed debugging on
the logger side.
Stefan |
2299
|
09 Nov 2021 |
Francesco Renga | Forum | Issue in data writing speed | Dear all,
I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
explain this behavior? Is there any way to improve it?
Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
to improve it, it would be really appreciated.
Thank you very much,
Francesco
const char* pSrc = (const char*)bufframe.buf;
for(int y = 0; y < bufframe.height; y++ ){
//Copy one row
const unsigned short* pDst = (const unsigned short*)pSrc;
//go through the row
for(int x = 0; x < bufframe.width; x++ ){
WORD tmpData = *pDst++;
*pdata++ = tmpData;
}
pSrc += bufframe.rowbytes;
}
|
2300
|
09 Nov 2021 |
Hunter Lowe | Forum | MityCAMAC Login | Hello all,
I've recently acquired a MityCAMAC system that was built at TRIUMF and I'm
having issues accessing it over ethernet.
The system: Ubuntu VM inside Windows 10 machine.
I've tried reconfiguring the network settings for the VM but nmap and arp/ip
commands have yielded me no results in finding the crate controller.
I was getting help from Pierre Amaudruz but I think he is now busy for some
time. I have the mac address of the crate controller and its name. The
controller seems to initialize fine inside of the CAMAC crate. The windows side
of the workstation also tells me that an unknown network is in fact connected.
I suspect I either need to do something with an ssh key (which I thought we
accomplished but maybe not), or perhaps the domain name in the controller needs
to be changed.
If anybody has experience working with MityARM I would appreciate any advice I
could get.
Best,
Hunter Lowe
UNBC Graduate Physics |
2301
|
10 Nov 2021 |
Stefan Ritt | Forum | Issue in data writing speed | Midas uses various buffers (in the frontend, at the server side before the SYSTEM buffer, the SYSTEM buffer itself, on the
logger before writing to disk. All these buffers are in RAM and have fast access, so you can fill them pretty quickly. When
they are full, the logger writes to disk, which is slower. So I believe at 2 Hz your disk can keep up with your writing
speed, but at 4 Hz (2x8MBx4=32 MB/sec) your disk starts slowing down the writing process. Now 32MB/s is pretty slow for
a disk, so I presume you have turned compression on which takes quite some time.
To verify this, disable logging. The disable compression and keep logging. Then report back here again.
> Dear all,
> I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
> explain this behavior? Is there any way to improve it?
>
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
> to improve it, it would be really appreciated.
>
> Thank you very much,
> Francesco
>
>
>
> const char* pSrc = (const char*)bufframe.buf;
>
> for(int y = 0; y < bufframe.height; y++ ){
>
> //Copy one row
> const unsigned short* pDst = (const unsigned short*)pSrc;
>
> //go through the row
> for(int x = 0; x < bufframe.width; x++ ){
>
> WORD tmpData = *pDst++;
>
> *pdata++ = tmpData;
>
> }
>
> pSrc += bufframe.rowbytes;
>
> }
> |
2302
|
11 Nov 2021 |
Thomas Lindner | Forum | MityCAMAC Login | Hi Hunter
This sounds like a Triumf specific problem;
not a MIDAS problem. Please email me directly
and we can try to solve this problem.
Thomas Lindner
TRIUMF DAQ
Hello all,
>
> I've recently acquired a MityCAMAC system
that was built at TRIUMF and I'm
> having issues accessing it over ethernet.
>
> The system: Ubuntu VM inside Windows 10
machine.
>
> I've tried reconfiguring the network
settings for the VM but nmap and arp/ip
> commands have yielded me no results in
finding the crate controller.
>
> I was getting help from Pierre Amaudruz but
I think he is now busy for some
> time. I have the mac address of the crate
controller and its name. The
> controller seems to initialize fine inside
of the CAMAC crate. The windows side
> of the workstation also tells me that an
unknown network is in fact connected.
>
> I suspect I either need to do something with
an ssh key (which I thought we
> accomplished but maybe not), or perhaps the
domain name in the controller needs
> to be changed.
>
> If anybody has experience working with
MityARM I would appreciate any advice I
> could get.
>
> Best,
> Hunter Lowe
> UNBC Graduate Physics |
2303
|
19 Nov 2021 |
Jacob Thorne | Forum | Sequencer error with ODB Inc | Hi,
I am having problems with the midas sequencer, here is my code:
1 COMMENT "Example to move a Standa stage"
2 RUNDESCRIPTION "Example movement sequence - each run is one position of a single stage
3
4 PARAM numRuns
5 PARAM sequenceNumber
6 PARAM RunNum
7
8 PARAM positionT2
9 PARAM deltapositionT2
10
11 ODBSet "/Runinfo/Run number", $RunNum
12 ODBSet "/Runinfo/Sequence number", $sequenceNumber
13
14 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Type of Measurement", 2
15 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Time Bins", 10
16 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Sweeps", 1
17 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Dwell Time", 100000
18
19 ODBSet "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $positionT2
20
21 LOOP $numRuns
22 WAIT ODBvalue, "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Ready", ==, 1
23 TRANSITION START
24 WAIT ODBvalue, "/Equipment/Neutron Detector/Statistics/Events sent", >=, 1
25 WAIT ODBvalue, "/Runinfo/State", ==, 1
26 WAIT ODBvalue, "/Runinfo/Transition in progress", ==, 0
27 TRANSITION STOP
28 ODBInc "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $deltapositionT2
29
30 ENDLOOP
31
32 ODBSet "/Runinfo/Sequence number", 0
The issue comes with line 28, the ODBInc does not work, regardless of what number I put I get the following error:
[Sequencer,ERROR] [odb.cxx:7046:db_set_data_index1,ERROR] "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position" invalid element data size 32, expected 4
I don't see why this should happen, the format is correct and the number that I input is an int.
Sorry if this is a basic question.
Jacob |
2306
|
02 Dec 2021 |
Stefan Ritt | Forum | Sequencer error with ODB Inc | Thanks for reporting that bug. Indeed there was a problem in the sequencer code which I fixed now. Please try the updated develop branch.
Stefan |
|