09 Nov 2021, Francesco Renga, Forum, Issue in data writing speed
|
Dear all,
I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
explain this behavior? Is there any way to improve it?
Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
to improve it, it would be really appreciated.
Thank you very much,
Francesco
const char* pSrc = (const char*)bufframe.buf;
for(int y = 0; y < bufframe.height; y++ ){
//Copy one row
const unsigned short* pDst = (const unsigned short*)pSrc;
//go through the row
for(int x = 0; x < bufframe.width; x++ ){
WORD tmpData = *pDst++;
*pdata++ = tmpData;
}
pSrc += bufframe.rowbytes;
}
|
10 Nov 2021, Stefan Ritt, Forum, Issue in data writing speed
|
Midas uses various buffers (in the frontend, at the server side before the SYSTEM buffer, the SYSTEM buffer itself, on the
logger before writing to disk. All these buffers are in RAM and have fast access, so you can fill them pretty quickly. When
they are full, the logger writes to disk, which is slower. So I believe at 2 Hz your disk can keep up with your writing
speed, but at 4 Hz (2x8MBx4=32 MB/sec) your disk starts slowing down the writing process. Now 32MB/s is pretty slow for
a disk, so I presume you have turned compression on which takes quite some time.
To verify this, disable logging. The disable compression and keep logging. Then report back here again.
> Dear all,
> I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
> explain this behavior? Is there any way to improve it?
>
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
> to improve it, it would be really appreciated.
>
> Thank you very much,
> Francesco
>
>
>
> const char* pSrc = (const char*)bufframe.buf;
>
> for(int y = 0; y < bufframe.height; y++ ){
>
> //Copy one row
> const unsigned short* pDst = (const unsigned short*)pSrc;
>
> //go through the row
> for(int x = 0; x < bufframe.width; x++ ){
>
> WORD tmpData = *pDst++;
>
> *pdata++ = tmpData;
>
> }
>
> pSrc += bufframe.rowbytes;
>
> }
> |
26 Jan 2022, Konstantin Olchanski, Forum, Issue in data writing speed
|
Francesco, when you say "writing an event is slow", do you mean it in the frontend
or in the output data file?
Stefan is quite right about the data file, it can take seconds between generating
an event in the frontend and seeing it written to the data file. (if compression
buffers are too big, an event can sit there forever, until pushed out by next events
or by run stop).
But maybe you see this on the frontend side.
What you are looking at is "real time" performance of the frontend and of the linux kernel.
The mfe.c frontend has many problems with real time performance, it can stall and take a long
time between calls to read_event(), for many reasons.
There are ways around that, but it is simpler to switch to the tmfe c++ frontend
that was designed for good real time performance.
In the tmfe frontend, if you use the polled equipment and enable the poll thread,
your frontend will be limited only by the linux kernel real time performance (i.e.
on a single-core CPU, other programs will delay execution of your frontend
and you will see it as long delays (usec, millisec) between calls to your read_event().
Next limit to real time performance (common to mfe.c and tmfe frontends) is the writing
of event data to the midas shared event buffer. One has to lock the shared memory semaphore
and this has to wait until other users of the event buffer finish their reading
or writing and unlock it. Arbitrary amount of time (usec, millisec, sec) can pass.
(there is also problems with "fairness" of the linux semaphores, a different story, again).
Making things more interesting, midas event buffers implement a write cache (default size 100 kbytes),
events smaller than the cache are quickly accumulated (no need to lock the shared memory semaphore),
them flushed to shared memory when cache is full. This is done to reduce the number
of shared memory semaphore locks per event, in the case of very high rate of very small events.
Solution to all this is to use 2 threads: read the data from hardware in one thread and write the data to midas
in a different thread. Between the threads would be an event fifo (circular buffer in mfe.c,
std::deque<EVENT> in tmfe c++ frontends).
For remote connected frontends, things are a bit different. Event data is written directly
into the TCP socket and as long as socket buffers are big enough, there is no real-time delays,
unless SYSTEM buffer is very congested and mserver does not read the TCP socket quickly enough.
So depending on event size, data rate and tcp socket buffer size, the extra 2nd thread
may not be necessary and poll thread real time performance may be good enough.
I hope this clarifies the situation somewhat.
K.O.
> Dear all,
> I've a frontend writing a quite big bunch of data into a MIDAS bank (16bit output from a 4MP photo camera).
> I'm experiencing a writing speed problem that I don't understand. When the photo camera is triggered at a low rate (< 2 Hz)
> writing into the bank takes a very short time for each event (indeed, what I measure is the time to write and go back
> into the polling function). If I increase the rate to 4 Hz, I see that writing the first two events takes a sort time,
> but the third event takes a very long time (hundreds of ms), then again the fourth and fifth events are very fast, and
> the sixth is very slow. If I further increase the rate, every other event is very slow. The problem is not in the readout
> of the camera, because if I just remove the bank writing and keep the camera readout, the problem disappears. Can you
> explain this behavior? Is there any way to improve it?
>
> Below you can also find the code I use to copy the data from the camera buffer into the bank. If you have any suggestion
> to improve it, it would be really appreciated.
>
> Thank you very much,
> Francesco
>
>
>
> const char* pSrc = (const char*)bufframe.buf;
>
> for(int y = 0; y < bufframe.height; y++ ){
>
> //Copy one row
> const unsigned short* pDst = (const unsigned short*)pSrc;
>
> //go through the row
> for(int x = 0; x < bufframe.width; x++ ){
>
> WORD tmpData = *pDst++;
>
> *pdata++ = tmpData;
>
> }
>
> pSrc += bufframe.rowbytes;
>
> }
> |
26 Jan 2022, Konstantin Olchanski, Forum, Issue in data writing speed
|
> Francesco, when you say "writing an event is slow", do you mean it in the frontend
> or in the output data file?
Another explanation just occurred to me. We do not know your event size and we do not
know the size of your SYSTEM buffer. But if you have an unlucky combination,
this can happen:
Consider event size is 6 Mbytes, buffer size is 8 Mbytes, enough space for only 1 event.
First event is written quickly (buffer is empty).
Second event will be delayed, there is not enough free space in the buffer, we have
to wait for mlogger to finish reading the first event.
Same thing happens if event size is 3 Mbytes, the first 2 events will write quickly,
writing the 3rd event will be delayed until mlogger does it's thing.
The mlogger reads the SYSTEM buffer "fast" and "quickly", but it can be delayed
for a number of reasons, i.e. handling a history event, a delay writing to disk,
a delay writing to network connected storage, etc.
In general, it is best to size the SYSTEM buffer to hold about 1 second worth
of data (of average size, average rate). If your event size is 4 Mbytes, and you
record them at 10/sec, SYSTEM buffer should be at least 40 Mbytes big. (this is
set in ODB /Experiment/Buffer Sizes). (MIDAS event buffer size is limited to 2 GBytes).
K.O. |
09 Nov 2021, Hunter Lowe, Forum, MityCAMAC Login
|
Hello all,
I've recently acquired a MityCAMAC system that was built at TRIUMF and I'm
having issues accessing it over ethernet.
The system: Ubuntu VM inside Windows 10 machine.
I've tried reconfiguring the network settings for the VM but nmap and arp/ip
commands have yielded me no results in finding the crate controller.
I was getting help from Pierre Amaudruz but I think he is now busy for some
time. I have the mac address of the crate controller and its name. The
controller seems to initialize fine inside of the CAMAC crate. The windows side
of the workstation also tells me that an unknown network is in fact connected.
I suspect I either need to do something with an ssh key (which I thought we
accomplished but maybe not), or perhaps the domain name in the controller needs
to be changed.
If anybody has experience working with MityARM I would appreciate any advice I
could get.
Best,
Hunter Lowe
UNBC Graduate Physics |
11 Nov 2021, Thomas Lindner, Forum, MityCAMAC Login
|
Hi Hunter
This sounds like a Triumf specific problem;
not a MIDAS problem. Please email me directly
and we can try to solve this problem.
Thomas Lindner
TRIUMF DAQ
Hello all,
>
> I've recently acquired a MityCAMAC system
that was built at TRIUMF and I'm
> having issues accessing it over ethernet.
>
> The system: Ubuntu VM inside Windows 10
machine.
>
> I've tried reconfiguring the network
settings for the VM but nmap and arp/ip
> commands have yielded me no results in
finding the crate controller.
>
> I was getting help from Pierre Amaudruz but
I think he is now busy for some
> time. I have the mac address of the crate
controller and its name. The
> controller seems to initialize fine inside
of the CAMAC crate. The windows side
> of the workstation also tells me that an
unknown network is in fact connected.
>
> I suspect I either need to do something with
an ssh key (which I thought we
> accomplished but maybe not), or perhaps the
domain name in the controller needs
> to be changed.
>
> If anybody has experience working with
MityARM I would appreciate any advice I
> could get.
>
> Best,
> Hunter Lowe
> UNBC Graduate Physics |
26 Jan 2022, Konstantin Olchanski, Info, MityCAMAC Login
|
For those curious about CAMAC controllers, this one was built around 2014 to
replace the aging CAMAC A1/A2 controllers (parallel and serial) in the TRIUMF
cyclotron controls system (around 50 CAMAC crates). It implements the main
and the auxiliary controller mode (single width and double width modules).
The design predates Altera Cyclone-5 SoC and has separate
ARM processor (TI 335x) and Cyclone-4 FPGA connected by GPMC bus.
ARM processor boots Linux kernel and CentOS-7 userland from an SD card,
FPGA boots from it's own EPCS flash.
User program running on the ARM processor (i.e. a MIDAS frontend)
initiates CAMAC operations, FPGA executes them. Quite simple.
K.O. |
16 Dec 2021, Zaher Salman, Forum, Device driver for modbus
|
Dear all, does anyone have an example of for a device driver using modbus or modbus tcp to communicate with a device and willing to share it? Thanks. |
26 Jan 2022, Konstantin Olchanski, Forum, Device driver for modbus
|
> Dear all, does anyone have an example of for a device driver using modbus or modbus tcp to communicate with a device and willing to share it? Thanks.
I have not seen any modbus devices recently, so all my code and examples are quite old.
Basic modbus/tcp communication driver is in the midas repo:
daq00:midas$ find . | grep -i modbus
./drivers/divers/ModbusTcp.cxx
./drivers/divers/ModbusTcp.h
daq00:midas$
This driver worked for communication to a modbus PLC (T2K/ND280/TPC experiment in Japan).
An example program to use this driver and test modbus communication is here:
https://bitbucket.org/expalpha/agdaq/src/master/src/modbus.cxx
Because at the end, we do not have any modbus devices in any recent experiment,
I do not have any example of using this driver in the midas frontend. Sorry.
K.O. |
19 Nov 2021, Jacob Thorne, Forum, Sequencer error with ODB Inc
|
Hi,
I am having problems with the midas sequencer, here is my code:
1 COMMENT "Example to move a Standa stage"
2 RUNDESCRIPTION "Example movement sequence - each run is one position of a single stage
3
4 PARAM numRuns
5 PARAM sequenceNumber
6 PARAM RunNum
7
8 PARAM positionT2
9 PARAM deltapositionT2
10
11 ODBSet "/Runinfo/Run number", $RunNum
12 ODBSet "/Runinfo/Sequence number", $sequenceNumber
13
14 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Type of Measurement", 2
15 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Time Bins", 10
16 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Number of Sweeps", 1
17 ODBSet "/Equipment/Neutron Detector/Settings/Detector/Dwell Time", 100000
18
19 ODBSet "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $positionT2
20
21 LOOP $numRuns
22 WAIT ODBvalue, "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Ready", ==, 1
23 TRANSITION START
24 WAIT ODBvalue, "/Equipment/Neutron Detector/Statistics/Events sent", >=, 1
25 WAIT ODBvalue, "/Runinfo/State", ==, 1
26 WAIT ODBvalue, "/Runinfo/Transition in progress", ==, 0
27 TRANSITION STOP
28 ODBInc "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position", $deltapositionT2
29
30 ENDLOOP
31
32 ODBSet "/Runinfo/Sequence number", 0
The issue comes with line 28, the ODBInc does not work, regardless of what number I put I get the following error:
[Sequencer,ERROR] [odb.cxx:7046:db_set_data_index1,ERROR] "/Equipment/MTSC/Settings/Devices/Stage 2 Translation/Device Driver/Set Position" invalid element data size 32, expected 4
I don't see why this should happen, the format is correct and the number that I input is an int.
Sorry if this is a basic question.
Jacob |
02 Dec 2021, Stefan Ritt, Forum, Sequencer error with ODB Inc
|
Thanks for reporting that bug. Indeed there was a problem in the sequencer code which I fixed now. Please try the updated develop branch.
Stefan |
29 Oct 2021, Frederik Wauters, Bug Report, midas::odb::iterator + operator
|
I have 16 array odb key
{"FIR Energy", {
{"Energy Gap Value", std::array<uint32_t,16>(10) },
I can get the maximum of this array like
uint32_t max_value = *std::max_element(values.begin(),values.end());
but when I need the maximum of a sub range
uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);
I get
/home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
584 | max_value = *std::max_element(values.begin(),values.begin()+4);
| ~~~~~~~~~~~~~~^~
| | |
| | int
|
As the + operator is overloaded for midas::odb::iterator, I was expected this to work.
(and yes, I can find the max element by accessing the elements on by one) |
29 Oct 2021, Frederik Wauters, Bug Report, midas::odb::iterator + operator | work around
|
ok, so retrieving as a std::array (as it was defined) does not work
std::array<uint32_t,16> avalues = settings["FIR Energy"]["Energy Gap Value"];
but retrieving as an std::vector does, and then I have a standard c++ iterator which I can use in std stuff
std::vector<uint32_t> values = settings["FIR Energy"]["Energy Gap Value"];
> I have 16 array odb key
>
> {"FIR Energy", {
> {"Energy Gap Value", std::array<uint32_t,16>(10) },
>
> I can get the maximum of this array like
>
>
> uint32_t max_value = *std::max_element(values.begin(),values.end());
>
> but when I need the maximum of a sub range
>
> uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);
>
> I get
>
> /home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
> 584 | max_value = *std::max_element(values.begin(),values.begin()+4);
> | ~~~~~~~~~~~~~~^~
> | | |
> | | int
> |
>
> As the + operator is overloaded for midas::odb::iterator, I was expected this to work.
>
> (and yes, I can find the max element by accessing the elements on by one) |
25 Oct 2021, Francesco Renga, Forum, Logger crash
|
Hello,
I'm experiencing crashes of the mlogger program on the time scale of a couple
of days. The only messages from MIDAS are:
05:34:47.336 2021/10/24 [mhttpd,INFO] Client 'Logger' (PID 14281) on database
'ODB' removed by db_cleanup called by cm_periodic_tasks (idle 10.2s,TO 10s)
05:34:47.335 2021/10/24 [mhttpd,INFO] Client 'Logger' on buffer 'SYSMSG' removed
by cm_periodic_tasks (idle 10.2s, timeout 10s)
Any suggestion to further investigate this issue?
Thank you very much,
Francesco |
25 Oct 2021, Stefan Ritt, Forum, Logger crash
|
The short term solution would be to increase the logger timeout in the ODB under
/Programs/Logger/Watchdog timeout
and set it to 6000 (one minute). But that is curing just the symptoms. It would be
interesting to understand the cause of this error. Probably the logger takes more than 10
seconds to start or stop the run. The reason could be that the history grow too big (what
we have right now in MEG II), or some disk problems. But that needs detailed debugging on
the logger side.
Stefan |
14 Oct 2021, Amy Roberts, Suggestion, Adding (or improving discoverability) of TID for odbset
|
Creating an ODB key requires users to know the Type ID that are defined in
https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.
I can't find any information on the Midas Wiki about these values or how to find
them.
Am I missing something obvious? Is there a way to improve how to find these values?
Or is this not the best way to interact with the ODB? |
15 Oct 2021, Stefan Ritt, Suggestion, Adding (or improving discoverability) of TID for odbset
|
> Creating an ODB key requires users to know the Type ID that are defined in
> https://bitbucket.org/tmidas/midas/src/develop/include/midas.h starting at line 320.
>
> I can't find any information on the Midas Wiki about these values or how to find
> them.
>
> Am I missing something obvious? Is there a way to improve how to find these values?
> Or is this not the best way to interact with the ODB?
Well, you found them in midas.h, so where is the problem?
If you want a more detailed description, just look in the midas documentation (RTFM):
https://midas.triumf.ca/MidasWiki/index.php/Midas_Data_Types
If you want a more modern interface to the ODB without these data types, look here:
https://midas.triumf.ca/MidasWiki/index.php/Odbxx
Best regards,
Stefan |
11 Oct 2021, Konstantin Olchanski, Forum, midas forum updated, moved
|
The midas forum software (elogd) was updated to latest version and moved from our old server
(ladd00.triumf.ca) to our new server (daq00.triumf.ca).
The following URLs should work:
https://daq00.triumf.ca/elog-midas/Midas/ (new URL)
https://midas.triumf.ca/elog/Midas/ (old URL, redirects to daq00)
https://midas.triumf.ca/forum (link from midas wiki)
The configuration on the old server ladd00.triumf.ca is quite tangled between
several virtual hosts and several DNS CNAMEs. I think I got all the redirects
correct and all old URLs and links in old emails & etc still work.
If you see something wrong, please reply to this message here or email me directly.
K.O. |
11 Oct 2021, Konstantin Olchanski, Forum, test
|
test, no email. K.O. |
11 Oct 2021, Konstantin Olchanski, Forum, test
|
> test, no email. K.O.
test reply, no email. K.O. |
11 Oct 2021, Konstantin Olchanski, Forum, test
|
> > test, no email. K.O.
>
> test reply, no email. K.O.
test attachment, no email. K.O. |
11 Oct 2021, Konstantin Olchanski, Forum, test
|
> > > test, no email. K.O.
> >
> > test reply, no email. K.O.
>
> test attachment, no email. K.O.
test email. K.O. |
11 Oct 2021, Stefan Ritt, Info, Modification in the history logging system
|
A requested change in the history logging system has been made today. Previously, history values were
logged with a maximum frequency (usually once per second) but also with a minimum frequency, meaning
that values were logged for example every 60 seconds, even if they did not change. This causes a problem.
If a frontend is inactive or crashed which produces variables to be logged, one cannot distinguish between
a crashed or inactive frontend program or a history value which simply did not change much over time.
The history system was designed from the beginning in a way that values are only logged when they actually
change. This design pattern was broken since about spring 2021, see for example this issue:
https://bitbucket.org/tmidas/midas/issues/305/log_history_periodic-doesnt-account-for
Today I modified the history code to fix this issue. History logging is now controlled by the value of
common/Log history in the following way:
* Common/Log history = 0 means no history logging
* Common/Log history = 1 means log whenever the value changes in the ODB
* Common/Log history = N means log whenever the value changes in the ODB and
the previous write was more than N seconds ago
So most experiments should be happy with 0 or 1. Only experiments which have fluctuating values due to noisy
sensors might benefit from a value larger than 1 to limit the history logging. Anyhow this is not the preferred
way to limit history logging. This should be done by the front-end limiting the updates to the ODB. Most of the
midas slow control drivers have a “threshold” value. Only if the input changes by more then the threshold are
written to the ODB. This allows a per-channel “dead band” and not a per-event limit on history logging
as ‘log history’ would do. In addition, the threshold reduces the write accesses to the ODB, although that is
only important for very large experiments.
Stefan |
29 Sep 2021, Richard Longland, Bug Report, nstall clash between MIDAS 2020-08 and mscb
|
Thank you, Stefan.
I found these instructions under
1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
I do see reference to updating the submodules under the TRIUMF install
instructions
(https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
all_MIDAS) although perhaps it can be clarified.
Cheers,
Richard |
29 Sep 2021, Stefan Ritt, Bug Report, nstall clash between MIDAS 2020-08 and mscb
|
> Thank you, Stefan.
>
> I found these instructions under
> 1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
> 2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
>
> I do see reference to updating the submodules under the TRIUMF install
> instructions
> (https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
> all_MIDAS) although perhaps it can be clarified.
>
> Cheers,
> Richard
Hi Richard,
I updated the documentation at
https://midas.triumf.ca/MidasWiki/index.php/Changelog#Updating_midas
by putting the submodule update command everywhere.
Best,
Stefan |
28 Sep 2021, Richard Longland, Bug Report, Install clash between MIDAS 2020-08 and mscb
|
All,
I am performing a fresh install of MIDAS on an Ubuntu linux box. I follow the
usual installation procedure:
1) git clone https://bitbucket.org/tmidas/midas --recursive
2) cd midas
3) git checkout release/midas-2020-08
4) mkdir build
5) cd build
6) cmake ..
7) make
Step 3 warns me that
"warning: unable to rmdir 'manalyzer': Directory not empty" and
"warning: unable to rmdir 'midasio': Directory not empty"
Step 7 fails.
Compilation fails with an mhttp error related to mscb:
mhttpd.cxx:8224:59: error: too few arguments to function 'int mscb_ping(int,
short unsigned int, int, int)'
8224 | status = mscb_ping(fd, (unsigned short) ind, 1);
I was able to get around this by rolling mscb back to some old version (commit
74468dd), but am extremely nervous about mix-and-matching the code this way.
Any advice would be greatly appreciated.
Cheers,
Richard |
28 Sep 2021, Stefan Ritt, Bug Report, Install clash between MIDAS 2020-08 and mscb
|
> 1) git clone https://bitbucket.org/tmidas/midas --recursive
> 2) cd midas
> 3) git checkout release/midas-2020-08
> 4) mkdir build
> 5) cd build
> 6) cmake ..
> 7) make
When you do step 3), you get
~/tmp/midas$ git checkout release/midas-2020-08
warning: unable to rmdir 'manalyzer': Directory not empty
warning: unable to rmdir 'midasio': Directory not empty
M mjson
M mscb
M mvodb
M mxml
The 'M' in front of the submodules like mscb tell you that you
have an older version of midas (namely midas-2020-08), but the
*current* submodules, which won't match. So you have to roll back
also the submodules with:
3.5) git submodule update --recursive
This fetched those versions of the submodules which match the
midas version 2020-08. See here for details:
https://git-scm.com/book/en/v2/Git-Tools-Submodules
From where did you get the command
git checkout release/xxxx ???
If you tell me the location of that documentation, I will take
care that it will be amended with the command
git submodule update --recursive
Best,
Stefan |
19 Sep 2021, Stefan Ritt, Bug Fix, Chat working again
|
Not sure how many people are using it, but the Chat facility in midas was broken
for some time now and got fixed today again.
Just for your information: Chat can be used like WhatsApp & Co, and connects all
people who access a midas experiment through their browser. It's good to
communicate between shift crew members located at different places. One advantage
is that the chat messages can get 'spoken' by the text-to-speech engine of your
browser, so it can be used to "wake up" shifters. Can be configured through the
"Config" page.
Stefan |
06 Sep 2021, Andreas Suter, Forum, mhttpd crash
|
midas version used: midas-2019-05-cxx-1461-g906be8b
I find in the systemd log every couple of days/weeks the following error message related to the mhttpd:
[mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
with various socket numbers of course.
Can anybody hint me what is going wrong here?
The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.
Help would be very much appreciated!
Andreas |
06 Sep 2021, Konstantin Olchanski, Forum, mhttpd crash
|
> [mhttpd,ERROR] [mhttpd.cxx:18886:on_work_complete,ERROR] Should not send response to request from socket 28 to socket 26, abort!
> Can anybody hint me what is going wrong here?
> The bad thing on the crash is, that sometimes it is leading to a "chain-reaction" killing multiple midas frontends, which essentially stop the experiment.
This is my code. I am the culprit. I had a bit of discussion about this with Stefan.
Bottom line is something is rotten in the multithreading code inside mhttpd and under conditions unknown,
it sends the wrong data into the wrong socket. This causes midas web pages to be really confused (RPC replies
processed as CSS file, HTML code processed at RPC replies, a mess), this wrong data is cached by the browser,
so restarting mhttpd does not fix the web pages. So a mess.
I find this is impossible to replicate, and so cannot debug it, cannot fix it. Best I was able to do
is to add a check for socket numbers, and thankfully it catches the condition before web browser caches
become poisoned. So, broken web pages replaced by mhttpd crash.
This situation reinforces my opinion that multi-threading and C++ classes "do not mix" (like H2 and O2 do not mix).
If you write a multithreaded C++ program and it works, good for you, if there is a malfunction, good luck with it,
C++ just does not have any built-in support for debugging typical multithreading problems. I think others have come
to the same conclusion and invented all these new "safe" programming languages, like Rust and Go.
Back to your troubles.
1) If you see a way to replicate this crash, or some way to reliably cause
the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
and I wish to fix this problem very much.
2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
are useless for debugging the present problem. There is just no way to examine the state of each
thread and of each http request using gdb by hand.
3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
the "wrong socket" check, but most likely you will not like the result).
4) run mhttpd inside a script: "while (1) { start mhttpd; sleep 1 sec; rinse, repeat; }" (run mhttpd without "-D", yes?)
In other news, the mongoose web server library have a new version available, they again changed their
multithreading scheme (I think it is an improvement). If I update mhttpd to this new version, it is very
likely the code with the "wrong socket" bug will be deleted. (with new bugs added to replace old bugs, of course).
K.O. |
07 Sep 2021, Andreas Suter, Forum, mhttpd crash
|
Dear Konstantin,
thanks for the prompt response, this helps a lot!
> 1) If you see a way to replicate this crash, or some way to reliably cause
> the crash within 5-10 minutes after starting mhttpd, please let me know. I can work with that
> and I wish to fix this problem very much.
I wished I could! This happens 3-4 times per year only, so close to impossible to trigger.
> 2) My "wrong socket" check calls abort() to produce a core dump. In my experience these core dumps
> are useless for debugging the present problem. There is just no way to examine the state of each
> thread and of each http request using gdb by hand.
>
> 3) this abort() causes linux to write a core dump, this takes a long time and I think it causes
> other MIDAS program to stop, timeout and die. You can try to fix this by disabling core dumps (set "enable core dumps"
> to "false" in ODB and set core dump size limit to 0), or change abort() to exit(). (You can also disable
> the "wrong socket" check, but most likely you will not like the result).
>
I changed now to exit() rather than abort on the production machine. Perhaps this should be the default?
Andreas |
17 Sep 2021, Stefan Ritt, Forum, mhttpd crash 
|
To limit the impact of the numerous crashes of mhttpd, I installed the monit tool at MEG at PSI
(https://en.wikipedia.org/wiki/Monit). It monitors mhttpd, and if it cannot connect to it for a certain
time, it kills the process and restarts it. This covers endless loops, simple crashes (caused by the
known multi-threading issue in mongoose), and also cases where mhttpd develops a memory leak and becomes
unresponsive.
To configure monit for mhttpd, first install the package, make sure the daemon gets started automatically
after reboot (typically "sysemctl enable monit"), and put the attached file into
/etc/monit.d/mhttpd
You have to adjust the <path-to-midas> according to your midas installation, and probably also the port
under which mhttpd is listening (8082 in my case). Put
set daemon 10
into /etc/monitrc if you want monit to check mhttpd every 10 seconds (default is 30 seconds). Then, every
10 seconds monit request "midas.css" from mhttpd, and if it cannot obtain it after 30 seconds, it kills
mhttpd and restarts it.
Loading long history plots taking more than 30 seconds should probably not be an issue since mhttpd is
multi-threaded, but I haven't tested this in detail.
Attached below is a typical status page produced by monit, which has its own built-in web server (normally
listening at port 2812, accessible only from localhost by default).
I hope this helps some of you.
Stefan |
24 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
I am updating the history plots. Main changes:
- the old history display code should again be easily usable (use the "open in old history display" checkbox)
- the history plot editor has an "edit in ODB" button that takes as to the plot definition in ODB (sometimes it is
easier to editing things in the ODB editor)
- error in history plot editor that created "formula" entry of incorrect size should be fixed
- "reorder" (and "delete entry") functions in the history plot editor should work again (plus added explanation text)
- "factor" and "offset" restored in the history plot editor
- added the long desired "voffset" to simplify plot scaling and positioning
- (factor, offset and voffset do not yet work in the new history plots, TBI ASAP)
- history plot editor and generate_hist_graph() now use the same code to read plot definitions from ODB. There should
be no more confusion about content of history plot entries in ODB and what each entry is supposed to do.
These changes have been precipitated by our inability to plot high voltage voltage and current on the same plot,
see bug https://bitbucket.org/tmidas/midas/issues/308/history-plot-formula-cannot-be-used-to
Voltage is in the range 0..1000 (volts) and current is in the range 0..50 and 0..0.100, autoscaling on voltage
makes the currents invisible at the zero line. In the past, we used the "factor" setting to scale
the graphs so we can see both voltage and currents at the same time (currents scaled up by factor 25 and 600,
as example).
The new "formula" feature was supposed to replace (and improve upon) the "factor" and "offset". But if I use
the formula "x*25", suddenly the plot is telling us that current values are not 50 uA, but 1250 uA (50*25),
and this is just wrong. We do not want to scale the micro-amps, we want to better position the plot on the graph,
like the old "factor" and "offset" allowed us to do.
So the idea is to use this computation:
y_position_on_plot = offset + factor*(formula(history_value) - voffset)
- "formula" is to transform history values into physical values (i.e. pressure meter reports bars, but we want atm, or
voltmeter is reading in discrete units of 0.125V, we want to see volts)
- "factor" and "offset" is to position the graphs on the plot for best visual presentation of data
- I also added is the much desired "voffset", you only know it is needed if you have a non-zero "offset" and you need
to change the "factor", surprise, "offset" has ot be changed, too, and good luck recalculating it correctly in one
try.
The way to use this stuff:
- adjust "voffset" to bring the graph to around y=0
- increase the "factor" to zoom-in on features and stuff
- adjust "offset" to move the graph up and down relative to all the other graphs on the plot
- now one can zoom in and out as needed by changing the "factor" and the plot will stay roughly in the right place
without having to readjust the offsets.
K.O. |
24 Jun 2021, Stefan Ritt, Bug Fix, changes in history plots
|
I disagree with the proposed change to scale the HV current for a "nice" display. If values are scaled, the axis should be
scaled in the same way. Otherwise people might read the current from the plot, look at the axis, and again get the wrong
value (the factor of 25x you mention). Sure you can hover with the cursor over the graph, and see the right value, but think
of taking a screen shot, putting this into a publication, and get complaints from the reviewer.
The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
Stefan |
25 Jun 2021, Marco Francesconi, Bug Fix, changes in history plots
|
We are using the new history formula as a quick way to convert signals from sensors to actual physical values (for example Voltage->Temperature, Voltage->relative humidity
...), so it is great that the shown voltage is the calculated one.
I would like to add a point to this discussion.
In our collaboration people attach images of history plots to elogs, meeting presentation and/or physical logbooks.
The proposed scaling formula may work fine online using the cursors, but, once an image is created, I do not understand how it is possible to extract the value for a scaled
variables.
Suppose you see a graph in a presentation with a current increase by some PSU and the current was scaled to be in the same plot of the voltage.
Looking at the delta in the image, how can you judge the current increase without any axis/grid to refer to?
So I support Stefan proposal for a secondary axis, as long as it is clear which value belong to which axis.
Maybe marking the channels in the description or using different line styles/thickness?
Best,
Marco |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
I will have to post an example of a scaled plot. I figure everybody forgot how they look like.
K.O.
> We are using the new history formula as a quick way to convert signals from sensors to actual physical values (for example Voltage->Temperature, Voltage->relative humidity
> ...), so it is great that the shown voltage is the calculated one.
>
> I would like to add a point to this discussion.
> In our collaboration people attach images of history plots to elogs, meeting presentation and/or physical logbooks.
> The proposed scaling formula may work fine online using the cursors, but, once an image is created, I do not understand how it is possible to extract the value for a scaled
> variables.
> Suppose you see a graph in a presentation with a current increase by some PSU and the current was scaled to be in the same plot of the voltage.
> Looking at the delta in the image, how can you judge the current increase without any axis/grid to refer to?
>
> So I support Stefan proposal for a secondary axis, as long as it is clear which value belong to which axis.
> Maybe marking the channels in the description or using different line styles/thickness?
>
> Best,
> Marco |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> I disagree ...
I am happy with disagreement and differences of opinions. Zest of life, driver of progress and improvements, etc.
I am even more happy with solutions to problems. The current problem is that the offset and factor feature
of history plots has been removed without much discussion.
I stress, we have been using this feature to run experiments for the last 20 years.
I do not understand objections to it being restored. If you do not want to use it, do not use it.
K.O.
> with the proposed change to scale the HV current for a "nice" display. If values are scaled, the axis should be
> scaled in the same way. Otherwise people might read the current from the plot, look at the axis, and again get the wrong
> value (the factor of 25x you mention). Sure you can hover with the cursor over the graph, and see the right value, but think
> of taking a screen shot, putting this into a publication, and get complaints from the reviewer.
>
> The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
> new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
> the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
>
> Stefan |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> > The only "correct" way in my opinion is to implement two vertical axis, as can be seen in some papers. One for the HV, and a
> > new TBD right axis for the current values, then indicating for each graph if the left or right vertical axis applies. For
> > the secondary axis we can have autoscaling or fixed scaling, as we have for the primary axis.
In the past, we have done some useful plots with maybe 10 variables plotted
at the same time with different scaling and positioning on the graph.
Having 2 vertical axis is maybe useful for the specific case of plotting high voltages,
but not in the general case.
Actually, just 2 vertical axis will not work to plot high voltages in ALPHA-g, because
we have anode currents on the scale 0..0.1 uA and cathode currents on the scale 50..60 uA.
K.O. |
25 Jun 2021, Stefan Ritt, Bug Fix, changes in history plots
|
A general warning: With the recent history changes implemented in the develop branch, starting from a fresh ODB and editing
any history panel, on gets tons of errors and debug output from mhttpd:
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Log axis" returned status 312
MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
Load from ODB History/Display/Default/Trigger rate: hist plot: 2 variables
timescale: 1h, minimum: 0.000000, maximum: 0.000000, zero_ylow: 0, log_axis: 0, show_run_markers: 1, show_values: 1,
show_fill: 1
var[0] event [System][Trigger per sec.] formula [], colour [#00AAFF] label [] factor 1.000000 offset 0.000000 voffset
0.000000 order 10
var[1] event [System][Trigger kB per sec.] formula [], colour [#FF9000] label [] factor 1.000000 offset 0.000000 voffset
0.000000 order 20
This has to be fixed by the original author. I strongly recommend to make such modifications on a separate branch not to
break running experiments.
Stefan |
25 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> A general warning: With the recent history changes implemented in the develop branch, starting from a fresh ODB and editing
> any history panel, on gets tons of errors and debug output from mhttpd: ...
This is the reason most projects have separate development and production branches.
I recommend everybody to use the released tagged versions of midas for production.
> I strongly recommend to make such modifications on a separate branch not to
> break running experiments.
Is there something that does not work anymore? Did I break something? The debug messages I am still
tuning.
K.O.
>
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Minimum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Maximum" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Log axis" returned status 312
> MVOdb: Error: MIDAS db_get_value() at ODB path "/History/Display/Default/Trigger rate/Zero ylow" returned status 312
> Load from ODB History/Display/Default/Trigger rate: hist plot: 2 variables
> timescale: 1h, minimum: 0.000000, maximum: 0.000000, zero_ylow: 0, log_axis: 0, show_run_markers: 1, show_values: 1,
> show_fill: 1
> var[0] event [System][Trigger per sec.] formula [], colour [#00AAFF] label [] factor 1.000000 offset 0.000000 voffset
> 0.000000 order 10
> var[1] event [System][Trigger kB per sec.] formula [], colour [#FF9000] label [] factor 1.000000 offset 0.000000 voffset
> 0.000000 order 20
>
>
>
> This has to be fixed by the original author. I strongly recommend to make such modifications on a separate branch not to
> break running experiments.
>
> Stefan |
30 Jun 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> I am updating the history plots.
> So the idea is to use this computation:
> y_position_on_plot = offset + factor*(formula(history_value) - voffset)
Stefan and myself did some brain storming on zoom. Writing it down the way I remember it.
- we distilled the gist of the problem - the numerical values we show in the plot labels and in hover-over-the-graph
are before formula is applied or after the formula is applied?
- I suggested a universal solution using a double formula: use formula1 for one case;
use formula2 for the other case;
use formula1 for "physics calibration", use formula2 for factor and offset for composite plots:
numeric_value = formula1(history_value)
plotted_value = formula2(numeric_value)
- we agree that this is way too complicated, difficult to explain and difficult to coherently present in the history editor
- Stefan suggested a simple solution, a checkbox labeled "show raw value" next to each history variable. by default, the
value after the formula is plotted and displayed. if checked, the raw value (before the formula) is displayed, and the
value after the formula is plotted. (so this works the same as the factor and offset on the old history plots).
- if "show raw value" is enabled, the numerical values shown will be inconsistent against the labels on the vertical axis.
Our solution it to turn the axis labels off. (for composite plots, like oscillator frequency in Hz vs oscillator
temperature in degC, both scaled to see their correlation, the vertical axis is unit-less "arbitrary units", of course)
- to simplify migration of old history plots that use custom factor and offset settings, we think in the direction of
automatically moving them to the "formula". (factor=2, offset=10 automatically populates formula with "2*x+10", "show raw
value" checked/enabled). Thus we can avoid implementing factor and offset in the new history code (an unwelcome
complication).
- I think this covers all the use cases I have seen in the past, so we will move in this direction.
K.O. |
14 Jul 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
Moving in the direction of this proposal. History plot editor is updated according to it. Remaining missing piece is the "show
raw value" buttons and code behind them.
Changes:
- "show factor and offset" moved to the top of the page, "off" by default
- factor and offset (if not zero) are automatically migrated to the formula field (if it is empty), one needs to save the panel
for this to take effect.
K.O.
> > I am updating the history plots.
> > So the idea is to use this computation:
> > y_position_on_plot = offset + factor*(formula(history_value) - voffset)
>
> Stefan and myself did some brain storming on zoom. Writing it down the way I remember it.
>
> - we distilled the gist of the problem - the numerical values we show in the plot labels and in hover-over-the-graph
> are before formula is applied or after the formula is applied?
>
> - I suggested a universal solution using a double formula: use formula1 for one case;
> use formula2 for the other case;
> use formula1 for "physics calibration", use formula2 for factor and offset for composite plots:
> numeric_value = formula1(history_value)
> plotted_value = formula2(numeric_value)
>
> - we agree that this is way too complicated, difficult to explain and difficult to coherently present in the history editor
>
> - Stefan suggested a simple solution, a checkbox labeled "show raw value" next to each history variable. by default, the
> value after the formula is plotted and displayed. if checked, the raw value (before the formula) is displayed, and the
> value after the formula is plotted. (so this works the same as the factor and offset on the old history plots).
>
> - if "show raw value" is enabled, the numerical values shown will be inconsistent against the labels on the vertical axis.
> Our solution it to turn the axis labels off. (for composite plots, like oscillator frequency in Hz vs oscillator
> temperature in degC, both scaled to see their correlation, the vertical axis is unit-less "arbitrary units", of course)
>
> - to simplify migration of old history plots that use custom factor and offset settings, we think in the direction of
> automatically moving them to the "formula". (factor=2, offset=10 automatically populates formula with "2*x+10", "show raw
> value" checked/enabled). Thus we can avoid implementing factor and offset in the new history code (an unwelcome
> complication).
>
> - I think this covers all the use cases I have seen in the past, so we will move in this direction.
>
> K.O. |
14 Jul 2021, Konstantin Olchanski, Bug Fix, changes in history plots
|
> Moving in the direction of this proposal. Remaining missing piece is the "show
> raw value" buttons and code behind them.
added "show raw value" button, updated on-page instructions.
I think this is the final layout of the history panel editor, conversion
to html+javascript will be done "as is". If you have suggestions to improve
the layout (add/remove/move things around, etc), please shoult out (on the elog
here or by direct email to me).
I am thinking in the direction of changing the control flow of the history editor:
- midas "history" manu button click redirects to
- current history panel selection (with checkbox to open old history plots), click on "new plot" button redirects to
- new page for creating new plots. this will present a list of all history variables, click on variable name creates a new history
panel containing just this one variable and redirects to it.
In other words, to see the history for any history variable:
- click on "history" menu button
- click on "new"
- click on desired history variable
- see this history plot
From here, click on the "wheel" button to open the existing history panel editor and add any additional variables, change settings,
etc.
In the history panel editor, I am thinking in the direction of replacing the existing drop-down selection of history variables (now
very workable for large experiments) with an overlay dialog to show all history variables, with checkboxes to select them, basically
the same history variable select page as described above. Not sure yet how this will work visually.
K.O. |
24 Aug 2021, Stefan Ritt, Bug Fix, changes in history plots
|
One addition I would be in favour of is to remove the "Order" and replace it with drag&drop handles, because this is what people are more
used to today. Only the old guys like us remember the /etc/init.d/xx_yy scheme where one uses an integer number in the file name to
determine an order.
See for example: https://jsbin.com/hijetos/edit?js,output
But instead of relying on a foreign library, I would rather implement that myself, since I need the same thing later for the to-be-
implemented ODB editor (next year? next lockdown?)
Stefan |
19 Aug 2021, Konstantin Olchanski, Bug Report, select() FD_SETSIZE overrun
|
I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
mysteriously misbehaving during run start and stop.
The problem turns out to be with the select() system call.
The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
the FD_SET() array. Ouch.
I see that all uses of select() in midas have no protection against this.
(we should probably move away from select() to newer poll() or whatever it is)
Why does mlogger open so many file descriptors? The usual, scaling problems in the
history. The old midas history does not reuse file descriptors, so opens the same
3 history files (.hst, .idx, etc) for each history event. The new FILE history
opens just one file per history event. But if the number of events is bigger than
1024, we run into same trouble.
(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
on some other machines, see "limit" or "ulimit -a").
K.O. |
20 Aug 2021, Stefan Ritt, Bug Report, select() FD_SETSIZE overrun
|
> I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
> mysteriously misbehaving during run start and stop.
>
> The problem turns out to be with the select() system call.
>
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
> the FD_SET() array. Ouch.
>
> I see that all uses of select() in midas have no protection against this.
>
> (we should probably move away from select() to newer poll() or whatever it is)
>
> Why does mlogger open so many file descriptors? The usual, scaling problems in the
> history. The old midas history does not reuse file descriptors, so opens the same
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history
> opens just one file per history event. But if the number of events is bigger than
> 1024, we run into same trouble.
>
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
> on some other machines, see "limit" or "ulimit -a").
>
> K.O.
I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't
fit on your status page.
I have some other suspicion: The logger opens a history file on access, then closes it
again after writing to it. In the old days we had a case where we had a return from the
write function BEFORE the file has been closed. This is kind of a memory leak, but with
file descriptors. After some time of course you run out of file descriptors and crash.
Now that bug has been fixed many years ago, but it sounds to me like there is another
"fd leak" somewhere. You should add some debugging in the history code to print the
file descriptors when you open a file and when you leave that routine. The leak could
however also be somewhere else, like writing to the message file, ODB dump, ...
The right thing of course would be to rewrite everything with std::ofstream which
closes automatically the file when the object gets out of scope.
Stefan |
12 May 2021, Mathieu Guigue, Bug Report, mhttpd WebServer ODBTree initialization
|
Hi,
Using midas version 12-2020, I am trying to run mhttpd from within a docker container using docker-compose.
Starting from an empty ODB, I simply run `mhttpd` and this is the output I have:
midas_hatfe_1 | <Warning> Starting mhttpd...
midas_hatfe_1 | [mhttpd,INFO] ODB subtree /Runinfo corrected successfully
midas_hatfe_1 | MVOdb::SetMidasStatus: Error: MIDAS db_find_key() at ODB path "/WebServer/Host list" returned status 312
midas_hatfe_1 | Mongoose web server will not use password protection
midas_hatfe_1 | Mongoose web server will not use the hostlist, connections from anywhere will be accepted
midas_hatfe_1 | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
midas_hatfe_1 | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"
According to the documentation, the WebServer tree should be created automatically when starting the mhttpd; but it seems not as it doesn't find the entry "/WebServer/Host list".
If I create it by end (using "create STRING /WebServer/Host list"), I still get the error message that mhttpd didn't bind properly to the local port 8080.
I am not sure what it wrong, as mhttpd is working perfectly well in this exact container for midas 03-2020.
Any idea what difference makes it not possible anymore to run into these container?
Thanks very much for your help.
Cheers
Mathieu |
12 May 2021, Ben Smith, Bug Report, mhttpd WebServer ODBTree initialization
|
> midas_hatfe_1 | Mongoose web server listening on http address "localhost:8080", passwords OFF, hostlist OFF
> midas_hatfe_1 | [mhttpd,ERROR] [mhttpd.cxx:19160:mongoose_listen,ERROR] Cannot mg_bind address "[::1]:8080"
It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false. |
12 May 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization
|
> It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
We had this issue already several times. This info should be put into the documentation at a prominent location.
Stefan |
13 May 2021, Mathieu Guigue, Bug Report, mhttpd WebServer ODBTree initialization
|
> > It looks like mhttpd managed to bind to the IPv4 address (localhost), but not the IPv6 address (::1). If you don't need it, try setting "/Webserver/Enable IPv6" to false.
>
> We had this issue already several times. This info should be put into the documentation at a prominent location.
>
> Stefan
Thanks a lot, this solved my issue! |
14 May 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization
|
> Thanks a lot, this solved my issue!
... or we should turn IPv6 off by default, since not many people use this right now. |
02 Jun 2021, Konstantin Olchanski, Bug Report, mhttpd WebServer ODBTree initialization
|
> > Thanks a lot, this solved my issue!
>
> ... or we should turn IPv6 off by default, since not many people use this right now.
IPv6 certainly works and is used at CERN.
But I am not sure why people see this message. I do not see it on any machines at
TRIUMF, even those with IPv6 turned off.
K.O. |
05 Aug 2021, Stefan Ritt, Bug Report, mhttpd WebServer ODBTree initialization
|
Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall
I do it? |
04 Jun 2021, Andreas Suter, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
Hi,
if I check out midas and try to configure it with
cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
I do get the error messages:
Target "midas" INTERFACE_INCLUDE_DIRECTORIES property contains path:
"<path>/tmidas/midas/include"
which is prefixed in the source directory.
Is the cmake setup not relocatable? This is new and was working until recently:
MIDAS version: 2.1
GIT revision: Thu May 27 12:56:06 2021 +0000 - midas-2020-08-a-295-gfd314ca8-dirty on branch HEAD
ODB version: 3 |
04 Jun 2021, Konstantin Olchanski, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
good timing, I am working on cmake for manalyzer and rootana and I have not tested
the install prefix business.
now I know to test it for all 3 packages.
I will also change find_package(Midas) slightly, (see my other message here),
I hope you can confirm that I do not break it for you.
K.O. |
04 Jun 2021, Konstantin Olchanski, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> Is the cmake setup not relocatable? This is new and was working until recently:
Indeed. Not relocatable. This is because we do not install the header files.
When you use the CMAKE_INSTALL_PREFIX, you get MIDAS "installed" in:
prefix/lib
prefix/bin
$MIDASSYS/include <-- this is the source tree and so not "relocatable"!
Before, this was kludged and cmake did not complain about it.
Now I changed cmake to handle the include path "the cmake way", and now it knows to complain about it.
I am not sure how to fix this: we have a conflict between:
- our normal way of using midas (include $MIDASSYS/include, link $MIDASSYS/lib, run $MIDASSYS/bin)
- the cmake way (packages *must be installed* or else! but I do like install(EXPORT)!)
- and your way (midas include files are in $MIDASSYS/include, everything else is in your special location)
I think your case is strange. I am curious why you want midas libraries to be in prefix/lib instead of in
$MIDASSYS/lib (in the source tree), but are happy with header files remaining in the source tree.
K.O. |
04 Jun 2021, Andreas Suter, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> > cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> > Is the cmake setup not relocatable? This is new and was working until recently:
>
> Indeed. Not relocatable. This is because we do not install the header files.
>
> When you use the CMAKE_INSTALL_PREFIX, you get MIDAS "installed" in:
>
> prefix/lib
> prefix/bin
> $MIDASSYS/include <-- this is the source tree and so not "relocatable"!
>
> Before, this was kludged and cmake did not complain about it.
>
> Now I changed cmake to handle the include path "the cmake way", and now it knows to complain about it.
>
> I am not sure how to fix this: we have a conflict between:
>
> - our normal way of using midas (include $MIDASSYS/include, link $MIDASSYS/lib, run $MIDASSYS/bin)
> - the cmake way (packages *must be installed* or else! but I do like install(EXPORT)!)
> - and your way (midas include files are in $MIDASSYS/include, everything else is in your special location)
>
> I think your case is strange. I am curious why you want midas libraries to be in prefix/lib instead of in
> $MIDASSYS/lib (in the source tree), but are happy with header files remaining in the source tree.
>
> K.O.
We do it this way, since the lib and bin needs to be in a place where standard users have no access to.
If I think an all other packages I am working with, e.g. ROOT, the includes are also installed under CMAKE_INSTALL_PREFIX.
Up until recently there was no issue to work with CMAKE_INSTALL_PREFIX, accepting that the includes stay under
$MIDASSYS/include, even though this is not quite the standard way, but no problem here. Anyway, since CMAKE_INSTALL_PREFIX
is a standard option from cmake, I think things should not "break" if you want to use it.
A.S. |
08 Jun 2021, Konstantin Olchanski, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> > > cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> > > Is the cmake setup not relocatable? This is new and was working until recently:
> > Not relocatable. This is because we do not install the header files.
>
> We do it this way, since the lib and bin needs to be in a place where standard users have no access to.
hmm... i did not get this. "needs to be in a place where standard users have no access to". what do you
mean by this? you install midas in a secret location to prevent somebody from linking to it?
> If I think an all other packages I am working with, e.g. ROOT, the includes are also installed under CMAKE_INSTALL_PREFIX.
cmake and other frameworks tend to be like procrustean beds (https://en.wikipedia.org/wiki/Procrustes),
pre-cmake packages never quite fit perfectly, and either the legs or the heads get cut off. post-cmake packages
are constructed to fit the bed, whether it makes sense or not.
given how this situation is known since antiquity, I doubt we will solve it today here.
(I exercise my freedom of speech rights to state that I object being put into
such situations. And I would like to have it clear that I hate cmake (ask me why)).
>
> Up until recently there was no issue to work with CMAKE_INSTALL_PREFIX, accepting that the includes stay under
> $MIDASSYS/include, even though this is not quite the standard way, but no problem here.
>
I think a solution would be to add install rules for include files. There will be a bit of trouble,
normal include path is $MIDASSYS/include,$MIDASSYS/mxml,$MIDASSYS/mjson,etc, after installing
it will be $CMAKE_INSTALL_PREFIX/include (all header files from different git submodules all
dumped into one directory). I do not know what problems will show up from that.
I think if midas is used as a subproject of a bigger project, this is pretty much required
(and I have seen big experiments, like STAR and ND280, do this type of stuff with CMT,
another horror and the historical precursor of cmake)
The problem is that we do not have any super-project like this here, so I cannot ever
be sure that I have done everything correctly. cmake itself can be helpful, like
in the current situation where it told us about a problem. but I will never trust
cmake completely, I see cmake do crazy and unreasonable things way too often.
One solution would be for you or somebody else to contribute such a cmake super-project,
that would build midas as a subproject, install it with a CMAKE_INSTALL_PREFIX and
try to link some trivial frontend or analyzer to check that everything is installed
correctly. It would become an example for "how to use midas as a subproject").
Ideally, it should be usable in a bitbucket automatic build (assuming bitbucket
has correct versions of cmake, which it does not half the time).
P.S. I already spent half-a-week tinkering with cmake rules, only to discover
that I broke a kludge that allows you to do something strange (if I have it right,
the CMAKE_PREFIX_INSTALL code is your contribution). This does not encourage
me to tinker with cmake even more. who knows against what other
kludge I bump into. (oh, yes, I know, I already bumped into the nonsense
find_package(Midas) implementation).
K.O. |
09 Jun 2021, Andreas Suter, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> > > > cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> > > > Is the cmake setup not relocatable? This is new and was working until recently:
> > > Not relocatable. This is because we do not install the header files.
> >
> > We do it this way, since the lib and bin needs to be in a place where standard users have no access to.
>
> hmm... i did not get this. "needs to be in a place where standard users have no access to". what do you
> mean by this? you install midas in a secret location to prevent somebody from linking to it?
>
This was a wrong wording from my side. We do not want the the users have write access to the midas installation libs and bins.
I have submitted the pull request which should resolve this without interfere with your usage.
Hope this will resolve the issue. |
10 Jun 2021, Konstantin Olchanski, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> > > > > cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> > > > > Is the cmake setup not relocatable? This is new and was working until recently:
> > > > Not relocatable. This is because we do not install the header files.
> > >
> > > We do it this way, since the lib and bin needs to be in a place where standard users have no access to.
> >
> > hmm... i did not get this. "needs to be in a place where standard users have no access to". what do you
> > mean by this? you install midas in a secret location to prevent somebody from linking to it?
> >
>
> This was a wrong wording from my side. We do not want the the users have write access to the midas installation libs and bins.
> I have submitted the pull request which should resolve this without interfere with your usage.
> Hope this will resolve the issue.
Excellent. I think it is good to have midas "install" in a sane manner.
But I still struggle to understand what you do. Presumably you can "install" midas
in the "midas account", which is not writable by the experiment and user accounts.
Then it does not matter if you "install" it in it's build directory (like we do)
or in some other location (like you do now).
This does not work of course if you only have one account, so do you build midas
as root? or install it as root?
I do ask because in the current computing world, doing things as root requires
a certain amount of trust, which may not be there anymore, see the recent "supply side" attacks
against python packages, solar winds hack, linux kernel malicious patches from umn, etc.
Personally, I do not want to answer questions "is midas safe to run as root?",
"can I trust the midas install scripts to run as root?" and certainly I do not want to hear
about "I installed midas and 100 other packages as root and got hacked 7 days later".
(and running midas as root was never safe. neither mhttpd nor mserver will pass
a security audit).
Anyhow, looks like I will look at cmake again next week. Right now I have a major
breakthrough in the ALPHA-g experiment, my big 96-port Juniper switch suddenly
has working ethernet flow control and I can record data at 600 Mbytes/sec without
any UDP packet loss. Above that, my event builder explodes. I want to fix it and get
it up to 1000 Mbytes/sec, the limit of my 10gige network link. (In this system I do not
have the disk subsystem to record data at this rate, but I have build 8-disk ZFS arrays
that would sink it, no problem). And the day has come when I ran out of CPU cores.
The UDP packet receivers are multithreaded, the event builder is multithreaded and I am using
all 4 of the available cores (intel cpu). As soon as I can get a rackmounted AMD Ryzen
or Threadripper machine, we will likely upgrade. (need at least one more CPU core to run
the online analyzer!). Exciting.
K.O. |
10 Jun 2021, Andreas Suter, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
> > > > > > cmake ../ -DCMAKE_INSTALL_PREFIX=/usr/local/midas
> > > > > > Is the cmake setup not relocatable? This is new and was working until recently:
> > > > > Not relocatable. This is because we do not install the header files.
> > > >
> > > > We do it this way, since the lib and bin needs to be in a place where standard users have no access to.
> > >
> > > hmm... i did not get this. "needs to be in a place where standard users have no access to". what do you
> > > mean by this? you install midas in a secret location to prevent somebody from linking to it?
> > >
> >
> > This was a wrong wording from my side. We do not want the the users have write access to the midas installation libs and bins.
> > I have submitted the pull request which should resolve this without interfere with your usage.
> > Hope this will resolve the issue.
>
> Excellent. I think it is good to have midas "install" in a sane manner.
>
> But I still struggle to understand what you do. Presumably you can "install" midas
> in the "midas account", which is not writable by the experiment and user accounts.
> Then it does not matter if you "install" it in it's build directory (like we do)
> or in some other location (like you do now).
>
> This does not work of course if you only have one account, so do you build midas
> as root? or install it as root?
>
We work the following way: there is a production Midas under let's say /usr/local/midas (make install as sudo/root). This is for the running experiment. Since we are doing muSR, we
have experiments on a daily base, rather than month and years as it is the case for a particle physics experiment. Now, still we would like to test updates, new features of Midas on
the same machine. For this we us the repo directly. If we are happy with the new feature, and fixes, we again do a 'make install' and hence freeze for the production a specific
snapshot. Of course we could use various local copies of the Midas repo, but over the last years this approach was very convenient and productive. Hope this explains a bit better
why we want to work with a CMAKE_INSTALL_PREFIX.
AS |
11 Jul 2021, Konstantin Olchanski, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
big thanks to Andreas S. for getting most of this figured out. I now understand
much better how cmake installs things and how it generates config files, both
find_package(midas) style and install(export) style.
with the latest updates, CMAKE_INSTALL_PREFIX should work correctly. I now understand how it works,
how to use it and how to test it, it should not break again.
for posterity, my commends to Andreas's pull request:
thank you for providing this code, it was very helpful. at the end I implemented things slightly differently. It took me a while to understand that I have to provide 2 “install” modes, for your case, I need to
“install” the header files and everything works “the cmake way”, for our normal case, we use include files in-place and have to include all the git submodules to the include path. I am quite happy with the
result. K.O.
K.O. |
02 Aug 2021, Andreas Suter, Bug Report, cmake with CMAKE_INSTALL_PREFIX fails
|
Dear Konstantin,
I have tried your adopted version. You did already quite a job which is more consistent than what I was suggesting.
Yet, I still have a problem (git sha2 2d3872dfd31) when starting on a clean system (i.e. no midas present yet):
Without CMAKE_INSTALL_PREFIX set, everything is fine.
However, when setting CMAKE_INSTALL_PREFIX, I get the following error message on the build level (cmake --build ./ -- VERBOSE=1) from the manalyzer:
[ 32%] Building CXX object manalyzer/CMakeFiles/manalyzer.dir/manalyzer.cxx.o
cd /home/l_musr_tst/Tmp/midas/build/manalyzer && /usr/bin/c++ -DHAVE_FTPLIB -DHAVE_MIDAS -DHAVE_ROOT_HTTP -DHAVE_THTTP_SERVER -DHAVE_TMFE -DHAVE_ZLIB -D_LARGEFILE64_SOURCE -I/home/l_musr_tst/Tmp/midas/manalyzer -I/usr/local/root/include -O2 -g -Wall -Wformat=2 -Wno-format-nonliteral -Wno-strict-aliasing -Wuninitialized -Wno-unused-function -std=c++11 -pipe -fsigned-char -pthread -DHAVE_ROOT -std=gnu++11 -o CMakeFiles/manalyzer.dir/manalyzer.cxx.o -c /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx
In file included from /home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.cxx:14:0:
/home/l_musr_tst/Tmp/midas/manalyzer/manalyzer.h:13:21: fatal error: midasio.h: No such file or directory
#include "midasio.h"
^
compilation terminated.
Obviously, still some include paths are missing. I tried quickly to see if an easy fix is possible, but I failed.
Question: is it possible to use manalyzer without midas? I am asking since the MIDAS_FOUND flag is confusing me.
> big thanks to Andreas S. for getting most of this figured out. I now understand
> much better how cmake installs things and how it generates config files, both
> find_package(midas) style and install(export) style.
>
> with the latest updates, CMAKE_INSTALL_PREFIX should work correctly. I now understand how it works,
> how to use it and how to test it, it should not break again.
>
> for posterity, my commends to Andreas's pull request:
>
> thank you for providing this code, it was very helpful. at the end I implemented things slightly differently. It took me a while to understand that I have to provide 2 “install” modes, for your case, I need to
> “install” the header files and everything works “the cmake way”, for our normal case, we use include files in-place and have to include all the git submodules to the include path. I am quite happy with the
> result. K.O.
>
> K.O. |
31 Jul 2021, Peter Kunz, Bug Report, ss_shm_name: unsupported shared memory type, bye!
|
I ran into a problem trying to compile the latest MIDAS version on a Fedora
system.
mhttpd and odbedit return:
ss_shm_name: unsupported shared memory type, bye!
check_shm_type: preferred POSIXv4_SHM got SYSV_SHM
The check returns SYSV_SHM which doesn't seem to be supported in ss_shm_name.
Is there an easy solution for this?
Thanks. |
09 Jul 2021, Konstantin Olchanski, Bug Report, cmake question
|
cmake check and mate in 1 move. please help.
the midas cmake file has a typo in the ROOT_CXX_FLAGS, I fixed it and now I am dead in the
water, need help from cmake experts and pushers.
On Ubuntu:
ROOT_CXX_FLAGS has -std=c++14
midas cmake defines -std=gnu++11 (never mind that I asked for c++11, not "c++11 with GNU
extensions")
the two compiler flags collide and the build explodes, the best I can tell c++11 prevails
and ROOT header files blow up because they expect c++14.
if I remove the midas cmake request for c++11, -std=gnu++11 is gone, there is no conflict
with ROOT C++14 request and the build works just fine.
but now it explodes on CentOS-7 because by default, c++11 is not enabled. (include <mutex>
blows up).
what a mess.
K.O. |
13 Jul 2021, Konstantin Olchanski, Bug Report, cmake question
|
> cmake check and mate in 1 move. please help.
> -std=c++11 and -std=c++14 collision...
I have a solution implemented for this, I am not happy with it, Stefan is not happy with it. See
discussion: https://bitbucket.org/tmidas/midas/commits/50a15aa70a4fe3927764605e8964b55a3bb1732b
K.O. |
14 Jul 2021, Konstantin Olchanski, Bug Report, cmake question
|
> > cmake check and mate in 1 move. please help.
> > -std=c++11 and -std=c++14 collision...
>
> I have a solution implemented for this, I am not happy with it, Stefan is not happy with it. See
> discussion: https://bitbucket.org/tmidas/midas/commits/50a15aa70a4fe3927764605e8964b55a3bb1732b
>
I figured it out, solution is to use:
target_compile_features(midas PUBLIC cxx_std_11)
this is how it works:
- centos-7 (g++ has c++11 off by default): -std=gnu++11 is added automatically (not -std=c++11, but
probably correct, as some c++11 functions were available as gnu extensions)
- ubuntu-20.04 LTS without ROOT: nothing added (I guess correct, g++ has c++11 is enabled by default)
- ubuntu-20.04 LTS with -std=c++14 from ROOT: nothing added, c++14 as requested by ROOT is in affect.
- macos without ROOT: -std=gnu++11 is added automatically
- macos with -std=c++11 from ROOT: ditto, so both -std=c++11 and -std=gnu++11 are present in this order,
wrong-ish, but works.
and good luck figuring this out just from cmake documentation:
https://cmake.org/cmake/help/latest/command/target_compile_features.html
K.O. |
|