ID |
Date |
Author |
Topic |
Subject |
2269
|
05 Aug 2021 |
Stefan Ritt | Bug Report | mhttpd WebServer ODBTree initialization | Well, we all see it here at PSI, so this is enough reason to turn this off by default. Shall
I do it? |
2270
|
19 Aug 2021 |
Konstantin Olchanski | Bug Report | select() FD_SETSIZE overrun | I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
mysteriously misbehaving during run start and stop.
The problem turns out to be with the select() system call.
The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
the FD_SET() array. Ouch.
I see that all uses of select() in midas have no protection against this.
(we should probably move away from select() to newer poll() or whatever it is)
Why does mlogger open so many file descriptors? The usual, scaling problems in the
history. The old midas history does not reuse file descriptors, so opens the same
3 history files (.hst, .idx, etc) for each history event. The new FILE history
opens just one file per history event. But if the number of events is bigger than
1024, we run into same trouble.
(BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
on some other machines, see "limit" or "ulimit -a").
K.O. |
2271
|
20 Aug 2021 |
Stefan Ritt | Bug Report | select() FD_SETSIZE overrun | > I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is
> mysteriously misbehaving during run start and stop.
>
> The problem turns out to be with the select() system call.
>
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun
> the FD_SET() array. Ouch.
>
> I see that all uses of select() in midas have no protection against this.
>
> (we should probably move away from select() to newer poll() or whatever it is)
>
> Why does mlogger open so many file descriptors? The usual, scaling problems in the
> history. The old midas history does not reuse file descriptors, so opens the same
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history
> opens just one file per history event. But if the number of events is bigger than
> 1024, we run into same trouble.
>
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024
> on some other machines, see "limit" or "ulimit -a").
>
> K.O.
I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't
fit on your status page.
I have some other suspicion: The logger opens a history file on access, then closes it
again after writing to it. In the old days we had a case where we had a return from the
write function BEFORE the file has been closed. This is kind of a memory leak, but with
file descriptors. After some time of course you run out of file descriptors and crash.
Now that bug has been fixed many years ago, but it sounds to me like there is another
"fd leak" somewhere. You should add some debugging in the history code to print the
file descriptors when you open a file and when you leave that routine. The leak could
however also be somewhere else, like writing to the message file, ODB dump, ...
The right thing of course would be to rewrite everything with std::ofstream which
closes automatically the file when the object gets out of scope.
Stefan |
2278
|
28 Sep 2021 |
Richard Longland | Bug Report | Install clash between MIDAS 2020-08 and mscb | All,
I am performing a fresh install of MIDAS on an Ubuntu linux box. I follow the
usual installation procedure:
1) git clone https://bitbucket.org/tmidas/midas --recursive
2) cd midas
3) git checkout release/midas-2020-08
4) mkdir build
5) cd build
6) cmake ..
7) make
Step 3 warns me that
"warning: unable to rmdir 'manalyzer': Directory not empty" and
"warning: unable to rmdir 'midasio': Directory not empty"
Step 7 fails.
Compilation fails with an mhttp error related to mscb:
mhttpd.cxx:8224:59: error: too few arguments to function 'int mscb_ping(int,
short unsigned int, int, int)'
8224 | status = mscb_ping(fd, (unsigned short) ind, 1);
I was able to get around this by rolling mscb back to some old version (commit
74468dd), but am extremely nervous about mix-and-matching the code this way.
Any advice would be greatly appreciated.
Cheers,
Richard |
2279
|
28 Sep 2021 |
Stefan Ritt | Bug Report | Install clash between MIDAS 2020-08 and mscb | > 1) git clone https://bitbucket.org/tmidas/midas --recursive
> 2) cd midas
> 3) git checkout release/midas-2020-08
> 4) mkdir build
> 5) cd build
> 6) cmake ..
> 7) make
When you do step 3), you get
~/tmp/midas$ git checkout release/midas-2020-08
warning: unable to rmdir 'manalyzer': Directory not empty
warning: unable to rmdir 'midasio': Directory not empty
M mjson
M mscb
M mvodb
M mxml
The 'M' in front of the submodules like mscb tell you that you
have an older version of midas (namely midas-2020-08), but the
*current* submodules, which won't match. So you have to roll back
also the submodules with:
3.5) git submodule update --recursive
This fetched those versions of the submodules which match the
midas version 2020-08. See here for details:
https://git-scm.com/book/en/v2/Git-Tools-Submodules
From where did you get the command
git checkout release/xxxx ???
If you tell me the location of that documentation, I will take
care that it will be amended with the command
git submodule update --recursive
Best,
Stefan |
2280
|
29 Sep 2021 |
Richard Longland | Bug Report | nstall clash between MIDAS 2020-08 and mscb | Thank you, Stefan.
I found these instructions under
1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
I do see reference to updating the submodules under the TRIUMF install
instructions
(https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
all_MIDAS) although perhaps it can be clarified.
Cheers,
Richard |
2281
|
29 Sep 2021 |
Stefan Ritt | Bug Report | nstall clash between MIDAS 2020-08 and mscb | > Thank you, Stefan.
>
> I found these instructions under
> 1) The changelog: https://midas.triumf.ca/MidasWiki/index.php/Changelog#2020-12
> 2) Konstantin's elog announcements (e.g. https://midas.triumf.ca/elog/Midas/2089)
>
> I do see reference to updating the submodules under the TRIUMF install
> instructions
> (https://midas.triumf.ca/MidasWiki/index.php/Setup_MIDAS_experiment_at_TRIUMF#Inst
> all_MIDAS) although perhaps it can be clarified.
>
> Cheers,
> Richard
Hi Richard,
I updated the documentation at
https://midas.triumf.ca/MidasWiki/index.php/Changelog#Updating_midas
by putting the submodule update command everywhere.
Best,
Stefan |
2296
|
29 Oct 2021 |
Frederik Wauters | Bug Report | midas::odb::iterator + operator | I have 16 array odb key
{"FIR Energy", {
{"Energy Gap Value", std::array<uint32_t,16>(10) },
I can get the maximum of this array like
uint32_t max_value = *std::max_element(values.begin(),values.end());
but when I need the maximum of a sub range
uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);
I get
/home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
584 | max_value = *std::max_element(values.begin(),values.begin()+4);
| ~~~~~~~~~~~~~~^~
| | |
| | int
|
As the + operator is overloaded for midas::odb::iterator, I was expected this to work.
(and yes, I can find the max element by accessing the elements on by one) |
2297
|
29 Oct 2021 |
Frederik Wauters | Bug Report | midas::odb::iterator + operator | work around | ok, so retrieving as a std::array (as it was defined) does not work
std::array<uint32_t,16> avalues = settings["FIR Energy"]["Energy Gap Value"];
but retrieving as an std::vector does, and then I have a standard c++ iterator which I can use in std stuff
std::vector<uint32_t> values = settings["FIR Energy"]["Energy Gap Value"];
> I have 16 array odb key
>
> {"FIR Energy", {
> {"Energy Gap Value", std::array<uint32_t,16>(10) },
>
> I can get the maximum of this array like
>
>
> uint32_t max_value = *std::max_element(values.begin(),values.end());
>
> but when I need the maximum of a sub range
>
> uint32_t max_value = *std::max_element(values.begin(),values.begin()+4);
>
> I get
>
> /home/labor/new_daq/frontends/SIS3316Module.cpp:584:62: error: no match for ‘operator+’ (operand types are ‘midas::odb::iterator’ and ‘int’)
> 584 | max_value = *std::max_element(values.begin(),values.begin()+4);
> | ~~~~~~~~~~~~~~^~
> | | |
> | | int
> |
>
> As the + operator is overloaded for midas::odb::iterator, I was expected this to work.
>
> (and yes, I can find the max element by accessing the elements on by one) |
2298
|
29 Oct 2021 |
Kushal Kapoor | Bug Report | Unknown Error 319 from client | I’m trying to run MIDAS using a frontend code/client named “fetiglab”. Run stops
after 2/3sec with an error saying “Unknown error 319 from client “fetiglab” on
localhost.
Frontend code compiled without any errors and MIDAS reads the frontend
successfully, this only comes when I start the new run on MIDAS, here are a few
more details from the terminal-
11:46:32 [fetiglab,ERROR] [odb.cxx:11268:db_get_record,ERROR] struct size
mismatch for "/" (expected size: 1, size in ODB: 41920)
11:46:32 [Logger,INFO] Deleting previous file
"/home/rcmp/online3/run00621_000.root"
11:46:32 [ODBEdit,ERROR] [midas.cxx:5073:cm_transition,ERROR] transition START
aborted: client "fetiglab" returned status 319
11:46:32 [ODBEdit,ERROR] [midas.cxx:5246:cm_transition,ERROR] Could not start a
run: cm_transition() status 319, message 'Unknown error 319 from client
'fetiglab' on host "localhost"'
TR_STARTABORT transition: cleanup after failure to start a run
‌
I’ve also enclosed a screenshot for the same, any suggestions would be highly
appreciated. thanks |
2304
|
01 Dec 2021 |
Lars Martin | Bug Report | Off-by-one in sequencer documentation | The documentation for the sequencer loop says:
<quote>
LOOP [name ,] n ... ENDLOOP To execute a loop n times. For infinite loops, "infinite"
can be specified as n. Optionally, the loop variable running from 0...(n-1) can be accessed
inside the loop via $name.
</quote>
In fact the loop variable runs from 1...n, as can be seen by running this exciting
sequencer code:
1 COMMENT "Figuring out MSL"
2
3 LOOP n,4
4 MESSAGE $n,1
5 ENDLOOP |
2305
|
02 Dec 2021 |
Stefan Ritt | Bug Report | Off-by-one in sequencer documentation | > The documentation for the sequencer loop says:
>
> <quote>
> LOOP [name ,] n ... ENDLOOP To execute a loop n times. For infinite loops, "infinite"
> can be specified as n. Optionally, the loop variable running from 0...(n-1) can be accessed
> inside the loop via $name.
> </quote>
>
> In fact the loop variable runs from 1...n, as can be seen by running this exciting
> sequencer code:
>
> 1 COMMENT "Figuring out MSL"
> 2
> 3 LOOP n,4
> 4 MESSAGE $n,1
> 5 ENDLOOP
Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation.
Stefan |
2307
|
02 Dec 2021 |
Alexey Kalinin | Bug Report | some frontend kicked by cm_periodic_tasks | Hello,
We have a small experiment with MIDAS based DAQ.
Status page shows :
ES ESFrontend@192.168.0.37 207 0.2 0.000
Trigger06 Sample Frontend06@192.168.0.37 1.297M 0.3 0.000
Trigger01 Sample Frontend01@192.168.0.37 1.297M 0.3 0.000
Trigger16 Sample Frontend16@192.168.0.37 1.297M 0.3 0.000
Trigger38 Sample Frontend38@192.168.0.37 1.297M 0.3 0.000
Trigger37 Sample Frontend37@192.168.0.37 1.297M 0.3 0.000
Trigger03 Sample Frontend03@192.168.0.38 1.297M 0.3 0.000
Trigger07 Sample Frontend07@192.168.0.38 1.297M 0.3 0.000
Trigger04 Sample Frontend04@192.168.0.38 59898 0.0 0.000
Trigger08 Sample Frontend08@192.168.0.38 59898 0.0 0.000
Trigger17 Sample Frontend17@192.168.0.38 59898 0.0 0.000
And SYSTEM buffers page shows:
ESFrontend 1968 198 47520 0 0x00000000 0
193 ms
Sample Frontend06 1332547 1330826 379729872 0 0x00000000
0 1.1 sec
Sample Frontend16 1332542 1330839 361988208 0 0x00000000
0 94 ms
Sample Frontend37 1332530 1330841 337798408 0 0x00000000
0 1.1 sec
Sample Frontend01 1332543 1330829 467136688 0 0x00000000
0 34 ms
Sample Frontend38 1332528 1330830 291453608 0 0x00000000
0 1.1 sec
Sample Frontend04 63254 61467 20882584 0 0x00000000
0 208 ms
Sample Frontend08 63262 61476 27904056 0 0x00000000
0 205 ms
Sample Frontend17 63271 61473 20433840 0 0x00000000
0 213 ms
Sample Frontend03 1332549 1330818 386821728 0 0x00000000
0 82 ms
Sample Frontend07 1332554 1330821 462210896 0 0x00000000
0 37 ms
Logger 968742 0w+9500418r 0w+2718405736r 0 0x00000000 0
GET_ALL Used 0 bytes 0.0% 303 ms
rootana 254561 0w+29856958r 0w+8718288352r 0 0x00000000 0
762 ms
The problem is that eventually some of frontend closed with message
:19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer
'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist
in the meantime mserver loggging :
mserver started interactively
mserver will listen on TCP port 1175
double free or corruption (!prev)
double free or corruption (!prev)
free(): invalid next size (normal)
double free or corruption (!prev)
I can find some correlation between number of events/event size produced by
frontend, cause its failed when its become big enough.
frontend scheme is like this:
poll event time set to 0;
poll_event{
//if buffer not transferred return (continue cutting the main buffer)
//read main buffer from hardware
//buffer not transfered
}
read event{
// cut the main buffer to subevents (cut one event from main buffer) return;
//if (last subevent) {buffer transfered ;return}
}
What is strange to me that 2 frontends (1 per remote pc) causing this.
Also, I'm executing one FEcode with -i # flag , put setting eventid in
frontend_init , and using SYSTEM buffer for all.
Is there something I'm missing?
Thanks.
A. |
2308
|
12 Dec 2021 |
Marius Koeppel | Bug Report | Writting MIDAS Events via FPGAs | Dear all,
in 13 Feb 2020 to 21 Feb 2020 we had a talk about how I try to create MIDAS events directly on a FPGA and
than use DMA to hand the event over to MIDAS. In the thread I also explained how I do it in my MIDAS frontend.
For testing the DAQ I created a dummy frontend which was emulating my FPGA (see attached file). The interesting code is
in the function read_stream_thread and there I just fill a array according to the 32b BANKS which are 64b aligned (more or less
the lines 306-369). And than I do:
uint32_t * dma_buf_volatile;
dma_buf_volatile = dma_buf_dummy;
copy_n(&dma_buf_volatile[0], sizeof(dma_buf_dummy)/4, pdata);
pdata+=sizeof(dma_buf_dummy);
rb_increment_wp(rbh, sizeof(dma_buf_dummy)); // in byte length
to send the data to the buffer.
This summer (Mai - July) everything was working fine but today I did not get the data into MIDAS.
I was hopping around a bit with the commits and everything was at least working until: 3921016ce6d3444e6c647cbc7840e73816564c78.
Thanks,
Marius |
2313
|
26 Jan 2022 |
Konstantin Olchanski | Bug Report | Writting MIDAS Events via FPGAs | > today I did not get the data into MIDAS.
Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes?
I do not understand what you mean by "did not get the data into midas". You create events
and send them to a midas event buffer and you do not see them there? With mdump?
Do you see this both connected locally and connected remotely through the mserver?
BTW, I see you are using the mfe.c frontend. Event data handling in mfe.c frontends
is quite convoluted and impossible to straighten out. I recommend that you use
the tmfe c++ frontend instead. Event data handling is much simplified and is easier to debug
compared to the mfe.c frontend. There is examples in the midas repository and there are
tutorials for converting frontends from mfe.c to tmfe posted in this forum here.
BTW, the commit you refer to only changed some html files, could not have affected
your data.
K.O. |
2314
|
26 Jan 2022 |
Konstantin Olchanski | Bug Report | some frontend kicked by cm_periodic_tasks | > The problem is that eventually some of frontend closed with message
> :19:22:31.834 2021/12/02 [rootana,INFO] Client 'Sample Frontend38' on buffer
> 'SYSMSG' removed by cm_periodic_tasks because process pid 9789 does not exist
This messages means what it says. A client was registered with the SYSMSG buffer and this
client had pid 9789. At some point some other client (rootana, in this case) checked it and
process pid 9789 was no longer running. (it then proceeded to remove the registration).
There is 2 possibilities:
- simplest: your frontend has crashed. best to debug this by running it inside gdb, wait for
the crash.
- unlikely: reported pid is bogus, real pid of your frontend is different, the client
registration in SYSMSG is corrupted. this would indicate massive corruption of midas shared
memory buffers, not impossible if your frontend misbehaves and writes to random memory
addresses. ODB has protection against this (normally turned off, easy to enable, set ODB
"/experiment/protect odb" to yes), shared memory buffers do not have protection against this
(should be added?).
Do this. When you start your frontend, write down it's pid, when you see the crash message,
confirm pid number printed is the same. As additional test, run your frontend inside gdb,
after it crashes, you can print the stack trace, etc.
>
> in the meantime mserver loggging :
> mserver started interactively
> mserver will listen on TCP port 1175
> double free or corruption (!prev)
> double free or corruption (!prev)
> free(): invalid next size (normal)
> double free or corruption (!prev)
>
Are these "double free" messages coming from the mserver or from your frontend? (i.e. you run
them in different terminals, not all in the same terminal?).
If messages are coming from the mserver, this confirms possibility (1),
except that for frontends connected remotely, the pid is the pid of the mserver,
and what we see are crashes of mserver, not crashes of your frontend. These are much harder to
debug.
You will need to enable core dumps (ODB /Experiment/Enable core dumps set to "y"),
confirm that core dumps work (i.e. "killall -SEGV mserver", observe core files are created
in the directory where you started the mserver), reproduce the crash, run "gdb mserver
core.NNNN", run "bt" to print the stack trace, post the stack trace here (or email to me
directly).
>
> I can find some correlation between number of events/event size produced by
> frontend, cause its failed when its become big enough.
>
There is no limit on event size or event rate in midas, you should not see any crash
regardless of what you do. (there is a limit of event size, because an event has
to fit inside an event buffer and event buffer size is limited to 2 GB).
Obviously you hit a bug in mserver that makes it crash. Let's debug it.
One thing to try is set the write cache size to zero and see if your crash goes away. I see
some indication of something rotten in the event buffer code if write cache is enabled. This
is set in ODB "/Eq/XXX/Common/Write Cache Size", set it to zero. (beware recent confusion
where odb settings have no effect depending on value of "equipment_common_overwrite").
>
> frontend scheme is like this:
>
Best if you use the tmfe c++ frontend, event data handling is much simpler and we do not
have to debug the convoluted old code in mfe.c.
K.O.
>
> poll event time set to 0;
>
> poll_event{
> //if buffer not transferred return (continue cutting the main buffer)
> //read main buffer from hardware
> //buffer not transfered
> }
>
> read event{
> // cut the main buffer to subevents (cut one event from main buffer) return;
> //if (last subevent) {buffer transfered ;return}
> }
>
> What is strange to me that 2 frontends (1 per remote pc) causing this.
>
> Also, I'm executing one FEcode with -i # flag , put setting eventid in
> frontend_init , and using SYSTEM buffer for all.
>
> Is there something I'm missing?
> Thanks.
> A. |
2315
|
26 Jan 2022 |
Konstantin Olchanski | Bug Report | Off-by-one in sequencer documentation | > > 3 LOOP n,4
> > 4 MESSAGE $n,1
> > 5 ENDLOOP
>
> Indeed you're right. The loop variable runs from 1...n. I fixed that in the documentation.
Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.
K.O. |
2317
|
26 Jan 2022 |
Marius Koeppel | Bug Report | Writting MIDAS Events via FPGAs |
> Any error messages printed by the frontend? any error message in midas.log? core dumps? crashes?
> I do not understand what you mean by "did not get the data into midas". You create events
> and send them to a midas event buffer and you do not see them there? With mdump?
> Do you see this both connected locally and connected remotely through the mserver?
I simply don't see the event counter counting up and I also don't see them using mdump. No logs, no dumps and no crashes - every is quite. I only tested it locally.
> BTW, I see you are using the mfe.c frontend. Event data handling in mfe.c frontends
> is quite convoluted and impossible to straighten out. I recommend that you use
> the tmfe c++ frontend instead. Event data handling is much simplified and is easier to debug
> compared to the mfe.c frontend. There is examples in the midas repository and there are
> tutorials for converting frontends from mfe.c to tmfe posted in this forum here.
I know the code I used is really old that's why I was so surprised that it suddenly did not work. But I am on the way to change it. Also Stefan gave me some comments on how to improve the code. But still changing them did not really change the behavior.
> BTW, the commit you refer to only changed some html files, could not have affected
> your data.
I just hopped around and the commit I send was the first one which worked again. But it's of course not the one where the stuff broke. I did a bit of git-bisect and ended up with this commit as the first one where my frontend is not working anymore: 91582e4172d534bf9b10e661a423c399fd1a69f4
Cheers,
Marius |
2319
|
26 Jan 2022 |
Stefan Ritt | Bug Report | Off-by-one in sequencer documentation | > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.
for (i=1 ; i<=10 ; i++); ;-) |
2321
|
26 Jan 2022 |
Konstantin Olchanski | Bug Report | Off-by-one in sequencer documentation | > > Shades/ghosts of FORTRAN. c/c++/perl/python loops loop from 0 to n-1.
>
> for (i=1 ; i<=10 ; i++); ;-)
Similar code made big news just recently: (scroll down to the example main() program)
https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-
vulnerability-discovered-in-polkits-pkexec-cve-2021-4034
I forget if the FORTRAN rules were "loop once" or "never loop" or if it was different
between Fortran-4, fortran-77, DEC extensions and IBM extension, or if it was a compiler switch.
We should check that we do something reasonable with such loops to zero:
LOOP n,0
MESSAGE $n,1
ENDLOOP
P.S. Yup. "man g77" option "-fonetrip".
K.O. |
|