ID |
Date |
Author |
Topic |
Subject |
1608
|
10 Jul 2019 |
Vinzenz Bildstein | Bug Report | Frontend killed at stop of run | > > > >
> > > > For SIGKILL, my gdb reports "Program terminated with signal SIGKILL, Killed." and there is no stack
> > > > trace. Is this what you see?
> > >
> > > Yes, that is exactly what I remember seeing.
> > >
> >
> > Where would a SIGKILL come from?!?
> >
> > Look in the syslog (/var/log/messages). If the program was killed by the linux kernel, it would be logged there,
> > the usual cause is the machine runs out of memory and programs are killed by the OOM killer, this is logged
> > into the syslog, always.
> >
> > MIDAS also can issue a SIGKILL sometimes, again this is always logged in midas.log. see src/midas.c, search for SIGKILL to see
> > the exact messages printed before it is sent out.
> >
> > K.O.
>
> I haven't been able to reproduce the error from the overnight run so far. I will try and leave this running in gdb overnight to see
> if I can get that error again.
I was able to reproduce the error after an overnight run. gdb reported that the program received a SIGKILL, but no sign of it in
/var/log/messages. I've tried finding a current midas.log file, but it seems we don't have one? The most recent one was last updated
on May 24th this year. |
1613
|
11 Jul 2019 |
Vinzenz Bildstein | Bug Report | Frontend killed at stop of run | > > ... finding a current midas.log file
>
> On the "help" page, see "midas.log".
>
> Same information is in ODB, the midas log file name is concatenation of "/Logger/Data dir" and "message file".
>
> K.O.
Sorry, should have found that myself ...
Anyway, the output from midas is
Tue Jul 9 07:24:06 2019 [mhttpd,INFO] Run #13456 started
Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [system.c:4580:ss_recv_net_command,ERROR] timeout receiving network
command header
Wed Jul 10 06:23:58 2019 [mhttpd,ERROR] [midas.c:10322:rpc_client_call,ERROR] call to "fedescant" on
"grsmid00.triumf.ca" RPC "rc_transition": timeout waiting for reply
Wed Jul 10 06:24:02 2019 [mhttpd,ERROR] [midas.c:5495:cm_shutdown,ERROR] Client 'fedescant' not responding to
shutdown command
Wed Jul 10 06:24:02 2019 [mhttpd,ERROR] [midas.c:5497:cm_shutdown,ERROR] Killing and Deleting client 'fedescant'
pid 31482
Wed Jul 10 06:24:02 2019 [Logger,INFO] Client 'fedescant' on buffer 'SYSMSG' removed by cm_watchdog because
process pid 31482 does not exist
Wed Jul 10 06:24:02 2019 [fegrifip09,INFO] Client 'fedescant' on buffer 'SYSTEM' removed by cm_watchdog because
process pid 31482 does not exist
Wed Jul 10 06:24:03 2019 [mhttpd,INFO] Run #13456 stopped
And I think I tracked down where this comes from with help from Thomas Lindner. It is a problem in the communication via the A3818 card from CAEN. This seems to block the frontend, even though it still reacts normal to a shutdown. So no issue with midas, even if it seemed that way at first. Thanks for all your help! |
1680
|
08 Sep 2019 |
Vinzenz Bildstein | Bug Report | https redirect and ODB access | I'm not sure if these issues are related or not, but I'm getting an error
message when I want to access the root of the ODB via the webserver:
[mhttpd,ERROR] [mhttpd.cxx:563:rread,ERROR] Cannot read file '/root', read of
4096 returned -1, errno 21 (Is a directory)
I also tried turning the re-direct from http to https off, but this does not
seem to work. I also noticed that the redirect changes the localhost into a
hostname. Where does mongoose take this hostname from?
EDIT: Seems that the change of the hostname is due to a setting in /etc/hosts,
i.e. all my fault ...
EDIT: I think there was some issue with the mhttpd. When I checked the output (I
used screen to run it), it was full of these messages:
ss_semaphore_wait_for: semop/semtimedop(2588679) returned -1, errno 43
(Identifier removed)
al_check: Something is wrong with our semaphore, ss_semaphore_wait_for()
returned 408, aborting.
al_check: Cannot abort - this will lock you out of odb. From this point, MIDAS
will not work correctly. Please read the discussion at
https://midas.triumf.ca/elog/Midas/945
Restarted it and it stopped redirecting. So accessing the root of the ODB via
the webserver is the only issue now. |
1728
|
21 Oct 2019 |
Vinzenz Bildstein | Forum | Data for key truncated | I keep on getting messages like this:
16:25:35 [fecaen,ERROR] [odb.c:4567:db_get_data,ERROR] data for key
"/DAQ/params/VX1730/custom/Board 0/Channel 0/Input range" truncated
whenever I start my frontend. Input range is defined to be a BOOL and using
odbedit to read it shows:
Key name Type #Val Size Last Opn Mode Value
---------------------------------------------------------------------------
Input range BOOL 1 4 75h 0 RWD y
without any error message. The entry is read using
size = sizeof(fInputRange);
db_get_data(hDb, hSubKey, &fInputRange, &size, TID_BOOL);
where fInputRange is a bool.
Where does this message come from and how can I resolve this? |
859
|
11 Feb 2013 |
Wes Gohn | Forum | send_tcp error | I am getting a series of errors from MIDAS that I do not understand, so I hope
someone can help me figure this out.
I am attempting to run many frontends on one machine. I can run 8 with no
problem, but if I try to add a 9th I get errors relating to send_tcp.
I have tried adjusting the max event sizes and buffer sizes, but it has not
resolved the problem. I also tried adjusting the data rates and the total data
volume going through each frontend, but there was no change. And as far as I can
tell I am not up against any hardware limits.
The errors are repeated continuously while a run is going. The three errors I
get are:
16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
If you have any suggestions of how I can debug this, please let me know. Thanks! |
861
|
11 Feb 2013 |
Wes Gohn | Forum | send_tcp error | > > I am getting a series of errors from MIDAS that I do not understand, so I hope
> > someone can help me figure this out.
> >
> > I am attempting to run many frontends on one machine. I can run 8 with no
> > problem, but if I try to add a 9th I get errors relating to send_tcp.
> >
> > I have tried adjusting the max event sizes and buffer sizes, but it has not
> > resolved the problem. I also tried adjusting the data rates and the total data
> > volume going through each frontend, but there was no change. And as far as I can
> > tell I am not up against any hardware limits.
> >
> > The errors are repeated continuously while a run is going. The three errors I
> > get are:
> >
> > 16:45:22 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR] send_tcp() failed
> > 16:45:22 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to master
> > 16:45:22 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
> > send(socket=9,size=16) returned -1, errno: 32 (Broken pipe)
> >
> > If you have any suggestions of how I can debug this, please let me know. Thanks!
>
> Can you tell me
>
> - why you need 9 frontends
> - what kind of data your frontends produce
> - how your event builder looks like and how you assemble the fragments
> - what messages/errors you see when you run odbedit BEFORE the crash
>
> /Stefan
Our experiment will need 24 frontends that will each run on its own machine. For now we
want to run 24 "fake" frontends on one machine for testing purposes. 9 is the limit
where it stops working properly.
We have a pulser that is giving us periodic data at a constant rate. We have a master
frontend running on a different PC in interrupt mode that assembles the events, and then
N "FakeData" frontends running in polled mode on a single PC.
We do have an event builder, but we get these errors whether the event builder is
running or not.
At the start of a run, I see the following messages:
[mtransition,INFO] Run #21 started
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [system.c:4166:send_tcp,ERROR]
send(socket=9,size=16) returned -1, errno: 104 (Connection reset by peer)
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [midas.c:9958:rpc_client_call,ERROR]
send_tcp() failed
Sat Feb 9 16:14:57 2013 [FakeData09,ERROR] [frontend_rpc.c:191:rpc_call,ERROR] No RPC to
master
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:10844:recv_tcp_server,ERROR] Cannot
allocate 268435512 bytes for network buffer
Sat Feb 9 16:14:57 2013 [master,ERROR] [midas.c:12893:rpc_server_receive,ERROR]
recv_tcp_server() returned -1, abort
Sat Feb 9 16:14:57 2013 [master,TALK] Program 'FakeData09' on host 'fe01' aborted
After this it recycles just the first three errors that I mentioned above. |
865
|
19 Feb 2013 |
Wes Gohn | Forum | send_tcp error |
Thank you for the help. As it turns out, the problem was due to the fact that we were compiling MIDAS on our 64 bit backend machine, but one of the frontend machines is 32 bit. The problem was resolved by compiling a 32 bit version of MIDAS in
addition to the 64 bit version. |
990
|
15 Apr 2014 |
Wes Gohn | Forum | C++11 error | I am having some trouble creating a frontend that interacts with some libraries that use C++11.
The flag I added to my MIDAS Makefile to get the C++11 part of the code to work is -std=c++0x. This
causes an error in the equipment description in the frontend code.
The error I get is:
frontend.cpp:149: error: narrowing conversion of ‘-0x00000000000000001’ from ‘int’ to ‘WORD’ inside {
}
This corresponds to the following in the MIDAS frontend code.
EQUIPMENT equipment[] = {
{
"MWPC", /* equipment name */
{1, TRIGGER_ALL, /* event ID, trigger mask */
"BUF2", /* event buffer */
EQ_POLLED | EQ_EB, /* equipment type */
LAM_SOURCE(0, 0xFFFFFF), /* event source crate 0, all stations */
"MIDAS", /* format */
TRUE, /* enabled */
RO_RUNNING, /* read only when running */
1, /* poll for 1ms */
0, /* stop run after this event limit */
0, /* number of sub events */
0, /* don't log history */
"", "", "",},
read_trigger_event, /* readout routine */
},
{""}
}; <- this is line 149
#ifdef __cplusplus
}
#endif
Do you know a way to make this compatible with C++11?
Thanks! |
992
|
16 Apr 2014 |
Wes Gohn | Forum | C++11 error | > > I am having some trouble creating a frontend that interacts with some libraries that use C++11.
> >
> > The flag I added to my MIDAS Makefile to get the C++11 part of the code to work is -std=c++0x. This
> > causes an error in the equipment description in the frontend code.
> >
> > The error I get is:
> >
> > frontend.cpp:149: error: narrowing conversion of ‘-0x00000000000000001’ from ‘int’ to ‘WORD’ inside {
> > }
> >
> > This corresponds to the following in the MIDAS frontend code.
> >
> > EQUIPMENT equipment[] = {
> > {
> > "MWPC", /* equipment name */
> > {1, TRIGGER_ALL, /* event ID, trigger mask */
> > "BUF2", /* event buffer */
> > EQ_POLLED | EQ_EB, /* equipment type */
> > LAM_SOURCE(0, 0xFFFFFF), /* event source crate 0, all stations */
> > "MIDAS", /* format */
> > TRUE, /* enabled */
> > RO_RUNNING, /* read only when running */
> > 1, /* poll for 1ms */
> > 0, /* stop run after this event limit */
> > 0, /* number of sub events */
> > 0, /* don't log history */
> > "", "", "",},
> > read_trigger_event, /* readout routine */
> > },
> >
> > {""}
> > }; <- this is line 149
> > #ifdef __cplusplus
> > }
> > #endif
> >
> > Do you know a way to make this compatible with C++11?
> >
> > Thanks!
>
> Is this maybe related to the LAM_SOURCE(0, 0xFFFFFFFF) where 0xFFFFFFFF is -1. I guess you are not using CAMAC, so just replace the
> LAM_SOURCE(...) with zero and try again.
>
> /Stefan
Thanks for the suggestion. It looks like it is instead the TRIGGER_ALL that is causing the problem. TRIGGER_ALL is defined as -1 in midas.h. If I replace TRIGGER_ALL with 0 in the
frontend, it compiles, but if I use -1, I get the same error. I do not think that I want my trigger mask set to 0. Do you have a suggestion of how to get around this?
To answer the other questions, we are running on SLF6. I am building a frontend for a MWPC to read data from CAEN TDCs. |
1048
|
17 Mar 2015 |
Wes Gohn | Forum | PosgresQL | For our MIDAS installation at Fermilab, it is necessary that we be able to write to a PosgresQL
database (MySQL is not supported here). This will be required of both mlogger and mscb.
Has anyone done this before? And do you know of a relatively simple way of implementing it, or do
we need to replicate the mysql functions that are already in the mlogger/mscb code to add functions
that perform the equivalent Posgres commands?
Thanks!
Wes |
1083
|
29 Jul 2015 |
Wes Gohn | Forum | error | SIGSEGV is a segmentation fault. Most often it means some ODB parameter is out of bounds or there is
an invalid memcpy somewhere in your code.
> Hello, I am new in the forum. We are running an experiment for a week with no
> problems. Now we add a detector a we found an error. Even we come back to our
> previous configuration the error continues appearing. Please, may someone help
> us? You can find the error in the attachment. Thanks! |
1088
|
10 Aug 2015 |
Wes Gohn | Forum | bk_create change | After pulling the newest version of midas, our compilation would fail on
bk_create, with the error:
frontend.cpp:954: error: invalid conversion from ‘DWORD**’ to ‘void**’
frontend.cpp:954: error: initializing argument 4 of ‘void bk_create(void*,
const char*, WORD, void**)’
I noticed a change to the function in midas.c that changes the type of pdata
from a pointer to a double pointer, and changes
*((void **) pdata) = pbk + 1;
to
*pdata = pbk + 1;
The fix is simple. In each call to bk_create, I changed it to:
bk_create(pevent, bk_name, TID_DWORD, (void**)&pdata);
I suggest updating the documentation. Also, why the change? Does it add some
improvement in efficiency? |
1175
|
22 Apr 2016 |
Wes Gohn | Bug Report | Calling external script from sequencer | Can the MIDAS Sequencer call an external script? It seems that it should be able to. I have a simple
test script to do so. It claims to execute, but the bash script never appears to be executed. Any
suggestions?
1 COMMENT "This is a MSL test file"
2 RUNDESCRIPTION "Test run"
3
4 LOOP setting, 1,2, 3
5 SCRIPT test_wheel.sh ,$setting
6 TRANSITION START
7 WAIT Seconds 10
8 TRANSITION STOP
9 ENDLOOP
I've also tried using an xml script with <Script params="1">test_wheel.sh</Script>, but with the same
result.
Thanks! |
1176
|
22 Apr 2016 |
Wes Gohn | Bug Report | Calling external script from sequencer | Nevermind. I just had to give it a path to my script. Now it's fine.
> Can the MIDAS Sequencer call an external script? It seems that it should be able to. I have a simple
> test script to do so. It claims to execute, but the bash script never appears to be executed. Any
> suggestions?
>
> 1 COMMENT "This is a MSL test file"
> 2 RUNDESCRIPTION "Test run"
> 3
> 4 LOOP setting, 1,2, 3
> 5 SCRIPT test_wheel.sh ,$setting
> 6 TRANSITION START
> 7 WAIT Seconds 10
> 8 TRANSITION STOP
> 9 ENDLOOP
>
> I've also tried using an xml script with <Script params="1">test_wheel.sh</Script>, but with the same
> result.
>
> Thanks! |
1193
|
07 Sep 2016 |
Wes Gohn | Forum | ODB as JSON file | Hi. Is it currently possible to automatically save the MIDAS ODB as a JSON file?
I can do it manually in odbedit, but it looks like the only option for the
automatic ODB save for each run is the standard .ODB format. Is there a way to
change this? |
1201
|
26 Sep 2016 |
Wes Gohn | Info | mongoose v6.4 is ready for use | Since updating to the most recent midas commit, we get the following error if we try running mhttpd without su privileges:
>mhttpd -e CR --http 8081
mhttpd is running in setuid-root mode.
mhttpd is listening on port 80
Mongoose version 4 cannot listen to port 80 in setuid mode. Please use mongoose version 6. Sorry, bye!
[mhttpd,ERROR] [midas.c:1960:,ERROR] cm_disconnect_experiment not called at end of program
It works if we run it as root, but that creates other problems. Is there a flag to turn off setuid-root mode? Or some other fix?
Thanks,
Wes |
1261
|
14 Apr 2017 |
Wes Gohn | Forum | mhttpd lag | Hi everyone,
We have recently been experiencing a lot of lag with our midas control webpage,
which is making it very frustrating to use. Has anyone experienced this, and do
you have any advice to speed it up? Are there particular web browsers that work
better than others, or certain settings that can make respond more quickly?
Thanks!
Wes |
1309
|
27 Jul 2017 |
Wes Gohn | Suggestion | Increasing Max Number of Frontends | Below are the steps we used to increase the maximum number of frontends that we could run.
In midas.h
#define MAX_CLIENTS 64
changed to
#define MAX_CLIENTS 128
In msystem.h:
#define MAX_RPC_CONNECTION 64
changed to
#define MAX_RPC_CONNECTION 128
In odb.c:
assert(sizeof(BUFFER_HEADER) == 16444);
GUESS: 256*64+60 = 16444, so change 64 to 128
changed to:
assert(sizeof(BUFFER_HEADER) == 32828); //256*128+60
DATABASE_HEADER = 64 + 64*DATABASE_CLIENT = 64 + 64*8256 = 528448
changed to:
DATABASE_HEADER = 64 + 128*DATABASE_CLIENT = 64 + 128*8256 = 1056832. |
1382
|
21 Aug 2018 |
Wes Gohn | Bug Report | mserver problem | Hi. We've just updated our midas installation to the newest version, and we now see repeated errors from the
mserver in messages. Mostly we see
11:17:02.994 2018/08/21 [ODBEdit,TALK] Program mserver restarted
which happens 2-3 times per minute.
We have also been seeing occasional dropped rpc connections to our frontends, which could be related.
The version we were running with previously was ~1 year old, and we have just updated to the newest version
on bitbucket.
Thanks,
Wes |
1245
|
27 Feb 2017 |
William Moore | Suggestion | analyzer failing to load ODB parameters | Hi,
I am attempting to compile and run analysis code on a completely different,
unconnected system than the DAQ computer for the experiment. The analyzer was
developed previously and my goal is to get it running and then update it to
achieve my needs. Before compiling the analyzer, I load a backup ODB file in
odbedit, and compile experim.h. I then compile the analyzer with that experim.h
file. When I run the analyzer I get the following output:
> MIDAS version 2.1ROOT version 5.34/36Root server listening on port 9090...
> Running analyzer offline. Stop with "!"
> Configuration file "/somedir/switches.odb" loaded
> [Analyzer,INFO] Set run number 1290 in ODB
> Load ODB from run 1290...[Analyzer,INFO] cannot load value "Client Notify":
write protected
> [Analyzer,INFO] cannot load value "Prompt": write protected
.
.
.
> [Analyzer,INFO] cannot load value "LANSCE-ops": write protected
> MIDAS version 2.1ROOT version 5.34/36OK
> Configuration file "/somedir/switches.odb" loaded
> Data_Raw/run01290.mid.gz:16355 Data_Analyzed/run01290.root:15208 events, 0.43s
I have confirmed all files being used have read/write access to all users. The
analyzer does populate a .root output file with filled histograms, however not
all histograms are filled. I believe this is because histograms that relied on
an ODB paramater that failed to load did not populate. Any idea as to what I am
doing wrong or how I could resolve this issue are greatly appreciated.
Thanks,
William Moore |
|