| ID |
Date |
Author |
Topic |
Subject |
|
2479
|
27 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.
> Connect to experiment Mu3e on host 10.32.113.210...
> OK
> Init hardware...
> terminate called after throwing an instance of 'mexception'
> what():
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1 0x00000000000042D828 (null) + 4380712
> 2 0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3 0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4 0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5 0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6 0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7 0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8 0x0000000000004385EF setup_odb() + 8351
> 9 0x00000000000043B2E6 frontend_init() + 22
> 10 0x000000000000433304 main + 1540
> 11 0x0000007F8C6FE3724D __libc_start_main + 239
> 12 0x000000000000433F7A _start + 42
>
> Aborted (core dumped)
K.O. |
|
2484
|
28 Apr 2023 |
Martin Mueller | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.
>
> > Connect to experiment Mu3e on host 10.32.113.210...
> > OK
> > Init hardware...
> > terminate called after throwing an instance of 'mexception'
> > what():
> > /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> > Stack trace:
> > 1 0x00000000000042D828 (null) + 4380712
> > 2 0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> > 3 0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> > 4 0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> > 5 0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> > 6 0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> > 7 0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> > 8 0x0000000000004385EF setup_odb() + 8351
> > 9 0x00000000000043B2E6 frontend_init() + 22
> > 10 0x000000000000433304 main + 1540
> > 11 0x0000007F8C6FE3724D __libc_start_main + 239
> > 12 0x000000000000433F7A _start + 42
> >
> > Aborted (core dumped)
>
> K.O.
As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
Stack trace of odbxx_test with line numbers:
Set ODB key "/Test/Settings/String Array 10[0...9]" = ["","","","","","","","","",""]
Created ODB key "/Test/Settings/Large String Array 10"
Set ODB key "/Test/Settings/Large String Array 10[0...9]" = ["","","","","","","","","",""]
[test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
[test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
[test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
Program received signal SIGABRT, Aborted.
0x00007ffff6665cdb in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-11.3.0+git1637-150000.1.9.1.x86_64 libstdc++6-debuginfo-11.2.1+git610-1.3.9.x86_64 libz1-debuginfo-1.2.11-3.24.1.x86_64
(gdb) bt
#0 0x00007ffff6665cdb in raise () from /lib64/libc.so.6
#1 0x00007ffff6667375 in abort () from /lib64/libc.so.6
#2 0x0000000000431bba in rpc_call (routine_id=11249) at /home/labor/midas/src/midas.cxx:13904
#3 0x0000000000460c4e in db_copy_xml (hDB=1, hKey=1009608, buffer=0x7ffff7e9c010 "", buffer_size=0x7fffffffadbc, header=false) at /home/labor/midas/src/odb.cxx:8994
#4 0x000000000046fc4c in midas::odb::odb_from_xml (this=0x7fffffffb3f0, str=...) at /home/labor/midas/src/odbxx.cxx:133
#5 0x000000000040b3d9 in midas::odb::odb (this=0x7fffffffb3f0, str=...) at /home/labor/midas/include/odbxx.h:605
#6 0x000000000040b655 in midas::odb::odb (this=0x7fffffffb3f0, s=0x4a465a "/Test/Settings") at /home/labor/midas/include/odbxx.h:629
#7 0x0000000000407bba in main () at /home/labor/midas/examples/odbxx/odbxx_test.cxx:56
(gdb) |
|
2490
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.
but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
K.O. |
|
2491
|
28 Apr 2023 |
Martin Mueller | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp (with cm_connect_experiment changed to "localhost")
> > [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> > [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> > [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
>
> ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
> the correct stack trace should be rooted in the handler for db_copy_xml.
>
> but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
>
> what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
>
> K.O.
Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines.
It does not matter how much data odbxx is asking for.
Something as simple as this reproduces the error, asking for a single integer:
int main() {
cm_connect_experiment("localhost", "Mu3e", "test", NULL);
midas::odb o = {
{"Int32 Key", 42}
};
o.connect("/Test/Settings");
cm_disconnect_experiment();
return 1;
}
at the same time this runs fine:
int main() {
cm_connect_experiment(NULL, NULL, "test", NULL);
midas::odb o = {
{"Int32 Key", 42}
};
o.connect("/Test/Settings");
cm_disconnect_experiment();
return 1;
}
in both cases mserver does not crash. I do not have a stack trace. There is also no error produced by mserver.
Last year we did not have these problems with the same midas frontends (For example in midas commit 9d2ef471 the code from above runs
fine). I am trying to pinpoint the exact commit where this stopped working now. |
|
2492
|
28 Apr 2023 |
Konstantin Olchanski | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | > > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines.
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine
so, a regression. ouch.
if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to
serve your RPC connection that crashes.
> int main() {
> cm_connect_experiment("localhost", "Mu3e", "test", NULL);
> midas::odb o = {
> {"Int32 Key", 42}
> };
> o.connect("/Test/Settings");
> cm_disconnect_experiment();
> return 1;
> }
to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.
so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).
rant-off.
to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu
last known-working point is:
commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date: Wed Jul 13 14:45:28 2022 -0700
Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.
K.O. |
|
2500
|
02 May 2023 |
Niklaus Berger | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | Thanks for all the helpful hints. When finally managing to evade all timeouts and attach the debugger in just the right moment, we find that we get a segfault in mserver at L827:
case RPC_DB_COPY_XML:
status = db_copy_xml(CHNDLE(0), CHNDLE(1), CSTRING(2), CPINT(3), CBOOL(4));
Some printf debugging then pointed us to the fact that the culprit is the pointer de-referencing in CBOOL(4). This in turn can be traced back to mrpc.cxx L282 ff, where the line with the arrow was missing:
{RPC_DB_COPY_XML, "db_copy_xml",
{{TID_INT32, RPC_IN},
{TID_INT32, RPC_IN},
{TID_ARRAY, RPC_OUT | RPC_VARARRAY},
{TID_INT32, RPC_IN | RPC_OUT},
-> {TID_BOOL, RPC_IN},
{0}}},
If we put that in, the mserver process completes peacfully and we get a segfault in the client ("Wrong key type in XML file") which we will attempt to debug next. Shall I create a pull request for the additional RPC argument or will you just fix this on the fly? |
|
2501
|
02 May 2023 |
Niklaus Berger | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option | And now we also fixed the client segfault, odb.cxx L8992 also needs to know about the header:
if (rpc_is_remote())
return rpc_call(RPC_DB_COPY_XML, hDB, hKey, buffer, buffer_size, header);
(last argument was missing before). |
|
2502
|
02 May 2023 |
Stefan Ritt | Forum | Problem with running midas odbxx frontends on a remote machine using the -h option |
> Shall I create a pull request for the additional RPC argument or will you just fix this on the fly?
Just fix it in the fly yourself. It’s an obvious bug, so please commit to develop.
Stefan |
|
2254
|
08 Jul 2021 |
Francesco Renga | Forum | Problem with python file reader | Dear experts,
while trying to readout a MIDAS file from a python script. I get the error below at the very first event. Any hint?
Thank you very much,
Francesco
File "/home/cygno/DAQ/offline/file_reader.py", line 9, in <module>
for event in mfile:
File "/home/cygno/DAQ/python/midas/file_reader.py", line 159, in __next__
ev = self.read_next_event()
File "/home/cygno/DAQ/python/midas/file_reader.py", line 264, in read_next_event
return self.read_this_event_body()
File "/home/cygno/DAQ/python/midas/file_reader.py", line 307, in read_this_event_body
self.event.unpack_body(body_data, 0, self.use_numpy)
File "/home/cygno/DAQ/python/midas/event.py", line 648, in unpack_body
bank.fill_header_from_bytes(bank_header_data, self.is_bank_32(), self.is_bank_data_64bit_aligned())
File "/home/cygno/DAQ/python/midas/event.py", line 298, in fill_header_from_bytes
self.name = "".join(x.decode('ascii') for x in unpacked[:4])
File "/home/cygno/DAQ/python/midas/event.py", line 298, in <genexpr>
self.name = "".join(x.decode('ascii') for x in unpacked[:4])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 0: ordinal not in range(128) |
|
2255
|
09 Jul 2021 |
Ben Smith | Forum | Problem with python file reader | Hi Francesco,
Can you send me an example file to look at please? Either attached to the elog or sent directly to bsmith@triumf.ca
Thanks,
Ben |
|
1279
|
26 Apr 2017 |
Francesco Renga | Forum | Problem with logger at run start | Dear experts,
we have a problem when trying to run a MIDAS DAQ which worked in the past on the same PC (but on a different
network). We get the following error messages when starting a new run:
Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
lett", port 44858: connect() returned -1, errno 113 (No route to host)
Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:3539:cm_transition_call,ERROR] cannot connect to client "Lo
gger" on host scarlett, port 44858, status 503
(scarlett is indeed the hostname of the PC). The error occurs even if the PC is disconnected from the network.
Any suggestion?
Best Regards,
Francesco |
|
1281
|
26 Apr 2017 |
Stefan Ritt | Forum | Problem with logger at run start | Dear Francesco,
Your error (No route to host) typically means that you have a network problem outside of MIDAS. Your computer has to "find itself" and
this is probably broken. Try to do a "ping scarlett" or "nslookup scarlett" and you will see that the DNS server can't be reached or is
wrongly configured. Sometimes it helps to put scarlett explicitly into /etc/hosts
Stefan
> Dear experts,
> we have a problem when trying to run a MIDAS DAQ which worked in the past on the same PC (but on a different
> network). We get the following error messages when starting a new run:
>
> Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
> lett", port 44858: connect() returned -1, errno 113 (No route to host)
> Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:3539:cm_transition_call,ERROR] cannot connect to client "Lo
> gger" on host scarlett, port 44858, status 503
>
> (scarlett is indeed the hostname of the PC). The error occurs even if the PC is disconnected from the network.
>
> Any suggestion?
>
> Best Regards,
> Francesco |
|
1283
|
26 Apr 2017 |
Francesco Renga | Forum | Problem with logger at run start | Dear Stefan,
thank you very much for your reply. We could finally fix the problem by replacing "scarlett" with "scarlett.localdomain" in our
hostname configuration file /etc/hostname (under debian).
Best Regards,
Francesco
> Dear Francesco,
>
> Your error (No route to host) typically means that you have a network problem outside of MIDAS. Your computer has to "find itself" and
> this is probably broken. Try to do a "ping scarlett" or "nslookup scarlett" and you will see that the DNS server can't be reached or is
> wrongly configured. Sometimes it helps to put scarlett explicitly into /etc/hosts
>
> Stefan
>
>
> > Dear experts,
> > we have a problem when trying to run a MIDAS DAQ which worked in the past on the same PC (but on a different
> > network). We get the following error messages when starting a new run:
> >
> > Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
> > lett", port 44858: connect() returned -1, errno 113 (No route to host)
> > Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:3539:cm_transition_call,ERROR] cannot connect to client "Lo
> > gger" on host scarlett, port 44858, status 503
> >
> > (scarlett is indeed the hostname of the PC). The error occurs even if the PC is disconnected from the network.
> >
> > Any suggestion?
> >
> > Best Regards,
> > Francesco |
|
1287
|
02 May 2017 |
Konstantin Olchanski | Forum | Problem with logger at run start | > Wed Apr 26 23:03:12 2017 [mhttpd,ERROR] [midas.c:9106:rpc_client_connect,ERROR] cannot connect to host "scar
> lett", port 44858: connect() returned -1, errno 113 (No route to host)
Forgot to reply to this: if you read the error messages, you will see the actual problem is "no route to host". Next step
is to ping the same hostname or try "telnet hostname 22" (cut-and-paste the hostname from the error message
to avoid the common pitfall of not seeing a typo, i.e. ping host00 works while midas connect to hostOO does not (zero vs capital-o)).
In your case you had the wrong hostname ("foo" and "foo.localdomain" resolve to different IP addresses, one works the other
one does not). You can also try to use the IP address instead of hostname, this will avoid hostname resolution problems
(inconsistency between /etc/hosts and hostnames in DNS is very easy to have when using self-made private networks).
K.O. |
|
1024
|
14 Oct 2014 |
Konstantin Olchanski | Bug Report | Problem with EQ_USER | If you use EQ_USER in mfe.c and have multiple threads writing into the ring buffer, you will have a big
problem - the thread locking in the ring buffer code only works for a single writer thread and a single
reader thread.
Presently, it is not clear how to have multiple multithreaded equipments inside one frontend.
During the Summer of 2013 code briefly existed in mfe.c to have an array of ring buffers and each
multithreaded equipment could write into it's own buffer.
But this code is now removed and mfe.c can only read from a single ring buffer and as I noted above, ring
buffer locking requires that only a single thread writes into it.
K.O. |
|
1028
|
15 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | Sure, each thread needs its own ring buffer for writing.
So I see that we need back the multiple-ring-buffer-readout-scheme even before MEG will start. So what you need is something like
for (i=0 ; rb[i] != 0 ; i++) {
read event from rb[i];
}
as it was before. What I do not like is that rb is a global variable, we should better use the encapsulation functions and extend get_event_rb() to
get_event_rb(i) so you can have n ring buffers.
Give me one day, I will extend the current code to make it work again and to implement N threads.
Cheers,
Stefan |
|
1029
|
16 Oct 2014 |
Stefan Ritt | Bug Report | Problem with EQ_USER | I restructured the front-end code to enable multiple readout threads for EQ_USER equipment. Last summer I was definitively interrupted during
that work and left it in an half finished state, sorry for that.
The way it works now is illustrated in mtfe.c. You create N ring buffers and N threads via
for (int i=0 ; i<N ; i++) {
create_event_rb(i);
ss_thread_create(trigger_thread, (void*)(PTYPE)i);
}
then each readout thread accesses its own readout buffer
thread(...)
{
index = (int)(PTYPE)param;
signal_readout_thread_active(index, TRUE);
rbh = get_event_rbh(index);
while (is_readout_thread_enabled()) {
... read event and put it into ring buffer ...
}
signal_readout_thread_active(index, FALSE);
}
The is_readout_thread_enabled() and signal_readout_thread_active() are used by the framework to shut down gracefully threads correct at the end
of the program. This way each thread can close any hardware correctly.
Note that no other thread management is done by the framework. In the old days with interrupt equipment, the framework disabled interrupts
when reading out periodic events, since that was necessary when using a single CAMAC crate for ADCs and scalers. This is obsolete now and not
needed any longer. It is now the responsibility of the user code to resolve hardware access conflicts between different threads (like using a local
mutex to access the same hardware). There is also no "readout when running" handling. If events should not be read out when the run is stopped,
the readout thread has to check to run status, or better the EOR routine should disable the hardware trigger and the BOR routine should re-enable
it. The readout threads will then poll for new events and just go to sleep if nothing is there.
I testes the mtfe.c program with 100 Hz and 1 MHz event rate on a dummy experiment (no hardware access) and it worked without problem.
Let me know if there is any issue left over.
/Stefan |
|
824
|
10 Aug 2012 |
Carl Blaksley | Forum | Problem with CAMAC controlled by CES8210 and read out by CAEN V1718 VME controller | Hello all,
I am trying to put together a system to read out several camac adc. The camac is
read by a ces8210 camac to vme controller. The vme is then interfaced to a
computer through a CAEN v1718 usb control module. As anyone gotten the latter to
work?
Previous users seemed to indicate that they had here:
https://ladd00.triumf.ca/elog/Midas/493
but I am having problems to get this example frontend to compile. What is set as
the driver in the makefile for example? If I put v1718 there then I recieve
numerous errors from the CAENVMElib files.
If someone else has gotten the V1718 running, I would be grateful for their
insight.
Thanks,
-Carl |
|
1164
|
22 Feb 2016 |
ZiyiGuo | Forum | Problem with BLTRead | Dear all,
I'm using MIDAS system and CAEN V1721 to digitize the waveform from photomultipliers (
and the link bridge to PC is V2718 ). I use BLTRead to read data of the digitizer, but
I found that if the event counting rate is high ( about 100KB/s ), the communication
of V1721 and PC would be suspended randomly, and I get an error code of -2. Could you
give me some suggestion? Thanks a lot. |
|
1165
|
23 Feb 2016 |
Pierre-Andre Amaudruz | Forum | Problem with BLTRead | > Dear all,
>
> I'm using MIDAS system and CAEN V1721 to digitize the waveform from photomultipliers (
> and the link bridge to PC is V2718 ). I use BLTRead to read data of the digitizer, but
> I found that if the event counting rate is high ( about 100KB/s ), the communication
> of V1721 and PC would be suspended randomly, and I get an error code of -2. Could you
> give me some suggestion? Thanks a lot.
Hi,
Can you provide the BLTread call fragment code and the PC /var/log/messages at the time of
the hang up.
What is needed to restart the daq?
PAA |
|