Back Midas Rome Roody Rootana
  Midas DAQ System, Page 85 of 150  Not logged in ELOG logo
    Reply  06 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support 
Hi 


> > some minutes ago I published a PR for PostgreSQL support I developed
> > at INFN-Napoli for Darkside experiment...
> > 
> > I don't know if you receive a notification about this PR and in doubt
> > I wrote this message...
> 
> Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> of your pull request and everything looked quite good. But that pull request was 
> for an older version of midas and it would not have applied cleanly to the current 
> version. I will take a look at your updated pull request. In theory it should only 
> add the Postgres class and modify a few other places in history_schema.cxx and have 
> no changes to anything else. (if you need those changes, it should be a separate 
> pull request).
> 
> Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> observed for storing and using midas history data.
> 
> K.O.
    Reply  06 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support 
Hi Konstantin,
thanks for this update |

My main interest for PostgreSQL is usage of TimescaleDB
(https://github.com/timescale/timescaledb) a PostgreSQL extension that
makes possible usage of downsampling functions on time-series...

here at INFN-Napoli we have a large history dataset that we manage
with MIDAS history and MySQL tables. We have a lot of issues 
(wait time, browser hangs, crashes) when we use MIDAS history plot 
pages on large time period because the Javascript web page try to 
download million of records in order to display them on a plot of 
(max) 2000 pixel width...

with native downsampling we can reduce a large dataset keeping the
"shape" of the curve using only the points needed by the plot area;

in TimescaleDB there is "lttb" ( Largest Triangle Three Bucket) a very 
nice and impressive downsampling function that preserve very well the
shape of the series.

If you are interested to see a lttb at work on some data you can open this page:
   https://www.base.is/flot

In next days I will work to add TimescaleDB backend to MIDAS history (it will be
similar to PostgreSQL backend) and we can discuss on how to add these 
downsampling features to history plot web pages, I already developed some
solutions and I will be happy to share them with MIDAS community;

Cheers,
Gennaro

> > some minutes ago I published a PR for PostgreSQL support I developed
> > at INFN-Napoli for Darkside experiment...
> > 
> > I don't know if you receive a notification about this PR and in doubt
> > I wrote this message...
> 
> Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> of your pull request and everything looked quite good. But that pull request was 
> for an older version of midas and it would not have applied cleanly to the current 
> version. I will take a look at your updated pull request. In theory it should only 
> add the Postgres class and modify a few other places in history_schema.cxx and have 
> no changes to anything else. (if you need those changes, it should be a separate 
> pull request).
> 
> Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> observed for storing and using midas history data.
> 
> K.O.
Entry  15 Mar 2023, Casey, Forum, Having trouble with MIDAS setup 
Hi

I'm not sure if this is the right forum for this query (if it is not, I would truly appreciate it if someone could point me at the right forum). I'm having a little bit of trouble with the setup of a Midas system.

Now, this is a system that I inherited after the previous guy who was looking after it went to other employment. There was a point at which it was working. And then there was an unrelated issue in the electrical system which, as a side effect, meant that the building lost power for a time, and the entire system had to be rebooted.

No problem, I thought. I'll just reset and restart all of the software...

...and I can't seem to get it to work. I keep getting the error message "mvme_read_value: Could not perform read!: Bad address". So far as I can tell, this seems to be related to the idea that the base address being used to read from the boards is incorrect. The base addresses are hardcoded in the software (not autogenerated) and, aside from the power going down and up again, the hardware hasn't been touched since the system was working.

I imagine that there is something that needs to be set, twiddled, tweaked, or turned on in the driver. The output of 'lsmod | grep vme' is:

vmedriver             117742  0

so presumably the driver is at least *present*, even if I have no idea how to twiddle anything on it.

Could anyone perhaps suggest a way forward? Is there some way to gather the information that I need, perhaps, or some way to twiddle anything twiddle-able on the driver?

Casey
Entry  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
midas bitbucket repository had a spam attack, about 40 spam messages were posted 
into the issues. I was able to delete them manually. No idea how they got past 
bitbucket spam filters and if they are spam or an attack against automated issue 
tracker tools or an attack against the repo owner (who is vulnerable as they rush 
in to deal with the spam). if this happens again, "anonymous issues" may have to 
be disabled, bitbucket login required. K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, Having trouble with MIDAS setup 
> I'm not sure if this is the right forum for this query

this might be the right place, depending.

> I'm having a little bit of trouble with the setup of a Midas system ...
> inherited after the previous guy ...

a rather major problem, but a typical situation.

> There was a point at which it was working.

this is very good. if it worked before, there is good chance it will work again.

> And then there was an unrelated issue in the electrical system which, as a side effect,
> meant that the building lost power for a time, and the entire system had to be rebooted.
> No problem, I thought. I'll just reset and restart all of the software...
> ...and I can't seem to get it to work.

this happens often enough. several things are likely to happen:
- unexpected software updates, i.e. new linux kernel was installed but inactive, waiting for a reboot
- hardware failures, i.e. we usually see blown up power supplies. check that all VME crate voltages are okey. (ask me how).
- firmware corruption, i.e. we have seen VME modules lose their firmware after power outage, had to be reloaded by jtag

> I keep getting the error message "mvme_read_value: Could not perform read!: Bad address".

this is a generic error, it does not mean that software suddenly is trying to read from wrong address.

> I imagine that there is something that needs to be set, twiddled, tweaked, or turned on in the driver. The output of 'lsmod | grep vme' is:
> vmedriver             117742  0

this is not the vme driver we use at TRIUMF, so I am not familiar with it's errors. we use the vme_tsi148 driver and the vme_universe driver. (ask me about them).

> so presumably the driver is at least *present*, even if I have no idea how to twiddle anything on it.

could be the wrong version of the driver or the wrong version of the linux kernel. worth checking log files
to see if kernel and driver version numbers are the same.

> Could anyone perhaps suggest a way forward?

Yes. You will have to tell me much more about your system. You can do this publicly here or privately by email to olchansk@triumf.ca

To start, I need to know your VME setup, what is the crate, what is the VME processor, what OS you run, what VME driver you use, what VME modules you have installed.

K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
> midas bitbucket repository had a spam attack, about 40 spam messages were posted 
> into the issues. I was able to delete them manually. No idea how they got past 
> bitbucket spam filters and if they are spam or an attack against automated issue 
> tracker tools or an attack against the repo owner (who is vulnerable as they rush 
> in to deal with the spam). if this happens again, "anonymous issues" may have to 
> be disabled, bitbucket login required. K.O.

Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.

K.O.
    Reply  16 Mar 2023, Konstantin Olchanski, Forum, bitbucket issue spam cleaned 
> > midas bitbucket repository had a spam attack, about 40 spam messages were posted 
> > into the issues. I was able to delete them manually. No idea how they got past 
> > bitbucket spam filters and if they are spam or an attack against automated issue 
> > tracker tools or an attack against the repo owner (who is vulnerable as they rush 
> > in to deal with the spam). if this happens again, "anonymous issues" may have to 
> > be disabled, bitbucket login required. K.O.
> 
> Two more spam messages, deleted. "Anonymous users can create issues" is now turned off.

Also, same for: rootana.
Also, empty issue trackers disabled: mvodb, midasio, mscb.

K.O.
    Reply  20 Mar 2023, Gennaro Tortone, Forum, pull request for PostgreSQL support 
Hi,
I have updated the PR with a new one that includes TimescaleDB support and some
changes to mhistory.js to support downsampling queries...

Cheers,
Gennaro

> > some minutes ago I published a PR for PostgreSQL support I developed
> > at INFN-Napoli for Darkside experiment...
> > 
> > I don't know if you receive a notification about this PR and in doubt
> > I wrote this message...
> 
> Hi, Gennaro, thank you for the very useful contribution. I saw the previous version 
> of your pull request and everything looked quite good. But that pull request was 
> for an older version of midas and it would not have applied cleanly to the current 
> version. I will take a look at your updated pull request. In theory it should only 
> add the Postgres class and modify a few other places in history_schema.cxx and have 
> no changes to anything else. (if you need those changes, it should be a separate 
> pull request).
> 
> Also I am curious what benefits and drawbacks of Postgres vs mysql/mariadb you have 
> observed for storing and using midas history data.
> 
> K.O.
Entry  21 Apr 2023, Grzegorz Nieradka, Forum, Setup Midas with Caen vx2740 - ask for help 
I'm trying to setup Midas with the Caen vx2740 VME digitizer board.
As the backend driver I used the software from Darkside located here:

https://bitbucket.org/ttriumfdaq/dsproto_vx2740/src/develop/

They implemented some helpers program and one from them should diagnose correct running of digitizer. But when I'm trying to run example program "vx2740_readout_test" I have segmentation fault:

Thread 1 "vx2740_readout_" received signal SIGSEGV, Segmentation fault.
0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>) at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947

During the calling this program I have running mhttpd, mlogger and the backend for vx2740 from the repository.

I'm not able to find documentation what is purpose of the RPC? Could someone give any indicators how I can start debug this behavior? Or there is some documentation about the RPC?


I'm freshman in the Midas world, so at this moment everything seems for me very complicated - and I'm learning by doing.

Regards,
Grzegorz

The backtrace from gdb which indicates the function in Midas package:

#0  0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
    at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947
#1  0x00005555555c2f12 in cm_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
    at /home/astrocent/workspace/packages/midas/src/midas.cxx:5840
#2  0x00005555555a26f6 in VX2740GroupFrontend::init (this=this@entry=0x7fffffffcba0, group_idx=group_idx@entry=-1, hDB=hDB@entry=0, enable_jrpc=enable_jrpc@entry=true)
    at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_fe_class.cxx:134
#3  0x000055555557e492 in do_fe (board_name=..., is_scope=<optimized out>) at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_readout_test.cxx:185
#4  0x000055555557adc9 in main (argc=<optimized out>, argv=0x7fffffffd1f8) at /home/astrocent/workspace/packages/ttriumfdaq-dsproto_vx2740-8122058cacd1/vx2740_readout_test.cxx:253
    Reply  21 Apr 2023, Ben Smith, Forum, Setup Midas with Caen vx2740 - ask for help 
> I'm not able to find documentation what is purpose of the RPC? Could someone give any indicators how I can start debug this behavior? Or there is some documentation about the RPC?

The RPC system allows midas clients to issue commands to each other. In the case of the VX2740 code we use it so the midas webserver (mhttpd) can tell the frontend to perform some actions when a user clicks a button on a webpage.

I've been writing most of the code for the VX2740 for Darkside, so will contact you directly to help debug the issue.
    Reply  21 Apr 2023, Konstantin Olchanski, Forum, Setup Midas with Caen vx2740 - ask for help 
> I'm trying to setup Midas with the Caen vx2740 VME digitizer board.

welcome to the world of daq and midas! Ben already answered and he will help you with this specific hardware. (we work together)

> #0  0x00005555555c2ee1 in rpc_register_function (id=id@entry=18000, func=func@entry=0x5555555a2790 <jrpc_helper(int, void**)>)
>     at /home/astrocent/workspace/packages/midas/src/midas.cxx:11947

I look at this line in midas and I do not see any problems other than all functions that touch rpc_list are not thread safe,
and calling them at the same time as rpc calls are active will cause memory corruption and crash. This is not a problem
in most programs because rpc_register_function() is usually called once at the beginning of everything, before any RPCs
are received, sent or processed. I filed a bug against this problem. https://bitbucket.org/tmidas/midas/issues/362/rpc_list-is-not-thread-safe

> with is "jrpc"?

I implemented it years ago to allow web pages to call mhttpd (XHR/HTTP) to call user frontends (MIDAS RPC) to perform real-time actions,
i.e. to turn power supplied on or off. "j" stands for "json", but most experiments send very simple commands
and do not use json encoding.

K.O.
Entry  26 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
Hi

We have a problem with running midas frontends when they should connect to a experiment on a different machine using the -h option. Starting them locally works fine. Firewall is off on both systems, Enable non-localhost RPC and Disable RPC hosts check are set to 'yes', mserver is running on the machine that we want to connect to. 
Error message looks like this: 

...
Connect to experiment Mu3e on host 10.32.113.210...
OK
Init hardware...
terminate called after throwing an instance of 'mexception'
  what():  
/home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
Stack trace:
1   0x00000000000042D828 (null) + 4380712
2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
8   0x0000000000004385EF setup_odb() + 8351
9   0x00000000000043B2E6 frontend_init() + 22
10  0x000000000000433304 main + 1540
11  0x0000007F8C6FE3724D __libc_start_main + 239
12  0x000000000000433F7A _start + 42

Aborted (core dumped)


We have the same problem for all our frontends. When we want to start them locally they work. Starting them locally with ./frontend -h localhost also reproduces the error above.

The error can also be reproduced with the odbxx_test.cxx example in the midas repo by replacing line 22 in midas/examples/odbxx/odbxx_test.cxx (cm_connect_experiment(NULL, NULL, "test", NULL);) with cm_connect_experiment("localhost", "Mu3e", "test", NULL); (Put the name of the experiment instead of "Mu3e")

running odbxx_test locally gives us then the same error as our other frontend.

Thanks in advance,
Martin
    Reply  27 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
> H
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1   0x00000000000042D828 (null) + 4380712
> 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8   0x0000000000004385EF setup_odb() + 8351
> 9   0x00000000000043B2E6 frontend_init() + 22
> 10  0x000000000000433304 main + 1540
> 11  0x0000007F8C6FE3724D __libc_start_main + 239
> 12  0x000000000000433F7A _start + 42
> 
> Aborted (core dumped)
> 
> 
> We have the same problem for all our frontends. When we want to start them locally they work. Starting them locally with ./frontend -h localhost also reproduces the error above.
> 
> The error can also be reproduced with the odbxx_test.cxx example in the midas repo by replacing line 22 in midas/examples/odbxx/odbxx_test.cxx (cm_connect_experiment(NULL, NULL, "test", NULL);) with cm_connect_experiment("localhost", "Mu3e", "test", NULL); (Put the name of the experiment instead of "Mu3e")
> 
> running odbxx_test locally gives us then the same error as our other frontend.
> 
> Thanks in advance,
> Martin
    Reply  27 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.

> Connect to experiment Mu3e on host 10.32.113.210...
> OK
> Init hardware...
> terminate called after throwing an instance of 'mexception'
>   what():  
> /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> Stack trace:
> 1   0x00000000000042D828 (null) + 4380712
> 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> 8   0x0000000000004385EF setup_odb() + 8351
> 9   0x00000000000043B2E6 frontend_init() + 22
> 10  0x000000000000433304 main + 1540
> 11  0x0000007F8C6FE3724D __libc_start_main + 239
> 12  0x000000000000433F7A _start + 42
> 
> Aborted (core dumped)

K.O.
    Reply  28 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
> Looks like your MIDAS is built without debug information (-O2 -g), the stack trace does not have file names and line numbers. Please rebuild with debug information and report the stack trace. Thanks. K.O.
> 
> > Connect to experiment Mu3e on host 10.32.113.210...
> > OK
> > Init hardware...
> > terminate called after throwing an instance of 'mexception'
> >   what():  
> > /home/mu3e/midas/include/odbxx.h:1102: Wrong key type in XML file
> > Stack trace:
> > 1   0x00000000000042D828 (null) + 4380712
> > 2   0x00000000000048ED4D midas::odb::odb_from_xml(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 605
> > 3   0x0000000000004999BD midas::odb::odb(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 317
> > 4   0x000000000000495383 midas::odb::read_key(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 1459
> > 5   0x0000000000004971E3 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) + 259
> > 6   0x000000000000497636 midas::odb::connect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool) + 502
> > 7   0x00000000000049883B midas::odb::connect_and_fix_structure(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 171
> > 8   0x0000000000004385EF setup_odb() + 8351
> > 9   0x00000000000043B2E6 frontend_init() + 22
> > 10  0x000000000000433304 main + 1540
> > 11  0x0000007F8C6FE3724D __libc_start_main + 239
> > 12  0x000000000000433F7A _start + 42
> > 
> > Aborted (core dumped)
> 
> K.O.

As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
Stack trace of odbxx_test with line numbers:

Set ODB key "/Test/Settings/String Array 10[0...9]" = ["","","","","","","","","",""]
Created ODB key "/Test/Settings/Large String Array 10"
Set ODB key "/Test/Settings/Large String Array 10[0...9]" = ["","","","","","","","","",""]
[test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
[test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
[test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort

Program received signal SIGABRT, Aborted.
0x00007ffff6665cdb in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libgcc_s1-debuginfo-11.3.0+git1637-150000.1.9.1.x86_64 libstdc++6-debuginfo-11.2.1+git610-1.3.9.x86_64 libz1-debuginfo-1.2.11-3.24.1.x86_64
(gdb) bt
#0  0x00007ffff6665cdb in raise () from /lib64/libc.so.6
#1  0x00007ffff6667375 in abort () from /lib64/libc.so.6
#2  0x0000000000431bba in rpc_call (routine_id=11249) at /home/labor/midas/src/midas.cxx:13904
#3  0x0000000000460c4e in db_copy_xml (hDB=1, hKey=1009608, buffer=0x7ffff7e9c010 "", buffer_size=0x7fffffffadbc, header=false) at /home/labor/midas/src/odb.cxx:8994
#4  0x000000000046fc4c in midas::odb::odb_from_xml (this=0x7fffffffb3f0, str=...) at /home/labor/midas/src/odbxx.cxx:133
#5  0x000000000040b3d9 in midas::odb::odb (this=0x7fffffffb3f0, str=...) at /home/labor/midas/include/odbxx.h:605
#6  0x000000000040b655 in midas::odb::odb (this=0x7fffffffb3f0, s=0x4a465a "/Test/Settings") at /home/labor/midas/include/odbxx.h:629
#7  0x0000000000407bba in main () at /home/labor/midas/examples/odbxx/odbxx_test.cxx:56
(gdb) 
    Reply  28 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
> As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
> [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort

ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
the correct stack trace should be rooted in the handler for db_copy_xml.

but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.

what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).

K.O.
    Reply  28 Apr 2023, Martin Mueller, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
> > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp  (with cm_connect_experiment changed to "localhost")
> > [test,ERROR] [system.cxx:5104:recv_tcp2,ERROR] unexpected connection closure
> > [test,ERROR] [system.cxx:5158:ss_recv_net_command,ERROR] error receiving network command header, see messages
> > [test,ERROR] [midas.cxx:13900:rpc_call,ERROR] routine "db_copy_xml": error, ss_recv_net_command() status 411, program abort
> 
> ok, cool. looks like we crashed the mserver. either run mserver attached to gdb or enable mserver core dump, we need it's stack trace,
> the correct stack trace should be rooted in the handler for db_copy_xml.
> 
> but most likely odbxx is asking for more data than can be returned through the MIDAS RPC.
> 
> what is the ODB key passed to db_copy_xml() and how much data is in ODB at that key? (odbedit "du", right?).
> 
> K.O.

Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines. 
It does not matter how much data odbxx is asking for.

Something as simple as this reproduces the error, asking for a single integer:

int main() {
   cm_connect_experiment("localhost", "Mu3e", "test", NULL);
   midas::odb o = {
      {"Int32 Key", 42}
   };
   o.connect("/Test/Settings");
   cm_disconnect_experiment();
   return 1;
}

at the same time this runs fine:

int main() {
   cm_connect_experiment(NULL, NULL, "test", NULL);
   midas::odb o = {
      {"Int32 Key", 42}
   };
   o.connect("/Test/Settings");
   cm_disconnect_experiment();
   return 1;
}

in both cases mserver does not crash. I do not have a stack trace. There is also no error produced by mserver.

Last year we did not have these problems with the same midas frontends (For example in midas commit 9d2ef471 the code from above runs 
fine). I am trying to pinpoint the exact commit where this stopped working now. 
    Reply  28 Apr 2023, Konstantin Olchanski, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
> > > As i said we can easily reproduce this with midas/examples/odbxx/odbxx_test.cpp
> > ok, cool. looks like we crashed the mserver.
> Ok. Maybe i have to make this more clear. ANY odbxx access of a remote odb reproduces this error for us on multiple machines. 
> It does not matter how much data odbxx is asking for.
> midas commit 9d2ef471 the code from above runs fine

so, a regression. ouch.

if core dumps are turned off, you will not "see" the mserver crash, because the main mserver is still running. it's the mserver forked to 
serve your RPC connection that crashes.

> int main() {
>    cm_connect_experiment("localhost", "Mu3e", "test", NULL);
>    midas::odb o = {
>       {"Int32 Key", 42}
>    };
>    o.connect("/Test/Settings");
>    cm_disconnect_experiment();
>    return 1;
> }

to debug this, after cm_connect_experiment() one has to put ::sleep(1000000000); (not that big, obviously),
then while it is sleeping do "ps -efw | grep mserver", this will show the mserver for the test program,
connect to it with gdb, wait for ::sleep() to finish and o.connect() to crash, with luck gdb will show
the crash stack trace in the mserver.

so easy to debug? this is why back in the 1970-ies clever people invented core dumps, only to have
even more clever people in the 2020-ies turn them off and generally make debugging more difficult (attaching
gdb to a running program is also disabled-by-default in some recent linuxes).

rant-off.

to check if core dumps work, to "killall -7 mserver". to enable core dumps on ubuntu, see here:
https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu

last known-working point is:

commit 9d2ef471c4e4a5a325413e972862424549fa1ed5
Author: Ben Smith <bsmith@triumf.ca>
Date:   Wed Jul 13 14:45:28 2022 -0700

    Allow odbxx to handle connecting to "/" (avoid trying to read subkeys as "//Equipment" etc.

K.O.
    Reply  02 May 2023, Niklaus Berger, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
Thanks for all the helpful hints. When finally managing to evade all timeouts and attach the debugger in just the right moment, we find that we get a segfault in mserver at L827:
   case RPC_DB_COPY_XML:
      status = db_copy_xml(CHNDLE(0), CHNDLE(1), CSTRING(2), CPINT(3), CBOOL(4));
Some printf debugging then pointed us to the fact that the culprit is the pointer de-referencing in CBOOL(4). This in turn can be traced back to mrpc.cxx L282 ff, where the line with the arrow was missing:
  {RPC_DB_COPY_XML, "db_copy_xml",
    {{TID_INT32, RPC_IN},
     {TID_INT32, RPC_IN},
     {TID_ARRAY, RPC_OUT | RPC_VARARRAY},
     {TID_INT32, RPC_IN | RPC_OUT},
 ->  {TID_BOOL, RPC_IN},
     {0}}},

If we put that in, the mserver process completes peacfully and we get a segfault in the client ("Wrong key type in XML file") which we will attempt to debug next. Shall I create a pull request for the additional RPC argument or will you just fix this on the fly?
    Reply  02 May 2023, Niklaus Berger, Forum, Problem with running midas odbxx frontends on a remote machine using the -h option 
And now we also fixed the client segfault, odb.cxx L8992 also needs to know about the header:
if (rpc_is_remote())
      return rpc_call(RPC_DB_COPY_XML, hDB, hKey, buffer, buffer_size, header);

(last argument was missing before).
ELOG V3.1.4-2e1708b5