Back Midas Rome Roody Rootana
  Midas DAQ System, Page 120 of 136  Not logged in ELOG logo
ID Date Author Topic Subjectdown
  708   27 Jun 2010 Jimmy NgaiForumError connecting to back-end computer
> Hi, there. I have not recently run mserver through inetd, and we usually do not do
> that at TRIUMF. We do this:
> 
> a) on the main computer: start mserver: "mserver -p 7070 -D" (note - use non-default
> port - can use different ports for different experiments)
> b) on remote computer: "odbedit -h main:7070" ("main" is the hostname of your main
> computer). Use same "-h" switch for all other programs, including the frontends.
> 
> This works well when all computers are on the same network, but if you have some
> midas clients running on private networks you may get into trouble when they try to
> connect to each other and fail because network routing is funny.

Hi K.O.,

Thanks for your reply. I have tried your way but I got the same error: 

[midas.c:8623:rpc_server_connect,ERROR] mserver subprocess could not be started 
(check path)

My front-end and back-end computers are on the same network connected by a router. I 
have allowed port 7070 in the firewall and done the port forwarding in the router (for 
connecting from outside the network). From the error message it seems that some 
processes can not be started automatically. Could it be related to some security 
settings such as the SELinux?

Best Regards,
Jimmy
  709   28 Jun 2010 Stefan RittForumError connecting to back-end computer
> > Hi, there. I have not recently run mserver through inetd, and we usually do not do
> > that at TRIUMF. We do this:
> > 
> > a) on the main computer: start mserver: "mserver -p 7070 -D" (note - use non-default
> > port - can use different ports for different experiments)
> > b) on remote computer: "odbedit -h main:7070" ("main" is the hostname of your main
> > computer). Use same "-h" switch for all other programs, including the frontends.
> > 
> > This works well when all computers are on the same network, but if you have some
> > midas clients running on private networks you may get into trouble when they try to
> > connect to each other and fail because network routing is funny.
> 
> Hi K.O.,
> 
> Thanks for your reply. I have tried your way but I got the same error: 
> 
> [midas.c:8623:rpc_server_connect,ERROR] mserver subprocess could not be started 
> (check path)
> 
> My front-end and back-end computers are on the same network connected by a router. I 
> have allowed port 7070 in the firewall and done the port forwarding in the router (for 
> connecting from outside the network). From the error message it seems that some 
> processes can not be started automatically. Could it be related to some security 
> settings such as the SELinux?

The way connections work under Midas is there is a callback scheme. The client starts 
mserver on the back-end, then the back-end connects back to the front-end on three 
different ports. These ports are assigned dynamically by the operating system and are 
typically in the range 40000-60000. So you also have to allow the reverse connection on 
your firewalls.
  710   28 Jun 2010 Jimmy NgaiForumError connecting to back-end computer
> The way connections work under Midas is there is a callback scheme. The client starts 
> mserver on the back-end, then the back-end connects back to the front-end on three 
> different ports. These ports are assigned dynamically by the operating system and are 
> typically in the range 40000-60000. So you also have to allow the reverse connection on 
> your firewalls.

It works now after allowing ports 40000-60000 in the front-end computer. Thanks!

Best Regards,
Jimmy
  711   29 Jun 2010 Konstantin OlchanskiForumError connecting to back-end computer
> > The way connections work under Midas is there is a callback scheme. The client starts 
> > mserver on the back-end, then the back-end connects back to the front-end on three 
> > different ports. These ports are assigned dynamically by the operating system and are 
> > typically in the range 40000-60000. So you also have to allow the reverse connection on 
> > your firewalls.
> 
> It works now after allowing ports 40000-60000 in the front-end computer. Thanks!


Yes, right. Midas networking does not like firewalls.

In the nutshell, TCP connections on all TCP ports have to be open between all computers
running MIDAS. I think in practice it is not a problem: you only ever have a finite (a small
integer) number of computers running MIDAS and you can be added them as exceptions to the
firewall rules. These exceptions should not create any security problem because you still have
the MIDAS computers firewalled from the outside world and one hopes that they will not be
attacking each other.

P.S. Permitting ports 40000-60000 is not good enough. TCP ports are allocated to TCP
connections semi-randomly from a 16-bit address space (0..65535) and your system will bomb
whenever port numbers like 39999 or 60001 get used.


K.O.
  2569   02 Aug 2023 Stefan RittBug ReportError accessing history files
We sporadically (like once per few hours) have an error message when we access the 
history plots through mhttpd:

07:21:35.109 2023/08/03 [mhttpd,ERROR] 
[history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
or directory)

When I log in to the machine, I properly see the file and also can access it

[meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
-rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat

and I also can dump that file. 

When I try again with mhttpd, I properly see that file. 

Now in principle this is not a problem, but the error message is annoying, since this 
is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
is an OS issue, I wonder if we should add code to retry the file access in case we get 
that error.

Anybody seen a similar thing?

Best,
Stefan
Attachment 1: log.txt
07:22:54.488 2023/08/03 [Sequencer,INFO] Run #536882 started
07:22:50.710 2023/08/03 [Sequencer,INFO] Run #536881 stopped
07:21:35.109 2023/08/03 [mhttpd,ERROR] [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file or directory)
07:16:44.351 2023/08/03 [Sequencer,INFO] Run #536881 started
07:16:40.513 2023/08/03 [Sequencer,INFO] Run #536880 stopped
07:10:34.581 2023/08/03 [Sequencer,INFO] Run #536880 started
07:10:30.594 2023/08/03 [Sequencer,INFO] Run #536879 stopped
07:04:23.783 2023/08/03 [Sequencer,INFO] Run #536879 started
07:04:19.864 2023/08/03 [Sequencer,INFO] Run #536878 stopped
06:57:55.055 2023/08/03 [Sequencer,INFO] Run #536878 started
06:57:50.991 2023/08/03 [Sequencer,INFO] Run #536877 stopped
06:51:41.184 2023/08/03 [Sequencer,INFO] Run #536877 started
06:51:37.611 2023/08/03 [Sequencer,INFO] Run #536876 stopped
06:44:56.595 2023/08/03 [Sequencer,INFO] Run #536876 started
06:44:52.834 2023/08/03 [Sequencer,INFO] Run #536875 stopped
06:38:28.422 2023/08/03 [Sequencer,INFO] Run #536875 started
06:38:24.945 2023/08/03 [Sequencer,INFO] Run #536874 stopped
06:32:08.153 2023/08/03 [Sequencer,INFO] Run #536874 started
06:32:04.586 2023/08/03 [Sequencer,INFO] Run #536873 stopped
06:25:23.687 2023/08/03 [Sequencer,INFO] Run #536873 started
06:25:20.318 2023/08/03 [Sequencer,INFO] Run #536872 stopped
06:19:15.480 2023/08/03 [Sequencer,INFO] Run #536872 started
06:19:11.305 2023/08/03 [Sequencer,INFO] Run #536871 stopped
06:12:52.689 2023/08/03 [Sequencer,INFO] Run #536871 started
06:12:49.075 2023/08/03 [Sequencer,INFO] Run #536870 stopped
06:06:42.901 2023/08/03 [Sequencer,INFO] Run #536870 started
06:06:39.033 2023/08/03 [Sequencer,INFO] Run #536869 stopped
06:00:25.953 2023/08/03 [Sequencer,INFO] Run #536869 started
06:00:22.384 2023/08/03 [Sequencer,INFO] Run #536868 stopped
05:54:13.589 2023/08/03 [Sequencer,INFO] Run #536868 started
05:54:09.719 2023/08/03 [Sequencer,INFO] Run #536867 stopped
05:47:49.328 2023/08/03 [Sequencer,INFO] Run #536867 started
05:47:45.429 2023/08/03 [Sequencer,INFO] Run #536866 stopped
05:41:39.018 2023/08/03 [Sequencer,INFO] Run #536866 started
05:41:35.248 2023/08/03 [Sequencer,INFO] Run #536865 stopped
05:35:25.122 2023/08/03 [Sequencer,INFO] Run #536865 started
05:35:21.542 2023/08/03 [Sequencer,INFO] Run #536864 stopped
05:29:14.937 2023/08/03 [Sequencer,INFO] Run #536864 started
05:29:11.320 2023/08/03 [Sequencer,INFO] Run #536863 stopped
05:22:46.524 2023/08/03 [Sequencer,INFO] Run #536863 started
05:22:42.746 2023/08/03 [Sequencer,INFO] Run #536862 stopped
05:16:33.997 2023/08/03 [Sequencer,INFO] Run #536862 started
05:16:30.422 2023/08/03 [Sequencer,INFO] Run #536861 stopped
05:10:30.602 2023/08/03 [Sequencer,INFO] Run #536861 started
05:10:26.922 2023/08/03 [Sequencer,INFO] Run #536860 stopped
05:04:14.734 2023/08/03 [Sequencer,INFO] Run #536860 started
05:04:10.964 2023/08/03 [Sequencer,INFO] Run #536859 stopped
04:57:48.773 2023/08/03 [Sequencer,INFO] Run #536859 started
04:57:44.994 2023/08/03 [Sequencer,INFO] Run #536858 stopped
04:51:29.976 2023/08/03 [Sequencer,INFO] Run #536858 started
04:51:26.224 2023/08/03 [Sequencer,INFO] Run #536857 stopped
04:44:46.298 2023/08/03 [Sequencer,INFO] Run #536857 started
04:44:41.832 2023/08/03 [Sequencer,INFO] Run #536856 stopped
04:38:32.283 2023/08/03 [Sequencer,INFO] Run #536856 started
04:38:28.513 2023/08/03 [Sequencer,INFO] Run #536855 stopped
04:32:15.707 2023/08/03 [Sequencer,INFO] Run #536855 started
04:32:12.185 2023/08/03 [Sequencer,INFO] Run #536854 stopped
04:26:08.980 2023/08/03 [Sequencer,INFO] Run #536854 started
04:26:05.406 2023/08/03 [Sequencer,INFO] Run #536853 stopped
04:19:55.754 2023/08/03 [Sequencer,INFO] Run #536853 started
04:19:51.976 2023/08/03 [Sequencer,INFO] Run #536852 stopped
04:13:45.140 2023/08/03 [Sequencer,INFO] Run #536852 started
04:13:41.465 2023/08/03 [Sequencer,INFO] Run #536851 stopped
04:06:45.891 2023/08/03 [Sequencer,INFO] Run #536851 started
04:06:42.253 2023/08/03 [Sequencer,INFO] Run #536850 stopped
04:00:28.915 2023/08/03 [Sequencer,INFO] Run #536850 started
04:00:25.300 2023/08/03 [Sequencer,INFO] Run #536849 stopped
03:54:15.851 2023/08/03 [Sequencer,INFO] Run #536849 started
03:54:12.372 2023/08/03 [Sequencer,INFO] Run #536848 stopped
03:47:53.825 2023/08/03 [Sequencer,INFO] Run #536848 started
03:47:50.240 2023/08/03 [Sequencer,INFO] Run #536847 stopped
03:41:50.429 2023/08/03 [Sequencer,INFO] Run #536847 started
03:41:46.892 2023/08/03 [Sequencer,INFO] Run #536846 stopped
03:35:41.247 2023/08/03 [Sequencer,INFO] Run #536846 started
03:35:37.480 2023/08/03 [Sequencer,INFO] Run #536845 stopped
03:29:33.930 2023/08/03 [Sequencer,INFO] Run #536845 started
03:29:30.453 2023/08/03 [Sequencer,INFO] Run #536844 stopped
03:23:07.931 2023/08/03 [Sequencer,INFO] Run #536844 started
03:23:04.214 2023/08/03 [Sequencer,INFO] Run #536843 stopped
03:17:01.227 2023/08/03 [Sequencer,INFO] Run #536843 started
03:16:57.611 2023/08/03 [Sequencer,INFO] Run #536842 stopped
03:10:48.030 2023/08/03 [Sequencer,INFO] Run #536842 started
03:10:44.255 2023/08/03 [Sequencer,INFO] Run #536841 stopped
03:04:32.608 2023/08/03 [Sequencer,INFO] Run #536841 started
03:04:28.881 2023/08/03 [Sequencer,INFO] Run #536840 stopped
02:58:22.218 2023/08/03 [Sequencer,INFO] Run #536840 started
02:58:18.228 2023/08/03 [Sequencer,INFO] Run #536839 stopped
02:51:50.716 2023/08/03 [Sequencer,INFO] Run #536839 started
02:51:46.287 2023/08/03 [Sequencer,INFO] Run #536838 stopped
02:45:31.191 2023/08/03 [Sequencer,INFO] Run #536838 started
02:45:27.463 2023/08/03 [Sequencer,INFO] Run #536837 stopped
02:39:24.271 2023/08/03 [Sequencer,INFO] Run #536837 started
02:39:20.694 2023/08/03 [Sequencer,INFO] Run #536836 stopped
02:33:08.324 2023/08/03 [Sequencer,INFO] Run #536836 started
02:33:04.757 2023/08/03 [Sequencer,INFO] Run #536835 stopped
02:27:03.014 2023/08/03 [Sequencer,INFO] Run #536835 started
02:26:58.734 2023/08/03 [Sequencer,INFO] Run #536834 stopped
02:20:27.209 2023/08/03 [Sequencer,INFO] Run #536834 started
02:20:23.695 2023/08/03 [Sequencer,INFO] Run #536833 stopped
02:14:14.607 2023/08/03 [Sequencer,INFO] Run #536833 started
02:14:11.131 2023/08/03 [Sequencer,INFO] Run #536832 stopped
02:07:43.853 2023/08/03 [Sequencer,INFO] Run #536832 started
02:07:40.091 2023/08/03 [Sequencer,INFO] Run #536831 stopped
02:01:05.642 2023/08/03 [Sequencer,INFO] Run #536831 started
02:01:01.975 2023/08/03 [Sequencer,INFO] Run #536830 stopped
01:54:55.768 2023/08/03 [Sequencer,INFO] Run #536830 started
01:54:51.901 2023/08/03 [Sequencer,INFO] Run #536829 stopped
01:48:43.247 2023/08/03 [Sequencer,INFO] Run #536829 started
01:48:39.525 2023/08/03 [Sequencer,INFO] Run #536828 stopped
01:42:26.066 2023/08/03 [Sequencer,INFO] Run #536828 started
01:42:22.294 2023/08/03 [Sequencer,INFO] Run #536827 stopped
01:36:10.218 2023/08/03 [Sequencer,INFO] Run #536827 started
01:36:06.352 2023/08/03 [Sequencer,INFO] Run #536826 stopped
01:30:03.121 2023/08/03 [Sequencer,INFO] Run #536826 started
01:29:59.558 2023/08/03 [Sequencer,INFO] Run #536825 stopped
01:23:50.397 2023/08/03 [Sequencer,INFO] Run #536825 started
01:23:46.823 2023/08/03 [Sequencer,INFO] Run #536824 stopped
01:17:28.309 2023/08/03 [Sequencer,INFO] Run #536824 started
01:17:24.641 2023/08/03 [Sequencer,INFO] Run #536823 stopped
01:11:11.245 2023/08/03 [Sequencer,INFO] Run #536823 started
01:11:07.680 2023/08/03 [Sequencer,INFO] Run #536822 stopped
01:04:57.774 2023/08/03 [Sequencer,INFO] Run #536822 started
01:04:54.143 2023/08/03 [Sequencer,INFO] Run #536821 stopped
00:58:52.150 2023/08/03 [Sequencer,INFO] Run #536821 started
00:58:48.569 2023/08/03 [Sequencer,INFO] Run #536820 stopped
00:52:19.523 2023/08/03 [Sequencer,INFO] Run #536820 started
00:52:15.857 2023/08/03 [Sequencer,INFO] Run #536819 stopped
00:45:33.032 2023/08/03 [Sequencer,INFO] Run #536819 started
00:45:29.201 2023/08/03 [Sequencer,INFO] Run #536818 stopped
00:39:19.076 2023/08/03 [Sequencer,INFO] Run #536818 started
00:39:15.510 2023/08/03 [Sequencer,INFO] Run #536817 stopped
00:32:50.593 2023/08/03 [Sequencer,INFO] Run #536817 started
00:32:47.035 2023/08/03 [Sequencer,INFO] Run #536816 stopped
00:26:09.730 2023/08/03 [Sequencer,INFO] Run #536816 started
00:26:05.862 2023/08/03 [Sequencer,INFO] Run #536815 stopped
00:19:57.831 2023/08/03 [Sequencer,INFO] Run #536815 started
00:19:53.408 2023/08/03 [Sequencer,INFO] Run #536814 stopped
00:13:41.084 2023/08/03 [Sequencer,INFO] Run #536814 started
00:13:37.504 2023/08/03 [Sequencer,INFO] Run #536813 stopped
00:07:24.877 2023/08/03 [Sequencer,INFO] Run #536813 started
00:07:21.339 2023/08/03 [Sequencer,INFO] Run #536812 stopped
00:01:18.670 2023/08/03 [Sequencer,INFO] Run #536812 started
00:01:14.751 2023/08/03 [Sequencer,INFO] Run #536811 stopped
23:55:12.073 2023/08/02 [Sequencer,INFO] Run #536811 started
23:55:08.493 2023/08/02 [Sequencer,INFO] Run #536810 stopped
23:53:35.294 2023/08/02 [mhttpd,ERROR] [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file or directory)
23:48:55.498 2023/08/02 [Sequencer,INFO] Run #536810 started
23:48:51.817 2023/08/02 [Sequencer,INFO] Run #536809 stopped
23:42:30.422 2023/08/02 [Sequencer,INFO] Run #536809 started
23:42:26.677 2023/08/02 [Sequencer,INFO] Run #536808 stopped
23:36:23.171 2023/08/02 [Sequencer,INFO] Run #536808 started
23:36:19.592 2023/08/02 [Sequencer,INFO] Run #536807 stopped
23:30:19.344 2023/08/02 [Sequencer,INFO] Run #536807 started
23:30:15.672 2023/08/02 [Sequencer,INFO] Run #536806 stopped
23:24:03.697 2023/08/02 [Sequencer,INFO] Run #536806 started
23:23:59.570 2023/08/02 [Sequencer,INFO] Run #536805 stopped
23:17:33.870 2023/08/02 [Sequencer,INFO] Run #536805 started
23:17:30.488 2023/08/02 [Sequencer,INFO] Run #536804 stopped
23:11:21.650 2023/08/02 [Sequencer,INFO] Run #536804 started
23:11:18.176 2023/08/02 [Sequencer,INFO] Run #536803 stopped
23:05:00.652 2023/08/02 [Sequencer,INFO] Run #536803 started
23:04:56.880 2023/08/02 [Sequencer,INFO] Run #536802 stopped
22:58:59.679 2023/08/02 [Sequencer,INFO] Run #536802 started
22:58:56.249 2023/08/02 [Sequencer,INFO] Run #536801 stopped
22:52:43.033 2023/08/02 [Sequencer,INFO] Run #536801 started
22:52:39.452 2023/08/02 [Sequencer,INFO] Run #536800 stopped
22:46:37.568 2023/08/02 [Sequencer,INFO] Run #536800 started
22:46:33.953 2023/08/02 [Sequencer,INFO] Run #536799 stopped
22:40:28.270 2023/08/02 [Sequencer,INFO] Run #536799 started
22:40:24.906 2023/08/02 [Sequencer,INFO] Run #536798 stopped
22:33:53.886 2023/08/02 [Sequencer,INFO] Run #536798 started
22:33:50.529 2023/08/02 [Sequencer,INFO] Run #536797 stopped
22:27:35.712 2023/08/02 [Sequencer,INFO] Run #536797 started
22:27:32.270 2023/08/02 [Sequencer,INFO] Run #536796 stopped
22:21:26.568 2023/08/02 [Sequencer,INFO] Run #536796 started
22:21:23.007 2023/08/02 [Sequencer,INFO] Run #536795 stopped
22:15:25.397 2023/08/02 [Sequencer,INFO] Run #536795 started
22:15:21.933 2023/08/02 [Sequencer,INFO] Run #536794 stopped
22:09:18.390 2023/08/02 [Sequencer,INFO] Run #536794 started
22:09:14.976 2023/08/02 [Sequencer,INFO] Run #536793 stopped
22:02:59.421 2023/08/02 [Sequencer,INFO] Run #536793 started
22:02:56.075 2023/08/02 [Sequencer,INFO] Run #536792 stopped
21:56:39.940 2023/08/02 [Sequencer,INFO] Run #536792 started
21:56:36.518 2023/08/02 [Sequencer,INFO] Run #536791 stopped
21:50:39.308 2023/08/02 [Sequencer,INFO] Run #536791 started
21:50:35.893 2023/08/02 [Sequencer,INFO] Run #536790 stopped
21:44:27.002 2023/08/02 [Sequencer,INFO] Run #536790 started
21:44:23.435 2023/08/02 [Sequencer,INFO] Run #536789 stopped
21:38:23.480 2023/08/02 [Sequencer,INFO] Run #536789 started
21:38:20.087 2023/08/02 [Sequencer,INFO] Run #536788 stopped
21:31:57.894 2023/08/02 [Sequencer,INFO] Run #536788 started
21:31:54.508 2023/08/02 [Sequencer,INFO] Run #536787 stopped
21:26:00.453 2023/08/02 [Sequencer,INFO] Run #536787 started
21:25:57.011 2023/08/02 [Sequencer,INFO] Run #536786 stopped
21:20:00.772 2023/08/02 [Sequencer,INFO] Run #536786 started
21:19:57.301 2023/08/02 [Sequencer,INFO] Run #536785 stopped
21:13:46.342 2023/08/02 [Sequencer,INFO] Run #536785 started
21:13:42.774 2023/08/02 [Sequencer,INFO] Run #536784 stopped
21:07:24.345 2023/08/02 [Sequencer,INFO] Run #536784 started
21:07:20.974 2023/08/02 [Sequencer,INFO] Run #536783 stopped
21:00:34.335 2023/08/02 [Sequencer,INFO] Run #536783 started
21:00:30.962 2023/08/02 [Sequencer,INFO] Run #536782 stopped
20:54:26.725 2023/08/02 [Sequencer,INFO] Run #536782 started
20:54:23.260 2023/08/02 [Sequencer,INFO] Run #536781 stopped
20:48:17.056 2023/08/02 [Sequencer,INFO] Run #536781 started
20:48:13.680 2023/08/02 [Sequencer,INFO] Run #536780 stopped
20:41:54.420 2023/08/02 [Sequencer,INFO] Run #536780 started
20:41:51.061 2023/08/02 [Sequencer,INFO] Run #536779 stopped
20:35:50.859 2023/08/02 [Sequencer,INFO] Run #536779 started
20:35:47.280 2023/08/02 [Sequencer,INFO] Run #536778 stopped
20:29:51.914 2023/08/02 [Sequencer,INFO] Run #536778 started
20:29:48.259 2023/08/02 [Sequencer,INFO] Run #536777 stopped
20:23:41.311 2023/08/02 [Sequencer,INFO] Run #536777 started
20:23:37.784 2023/08/02 [Sequencer,INFO] Run #536776 stopped
20:17:25.427 2023/08/02 [Sequencer,INFO] Run #536776 started
20:17:21.759 2023/08/02 [Sequencer,INFO] Run #536775 stopped
20:11:15.119 2023/08/02 [Sequencer,INFO] Run #536775 started
20:11:11.604 2023/08/02 [Sequencer,INFO] Run #536774 stopped
20:05:05.195 2023/08/02 [Sequencer,INFO] Run #536774 started
20:05:01.833 2023/08/02 [Sequencer,INFO] Run #536773 stopped
19:59:04.956 2023/08/02 [Sequencer,INFO] Run #536773 started
19:59:01.477 2023/08/02 [Sequencer,INFO] Run #536772 stopped
19:52:59.175 2023/08/02 [Sequencer,INFO] Run #536772 started
19:52:55.092 2023/08/02 [Sequencer,INFO] Run #536771 stopped
19:46:40.384 2023/08/02 [Sequencer,INFO] Run #536771 started
19:46:36.999 2023/08/02 [Sequencer,INFO] Run #536770 stopped
19:40:31.744 2023/08/02 [Sequencer,INFO] Run #536770 started
19:40:28.278 2023/08/02 [Sequencer,INFO] Run #536769 stopped
19:34:17.986 2023/08/02 [Sequencer,INFO] Run #536769 started
19:34:14.533 2023/08/02 [Sequencer,INFO] Run #536768 stopped
19:28:11.473 2023/08/02 [Sequencer,INFO] Run #536768 started
19:28:08.058 2023/08/02 [Sequencer,INFO] Run #536767 stopped
19:22:01.786 2023/08/02 [Sequencer,INFO] Run #536767 started
19:21:58.413 2023/08/02 [Sequencer,INFO] Run #536766 stopped
19:15:54.577 2023/08/02 [Sequencer,INFO] Run #536766 started
  2577   09 Aug 2023 Konstantin OlchanskiBug ReportError accessing history files
I confirm I see same on the agmini system. Two problems: (a) error message is wrong, it's a 
short read, not a read error (clue: read() syscall does not return "no such file"). (b) 
mlogger is supposed to write history in record-size blocks, read in the same record size 
blocks. UNIX file semantics require that both reader and writer see read() and write() as 
atomic, even on NFS, so mhttpd should never see partially written history records. I can 
debug this on the agmini system. Probably should.

Problem (a) fixed in commit bb423c8680cc67220312534403840442868f2b3b, if you update, you 
should see error messages about "short read" and the read sizes it reports are very 
interesting, please put them in the elog here.

K.O.


> We sporadically (like once per few hours) have an error message when we access the 
> history plots through mhttpd:
> 
> 07:21:35.109 2023/08/03 [mhttpd,ERROR] 
> [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
> or directory)
> 
> When I log in to the machine, I properly see the file and also can access it
> 
> [meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
> -rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat
> 
> and I also can dump that file. 
> 
> When I try again with mhttpd, I properly see that file. 
> 
> Now in principle this is not a problem, but the error message is annoying, since this 
> is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
> is an OS issue, I wonder if we should add code to retry the file access in case we get 
> that error.
> 
> Anybody seen a similar thing?
> 
> Best,
> Stefan
  2588   16 Aug 2023 Stefan RittBug ReportError accessing history files
Tonight we got another error of that type after the update:

04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)

This morning I looked at the file, and it was there:

[meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
-rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
[meg@megon02 history]$


Stefan
  2591   17 Aug 2023 Konstantin OlchanskiBug ReportError accessing history files
Confirmed. The error message is wrong. It is printed after a short read(), but short read() does not 
set errno, and errno reported by the error message is from some previous syscall. Corrected error 
message is already committed. K.O.


> Tonight we got another error of that type after the update:
> 
> 04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)
> 
> This morning I looked at the file, and it was there:
> 
> [meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
> -rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
> [meg@megon02 history]$
> 
> 
> Stefan
  2593   19 Aug 2023 Stefan RittBug ReportError accessing history files
Still get the same error with the latest version:

3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

Stefan
  2615   06 Oct 2023 Konstantin OlchanskiBug ReportError accessing history files
> Still get the same error with the latest version:
> 3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

I figured it out. I claim defense of temporary insanity and old age senility.

1) I added the "short read" check in one place, missed the second place
2) writes of history were meant to be atomic, and they are atomic in my head, but not in the midas 
code:

history_schema.cxx:HsFileSchema::write_event()
...
   status = write(s->writer_fd, &t, 4);
   if (status != 4) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(timestamp) errno 
%d (%s)", s->file_name.c_str(), errno, strerror(errno));
      return HS_FILE_ERROR;
   }

   status = write(s->writer_fd, data, expected_size);
   if (status != expected_size) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(%d) errno %d 
(%s)", s->file_name.c_str(), data_size, errno, strerror(errno));
      return HS_FILE_ERROR;
   }
...

that's not atomic, that's two separate writes. history reader hits the history file between the 
two writes and gets a short read of 4 bytes timestamp instead of full record size. that's the 
error message reported by mhttpd.

two fixes forthcoming:
a) check for short read in the 2nd place that I missed
b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

Overall, I am pretty happy that this is the only bug in the FILE history code found in N years, 
and it does not even cause data corruption...

K.O.
  2616   06 Oct 2023 Konstantin OlchanskiBug ReportError accessing history files
> two fixes forthcoming:
> a) check for short read in the 2nd place that I missed
> b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

commit 713ec4a583365d57ffcd700ceeb09dcc14518295

K.O.
  1255   05 Apr 2017 Andreas SuterBug ReportEquipment Expand doesn't work anymore
I'd liked very much the possibility to hide away Equipment on the main page. It
is also nice to have the '+' to get it quickly back when needed. However, this
seems not to work anymore (git c9d9d604803). Is this a feature or something went
wrong?
  1257   10 Apr 2017 Stefan RittBug ReportEquipment Expand doesn't work anymore
> I'd liked very much the possibility to hide away Equipment on the main page. It
> is also nice to have the '+' to get it quickly back when needed. However, this
> seems not to work anymore (git c9d9d604803). Is this a feature or something went
> wrong?

The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
branch. Please check if it's working.

Stefan
  1258   10 Apr 2017 Andreas SuterBug ReportEquipment Expand doesn't work anymore
> > I'd liked very much the possibility to hide away Equipment on the main page. It
> > is also nice to have the '+' to get it quickly back when needed. However, this
> > seems not to work anymore (git c9d9d604803). Is this a feature or something went
> > wrong?
> 
> The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
> implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
> this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
> branch. Please check if it's working.
> 
> Stefan

Tested it on two machines and expansion is back and working! Thanks a lot!

Andreas
  1264   15 Apr 2017 Konstantin OlchanskiBug ReportEquipment Expand doesn't work anymore
> > > I'd liked very much the possibility to hide away Equipment on the main page. It
> > > is also nice to have the '+' to get it quickly back when needed. However, this
> > > seems not to work anymore (git c9d9d604803). Is this a feature or something went
> > > wrong?
> > 
> > The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
> > implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
> > this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
> > branch. Please check if it's working.
> > 
> > Stefan
> 
> Tested it on two machines and expansion is back and working! Thanks a lot!
> 

Confirmed fixed. Thanks. Not sure how this got lost.

K.O.
  2014   17 Nov 2020 Stefan RittInfoEquipment "common" settings in ODB
Today I addressed a topic which bugged me since long time. The ODB contains 
settings under /Equipment/<name>/Common which are a "mirror" of the equipment[] 
setting in a frontend (using the mfe.cxx framework). If the "Common" entry in 
the ODB is not present (fresh experiment), the equipment[] settings from the 
frontend are copied to the ODB. But if it exists, it takes precedence over the 
equipment[] entries, which is wrong in my opinion. Like if you change some 
settings in equipment[] (like the logging period of the history), then recompile 
and restart the frontend, the old values in the ODB are kept and your 
modification in the frontend code has no effect.

Starting on commit c3017c6c on Nov. 17th 2020 I reversed the precedence: Now, on 
each start of the frontend program, the values from equipment[] are written to 
the ODB. They are still "live". If one changes them when the frontend is 
running, that change takes effect immediately. But on the next restart of the 
frontend, the old values from equipment[] is put back there.

I fell too many times into this trap, and I hope the modification helps 
everybody. If there are however experiments which rely on the fact that the 
common settings in the ODB are NOT overwritten by the frontend, please let me 
know and I can put a flag "EQUIPMENT_FE_PRECEDENCE = FALSE" somewhere to restore 
the old behaviour.

Stefan
  2019   20 Nov 2020 Pierre-Andre AmaudruzInfoEquipment "common" settings in ODB
Indeed this "mirror" of the ODB in settings option can cause frustration in 
particular when we think the ODB is empty but is not.
In the other hand, over time the settings are adjusted to a particular 
configuration or touched or not by the individual run preset parameters. Later, if 
a bug or code correction requires multiple restart of the fe, for every start of 
the application, you loose the latest configuration. This can be frustrating as 
well until you force a post-setting or report the specifics parameters in the fe 
code.
BTW I believe, we originally went for the ODB priority for that specific reason.
 
I would be in favour for having a general flag (FALSE) in /experiment which would 
define this global behaviour.  
PAA

> Today I addressed a topic which bugged me since long time. The ODB contains 
> settings under /Equipment/<name>/Common which are a "mirror" of the equipment[] 
> setting in a frontend (using the mfe.cxx framework). If the "Common" entry in 
> the ODB is not present (fresh experiment), the equipment[] settings from the 
> frontend are copied to the ODB. But if it exists, it takes precedence over the 
> equipment[] entries, which is wrong in my opinion. Like if you change some 
> settings in equipment[] (like the logging period of the history), then recompile 
> and restart the frontend, the old values in the ODB are kept and your 
> modification in the frontend code has no effect.
> 
> Starting on commit c3017c6c on Nov. 17th 2020 I reversed the precedence: Now, on 
> each start of the frontend program, the values from equipment[] are written to 
> the ODB. They are still "live". If one changes them when the frontend is 
> running, that change takes effect immediately. But on the next restart of the 
> frontend, the old values from equipment[] is put back there.
> 
> I fell too many times into this trap, and I hope the modification helps 
> everybody. If there are however experiments which rely on the fact that the 
> common settings in the ODB are NOT overwritten by the frontend, please let me 
> know and I can put a flag "EQUIPMENT_FE_PRECEDENCE = FALSE" somewhere to restore 
> the old behaviour.
> 
> Stefan
  2032   27 Nov 2020 Konstantin OlchanskiInfoEquipment "common" settings in ODB
> Today I addressed a topic which bugged me since long time.

Right. No easy subject. For me, too, this has been a problem in MIDAS for a long time.

> Now, on each start of the frontend program, the values from equipment[] are written to 
> the ODB. They are still "live". If one changes them when the frontend is 
> running, that change takes effect immediately. But on the next restart of the 
> frontend, the old values from equipment[] is put back there.

There is a downside from this behaviour.

If some values in equipment/common are "live" and the user is expected to change them,
the user will be unpleasantly surprised when their changes magically disappear (after reboot,
after frontend crash, after run restart if experiment requires restarting some frontends
before starting a new run).

This change will also break some experiments that rely in things like specifying
event buffer names through ODB. But experiments can adapt and specify buffer names
through command line switch instead of ODB.

This new way also it makes the "live" Common/Period unusable. Sure I can speed up or slow
down a frontend even during the run, but if my change does not "stick", what good is it?

Personally, I think there is no easy solution for all these troubles.

I would advocate the following approach:

- think of MIDAS as a "mature" system,
- treasure backward compatibility
- (if we must break backward compatibility to introduce a new "must have" improvement, so be it)
- document how things work. if it is clearly written down what different fields in "common" do, fewer people 
"get burned" by unexpected or illogical things. (and any non-trivial system has plenty of those).

Going back to ODB equipment/common, my experience with midas and odb tells me
that one should avoid mixing together ODB entries set by user and ODB entries set by code.

For example, separating them as equipment/settings and equipment/variables works well. Mixing
them as in equipment/common and sequencer/state causes trouble.

So perhaps we should split Equipment/common into two pieces, user settable fields like
"Period" and "event buffer name" would move to equipment/settings or whatever.

This will open the discussion of which items in equipment/common should be user settable,
and some people would want event buffer specified in the code to prevail, while other
people would want the name from odb to prevail, and both are valid but conflicting preferences.

Or we could bite the bullet and say, equipment/common is controlled by the frontend code,
the user should not change it. (and mark it read-only in ODB).

For all the pain this may cause, at least this will make it self-consistent.

Per this proposal, in addition to Stefan's change, the hotlink on equipment/common goes away,
"period" is no longer "live" and the whole subdirectory is made "read-only".

K.O.
  2036   27 Nov 2020 Stefan RittInfoEquipment "common" settings in ODB
Ok, so what about the following proposal:

- I change back the mfe.cxx code to behave like before (ODB has precedence and does not get overwritten when the 
front-end restarts)

- I add a global flag

BOOL equipment_common_overwrite;

and pre-set it to FALSE;

- So if nothing is changed the flag stays false and ODB keeps precedence

- If a frontend wants to overwrite equipment/common on each start, the user sets

BOOL equipment_common_overwrite = TRUE;

near the equipment[] structure in the front-end code. 

- If the flag is true, the mfe.cxx init code copies the equipment[] structure to the ODB on each frontend start

I believe this way we can keep backward compatibility, and add the new way with minimal effort. The only downside 
is that all frontends on this plane have to add at least "BOOL equipment_common_overwrite = FALSE;" in their 
code.

I know global variables are evil, but this way the user can just add the line above to the equipment[] array, so 
one sees this when one edits the equipment[] array, giving motivation to change as needed. So the code would be



BOOL equipment_common_overwrite = TRUE;

EQUIPMENT equipment[] = {
 ....
}



An alternative way would be to add a function

  set_equipment_common_overwrite(TRUE);

into the frontend_init() code. That's somehow cleaner (still needs an internal global variable), but it has to go 
into frontend_init() so won't be at the same place as the EQUIPMENT list in the frontend.

Thoughts?

Best,
Stefan
  2037   27 Nov 2020 Konstantin OlchanskiInfoEquipment "common" settings in ODB
Yes, I think this will work.

For old mfe.c frontends, global variable set to "do it the new way" should be okey,
new experiments will have it the new way. Old experiments, will be forced to add a one-line definition
of this global variable (otherwise mfe.o will not link), at that time they get to chose "new way" or "old way".

For the new TMFE c++ frontend, this will work naturally when they create the Equipment Common object,
in the object constructor, you can see how it explicitly honors or overwrites the ODB common entries.

The TMFE frontend does not do a live "period", so there should be no issue with that.

Should I open a bitbucket issue "update TMFE frontend to new Equipment/Common scheme", to make sure
I do not forget about it?

K.O.
ELOG V3.1.4-2e1708b5