Back Midas Rome Roody Rootana
  Midas DAQ System, Page 120 of 136  Not logged in ELOG logo
New entries since:Wed Dec 31 16:00:00 1969
ID Date Author Topic Subjectdown
  649   29 Sep 2009 Exaos LeeBug ReportError invoking 'odbedit': db_validate_size
It seems to be fixed in svn-r4568:
-----------------------------------------------------------------------
r4568 | olchanski | 2009-09-27 23:56:39 +0800 (日, 27  9月 2009) | 5 lines

mhttpd: compile using the C++ compiler.
mhttpd: fix wrong initialization of /History/ODBC_DSN
odb.c: size of EQUIPMENT_INFO has changed.
Makefile: use "-O2" compiler flag instead of "-O3" - to fix SL5 gcc crash (ICE) 

But another compiling error:
Linking CXX executable bin/mh2sql
CMakeFiles/mh2sql.dir/utils/mh2sql.cxx.o: In function `main':
/opt/DAQ/repos/bot/midas/utils/mh2sql.cxx:150: undefined reference to `MakeMidasHistoryODBC()'
  650   30 Sep 2009 Konstantin OlchanskiBug ReportError invoking 'odbedit': db_validate_size
> $ odbedit -e expcvadc
> odbedit: /opt/DAQ/repos/bot/midas/src/odb.c:651: db_validate_sizes: Assertion 
`sizeof(EQUIPMENT_INFO) == 400' failed.

Yes, this is now fixed, svn rev 4571 should be okey. Sorry about causing this problem - Stefan added 
some useful additional data to EQUIPMENT_INFO and my check for binary compatibility caught it and 
complained. Unfortunately on Saturday Stefan had to abruptly go back to PSI and things have been a little 
bit chaotic because we did not complete the testing of all the new changes and additions.

K.O.
  408   08 Oct 2007 Carl MetelkoBug ReportError in data format- ending blocks on 32bit boundary x86_64
Hi,
    I found that midas banks can be given an extra 32 bits of zeros when
trying to keep to 32bit boundary on my x86_64. 

This can be fixed by changing (in midas.h)
#define ALIGN8(x)  (((x)+7) & ~7)
to
#define ALIGN8(x)  (((x)+3) & ~3)

Is there any bad consequences doing this?
  409   08 Oct 2007 Stefan RittBug ReportError in data format- ending blocks on 32bit boundary x86_64
> Hi,
>     I found that midas banks can be given an extra 32 bits of zeros when
> trying to keep to 32bit boundary on my x86_64. 
> 
> This can be fixed by changing (in midas.h)
> #define ALIGN8(x)  (((x)+7) & ~7)
> to
> #define ALIGN8(x)  (((x)+3) & ~3)
> 
> Is there any bad consequences doing this?

Yes. ALIGN8 means 'align to 8-byte boundary' (64-bit), and if you change that, you
break the code at various locations. Furthermore, 8-byte aligned access is faster
on x86_64 than 4-byte aligned access, so you will get a performance penalty. If
course if you have very many small banks, the zero padding can cause some
overhead, but in that case you could combine some data into a single bank.
  2412   20 Jun 2022 jianrunBug ReportError in "midas/src/mana.cxx"
Dear Midas developers,

When we are running the examples in $MIDASSYS/examples/experiment/, we meet some 
problems when analyzing the results:
1. When we analyze the data using the analyzer: ./analyzer -i run00001.mid -o 
run00001.rz  , we find some bugs: 
"
Root server listening on port 9090...
Running analyzer offline. Stop with "!"
[Analyzer,ERROR] [mana.cxx:1832:bor,ERROR] HBOOK support is not compiled in
[Analyzer,INFO] Set run number 6 in ODB
Load ODB from run 6...OK
run00006.mid:2680  events, 0.00s
"
We think this occurs in the "midas/src/mana.cxx ". How can we solve this?

2. When we analyze the above data, an error also occurs: 
[Analyzer,ERROR] [odb.cxx:847:db_validate_name,ERROR] Invalid name 
"/Analyzer/Tests/Always true/Rate [Hz]" passed to db_create_key_wlocked: should 
not contain "["

We simply fixed that just by replacing the "Rate [Hz]" with "Rate" in the 
test_write in midas/src/mana.cxx 
We are curious whether you can fix the problem permanently in the next version, 
or we are not running the code properly. Thanks!
  706   24 Jun 2010 Jimmy NgaiForumError connecting to back-end computer
Dear All,

This is my first time running an experiment on separate computers. I followed 
the documentation (https://midas.psi.ch/htmldoc/quickstart.html) to setup the 
files:
/etc/services
/etc/xinetd.d/midas
/etc/ld.so.conf
/etc/exptab

but when I started the frontend program in the front-end computer I got the 
following error (computerB is my back-end): 

[midas.c:8623:rpc_server_connect,ERROR] mserver subprocess could not be started 
(check path)
[mfe.c:2573:mainFE,ERROR] Cannot connect to experiment '' on host 'computerB', 
status 503

In both front-end and back-end computers only a file '.SYSMSG.SHM' was created 
after the attempt. If I start the frontend program somewhere in the back-end 
computer by connecting to 'localhost', seven .SHM files are created in the 
experiment directory together with a .RPC.SHM in the directory where I run the 
frontend program.

Is that I misconfigure something? I cannot find a solution...

Thanks.

Best Regards,
Jimmy
  707   26 Jun 2010 Konstantin OlchanskiForumError connecting to back-end computer
> This is my first time running an experiment on separate computers. I followed 
> the documentation (https://midas.psi.ch/htmldoc/quickstart.html) to setup the 
> files:
> /etc/services
> /etc/xinetd.d/midas


Hi, there. I have not recently run mserver through inetd, and we usually do not do
that at TRIUMF. We do this:

a) on the main computer: start mserver: "mserver -p 7070 -D" (note - use non-default
port - can use different ports for different experiments)
b) on remote computer: "odbedit -h main:7070" ("main" is the hostname of your main
computer). Use same "-h" switch for all other programs, including the frontends.

This works well when all computers are on the same network, but if you have some
midas clients running on private networks you may get into trouble when they try to
connect to each other and fail because network routing is funny.


K.O.
  708   27 Jun 2010 Jimmy NgaiForumError connecting to back-end computer
> Hi, there. I have not recently run mserver through inetd, and we usually do not do
> that at TRIUMF. We do this:
> 
> a) on the main computer: start mserver: "mserver -p 7070 -D" (note - use non-default
> port - can use different ports for different experiments)
> b) on remote computer: "odbedit -h main:7070" ("main" is the hostname of your main
> computer). Use same "-h" switch for all other programs, including the frontends.
> 
> This works well when all computers are on the same network, but if you have some
> midas clients running on private networks you may get into trouble when they try to
> connect to each other and fail because network routing is funny.

Hi K.O.,

Thanks for your reply. I have tried your way but I got the same error: 

[midas.c:8623:rpc_server_connect,ERROR] mserver subprocess could not be started 
(check path)

My front-end and back-end computers are on the same network connected by a router. I 
have allowed port 7070 in the firewall and done the port forwarding in the router (for 
connecting from outside the network). From the error message it seems that some 
processes can not be started automatically. Could it be related to some security 
settings such as the SELinux?

Best Regards,
Jimmy
  709   28 Jun 2010 Stefan RittForumError connecting to back-end computer
> > Hi, there. I have not recently run mserver through inetd, and we usually do not do
> > that at TRIUMF. We do this:
> > 
> > a) on the main computer: start mserver: "mserver -p 7070 -D" (note - use non-default
> > port - can use different ports for different experiments)
> > b) on remote computer: "odbedit -h main:7070" ("main" is the hostname of your main
> > computer). Use same "-h" switch for all other programs, including the frontends.
> > 
> > This works well when all computers are on the same network, but if you have some
> > midas clients running on private networks you may get into trouble when they try to
> > connect to each other and fail because network routing is funny.
> 
> Hi K.O.,
> 
> Thanks for your reply. I have tried your way but I got the same error: 
> 
> [midas.c:8623:rpc_server_connect,ERROR] mserver subprocess could not be started 
> (check path)
> 
> My front-end and back-end computers are on the same network connected by a router. I 
> have allowed port 7070 in the firewall and done the port forwarding in the router (for 
> connecting from outside the network). From the error message it seems that some 
> processes can not be started automatically. Could it be related to some security 
> settings such as the SELinux?

The way connections work under Midas is there is a callback scheme. The client starts 
mserver on the back-end, then the back-end connects back to the front-end on three 
different ports. These ports are assigned dynamically by the operating system and are 
typically in the range 40000-60000. So you also have to allow the reverse connection on 
your firewalls.
  710   28 Jun 2010 Jimmy NgaiForumError connecting to back-end computer
> The way connections work under Midas is there is a callback scheme. The client starts 
> mserver on the back-end, then the back-end connects back to the front-end on three 
> different ports. These ports are assigned dynamically by the operating system and are 
> typically in the range 40000-60000. So you also have to allow the reverse connection on 
> your firewalls.

It works now after allowing ports 40000-60000 in the front-end computer. Thanks!

Best Regards,
Jimmy
  711   29 Jun 2010 Konstantin OlchanskiForumError connecting to back-end computer
> > The way connections work under Midas is there is a callback scheme. The client starts 
> > mserver on the back-end, then the back-end connects back to the front-end on three 
> > different ports. These ports are assigned dynamically by the operating system and are 
> > typically in the range 40000-60000. So you also have to allow the reverse connection on 
> > your firewalls.
> 
> It works now after allowing ports 40000-60000 in the front-end computer. Thanks!


Yes, right. Midas networking does not like firewalls.

In the nutshell, TCP connections on all TCP ports have to be open between all computers
running MIDAS. I think in practice it is not a problem: you only ever have a finite (a small
integer) number of computers running MIDAS and you can be added them as exceptions to the
firewall rules. These exceptions should not create any security problem because you still have
the MIDAS computers firewalled from the outside world and one hopes that they will not be
attacking each other.

P.S. Permitting ports 40000-60000 is not good enough. TCP ports are allocated to TCP
connections semi-randomly from a 16-bit address space (0..65535) and your system will bomb
whenever port numbers like 39999 or 60001 get used.


K.O.
  2569   02 Aug 2023 Stefan RittBug ReportError accessing history files
We sporadically (like once per few hours) have an error message when we access the 
history plots through mhttpd:

07:21:35.109 2023/08/03 [mhttpd,ERROR] 
[history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
or directory)

When I log in to the machine, I properly see the file and also can access it

[meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
-rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat

and I also can dump that file. 

When I try again with mhttpd, I properly see that file. 

Now in principle this is not a problem, but the error message is annoying, since this 
is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
is an OS issue, I wonder if we should add code to retry the file access in case we get 
that error.

Anybody seen a similar thing?

Best,
Stefan
  2577   09 Aug 2023 Konstantin OlchanskiBug ReportError accessing history files
I confirm I see same on the agmini system. Two problems: (a) error message is wrong, it's a 
short read, not a read error (clue: read() syscall does not return "no such file"). (b) 
mlogger is supposed to write history in record-size blocks, read in the same record size 
blocks. UNIX file semantics require that both reader and writer see read() and write() as 
atomic, even on NFS, so mhttpd should never see partially written history records. I can 
debug this on the agmini system. Probably should.

Problem (a) fixed in commit bb423c8680cc67220312534403840442868f2b3b, if you update, you 
should see error messages about "short read" and the read sizes it reports are very 
interesting, please put them in the elog here.

K.O.


> We sporadically (like once per few hours) have an error message when we access the 
> history plots through mhttpd:
> 
> 07:21:35.109 2023/08/03 [mhttpd,ERROR] 
> [history_schema.cxx:2345:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1690890685_20230801_dc_hv.dat', read() errno 2 (No such file 
> or directory)
> 
> When I log in to the machine, I properly see the file and also can access it
> 
> [meg@megon02 history]$ ls -l mhf_1690890685_20230801_dc_hv.dat
> -rw-rw-r--. 1 meg meg 34176312 Aug  3 07:23 mhf_1690890685_20230801_dc_hv.dat
> 
> and I also can dump that file. 
> 
> When I try again with mhttpd, I properly see that file. 
> 
> Now in principle this is not a problem, but the error message is annoying, since this 
> is the only error we get in 24 hours. I attached a 24h log to see what I mean. If this 
> is an OS issue, I wonder if we should add code to retry the file access in case we get 
> that error.
> 
> Anybody seen a similar thing?
> 
> Best,
> Stefan
  2588   16 Aug 2023 Stefan RittBug ReportError accessing history files
Tonight we got another error of that type after the update:

04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)

This morning I looked at the file, and it was there:

[meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
-rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
[meg@megon02 history]$


Stefan
  2591   17 Aug 2023 Konstantin OlchanskiBug ReportError accessing history files
Confirmed. The error message is wrong. It is printed after a short read(), but short read() does not 
set errno, and errno reported by the error message is from some previous syscall. Corrected error 
message is already committed. K.O.


> Tonight we got another error of that type after the update:
> 
> 04:17 - [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692128214_20230815_gassystem.dat', read() errno 2 (No such file or directory)
> 
> This morning I looked at the file, and it was there:
> 
> [meg@megon02 history]$ ls -alg mhf_1692128214_20230815_gassystem.dat
> -rw-rw-r--. 1 meg 4663228 Aug 17 08:50 mhf_1692128214_20230815_gassystem.dat
> [meg@megon02 history]$
> 
> 
> Stefan
  2593   19 Aug 2023 Stefan RittBug ReportError accessing history files
Still get the same error with the latest version:

3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
'/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

Stefan
  2615   06 Oct 2023 Konstantin OlchanskiBug ReportError accessing history files
> Still get the same error with the latest version:
> 3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read 
> '/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)

I figured it out. I claim defense of temporary insanity and old age senility.

1) I added the "short read" check in one place, missed the second place
2) writes of history were meant to be atomic, and they are atomic in my head, but not in the midas 
code:

history_schema.cxx:HsFileSchema::write_event()
...
   status = write(s->writer_fd, &t, 4);
   if (status != 4) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(timestamp) errno 
%d (%s)", s->file_name.c_str(), errno, strerror(errno));
      return HS_FILE_ERROR;
   }

   status = write(s->writer_fd, data, expected_size);
   if (status != expected_size) {
      cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(%d) errno %d 
(%s)", s->file_name.c_str(), data_size, errno, strerror(errno));
      return HS_FILE_ERROR;
   }
...

that's not atomic, that's two separate writes. history reader hits the history file between the 
two writes and gets a short read of 4 bytes timestamp instead of full record size. that's the 
error message reported by mhttpd.

two fixes forthcoming:
a) check for short read in the 2nd place that I missed
b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

Overall, I am pretty happy that this is the only bug in the FILE history code found in N years, 
and it does not even cause data corruption...

K.O.
  2616   06 Oct 2023 Konstantin OlchanskiBug ReportError accessing history files
> two fixes forthcoming:
> a) check for short read in the 2nd place that I missed
> b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()

commit 713ec4a583365d57ffcd700ceeb09dcc14518295

K.O.
  1255   05 Apr 2017 Andreas SuterBug ReportEquipment Expand doesn't work anymore
I'd liked very much the possibility to hide away Equipment on the main page. It
is also nice to have the '+' to get it quickly back when needed. However, this
seems not to work anymore (git c9d9d604803). Is this a feature or something went
wrong?
  1257   10 Apr 2017 Stefan RittBug ReportEquipment Expand doesn't work anymore
> I'd liked very much the possibility to hide away Equipment on the main page. It
> is also nice to have the '+' to get it quickly back when needed. However, this
> seems not to work anymore (git c9d9d604803). Is this a feature or something went
> wrong?

The expansion of the equipment list is handled by a Cookie ("expeq" being 1 or 0). When Konstantin 
implemented the mongoose server instead of the internal mhttp server, he neglected to evaluate 
this cookie. I fixed this now (also renamed the cookie to "midas_expeq") in the current development 
branch. Please check if it's working.

Stefan
ELOG V3.1.4-2e1708b5