> Still get the same error with the latest version:
> 3:28 [mhttpd,ERROR] [history_schema.cxx:2913:FileHistory::read_data,ERROR] Cannot read
> '/data2/history/mhf_1692391703_20230818_hv_tc.dat', read() errno 2 (No such file or directory)
I figured it out. I claim defense of temporary insanity and old age senility.
1) I added the "short read" check in one place, missed the second place
2) writes of history were meant to be atomic, and they are atomic in my head, but not in the midas
code:
history_schema.cxx:HsFileSchema::write_event()
...
status = write(s->writer_fd, &t, 4);
if (status != 4) {
cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(timestamp) errno
%d (%s)", s->file_name.c_str(), errno, strerror(errno));
return HS_FILE_ERROR;
}
status = write(s->writer_fd, data, expected_size);
if (status != expected_size) {
cm_msg(MERROR, "FileHistory::write_event", "Cannot write to \'%s\', write(%d) errno %d
(%s)", s->file_name.c_str(), data_size, errno, strerror(errno));
return HS_FILE_ERROR;
}
...
that's not atomic, that's two separate writes. history reader hits the history file between the
two writes and gets a short read of 4 bytes timestamp instead of full record size. that's the
error message reported by mhttpd.
two fixes forthcoming:
a) check for short read in the 2nd place that I missed
b) two write() are replaced by 2 memcpy() to a preallocated buffer and 1 write()
Overall, I am pretty happy that this is the only bug in the FILE history code found in N years,
and it does not even cause data corruption...
K.O. |