> Hello,
>
> I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
> The error is :
>
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
>
> This seems to happen in two different ways:
>
> 1. Error being reported repeatedly
> 2. Occasional single errors being reported
>
> When the first of these presents, we typically restart the FE to "solve" the problem.
> Case 2. is typically ignored.
>
> The code in question is:
>
> void TMFE::Sleep(double time)
> {
> int status;
> fd_set fdset;
> struct timeval timeout;
>
> FD_ZERO(&fdset);
>
> timeout.tv_sec = time;
> timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
>
> while (1) {
> status = select(1, &fdset, NULL, NULL, &timeout);
> #ifdef EINTR
> if (status < 0 && errno == EINTR) {
> continue;
> }
> #endif
> break;
> }
>
> if (status < 0) {
> TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
> }
> }
>
> So it looks like either file descriptor of the timeval struct must have a problem.
> From some reading it seems that invalid timeval structs are often caused by one or both
> of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
> correctly set initially.
>
> From the select() man page I see:
>
> RETURN VALUE
> On success, select() and pselect() return the number of file descriptors contained in
> the three returned descriptor sets (that is, the total number of bits that are set in
> readfds, writefds, exceptfds). The return value may be zero if the timeout expired
> before any file descriptors became ready.
>
> On error, -1 is returned, and errno is set to indicate the error; the file descriptor
> sets are unmodified, and timeout becomes undefined.
>
> The second paragraph quoted from the man page above would indicate to me that perhaps the
> timeout needs to be reset inside the if block. eg:
>
> if (status < 0 && errno == EINTR) {
> timeout.tv_sec = time;
> timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
> continue;
> }
>
> Please note that I've only just briefly looked at this and was hoping someone more
> familiar with using select() as a way to sleep() might be better able to understand
> what is happening.
>
> I wonder also if now that midas requires stricter/newer c++ standards if there maybe
> some more straightforward method to sleep that is sufficiently robust and portable.
>
> Thanks,
>
> Nick.
I had the same error a few months ago, though I wasn't using a tagged release. It happened because I was calling TMFE::Sleep()
with a negative time. If your issues were caused by the same reason, TMFE::Sleep() can handle negative times since commit
591f78f (https://bitbucket.org/tmidas/midas/commits/591f78f52893d5ffd64bf4e52a1daac537ebd672).
Early in my debugging, I did come to the same conclusions you did, and actually tried a similar solution the one you suggested.
This was a few months ago and I didn't write down what happened, but I believe it didn't work because in my case the errno was
something other than EINTR, and/or the timeval was still an invalid argument for sleep because the timeout was still negative. I
never followed it up because I was able to fix my problem by fixing my frontend. |