Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  26 Nov 2024, Nick Hastings, Bug Report, TMFE::Sleep() errors 
    Reply  26 Nov 2024, Maia Henriksson-Ward, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
       Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
          Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
Message ID: 2905     Entry time: 26 Nov 2024     In reply to: 2904
Author: Maia Henriksson-Ward 
Topic: Bug Report 
Subject: TMFE::Sleep() errors 
> Hello,
> 
> I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
> The error is :
> 
> [tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).
> 
> This seems to happen in two different ways:
> 
> 1. Error being reported repeatedly
> 2. Occasional single errors being reported
> 
> When the first of these presents, we typically restart the FE to "solve" the problem.
> Case 2. is typically ignored.
> 
> The code in question is:
> 
> void TMFE::Sleep(double time)
> {
>    int status;
>    fd_set fdset;
>    struct timeval timeout;
>       
>    FD_ZERO(&fdset);
>       
>    timeout.tv_sec = time;
>    timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
> 
>    while (1) {
>       status = select(1, &fdset, NULL, NULL, &timeout);
> #ifdef EINTR
>       if (status < 0 && errno == EINTR) {
>          continue;
>       }
> #endif
>       break;
>    }
>       
>    if (status < 0) {
>       TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
>    }
> }
> 
> So it looks like either file descriptor of the timeval struct must have a problem.
> From some reading it seems that invalid timeval structs are often caused by one or both
> of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
> correctly set initially.
> 
> From the select() man page I see:
> 
> RETURN VALUE
>        On success, select() and pselect() return the number of file descriptors contained in
>        the three returned descriptor sets (that is, the total number of bits that are set in
>        readfds,  writefds,  exceptfds).  The return value may be zero if the timeout expired
>        before any file descriptors became ready.
> 
>        On error, -1 is returned, and errno is set to indicate the error; the file descriptor
>        sets are unmodified, and timeout becomes undefined.
> 
> The second paragraph quoted from the man page above would indicate to me that perhaps the
> timeout needs to be reset inside the if block. eg:
> 
>       if (status < 0 && errno == EINTR) {
>          timeout.tv_sec = time;
>          timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
>          continue;
>       }
> 
> Please note that I've only just briefly looked at this and was hoping someone more
> familiar with using select() as a way to sleep() might be better able to understand
> what is happening.
> 
> I wonder also if now that midas requires stricter/newer c++ standards if there maybe
> some more straightforward method to sleep that is sufficiently robust and portable.
> 
> Thanks,
> 
> Nick.

I had the same error a few months ago, though I wasn't using a tagged release. It happened because I was calling TMFE::Sleep() 
with a negative time. If your issues were caused by the same reason, TMFE::Sleep() can handle negative times since commit 
591f78f (https://bitbucket.org/tmidas/midas/commits/591f78f52893d5ffd64bf4e52a1daac537ebd672).

Early in my debugging, I did come to the same conclusions you did, and actually tried a similar solution the one you suggested. 
This was a few months ago and I didn't write down what happened, but I believe it didn't work because in my case the errno was 
something other than EINTR, and/or the timeval was still an invalid argument for sleep because the timeout was still negative. I 
never followed it up because I was able to fix my problem by fixing my frontend.
ELOG V3.1.4-2e1708b5