Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  26 Nov 2024, Nick Hastings, Bug Report, TMFE::Sleep() errors 
    Reply  26 Nov 2024, Maia Henriksson-Ward, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
    Reply  27 Nov 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
       Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
          Reply  06 Dec 2024, Konstantin Olchanski, Bug Report, TMFE::Sleep() errors 
Message ID: 2904     Entry time: 26 Nov 2024     Reply to this: 2905   2906   2907   2908
Author: Nick Hastings 
Topic: Bug Report 
Subject: TMFE::Sleep() errors 
Hello,

I've noticed that SC FEs that use the TMFE class with midas-2022-05-c often report errors when calling TMFE:Sleep().
The error is :

[tmfe.cxx:1033:TMFE::Sleep,ERROR] select() returned -1, errno 22 (Invalid argument).

This seems to happen in two different ways:

1. Error being reported repeatedly
2. Occasional single errors being reported

When the first of these presents, we typically restart the FE to "solve" the problem.
Case 2. is typically ignored.

The code in question is:

void TMFE::Sleep(double time)
{
   int status;
   fd_set fdset;
   struct timeval timeout;
      
   FD_ZERO(&fdset);
      
   timeout.tv_sec = time;
   timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;

   while (1) {
      status = select(1, &fdset, NULL, NULL, &timeout);
#ifdef EINTR
      if (status < 0 && errno == EINTR) {
         continue;
      }
#endif
      break;
   }
      
   if (status < 0) {
      TMFE::Instance()->Msg(MERROR, "TMFE::Sleep", "select() returned %d, errno %d (%s)", status, errno, strerror(errno));
   }
}

So it looks like either file descriptor of the timeval struct must have a problem.
From some reading it seems that invalid timeval structs are often caused by one or both
of tv_sec or tv_usec not being set. In the code above we can see that both appear to be
correctly set initially.

From the select() man page I see:

RETURN VALUE
       On success, select() and pselect() return the number of file descriptors contained in
       the three returned descriptor sets (that is, the total number of bits that are set in
       readfds,  writefds,  exceptfds).  The return value may be zero if the timeout expired
       before any file descriptors became ready.

       On error, -1 is returned, and errno is set to indicate the error; the file descriptor
       sets are unmodified, and timeout becomes undefined.

The second paragraph quoted from the man page above would indicate to me that perhaps the
timeout needs to be reset inside the if block. eg:

      if (status < 0 && errno == EINTR) {
         timeout.tv_sec = time;
         timeout.tv_usec = (time-timeout.tv_sec)*1000000.0;
         continue;
      }

Please note that I've only just briefly looked at this and was hoping someone more
familiar with using select() as a way to sleep() might be better able to understand
what is happening.

I wonder also if now that midas requires stricter/newer c++ standards if there maybe
some more straightforward method to sleep that is sufficiently robust and portable.

Thanks,

Nick.
ELOG V3.1.4-2e1708b5