Back Midas Rome Roody Rootana
  Midas DAQ System  Not logged in ELOG logo
Entry  19 Aug 2021, Konstantin Olchanski, Bug Report, select() FD_SETSIZE overrun 
    Reply  20 Aug 2021, Stefan Ritt, Bug Report, select() FD_SETSIZE overrun 
Message ID: 2271     Entry time: 20 Aug 2021     In reply to: 2270
Author: Stefan Ritt 
Topic: Bug Report 
Subject: select() FD_SETSIZE overrun 
> I am looking at the mlogger in the ALPHA anti-hydrogen experiment at CERN. It is 
> mysteriously misbehaving during run start and stop.
> The problem turns out to be with the select() system call.
> The corresponding FD_SET(), FD_ISSET() & co operate on a an array of fixed size 
> FD_SETSIZE, value 1024, in my case. But the socket number is 1409, so we overrun 
> the FD_SET() array. Ouch.
> I see that all uses of select() in midas have no protection against this.
> (we should probably move away from select() to newer poll() or whatever it is)
> Why does mlogger open so many file descriptors? The usual, scaling problems in the 
> history. The old midas history does not reuse file descriptors, so opens the same 
> 3 history files (.hst, .idx, etc) for each history event. The new FILE history 
> opens just one file per history event. But if the number of events is bigger than 
> 1024, we run into same trouble.
> (BTW, the system limit on file descriptors is 4096 on the affected machine, 1024 
> on some other machines, see "limit" or "ulimit -a").
> K.O.

I cannot imagine that you have more than 1024 different events in ALPHA. That wouldn't 
fit on your status page. 

I have some other suspicion: The logger opens a history file on access, then closes it 
again after writing to it. In the old days we had a case where we had a return from the 
write function BEFORE the file has been closed. This is kind of a memory leak, but with 
file descriptors. After some time of course you run out of file descriptors and crash. 
Now that bug has been fixed many years ago, but it sounds to me like there is another 
"fd leak" somewhere. You should add some debugging in the history code to print the 
file descriptors when you open a file and when you leave that routine. The leak could 
however also be somewhere else, like writing to the message file, ODB dump, ...

The right thing of course would be to rewrite everything with std::ofstream which 
closes automatically the file when the object gets out of scope.

ELOG V3.1.4-2e1708b5