since I am implementing a polled equipment, I was curious what is the smallest possible sleep time on current computers.
in current UNIX, there are 2 system calls available for sleeping: select() (with microsecond granularity) and nanosleep() (with nanosecond granularity).
So I wrote a little test program to check it out (progs/test_sleep).
First, Linux result using select(). Typical run on AMD 3700X CPU (4.1 GHz turbo boost) with Ubuntu LTS 20, linux kernel 5.8:
daq13:midas$ ./bin/test_sleep
sleep 10 loops, 0.100000 sec per loop, 1.000000 sec total, 1003368.855 usec actual, 100336.885 usec actual per loop, oversleep 336.885 usec, 0.3%
sleep 100 loops, 0.010000 sec per loop, 1.000000 sec total, 1008512.020 usec actual, 10085.120 usec actual per loop, oversleep 85.120 usec, 0.9%
sleep 1000 loops, 0.001000 sec per loop, 1.000000 sec total, 1062137.842 usec actual, 1062.138 usec actual per loop, oversleep 62.138 usec, 6.2%
sleep 10000 loops, 0.000100 sec per loop, 1.000000 sec total, 1528650.999 usec actual, 152.865 usec actual per loop, oversleep 52.865 usec, 52.9%
sleep 99999 loops, 0.000010 sec per loop, 0.999990 sec total, 6250898.123 usec actual, 62.510 usec actual per loop, oversleep 52.510 usec, 525.1%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 54056918.144 usec actual, 54.057 usec actual per loop, oversleep 53.057 usec, 5305.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 210875.988 usec actual, 0.211 usec actual per loop, oversleep 0.111 usec, 110.9%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 204804.897 usec actual, 0.205 usec actual per loop, oversleep 0.195 usec, 1948.0%
daq13:midas$
How to read this:
First line is 10 sleeps of 100 ms, for a total of 1 sec. this actually sleeps for a bit longer,
average over-sleep is 300 usec out of 100 ms is 0.3%.
Next few lines use progressively shorter sleep, 10 ms, 1 ms and 0.1 ms. over-sleep is consistently around 50-60 usec,
which I conclude to be this linux sleep granularity.
Last two lines try sleep for 0.1 usec and 0.01 usec, resulting in a zero-time sleep of select(),
so we just measure the average time cost of a linux syscall, around 200 ns in this machine.
Going to different machines:
Intel E-2236 (4.8 GHz tutboboost), Ubuntu LTS 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 400 ns.
Intel E-2226G (same, see arc.intel.com), CentOS-7, linux kernel 3.10: over-sleep is 60 usec, zero-sleep is 600 ns.
VME processor (2 GHz Intel T7400), Ubuntu 20, linux kernel 5.8: over-sleep is 60 usec, zero-sleep is 1700 ns.
This is pretty consistent, select() over-sleep is 60 usec on all hardware, zero-sleep tracks CPU GHz ratings.
Next, MacOS result, MacBookAir2020, MacOS 10.15.7, CPU 1.2 GHz i7-1060G7:
4ed0:midas olchansk$ ./bin/test_sleep
sleep 10 loops, 0.100000 sec per loop, 1.000000 sec total, 1031108.856 usec actual, 103110.886 usec actual per loop, oversleep 3110.886 usec, 3.1%
sleep 100 loops, 0.010000 sec per loop, 1.000000 sec total, 1091104.984 usec actual, 10911.050 usec actual per loop, oversleep 911.050 usec, 9.1%
sleep 1000 loops, 0.001000 sec per loop, 1.000000 sec total, 1270800.829 usec actual, 1270.801 usec actual per loop, oversleep 270.801 usec, 27.1%
sleep 10000 loops, 0.000100 sec per loop, 1.000000 sec total, 1370345.116 usec actual, 137.035 usec actual per loop, oversleep 37.035 usec, 37.0%
sleep 99999 loops, 0.000010 sec per loop, 0.999990 sec total, 1706473.112 usec actual, 17.065 usec actual per loop, oversleep 7.065 usec, 70.6%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 5150341.034 usec actual, 5.150 usec actual per loop, oversleep 4.150 usec, 415.0%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 595654.011 usec actual, 0.596 usec actual per loop, oversleep 0.496 usec, 495.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 591560.125 usec actual, 0.592 usec actual per loop, oversleep 0.582 usec, 5815.6%
4ed0:midas olchansk$
things are quite different here, OS is Mach microkernel with an oldish FreeBSD UNIX single-server (from NextSTEP),
so the sleep granularity is different, better than linux. zero-sleep still measures the syscall time, 600 ns on this machine.
Next we measure the same using the nansleep() syscall.
daq13:midas$ ./bin/test_sleep
sleep 10 loops, 0.100000 sec per loop, 1.000000 sec total, 1004133.940 usec actual, 100413.394 usec actual per loop, oversleep 413.394 usec, 0.4%
sleep 100 loops, 0.010000 sec per loop, 1.000000 sec total, 1046117.067 usec actual, 10461.171 usec actual per loop, oversleep 461.171 usec, 4.6%
sleep 1000 loops, 0.001000 sec per loop, 1.000000 sec total, 1096894.979 usec actual, 1096.895 usec actual per loop, oversleep 96.895 usec, 9.7%
sleep 10000 loops, 0.000100 sec per loop, 1.000000 sec total, 1526744.843 usec actual, 152.674 usec actual per loop, oversleep 52.674 usec, 52.7%
sleep 99999 loops, 0.000010 sec per loop, 0.999990 sec total, 6250154.018 usec actual, 62.502 usec actual per loop, oversleep 52.502 usec, 525.0%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 53344123.125 usec actual, 53.344 usec actual per loop, oversleep 52.344 usec, 5234.4%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 52641665.936 usec actual, 52.642 usec actual per loop, oversleep 52.542 usec, 52541.7%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 52637501.001 usec actual, 52.638 usec actual per loop, oversleep 52.628 usec, 526275.0%
daq13:midas$
Here everything is simple. sleep longer than 1000 usec works the same as select(), sleep for shorter than 100 usec sleeps for 52 usec, regardless of what
we ask for.
MacOS does no better, long sleeps are same as select(), sleeps is 1 usec or less sleep for too long. no improvement over select().
4ed0:midas olchansk$ ./bin/test_sleep
sleep 10 loops, 0.100000 sec per loop, 1.000000 sec total, 1023327.827 usec actual, 102332.783 usec actual per loop, oversleep 2332.783 usec, 2.3%
sleep 100 loops, 0.010000 sec per loop, 1.000000 sec total, 1130330.086 usec actual, 11303.301 usec actual per loop, oversleep 1303.301 usec, 13.0%
sleep 1000 loops, 0.001000 sec per loop, 1.000000 sec total, 1333846.807 usec actual, 1333.847 usec actual per loop, oversleep 333.847 usec, 33.4%
sleep 10000 loops, 0.000100 sec per loop, 1.000000 sec total, 1402330.160 usec actual, 140.233 usec actual per loop, oversleep 40.233 usec, 40.2%
sleep 99999 loops, 0.000010 sec per loop, 0.999990 sec total, 2034706.831 usec actual, 20.347 usec actual per loop, oversleep 10.347 usec, 103.5%
sleep 1000000 loops, 0.000001 sec per loop, 1.000000 sec total, 6646192.074 usec actual, 6.646 usec actual per loop, oversleep 5.646 usec, 564.6%
sleep 1000000 loops, 0.000000 sec per loop, 0.100000 sec total, 7556284.189 usec actual, 7.556 usec actual per loop, oversleep 7.456 usec, 7456.3%
sleep 1000000 loops, 0.000000 sec per loop, 0.010000 sec total, 15720005.035 usec actual, 15.720 usec actual per loop, oversleep 15.710 usec, 157100.1%
4ed0:midas olchansk$
On Linux, strace tells us that the actual syscall behind nanosleep() is this:
clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=10000}, 0x7fffc159e200) = 0
Let's try it directly... result is the same.
Let's try it with CLOCK_MONOTONIC... result is the same.
The man page of clock_nanosleep() specifies that this syscall always suspends the calling thread,
so what we see here is the Linux scheduler tick size.
Bottom line.
On current linux, shortest sleep is around 100 usec both select() and nanosleep().
On MacOS, shortest sleep is down to 5 usec using select(), but I cannot tell if CPU sleeps or busy-loops.
select() is still the best syscall for sleeping.
K.O. |