At Microsoft we have an internal email list called “Spot the Defect” — people mail around buggy code they’ve discovered and we compete to see who can find the most problems with it. It’s fun, and you learn a lot about what other people consider bugs — everything from security holes to lying comments.
I love playing Spot the Defect. Here is the code for the WScript.Sleep
method with the comments removed and some serious bugs added. You’ll note that this code has all the required features I mentioned in my previous post.
We go to sleep in one-second (or less) intervals, and tell the operating system to wake us up if COM posts a message to the message queue, because there might be an event handler to dispatch. We also check to see if the host recorded a script error (either due to an event handler or due to the script timeout firing) so that we can abort the sleep. This way we never keep the script alive more than a second after it was shut down due to error.
What bugs did I add?
HRESULT CWScript::Sleep(long Time) { const DWORD TimerGranularity = 1000; if (Time < 0) return E_INVALIDARG; DWORD StartTickCount = ::GetTickCount(); DWORD EndTickCount = StartTickCount + Time; DWORD CurTickCount = StartTickCount; while(CurTickCount < EndTickCount) { MSG msg; DWORD CurWaitTime = (DWORD)(EndTickCount - CurTickCount); if (CurWaitTime > TimerGranularity) CurWaitTime = TimerGranularity; ::MsgWaitForMultipleObjects( 0, NULL, TRUE, CurWaitTime, QS_ALLINPUT | QS_ALLPOSTMESSAGE); if (0 != ::PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) ::DispatchMessage(&msg); if (m_pHost->FErrorPending()) return S_OK; CurTickCount = ::GetTickCount(); } return S_OK; }
Commentary from 2019:
Readers from 2019 who are used to reading C# may need the hint that long
and DWORD
are both 32 bit integers.
I don’t recall why we had this strange convention of UpperCasing
local variables; the WSH codebase was actually taken over by the scripting team from another team that had developed it for their own use, and there were a number of odd choices in that codebase. I suspect this was one of them; I’m a big believer in keeping new work consistent with old work, even if you wouldn’t use the conventions of the old work in brand-new green-field code.
There were a lot of good responses to this puzzle:
Is it legal to call
MsgWaitForMultipleObjects
with zero handles?
Yes, in regular desktop or server Windows. A user pointed out that this is not legal in the CE version of Windows.
After
MsgWaitForMultipleObjects
returns, you should pump all waiting messages, not just one.
Great catch! That bug was actually in the code; that’s not the bug I added for this puzzle.
I sent the bug on to the scripting sustaining engineering team.
It looks like there is an overflow bug at
StartTickCount + Time
There is; can you be more specific?
Suppose the machine has been on for just under 49.7 days, which is when
GetTickCount()
overflows, and addingStart
overflows. The end time will be less than the start time, so it will not sleep at all.
That is a bug, but there is a worse one.
Suppose
CurTickCount
andEndTickCount
are very close to overflowing, and the second time you setCurTickCount
it is set to a small value after the overflow. The function will then sleep for 49.7 days.
That’s correct, but it’s actually worse than that; it will sleep for at least 49.7 days, and then it has a possibly very narrow “window” of tick counts that will avoid sleeping for another 49.7 days. If that window is small then we are likely to sleep for a long, long time.
The real code was written to take all of these problems into account; it does all of its timing arithmetic in doubles, which have more than enough range and precision to avoid overflow issues, and detects clock overflows correctly.