Spot the defect!

At Microsoft we have an internal email list called “Spot the Defect” — people mail around buggy code they’ve discovered and we compete to see who can find the most problems with it. It’s fun, and you learn a lot about what other people consider bugs — everything from security holes to lying comments.

I love playing Spot the Defect. Here is the code for the WScript.Sleep method with the comments removed and some serious bugs added. You’ll note that this code has all the required features I mentioned in my previous post.

We go to sleep in one-second (or less) intervals, and tell the operating system to wake us up if COM posts a message to the message queue, because there might be an event handler to dispatch. We also check to see if the host recorded a script error (either due to an event handler or due to the script timeout firing) so that we can abort the sleep. This way we never keep the script alive more than a second after it was shut down due to error.

What bugs did I add?

HRESULT CWScript::Sleep(long Time)
  const DWORD TimerGranularity = 1000;
  if (Time < 0)
    return E_INVALIDARG;
  DWORD StartTickCount = ::GetTickCount();
  DWORD EndTickCount = StartTickCount + Time;
  DWORD CurTickCount = StartTickCount;
  while(CurTickCount < EndTickCount)
    MSG msg;
    DWORD CurWaitTime = (DWORD)(EndTickCount - CurTickCount);
    if (CurWaitTime > TimerGranularity)
      CurWaitTime = TimerGranularity;
      if (0 != ::PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
      if (m_pHost->FErrorPending())
        return S_OK;
      CurTickCount = ::GetTickCount();
  return S_OK;

Commentary from 2019:

Readers from 2019 who are used to reading C# may need the hint that long and DWORD are both 32 bit integers.

I don’t recall why we had this strange convention of UpperCasing local variables; the WSH codebase was actually taken over by the scripting team from another team that had developed it for their own use, and there were a number of odd choices in that codebase. I suspect this was one of them; I’m a big believer in keeping new work consistent with old work, even if you wouldn’t use the conventions of the old work in brand-new green-field code.

There were a lot of good responses to this puzzle:

Is it legal to call MsgWaitForMultipleObjects with zero handles?

Yes, in regular desktop or server Windows. A user pointed out that this is not legal in the CE version of Windows.

After MsgWaitForMultipleObjects returns, you should pump all waiting messages, not just one.

Great catch! That bug was actually in the code; that’s not the bug I added for this puzzle.

I sent the bug on to the scripting sustaining engineering team.

It looks like there is an overflow bug at StartTickCount + Time

There is; can you be more specific?

Suppose the machine has been on for just under 49.7 days, which is when GetTickCount() overflows, and adding Start overflows. The end time will be less than the start time, so it will not sleep at all.

That is a bug, but there is a worse one.

Suppose CurTickCount and EndTickCount are very close to overflowing, and the second time you set CurTickCount it is set to a small value after the overflow. The function will then sleep for 49.7 days.

That’s correct, but it’s actually worse than that; it will sleep for at least 49.7 days, and then it has a possibly very narrow “window” of tick counts that will avoid sleeping for another 49.7 days. If that window is small then we are likely to sleep for a long, long time.

The real code was written to take all of these problems into account; it does all of its timing arithmetic in doubles, which have more than enough range and precision to avoid overflow issues, and detects clock overflows correctly.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s