Nostalgia, horror, and a very old bug

My next article about graph traversal is pre-empted by this breaking news; I’ll pick up that series again soon.

Yesterday morning a coworker forwarded to me an article about a recently patched security hole in Windows, and wondered if I had any thoughts on it. Oh, did I! I read about the exploit with an odd mixture of nostalgia — because I worked on the code in question back in the 1990s — and horror at how long this exploitable bug had been in Windows.

To be clear, I did not write the actual exploitable code; it predates my time at Microsoft. But I was worried while I was reading the article that it might turn out to be my bad! This is the second time that has happened to me, and it is not a pleasant feeling.

Coverity has a research team devoted specifically to security-impacting bugs, and they were kind enough to ask me to write up my thoughts for their blog. You can read about my guess at what the buggy code looked like here.

If you have examples of “missing restore”-style bugs — security-impacting or not — in real-world code in any language, I would love to see them. Please leave examples in the comments here or on the security blog. Thanks!

22 thoughts on “Nostalgia, horror, and a very old bug

  1. Not a security bug but I know the feeling when you think something like this may be your fault. I was working on a government website listing companies in the country. The website was not deployed till like 6 months later due to bureaucracy reasons and I had moved to other projects. When they finally deployed it I read in the news that “Personal data of thousands of businessmen leaked in the new portal”. Naturally I am horrified that we may have been hacked or forgot to secure the access to this data but it turns out that this was a requirement by the law and everyone could see the personal data in question if he went to see the paper archives before the website was online.

  2. I did something similar (but not as bad, thankfully) on a previous job. I wrote some new code for a web page, tested it on our copy of the database, found no problems, had it reviewed by a co-worker who also found no problems, and committed it to the production version. The next day, dozens of users called to say it was broken. The problem was that I had assumed a certain table had no duplicates, and that the users had entered _hundreds_ of duplicate records. On encountering these duplicates, my code screwed up horribly. We quickly reverted to the previous day’s code, while I feverishly investigated and then rewrote the code, finally adding a “Distinct” clause to a query. This fixed the problem. My supervisor told me not to worry about it too much; some of the other programmers had made similar errors as well. It did teach me not to trust the contents of any data that users could add things to.

  3. I think the original PC BIOS had a bug in one of its screen scrolling routines which would trash the BP register [which almost all 16-bit X86 languages use as a frame pointer]. The bug was easily worked around, but I think it was also fixed in later versions of the BIOS.

    Another kind of “restore bug” that can be rather nasty appear when reading something doesn’t reveal all the information contained therein. At the hardware level, that can be a problem with many I/O registers; on some platforms it can also be a problem with floating-point units [it *shouldn’t* be a problem with a compiler that correctly supports the x87, but Microsoft’s support for x87 has generally been poor]. In the .NET Framework, such problems can appear in many places where property getters and setters report/change different things, such as the “Visible” property of WinForms Control objects.

    It would seem that the ability to cleanly and reliably save and restore things should be a fundamental part of architectures, languages, and frameworks, but a surprising number of problems persist at all levels.

  4. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1738

  5. In about 1980 on an early breadboard version of the Argus M700 (http://en.wikipedia.org/wiki/Ferranti_Argus#M700) we ran code with a ‘do forever’ loop in a common idiom used in the Coral 66 language: “FOREVER DO Statement” which gave “FOR dummy := 0 WHILE 0 = 0 DO Statement” after macro expansion. For this code the compiler generated a test for the “0 = 0” part followed by a conditional branch, ie two instruction. We found that every so often the loop finished because 0 did not equal 0. After a couple of days the hardware engineers reported that the ‘return from interrupt’ instruction was not restoring the condition code registers and we were getting a timer interrupt between the compare and the conditional branch.

    • Nice. I once fixed a similar bug in VBScript, where a script called into a control which changed the floating point register control flag from “set a bit on overflow” to “crash on overflow”, and then never changed it back. So you could call a method on this control, and then five minutes later your script would crash with an overflow exception when some completely unrelated computation overflowed. The solution was to save and restore the floating point chip control word around *every* call to an external component. Total pain in the rear.

    • I’ve seen a number of processors whose context save/restore on interrupts left something to be desired. On one TI DSP I’ve used, it would be essentially impossible to use the non-maskable interrupt (NMI) for any unpredictable event that needed to be recoverable, since it only has a one-deep stack for certain bits in the status register which control memory addressing modes. It’s not possible for an interrupt or NMI to do anything useful without overwriting some of the the memory-address-mode bits with known values; the bits get copied to a backup register which code can then store to memory, but if an NMI is triggered between the time that an interrupt service routine overwrites the main status bits and the time that it saves the backup copy, there will be practically no way for the NMI to do anything useful without trashing the backup copy.

      I think it may have been technically possible to copy the accumulator to two address registers (out of eight) permanently set aside for that purpose (the compiler could be directed to leave one or two registers unused), pop the execution stack to the accumulator, check whether execution came from a spot in the ISR which had overwritten the address-mode bits but not saved the backup copy, and if so figure that there was no need to load the addressing mode bits. Either that or refrain from using the NMI for anything which could be unpredictable but needed to be recoverable [probably the only practical approach]. It’s not uncommon for systems to use NMI to perform quick actions that need to be done periodically with minimal delay, but that wouldn’t be at all workable on this chip given the register save/restore handling.

  6. Pingback: Dew Drop – November 17, 2014 (#1897) | Morning Dew

  7. hello, its about your site’s css. add this code to have a better look in your header’s menu :

    a, a:link, a:hover, a:focus {
    transition: background-color 0.3s, color 0.3s;
    }
    #access li:hover > a, #access a:focus {
    background-color: #eaeaea;
    transition: background-color 0.5s, color 0.5s;
    }
    #access li > a, #access a {
    background-color: #000;
    transition: background-color 0.5s, color 0.5s;
    }

  8. I’ve always thought that some kind of generic mechanism that leveraged the C# “using” keyword would be very handy. Something like the code below is close, but doesn’t work in the general case (I think I’ve escaped the angle brackets acceptably):
    class Using<T> :IDisposable {
    private readonly Action<T> cleanupAction;
    private readonly T objectToCleanup;
    public Using(T objectToCleanup, Action<T> cleanupAction) {
    this.cleanupAction = cleanupAction;
    this.objectToCleanup=objectToCleanup;
    }
    public void Dispose(){
    cleanupAction(objectToCleanup);
    }
    }

  9. I think it’s not a bug but a necessary implementation step–and there was an MSDN article on this, years ago, in the late 90s. In virtually all 32-bit versions of Windows starting form Windows NT 4.0 all Kernel functions are mapped to the same segment, and that lets you inject your code into another process’ protected memory space and execute it here (provided you’ve got the privileges). I have the code working, and I’ve been able to kill, say, MS Word with it, by an error of my choice (how’s Division by Zero?), at a specified moment (10 seconds after my code starts, for example). I haven’t tried it on Windows 7 or 8 because I’ve since switched to 64-bit system, but I wonder: if I modify my code to run on 64-bit system, will it still work?

Leave a comment