When everything you know is wrong, part two

Now that we’ve looked at a bunch of myths about when finalizers are required to run, let’s consider when they are required to not run:

Myth: Keeping a reference to an object in a variable prevents the finalizer from running while the variable is alive; a local variable is always alive at least until control leaves the block in which the local was declared.

{
  Foo foo = new Foo(); 
  Blah(foo);  // Last read of foo
  Bar(); 
  // We require that foo not be finalized before Bar();
  // Since foo is in scope until the end of the block,
  // it will not be finalized until this point, right?
} 

The C# specification states that the runtime is permitted broad latitude to detect when storage containing a reference is never going to be accessed again, and to stop treating that storage as a root of the garbage collector. For example, suppose we have a local variable foo and a reference is written into it at the top of the block. If the jitter knows that a particular read is the last read of that variable, the variable can legally be removed from the set of GC roots immediately; it doesn’t have to wait until control leaves the scope of the variable. If that variable contained the last reference then the GC can detect that the object is unreachable and put it on the finalizer queue immediately. Use GC.KeepAlive to avoid this.

Why does the jitter have this latitude? Suppose the local variable is enregistered into the register needed to pass the value to Blah(). If foo is in a register that Bar() needs to use, there’s no point in saving the value of the never-to-be-read-again foo on the stack before Bar() is called. (If the actual details of the code generated by the jitter is of interest to you, see Raymond Chen’s deeper analysis of this issue.)

Extra bonus fun: the runtime uses less aggressive code generation and less aggressive garbage collection when running the program in the debugger, because it is a bad debugging experience to have objects that you are debugging suddenly disappear even though the variable referring to the object is in scope. That means that if you have a bug where an object is being finalized too early, you probably cannot reproduce that bug in the debugger!

See the last point in this article for an even more horrid version of this problem.

Myth: Finalizers run no more than once.

Suppose you have an object that is in the process of being finalized and is therefore no longer a candidate for finalization, or you have suppressed finalization. The aptly-named ReRegisterForFinalize method tells the runtime that you would like the object to be finalized. This can cause an object to be finalized more than once.

Why on earth would you want to do that? The most common usage case is that you have a pool of objects that are very expensive for some reason. Perhaps they are producing collection pressure if they are allocated too often, or perhaps they are for some reason expensive to allocate but cheap to re-use. In this case you can have a “pool” of living objects. When you need an object, you remove it from the pool. When you’re done with the object, you put it back in the pool. What if you forget to put the object back in the pool? (This is analogous to forgetting to dispose of an object that has an unmanaged resource.) In that case, the finalizer can put the object being finalized back in the pool, so it is no longer dead. Of course the object now needs to be finalized again, should the user take it out of the pool and again forget to finalize it.

I do not recommend resurrecting dead objects unless you really know what you are doing and you have a clearly unacceptable performance problem that this technique solves. In the case of Roslyn we identified very early on that the compiler allocates a gazillion small objects, some of them very short-lived and reusable, and that we had a performance problem directly attributable to excess collection pressure. We used a pooling strategy for the cases where our performance tests indicated that it would be a win.

Myth: An object being finalized is a dead object.

The GC must identify an object as dead — no living references — in order to place it on the finalizer queue, but the finalizer queue is itself a living object, so objects on the finalizer queue are technically alive as far as the GC is concerned. Which is good; if the GC runs for a second time while the objects identified the previous time are still on the finalization queue, they should not be reclaimed, and they certainly should not be placed on the finalization queue again!

Myth: An object being finalized is guaranteed to be unreachable from code outside the finalization queue.

There could be two objects both determined by the GC to be dead, both with references to each other. When one is finalized, it decides to keep itself alive an copies its “this” to a static field, which is clearly reachable by user code. Since the now-reachable object has a reference to another object, that object is also reachable, so user code could be running in it while it is being finalized.

Again, I strongly recommend against resurrecting dead objects unless you really know what you are doing and have a truly excellent reason for doing this crazy thing.

Myth: Finalizers run on the thread that created the object.

The finalizer typically runs on its own thread. If you have an object that is in some way has affinity to a particular thread — perhaps it uses thread local storage, or perhaps it is an apartment threaded object — then you must do whatever threading magic is necessary to use the object safely from the finalizer thread, preferably without blocking the finalizer thread indefinitely.

Myth: Finalizers run on the garbage collector thread.

The finalizer and the garbage collector typically have their own threads. This is not a requirement of all versions of the CLR, but it is the typical case.

Myth: Finalizers run as the garbage collector determines that objects are dead.

As we’ve discussed, the GC determines that the object is dead and needs finalization, and puts it on the finalizer queue. The GC then keeps on doing what it does best: looking for dead objects.

Myth: Finalizers never deadlock

We can certainly force a finalizer to deadlock, illustrating that the myth is false:

class Deadlock
{
    ~Deadlock()
    {
        System.Threading.Monitor.Enter(this);
    }
    static void Main()
    {
        Deadlock d = new Deadlock();
        System.Threading.Monitor.Enter(d);
        d = null;
        System.GC.Collect();
        System.GC.WaitForPendingFinalizers();
    }
}

This is obviously unrealistic, but realistic deadlocks are in particular possible in scenarios like I mentioned above: where a call must be marshalled to the correct thread for an object that has some sort of thread affinity. Here’s a link to a typical example. (Note that the article leads with “finalizers are dangerous and you should avoid them at all costs”. This is good advice.)

Myth: Finalizers run in a predictable order

Suppose we have a tree of objects, all finalizable, and all on the finalizer queue. There is no requirement whatsoever that the tree be finalized from the root to the leaves, from the leaves to the root, or any other order.

Myth: An object being finalized can safely access another object.

This myth follows directly from the previous. If you have a tree of objects and you are finalizing the root, then the children are still alive — because the root is alive, because it is on the finalization queue, and so the children have a living reference — but the children may have already been finalized, and are in no particularly good state to have their methods or data accessed.

Myth: Running a finalizer frees the memory associated with the object.

The finalizer thread runs the finalizers, the GC thread identifies dead objects that do not need finalization, and reclaims their memory. The finalizer thread does not try to do the GC’s job for it.

Myth: An object being finalized was fully constructed.

I’ve saved the worst for last. This is in my opinion the truly nastiest of all the issues with finalizers. I’ll give you two scenarios, both horrible.

sealed class Nasty : IDisposable
{
    IntPtr foo;
    IntPtr bar;
    public Nasty()
    {
        foo = AllocateFoo();
        // Suppose a thread abort exception is thrown right here.
        bar = AllocateBar();
    }
    ~Nasty()
    {
        Dispose(false);
    }
    public void Dispose()
    {
        Dispose(true);
    }
    private void Dispose(bool disposing)
    {
        DeallocateFoo(foo);
        DeallocateBar(bar);
    }
}

In C++, destructors don’t run if a constructor throws, but in C# an object becomes eligible for finalization the moment that it is created. If a thread abort exception is thrown after foo is initialized then bar is still zero when the finalizer runs, and zero might not be a valid input to DeallocateBar.

Now let’s combine that with the first point in today’s episode: that a finalizer can run earlier than you think.

sealed class Horrid : IDisposable
{
    IntPtr foo;
    public Horrid()
    {
        foo = AllocateFoo();
        Bar.Blah(); // static method
    }
    ~Horrid()
    {
        Dispose(false);
    }
    public void Dispose()
    {
        Dispose(true);
    }
    private void Dispose(bool disposing)
    {

OK, what are the possible scenarios at this point? Plainly a thread abort exception could have been thrown before, during or after the execution of Blah(), so we cannot rely on any invariant set up by Blah() in the finalizer. But we can at least rely on the fact that there are only three possibilities: Blah() was never run, Blah() threw, or Blah() completed normally, right?

No; there is a fourth possibility: Blah() is still running on the user thread, the GC has identified that the this is never read, so the object is a candidate for finalization, and therefore it is possible that the finalizer and constructor are running concurrently. (Why you would create an object and then never read the reference I do not know, but people do strange things.)

And finally, I described an even more horrid version of this scenario in a previous blog entry.

Read the title of this article again: everything you know is wrong. In a finalizer you have no guarantee that anything happened other than the object was allocated, and that the GC at one time believed it to be dead. You have no guarantee that any invariant set up by the constructor is valid, and the constructor (or any other method of the object) could still be running when the finalizer is called, provided that the runtime knows that local copies of the reference will never be read again.

It is therefore very difficult indeed to write a correct finalizer, and the best advice I can give you is to not try.

Next time on FAIC: A far-too-detailed analysis of a copy-paste bug. But not in code this time!

Advertisements

19 thoughts on “When everything you know is wrong, part two

  1. Before there was SafeHandle, the bug you described in the last blog post and mention here was relatively common.
    People would have a handle to some win32 resource as a member variable, and finalizer would dealocate/free it. They would ‘read’ that handle and pass it to some unmanaged api.
    Meanwhile handle would get dealocated.

    First time I debugged this was painful. From then on, when I hear about sporadic win32 failures and knew exactly what to look for 🙂

  2. “In C++, destructors don’t run if a constructor throws”

    I fear that this may confuse some people (it certainly confused me until I realised what you meant). As far as I know, destructors for all fully constructed objects (temporaries within the constructor body before an exception is thrown, or within the initialiser list, or implicitly initialised members); but any destructor corresponding to a constructor on the unwinding stack won’t be called.

  3. “We used a pooling strategy”

    So, I get the reasoning behind using a pooling strategy. I also (sort of) get the motivation behind trying to write a finalizer-based pooling strategy as a sort of backstop for one’s pooling strategy.

    Are you saying that in Roslyn, that was actually the decision made? I.e. for some reason, it was deemed justified to backstop your pooling strategy with a finalizer-based implementation?

    If so, could you please clarify on why this made sense for Roslyn, even as this is clearly a bad idea in most other scenarios?

  4. A myth related to the idea of finalizers running on the GC thread: when using stop-the-world GC, finalizers run while most of the world is stopped. Until I found out how the finalizer queues work, I thought finalizers were called directly from the GC (in which case it would make sense to have severe limits on finalizers’ ability to access any kind of outside objects).

    As for the idea that finalized objects are unreachable from outside the freachable queue, it’s possible given any object reference to construct a long weak reference to it which will remain valid as long as the target still exists in any form and a strong rooted reference exists to the long weak reference. If a long weak reference exists to an object but no other strong rooted reference exists, then such an object may get queued for finalization at any time, and strong rooted references may be formed at any time. Consequently, a finalizable object that never makes use of resurrection will generally have no way to guard against the possibility of outside code manipulating references to it so as to cause its finalizer to run while the strongly-rooted references to the object are being used by outside code.

  5. “It is therefore very difficult indeed to write a correct finalizer, and the best advice I can give you is to not try.”. This sentence near the end of your post has me confused. How do I reconcile it with the advice to have a finalizer (destructor) when I’m using unmanaged resources? For example this sentence from (https://msdn.microsoft.com/en-us/library/66x5fx1b.aspx):

    “However, when your application encapsulates unmanaged resources such as windows, files, and network connections, you should use destructors to free those resources. When the object is eligible for destruction, the garbage collector runs the Finalize method of the object.”

    would seem to indicate I should have a finalizer, at least when I have unmanaged resources? Are you suggesting a different pattern for cleaning up unmanaged resources, and if so, what? Or, does one need a finalizer (destructor) if using unmanaged resources?

    Thanks,

    Dave

  6. I remember first reading about finalizers in the Java documentation, back in the mid 1990’s, and being confused and re-reading the whole section about five times. They’re both complex and counterintuitive, which is not a great combination.

    Over the years, I’ve written hundreds of thousands (millions?) of lines of code, in many languages, and never had any need to write a finalizer. Or maybe I would have, if I’d been smarter, but it seems weird to write a method which is super easy to screw up, virtually impossible to test, and may never run at all.

    I’d love to hear the flip side of all this: why on earth would anyone ever legitimately want to write a finalizer?

  7. I think this whole article series could be condensed without losing meaning:

    “Myth: adding finalizers to the C# language was a good idea

    Clearly not.”

    • The .NET framework borrowed the `Finalize` concept from Java; later versions of Java added a concept calls “phantom reference” which, as implemented in Java is a bit clunky, but encapsulates the idea that resource cleanup should not be handled by the object holding the resources, but rather by another object which the first creates to watch over it, and which should avoid holding any strong references to anything not needed for cleanup.

      Under such a design, cyclic reference chains between cleanup objects and the objects that they’re guarding can prevent such objects from ever becoming eligible for cleanup or collection, and even non-cyclic chains will increase the number of GC cycles required for cleanup and collection. On the flip side, however, such a design would eliminate many complications associated with intentional or unintentional resurrection, since the cleanup code wouldn’t run until after the guarded object was well and truly dead.

      There are a few things which finalizers can do semantically which a design using separate cleanup objects could not (finalizable objects may hold references to each other and even use such references in their cleanup, though there is no guarantee of the order in which finalizers will run). I don’t know how often such abilities can be used to accomplish anything that couldn’t be done as practically without them.

  8. is there actually any compelling reason to use a finalizer (except within the Disposable pattern, and even that is questionable, if it is disposable it should have been disposed in the first place) ever?

    Stefan

    • There are some types that use resources which are plentiful and fungible but not unlimited, and whose consumers will often be abandoned outside their control. A prime example of such a type is “WeakReference”, which encapsulates a GCHandle. When a WeakReference is abandoned, the handle must be freed; there are a variety of object-abandonment-notification approaches a framework could provide which would allow the handles to be cleaned up when objects holding weak references get abandoned, but I think `Finalize` is the only one .NET provides that would really work well.

  9. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1866

  10. Pingback: Dew Drop – May 22, 2015 (#2020) | Morning Dew

  11. Just a note, I managed to reproduce the behavior of running the finalizer before the ctor completes (and this wasn’t that hard), but this does not happen if, at the last line of the ctor, the is a reference to “this”, like “this._handle = handle” or even “this.ToString()”, so, this myth isn’t all that bad.

  12. Regarding aggressive GC of local vars before they’re out of scope – this can also occur with respect to the “this” reference. An object instance can be collected while an instance method is still running, as long as the method is past the point that no instance data is required (i.e. the “this” reference is no longer required). As an example, consider this code:
    1. public class Looper
    2. {
    3. private readonly int _numLoops;
    5.
    6. public Looper(int numLoops)
    7. {
    8. _numLoops = numLoops;
    9. }
    10.
    11. public void Go()
    12. {
    13. int numLoops = _numLoops;
    14. for (int i = 0; i < numLoops; i++)
    15. {
    16. Console.Out.WriteLine("Loop #" + i);
    17. }
    18. }
    19. }
    20.
    21. class Program
    22. {
    23. static void Main()
    24. {
    25. var looper = new Looper(1000000);
    26. looper.Go();
    27. }
    28. }

    What is the earliest point (i.e. line number during program execution) at which the instance "looper" in the Main() function could be collected by the garbage collector? The answer is line 14, while it's still executing the Go() method. No instance data is required after that point. You can see this by adding a finalizer that outputs a message. When you run, sometimes you'll see that message before the looping is finished (must be a release build).

    Another good reason for the collection of local vars before they are out of scope is that this can help in some situations under memory pressure. For example, I could write a method that has this:
    var a = new double[HUGE_SIZE];
    var b = new double[HUGE_SIZE];
    var c = new double[HUGE_SIZE];

    This could continue indefinitely. With aggressive GC, you'll never run out of memory even though all the variables are always in scope. If they're not referenced after instantiation, they can be collected.

  13. Pingback: When everything you know is wrong, part one | Fabulous adventures in coding

  14. Pingback: Destructors and why you should avoid them | vhsven

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s