Unknown's avatar

About ericlippert

http://ericlippert.com

Alas, Smith and Jones

We have a feature in C# which allows you to declare a “friend assembly“. If assembly Smith says that assembly Jones is its friend, then code in Jones is allowed to see “internal” types of Smith as though they were public. Fully trusted assemblies of course can always use reflection to see private and internal details of other assemblies; that is what “full trust” means. And in the new CLR security model, two assemblies that have the same grant set can see each other’s private and internal details via reflection. But friend assemblies are the only mechanism that allow compile time support for peering at the internal details of another assembly. It’s a pretty handy feature for building a “family” of assemblies that share common implementation details.

The compiler enforces one particularly interesting rule about the naming of friendly assemblies. If assembly Smith is a strong-named assembly, and Smith says that assembly Jones is its friend, then Jones must also be strong-named. If, however, Smith is not strong-named, then Jones need not be strong-named either.

I’m occasionally asked “what’s up with that?”

When you call a method in Smith and pass in some data, you are essentially trusting Smith to (1) make proper use of your data; the data might contain sensitive information that you don’t want Smith to misuse, and (2) take some action on your behalf. In the non-computer world an entity which you share secrets with and then empower to take actions on your behalf is called an “attorney”. (Or “delegate”, which obviously I want to avoid because that means something technical in C#.)

That got me thinking, which usually spells trouble.

You want to hire an attorney. You go to Smith & Jones, Esqs. You meet with Smith, the trustworthy-looking man on the left of this photo:

alassmithandjones_2_396x222

Scenario (1):

You check Smith’s ID. It really is Mel Smith. You are contemplating giving Smith a secret, and empowering Smith to act on your behalf with the knowledge of that secret. Smith says “I share my secrets with my partner Griff Rhys Jones, whose identity you may now verify, here is Jones and here is Jones’ ID. Jones will also be acting on your behalf. Jones has full access to all secrets which are internal to this organization.”

You decide that you trust both Smith and Jones, so you share your secrets with Smith. A series of wacky sketch comedy hijinks then ensues.

(Smith and Jones are both strong-named assemblies. That is, they have “ID” you can verify. Smith’s internals are visible to Jones. Everything is fine.)

Scenario (2):

You check the ID. It is Mel Smith. Smith says “By the way, I share my internal secrets with everyone in the world who claims their name is Jones, I don’t bother to check IDs, I just trust ’em, and you should too! Everyone named Jones is trustworthy! Oh, and that includes my partner over here, Jones. Good old Jones! No, you can’t check Jonesie’s ID. Doesn’t have any ID, that Jones.”

This is ludicrous. Obviously this breaks the chain of trust that you showed you were looking for when you checked Smith’s ID in the first place.

(The compiler keeps you from getting into this terrible situation by preventing Smith from doing that; Smith, a strong-named assembly, can only state that it is sharing his internal details with another strong-named assembly. Smith cannot share its internals with any assembly named Jones, a weakly-named assembly in this scenario. We restrict Smith from exposing internals to weakly-named assemblies so that Smith does not accidentally create this security hole or accidentally mislead you into believing that Smith’s internals are in any way hidden from partially-trusted code.)

Scenario (3):

You don’t bother to check Smith’s ID. In fact, you give your secrets and power of attorney to anyone named Smith, regardless of who they actually are or where they came from. The first Smith you pick off the street says “By the way, I’ll give my internal secrets to anyone named Jones”.

Do you care?  Why would you?  You’re already giving secrets away to anyone named Smith! If a con artist wants your secrets, he can pretend to be Jones and take them from Smith, or he can pretend to be Smith and get them from you directly, but either way, the problem is that you are giving away secrets and power of attorney to strangers off the street.

(Smith and Jones are both weakly named assemblies, Smith exposes internals to Jones. This is perfectly legal, because if you don’t care enough about who you’re talking to to check IDs then what’s the point of the security system preventing those random strangers from talking to each other?)

foreach vs ForEach

A number of people have asked me why there is no Microsoft-provided ForEach sequence operator extension method. The List<T> class has such a method already of course, but there’s no reason why such a method could not be created as an extension method for all sequences. It’s practically a one-liner:

public static void ForEach<T>(
  this IEnumerable<T> sequence, 
  Action<T> action)
{
  // argument null checking omitted
  foreach(T item in sequence) 
    action(item);
}

My usual response to “why is feature X not implemented?” is that of course as Raymond Chen said “Features start out nonexistent and somebody has to make them happen.” All features are unimplemented until someone designs, implements, tests, documents and ships the feature, and no one has yet spent the money to do so in this case.

It’s not much money though. Yes, though I have famously pointed out that even small features can have large costs, this one really is dead easy, obviously correct, easy to test, and easy to document. Cost is always a factor of course, but the costs for this one really are quite small.

Of course, that cuts the other way too. If it is so cheap and easy, then you can do it yourself if you need it. And really what matters is not the low cost, but rather the net benefit. As we’ll see, I think the benefits are also very small, and therefore the net benefit might in fact be negative. But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.

The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects. The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect. The call site of this thing would look an awful lot like an expression (though, admittedly, since the method is void-returning, the expression could only be used in a “statement expression” context.) It does not sit well with me to make the one and only sequence operator that is only useful for its side effects.

The second reason is that doing so adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:

foreach(Foo foo in foos){ statement involving foo; }

into this code:

foos.ForEach((Foo foo)=>{ statement involving foo; });

which uses almost exactly the same characters in slightly different order. And yet the second version is harder to understand, harder to debug, and introduces closure semantics, thereby potentially changing object lifetimes in subtle ways.

When we provide two subtly different ways to do exactly the same thing, we produce confusion in the industry, we make it harder for people to read each other’s code, and so on. Sometimes the benefit added by having two different textual representations for one operation (like query comprehensions versus their underlying method call form, or + versus String.Concat) is so huge that it’s worth the potential confusion. But the compelling benefit of query expressions is their readability; this new form of foreach is certainly no more readable than the normal form and is arguably worse.

If you don’t agree with these philosophical objections and find practical value in this pattern, by all means, go ahead and write this trivial one-liner yourself.


Notes from 2020

This episode was, oddly enough, one of the more controversial things I’ve posted on this blog; it sparked a lot of comments and a lot of inbound links from people arguing about the points raised here. Some of the points made in the comments to the original were:

  • The ForEach method does exist on lists, as I noted. Is there anything different about lists than sequences which makes it better on lists? My answer was that I found it still distasteful on lists, but with lists at least we can make the argument that lists are intended to be a mutable data structure, so maybe it makes more sense to have a method designed to have a side effect operating on a mutable data structure? But that’s a pretty weak argument because of course you’re still not supposed to mutate a list while you are iterating it!
  • A commenter pointed out that the list implementation of the ForEach method iterates by index in a for loop; ironically it does not use a foreach.
  • A ForEach extension method is allegedly more readable at the end of a long chain of sequence operators than a foreach statement is at the top.
  • A ForEach extension method that introduces different semantics than regular foreach would have a clear benefit — like AsParallel().ForEach() does. Good point, though of course that is a sharp tool.
  • You could make the extension method non-void by both causing a side effect on each element and then yielding it. There are usages for such a thing; in particular I have used that method for “dump this sequence to the debugger display”. It’s nice to be able to insert a debugging hook like that into the middle of a long chain of sequence operators that is going wrong. But I would still feel icky about putting it into production code.

Reserved and contextual keywords

Many programming languages, C# included, treat certain sequences of letters as “special”.

Some sequences are so special that they cannot be used as identifiers. Let’s call those the “reserved keywords” and the remaining special sequences we’ll call the “contextual keywords”. They are “contextual” because the character sequence might one meaning in a context where the keyword is expected and another in a context where an identifier is expected.[1. An unfortunate consequence of this definition is that using is said to be a reserved keyword even though its meaning depends on its context; whether the using begins a directive or a statement determines its meaning. And similarly for other keywords that have multiple usages, like fixed.]

The C# specification defines the following reserved keywords:

Continue reading

The Stack Is An Implementation Detail, Part Two

A number of people have asked me, in the wake of my earlier posting about value types being on the stack, why it is that value types go on the stack but reference types do not.

The short answer is “because they can”. And since the stack is cheap, we do put them on the stack when possible.

The long answer is… long.

I’ll need to give a high-level description of the memory management strategies that we call “stack” and “heap”. Heap first.

The CLRs garbage-collected heap is a marvel of engineering with a huge amount of detail; this sketch is not actually how it works, but it’s a good enough approximation to get the idea.

The idea is that there is a large block of memory reserved for instances of reference types. This block of memory can have “holes” – some of the memory is associated with “live” objects, and some of the memory is free for use by newly created objects. Ideally though we want to have all the allocated memory bunched together and a large section of “free” memory at the top.

If we’re in that situation when new memory is allocated then the “high water mark” is bumped up, eating up some of the previously “free” portion of the block. The newly-reserved memory is then usable for the reference type instance that has just been allocated. That is extremely cheap; just a single pointer move, plus zeroing out the newly reserved memory if necessary.

If we have holes then we can maintain a “free list” – a linked list of holes. We can then search the free list for a hole of appropriate size and fill it in. This is a bit more expensive since there is a list search. We want to avoid this suboptimal situation if possible.

When a garbage collection is performed there are three phases: mark, sweep and compact. In the “mark” phase, we assume that everything in the heap is “dead”. The CLR knows what objects were “guaranteed alive” when the collection started, so those guys are marked as alive. Everything they refer to is marked as alive, and so on, until the transitive closure of live objects are all marked. In the “sweep” phase, all the dead objects are turned into holes. In the “compact” phase, the block is reorganized so that it is one contiguous block of live memory, free of holes.

This sketch is complicated by the fact that there are actually three such arenas; the CLR collector is generational. Objects start off in the “short lived” heap. If they survive they eventually move to the “medium lived” heap, and if they survive there long enough, they move to the “long lived” heap. The GC runs very often on the short lived heap and very seldom on the long lived heap; the idea is that we do not want to have the expense of constantly re-checking a long-lived object to see if it is still alive. But we also want short-lived objects to be reclaimed swiftly.

The GC has a huge amount of carefully tuned policy that ensures high performance; it attempts to balance the memory and time costs of having a Swiss-cheesed heap against the high cost of the compaction phase. Extremely large objects are stored in a special heap that has different compaction policy. And so on. I don’t know all the details, and fortunately, I don’t need to. (And of course, I have left out lots of additional complexity that is not germane to this article – pinning and finalization and weak refs and so on.)

Now compare this to the stack. The stack is like the heap in that it is a big block of memory with a “high water mark”. But what makes it a “stack” is that the memory on the bottom of the stack always lives longer than the memory on the top of the stack; the stack is strictly ordered. The objects that are going to die first are on the top, the objects that are going to die last are on the bottom. And with that guarantee, we know that the stack will never have holes, and therefore will not need compacting. We know that the stack memory will always be “freed” from the top, and therefore do not need a free list. We know that anything low-down on the stack is guaranteed alive, and so we do not need to mark or sweep.

On a stack, the allocation is just a single pointer move – the same as the best (and typical) case on the heap. But because of all those guarantees, the deallocation is also a single pointer move! And that is where the huge time performance savings is. A lot of people seem to think that heap allocation is expensive and stack allocation is cheap. They are actually about the same, typically. It’s the deallocation costs – the marking and sweeping and compacting and moving memory from generation to generation – that are massive for heap memory compared to stack memory.

Clearly it is better to use a stack than a GC heap if you can. When can you? Only when you can guarantee that all the necessary conditional that make a stack work are actually achieved. Local variables and formal parameters of value type are the sweet spot that achieve that. The locals of frames on the bottom of the stack clearly live longer than the locals on the frames of the top of the stack. Locals of value type are copied by value, not by reference, so the local is the only thing that references the memory; there is no need to track who is referencing a particular value to determine its lifetime. And the only way to take a ref to a local is to pass it as a ref or out parameter, which just passes the ref on up the stack. The local is going to be alive anyway, so the fact that there is a ref to it “higher up” the call stack doesn’t change its lifetime.


A few asides:

  • This explains why you cannot make a “ref int” field. If you could then you could store a ref to the value of a short-lived local inside a long-lived object. Were that legal then using the stack as a memory management technique would no longer be a viable optimization; value types would be just another kind of reference type and would have to be garbage collected.
  • Anonymous function closures and iterator block closures are implemented behind-the-scenes by turning locals and formal parameters into fields. So now you know why it is illegal to capture a ref or out formal parameter in an anonymous function or iterator block; doing so would not be a legal field.

Of course we do not want to have ugly and bizarre rules in the language like “you can close over any local or value parameter but not a ref or out parameter”. But because we want to be able to get the optimization of putting value types on the stack, we have chosen to put this odd restriction into the language. Design is, as always, the art of finding compromises.

  • Finally, the CLR does allow “ref return types”; you could in theory have a method “ref int M() { … }” that returned a reference to an integer variable. If for some bizarre reason we ever decided to allow that in C#, we’d have to fix up the compiler and verifier so that they ensured that it was only possible to return refs to variables that were known to be on the heap, or known to be “lower down” on the stack than the callee. (NOTE FROM 2021: This feature was added to C# 7.)

So there you go. Local variables of value type go on the stack because they can. They can because (1) “normal” locals have strict ordering of lifetime, and (2) value types are always copied by value and (3) it is illegal to store a reference to a local anywhere that could live longer than the local. By contrast, reference types have a lifetime based on the number of living references, are copied by reference, and those references can be stored anywhere. The additional flexibility that reference types give you comes at the cost of a much more complex and expensive garbage collection strategy.

But again, all of this is an implementation detail. Using the stack for locals of value type is just an optimization that the CLR performs on your behalf. The relevant feature of value types is that they have the semantics of being copied by value, not that sometimes their deallocation can be optimized by the runtime.

The Stack Is An Implementation Detail, Part One

I blogged a while back about how “references” are often described as “addresses” when describing the semantics of the C# memory model. Though that’s arguably correct, it’s also arguably an implementation detail rather than an important eternal truth. Another memory-model implementation detail I often see presented as a fact is “value types are allocated on the stack”. I often see it because of course, that’s what our documentation says.

Almost every article I see that describes the difference between value types and reference types explains in (frequently incorrect) detail about what “the stack” is and how the major difference between value types and reference types is that value types go on the stack. I’m sure you can find dozens of examples by searching the web.

I find this characterization of a value type based on its implementation details rather than its observable characteristics to be both confusing and unfortunate. Surely the most relevant fact about value types is not the implementation detail of how they are allocated, but rather the by-design semantic meaning of “value type”, namely that they are always copied “by value”. If the relevant thing was their allocation details then we’d have called them “heap types” and “stack types”. But that’s not relevant most of the time. Most of the time the relevant thing is their copying and identity semantics.

I regret that the documentation does not focus on what is most relevant; by focusing on a largely irrelevant implementation detail, we enlarge the importance of that implementation detail and obscure the importance of what makes a value type semantically useful. I dearly wish that all those articles explaining what “the stack” is would instead spend time explaining what exactly “copied by value” means and how misunderstanding or misusing “copy by value” can cause bugs.

Of course, the simplistic statement I described is not even true. As the MSDN documentation correctly notes, value types are allocated on the stack sometimes. For example, the memory for an integer field in a class type is part of the class instance’s memory, which is allocated on the heap. A local variable is hoisted to be implemented as a field of a hidden class if the local is an outer variable used by an anonymous method(*) so again, the storage associated with that local variable will be on the heap if it is of value type.

But more generally, again we have an explanation that doesn’t actually explain anything. Leaving performance considerations aside, what possible difference does it make to the developer whether the CLR’s jitter happens to allocate memory for a particular local variable by adding some integer to the pointer that we call “the stack pointer” or adding the same integer to the pointer that we call “the top of the GC heap”? As long as the implementation maintains the semantics guaranteed by the specification, it can choose any strategy it likes for generating efficient code.

Heck, there’s no requirement that the operating system that the CLI is implemented on top of provide a per-thread one-meg array called “the stack”. That Windows typically does so, and that this one-meg array is an efficient place to store small amounts of short-lived data is great, but it’s not a requirement that an operating system provide such a structure, or that the jitter use it. The jitter could choose to put every local “on the heap” and live with the performance cost of doing so, as long as the value type semantics were maintained.

Even worse though is the frequently-seen characterization that value types are “small and fast” and reference types are “big and slow”. Indeed, value types that can be jitted to code that allocates off the stack are extremely fast to both allocate and deallocate. Large structures heap-allocated structures like arrays of value type are also pretty fast, particularly if you need them initialized to the default state of the value type. And there is some memory overhead to ref types. And there are some high-profile cases where value types give a big perf win. But in the vast majority of programs out there, local variable allocations and deallocations are not going to be the performance bottleneck.

Making the nano-optimization of making a type that really should be a ref type into a value type for a few nanoseconds of perf gain is probably not worth it. I would only be making that choice if profiling data showed that there was a large, real-world-customer-impacting performance problem directly mitigated by using value types. Absent such data, I’d always make the choice of value type vs reference type based on whether the type is semantically representing a value or semantically a reference to something.


(*) Or in an iterator block.

Restating the problem

A problem statement:

I am trying to loop though a sequence of strings. How can I determine when I am on the last item in a sequence? I don’t see how to do it with foreach.

Indeed, foreach does not make it easy to know when you are almost done. Now, if I were foolish enough to actually answer the question as stated, I’d probably say something like this:

You can do so by eschewing foreach and rolling your own loop code that talks directly to the enumerator:

IEnumerable<string> items = GetStrings();
IEnumerator<string> enumtor = items.GetEnumerator();
if (enumtor.MoveNext())
{
  string current = enumtor.Current;
  while(true)
  {
    if (enumtor.MoveNext())
    {
      // current is not the last item in the sequence.  
      DoSomething(current); 
      current = enumtor.Current;
    }
    else
    {
      // current is the last item in the sequence. 
      DoSomethingElse(current); 
      break;
    }
  }
}
else
{
    // handle case where sequence was empty
}

Yuck. This is horrid. A fairly deep nesting level, some duplicated code, the mechanisms of iteration overwhelm any semantic meaning in the code, and it all makes my eyes hurt.

When faced with a question like that, rather than writing this horrid code I’ll usually push back and ask “why do you want to know?”

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”.  After each item I want to place a comma except not after the last item. So I need to know when I’m on the last item.

Well knowing that certainly makes the problem a whole lot easier to solve, doesn’t it? A whole bunch of techniques come to mind when given the real problem to solve:

First technique: find an off-the-shelf part that does what you want.

In this case, call the ToArray extension method on the sequence and then pass the whole thing to String.Join and you’re done.

Second technique: Do more work than you need, and then undo some of it.

Using a string builder, to build up the result, put a comma after every item. Then “back up” when you’re done and remove the last comma (if there were any items in the sequence, of course).

Third technique: re-state the problem and see if that makes it any easier.

Consider this equivalent statement of the problem:

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”. Before each item I want to place a comma except not before the first item. So I need to know when I’m on the first item.

Suddenly the problem is much simpler. It’s easy to tell if you’re on the first item in a sequence by just setting an “at the first item” flag outside the foreach loop and clearing it after going through the loop.

When I have a coding problem to solve and it looks like I’m about to write a bunch of really gross code to do so, it’s helpful to take a step back and ask myself “Could I state the problem in an equivalent but different way?” I’ll try to state a problem that I’ve been thinking about iteratively in a recursive form, or vice versa. Or I’ll try to find formalizations that express the problem in mathematical terms. (For example, the method type inference algorithm can be characterized as a graph theory problem where type parameters are nodes and dependency relationships are edges.) Often by doing so I find that the new statement of the problem corresponds more directly to clear code.


Commentary from 2020

I should have anticipated when writing this that commenters would not focus on the general point I was making — that reformulating a specific problem can lead to a more elegant solution — but would rather get nerd sniped into solving the specific problem. I mean, it was 2009. I had read the comments to Jeff’s 2007 article on FizzBuzz, and I knew that, as Jeff said “it’s amusing to me that any reference to a programming problem immediately prompts developers to feverishly begin posting solutions.”

Unsurprisingly, the comments to this article were mostly alternative solutions for the comma-separated string problem, and then critiques of those solutions. I embraced this in my follow-up post, Comma Quibbling, where I explicitly solicited solutions to a harder version of the problem.

A selection of the solutions left in comments and some critiques is below.


Note that at the time this was written, string.Join did not take a sequence, it took an array. My preferred solution suggested by a reader would be to write your own extension method that has the pleasant interface you want:

public static string Join(
  this IEnumerable<string> source, string separator)
{
  return String.Join(separator, source.ToArray());
}

But I usually go one step farther and write something like:

public static string Comma(this IEnumerable<string> source)
{
  return source.Join(",");
}

Other solutions that were suggested and critiqued:

list.Aggregate((x, y) => x + "," + y) is a Schelmiel solution.

list.Skip(1).Aggregate(
  new StringBuilder().Append(list.First()),
  (buf, e) => buf.Append(",").Append(e),
  buf => buf.ToString());

is a non-Schlemiel solution that builds an iterator twice, which is considered inelegant.

This terrible solution calls a potentially linear count method twice on every iteration and is a Schlemiel solution:

string newString = string.Empty;
for (int i = 0; i < items.Count(); i++)
   newString += items[i] + (i == items.Count() - 1 ? string.Empty : ",");

If we know that the items are not empty strings, which seems reasonable, then we can use the length of the accumulated buffer as a “not first” flag:

StringBuilder sb = new StringBuilder();
foreach (var item in items)
{
 if (sb.Length > 0)
   sb.Append(",");
 sb.Append(item);
}

We can also avoid a flag entirely by doing an extra empty-string append:

StringBuilder sb = new StringBuilder();
String comma = "";
foreach(item in list) {
 sb.Add(comma);
 sb.Add(item);
 comma = ",";
}

A number of commenters pointed out that this problem becomes much easier in languages where lists are typically in head::tail form rather than arrays; one commenter even went so far as to implement an immutable stack in their comment! (I know how to write an immutable stack but thanks for the effort.) I’m not sure why this fact is relevant.

One commenter stated that the best solution is to treat the comma as coming before every item but the first, not after every item but the last, and gave a solution for that. The same commenter then pointed out in a follow-up comment that he had written the comment before reading the entire post. Since the title of the piece is “restating the problem” I suspect that this commenter also did not read the title before banging out a solution.

My favourite comment though was a link to Jon Skeet’s 2007 article where he solves the more general problem in an elegant way. If you want to enumerate a sequence and do something different if you are on the first or last item, just make an interface that provides that information. We are computer programmers; making abstractions that solve the stated problem is what we do.

 

Locks and exceptions do not mix

A couple years ago I wrote a bit about how our codegen for the lock statement could sometimes lead to situations in which an unoptimized build had different potential deadlocks than an optimized build of the same source code. This is unfortunate, so we’ve fixed that for C# 4.0. However, all is still not rainbows, unicorns and Obama, as we’ll see.
Continue reading

Representation and identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T (or Nullable<T>.) Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.
Continue reading

References are not addresses

[NOTE: Based on some insightful comments I have updated this article to describe more clearly the relationships between references, pointers and addresses. Thanks to those who commented.]

I review a fair number of C# books; in all of them of course the author attempts to explain the difference between reference types and value types. Unfortunately, most of them do so by saying something like “a variable of reference type stores the address of the object“. I always object to this. The last time this happened the author asked me for a more detailed explanation of why I always object, which I shall share with you now:

We have the abstract concept of “a reference”. If I were to write about “Beethoven’s Ninth Symphony”, those two-dozen characters are not a 90-minute long symphonic masterwork with a large choral section. They’re a reference to that thing, not the thing itself. And this reference itself contains references — the word “Beethoven” is not a long-dead famously deaf Romantic Period composer, but it is a reference to one.

Similarly in programming languages we have the concept of “a reference” distinct from “the referent”.

The inventor of the C programming language, oddly enough, chose to not have the concept of references at all. Rather, Ritchie chose to have “pointers” be first-class entities in the language. A pointer in C is like a reference in that it refers to some data by tracking its location, but there are more smarts in a pointer; you can perform arithmetic on a pointer as if it were a number, you can take the difference between two pointers that are both in the interior of the same array and get a sensible result, and so on.

Pointers are strictly “more powerful” than references; anything you can do with references you can do with pointers, but not vice versa. I imagine that’s why there are no references in C — it’s a deliberately austere and powerful language.

The down side of pointers-instead-of-references is that pointers are hard for many novices to understand, and make it very very very easy to shoot yourself in the foot.

Pointers are typically implemented as addresses. An address is a number which is an offset into the “array of bytes” that is the entire virtual address space of the process (or, sometimes, an offset into some well-known portion of that address space — I’m thinking of “near” vs. “far” pointers in win16 programming. But for the purposes of this article let’s assume that an address is a byte offset into the whole address space.) Since addresses are just numbers you can easily perform pointer arithmetic with them.

Now consider C#, a language which has both references and pointers. There are some things you can only do with pointers, and we want to have a language that allows you to do those things (under carefully controlled conditions that call out that you are doing something that possibly breaks type safety, hence “unsafe”.)  But we also do not want to force anyone to have to understand pointers in order to do programming with references.

We also want to avoid some of the optimization nightmares that languages with pointers have. Languages with heavy use of pointers have a hard time doing garbage collection, optimizations, and so on, because it is infeasible to guarantee that no one has an interior pointer to an object, and therefore the object must remain alive and immobile.

For all these reasons we do not describe references as addresses in the specification. The spec just says that a variable of reference type “stores a reference” to an object, and leaves it completely vague as to how that might be implemented. Similarly, a pointer variable stores “the address” of an object, which again, is left pretty vague. Nowhere do we say that references are the same as addresses.

So, in C# a reference is some vague thing that lets you reference an object. You cannot do anything with a reference except dereference it, and compare it with another reference for equality. And in C# a pointer is identified as an address.

By contrast with a reference, you can do much more with a pointer that contains an address. Addresses can be manipulated mathematically; you can subtract one from another, you can add integers to them, and so on. Their legal operations indicate that they are “fancy numbers” that index into the “array” that is the virtual address space of the process.

Now, behind the scenes, the CLR actually does implement managed object references as addresses to objects owned by the garbage collector, but that is an implementation detail. There’s no reason why it has to do that other than efficiency and flexibility. C# references could be implemented by opaque handles that are meaningful only to the garbage collector, which, frankly, is how I prefer to think of them. That the “handle” happens to actually be an address at runtime is an implementation detail which I should neither know about nor rely upon. (Which is the whole point of encapsulation; the client doesn’t have to know.)

I therefore have three reasons why authors should not explain that “references are addresses”.

1) It’s close to a lie. References cannot be treated as addresses by the user, and in fact, they do not necessarily contain an address in the implementation. (Though our implementation happens to do so.)

2) It’s an explanation that explains nothing to novice programmers. Novice programmers probably do not know that an “address” is an offset into the array of bytes that is all process memory. To understand what an “address” is with any kind of depth, the novice programmer already has to understand pointer types and addresses — basically, they have to understand the memory model of many implementations of C. This is one of those “it’s clear only if it’s already known” situations that are so common in books for beginners.

3) If these novices eventually learn about pointer types in C#, their confused understanding of references will probably make it harder, not easier, to understand how pointers work in C#. The novice could sensibly reason “If a reference is an address and a pointer is an address, then I should be able to cast any reference to a pointer in unsafe code, right?”  But you cannot.

If you think of a reference is actually being an opaque GC handle then it becomes clear that to find the address associated with the handle you have to somehow “fix” the object. You have to tell the GC “until further notice, the object with this handle must not be moved in memory, because someone might have an interior pointer to it”. (There are various ways to do that which are beyond the scope of this screed.)

Basically what I’m getting at here is that an understanding of the meaning of “addresses” in any language requires a moderately deep understanding of the memory model of that language. If an author does not provide an explanation of the memory model of either C or C#, then explaining references in terms of addresses becomes an exercise in question begging. It raises more questions than it answers.

This is one of those situations where the author has the hard call of deciding whether an inaccurate oversimplification serves the larger pedagogic goal better than an accurate digression or a vague hand-wave.

In the counterfactual world where I am writing a beginner C# book, I would personally opt for the vague hand-wave.  If I said anything at all I would say something like “a reference is actually implemented as a small chunk of data which contains information used by the CLR to determine precisely which object is being referred to by the reference”. That’s both vague and accurate without implying more than is wise.

Long division

I was recently asked by a reader why in C#, int divided by long has a result of long, even though it is clear that when an int is divided by a (nonzero) long, the result always fits into an int.

I agree that this is a bit of a head scratcher. After scratching my head for a while, two reasons to not have the proposed behaviour came to mind.

First, why is it even desirable to have the result fit into an int? You’d be saving merely four bytes of memory and probably cheap stack memory at that. You’re already doing the math in longs anyway; it would probably be more expensive in time to truncate the result down to int. Let’s not have a false economy here. Bytes are cheap.

A second and more important reason is illustrated by this case:

long x = whatever;
x = 2000000000 + 2000000000 / x;

Suppose x is one. Should this be two integers, each equal to two billion, added together with unchecked integer arithmetic, resulting in a negative number which is then converted to a long? Or should this be the long 4000000000 ?

Once you start doing a calculation in longs, odds are good that it is because you want the range of a long in the result and in all the calculations that get you to the result.