What’s the difference between a trenchcoat and a duster?

Today, yet another episode in my ongoing series “What’s the difference?” This time, a non-computer-related topic.

I am often complimented on my choice of outerwear in the Seattle rainy season, and I hate to respond to a well-meant compliment with a correction. So I usually let all those “Nice trenchcoat!” comments slide and just say “Thanks!” But as a public service, let me lay it out for you so that you don’t make the same mistake. Here we see David Tennant as the Tenth Doctor wearing a classic example of a trenchcoat: (Click for a larger version.)

TenthDoctor

The trenchcoat is a long waterproof coat, traditionally made of gabardine. The term originated in the trenches of the First World War, due to the popularity of this style of coat amongst officers in the British armed forces. The trench coat is not merely a functional warm raincoat but also stylish, with long wide lapels and decorative buttons. The trenchcoat is often belted and might be tailored in at the waist, particularly for women’s trenchcoats.

A duster is also a long waterproof coat that is often referred to as a “trenchcoat” — but as you’ll see, it is quite different in its details. Here’s the duster I wear, an Australian-made Driza-Bone:

DrizaBone

Note the lack of decorative elements, the flap over the closure, the no-lapel collar (which clasps shut, completely enclosing the neck if necessary) and the built-in extra rain protection on the shoulders. (*) Dusters are typically made of oilcloth and are built for handling the practicalities of herding sheep in the rain, not for style (**).

Not shown in this view: the interior includes straps that let you attach the bottom of the coat to your legs, so that it does not blow around when you are on horseback. Also, the back is cut in such a way that you can cover both your legs and the rear portion of the saddle with the coat. I usually take the bus and not a horse to work, but still it’s nice to know that options are available should I need them. These practical elements are usually not present in trenchcoats.

Right, glad we got that sorted out.

Next time on FAIC: What is binding, and why is it always either early or late? Can’t it ever be on time?


(*) Duster manufacturers always hasten to point out that the shoulders are already waterproof; the extra layer keeps your shoulders warmer by shedding rain more effectively.

(**) There are, of course, some dusters built for style; if you watch the “Matrix” series of movies you’ll see the heroes wear an assortment of extremely stylish dusters and trenchcoats both.

Every public change is a breaking change

This article was inspired by a question from StackOverflow.


Here’s an inconvenient truth: just about every “public surface area” change you make to your code is a potential breaking change.

First off, I should clarify what I mean by a “breaking change” for the purposes of this article. If you provide a component to a third party, then a “breaking change” is a change such that the third party’s code compiled correctly with the previous version, but the change causes a recompilation to fail. (A more strict definition would be that a breaking change is one where the code recompiles successfully but has a different meaning; for today we will just consider actual “build breaks”.)

A “potential” breaking change is a change which might cause a break, if the third party happens to have consumed your component in a particular way. By a “public surface area” change, I mean a change to the “public metadata” surface of a component, like adding a new method, rather than changing the behaviour of an existing method by editing its body. (Such a change would typically cause a difference in runtime behaviour, rather than a build break.)

Some public surface area breaking changes are obvious: making a public method into a private method, sealing an unsealed class, and so on. Third-party code that called the method, or extended the class, will break. But a lot of changes seem a lot safer; adding a new public method, for example, or making a read-only property into a read-write property. As it turns out, almost any change you make to the public surface area of a component is a potential breaking change. Let’s look at some examples. Suppose you add a new overload:

// old component code:
public interface IFoo {...}
public interface IBar { ... }
public class Component
{
    public void M(IFoo x) {...}
}

Suppose you then later add

    public void M(IBar x) {...}

to Component. Suppose the consumer code is:

// consumer code:
class Consumer : IFoo, IBar
{
   ...
   component.M(this);
   ...
}

The consumer code compiles successfully against the original component, but recompiling it with the new component suddenly the build breaks with an overload resolution ambiguity error. Oops.

What about adding an entirely new method?

// old component code:
...
public class Component
{
  public void MFoo(IFoo x) {...}
}

and now you add

public void MBar(IBar x) {...}

No problem now, right? The consumer could not possibly have been consuming MBar. Surely adding it could not be a build break on the consumer, right?

class Consumer
{
    class Blah
    {
        public void MBar(IBar x) {}
    }
    static void N(Action<Blah> a) {}
    static void N(Action<Component> a) {}
    static void D(IBar bar)
    {
        N(x=>{ x.MBar(bar); });
    }
}

Oh, the pain.

In the original version, overload resolution has two overloads of N to choose from. The lambda is not convertible to Action<Component> because typing formal parameter x as Component causes the body of the lambda to have an error. That overload is therefore discarded. The remaining overload is the sole applicable candidate; its body binds without error with x typed as Blah.

In the new version of Component the body of the lambda does not have an error; therefore overload resolution has two candidates to choose from and neither is better than the other; this produces an ambiguity error.

This particular “flavour” of breaking change is an odd one in that it makes almost every possible change to the surface area of a type into a potential breaking change, while at the same time being such an obviously contrived and unlikely scenario that no “real world” developers are likely to run into it. When we are evaluating the impact of potential breaking changes on our customers, we now explicitly discount this flavour of breaking change as so unlikely as to be unimportant. Still, I think its important to make that decision with eyes open, rather than being unaware of the problem.


This article elicited many reader comments:

You pretty much have to ignore these possible problems, the same way you can’t guess what extension methods we’ve made that may be silently overridden by new methods in a class.

That was our conclusion, yes.

Could you give an example where making a read-only property into a read-write property would result in a breaking change? I can’t think of any…

Sure:

class C { public int P { get; set; } } 
class D { public int P { get; private set; } } 
class E { 
  static void M(Action<C> ac){} 
  static void M(Action<D> ad) {} 
  static void X() { M(q=>{q.P = 123; }); } 
}

The body of X binds without error as long as D.P‘s setter is private. If it becomes public then the call to M is ambiguous.

I’m curious to see how unsealing a class can cause code to stop compiling.

Same trick. This trick is surprisingly versatile.

class B {}
sealed class C {}
class D : B, I {}
interface I {}
class P {
  static void M(Func<C, I> fci){}
  static void M(Func<B, I> fbi){} // special agent!
  static void Main() { M(q=>q as I); }
}

That compiles successfully because q as I is illegal if q is of type C. The compiler knows that C does not implement I, and because it is sealed, it knows that no possible type that is compatible with C could implement I. Therefore overload resolution chooses the func from B to I, because a derived type B, say, D, could implement I. When C is unsealed then overload resolution has no basis to choose one over the other, so it becomes an ambiguity error.

Seems to me you would be much better off in this regard avoiding the new functional features and sticking to the C# 2.0 specifications. That way you’re pretty much protected against breakages like these right?

At what cost? Is it worthwhile to eschew the benefits of LINQ in order to avoid the drawbacks of some obscure, unlikely and contrived breaking changes? The benefits of LINQ outweigh the costs by a huge margin, in my opinion.

You know, all these examples make me think overload resolution is just too clever for its own good — these kinds of algorithms are more the kind you’d expect in the optimizing part, where cleverness abounds, not in the basic semantic part. Clearly, the solution is to abolish overload resolution, assign every method an unambiguous canonical name (that’s stable in the face of changes) and force the developer to specify this name on every call. Alternatively, make overload resolution unnecessary by forbidding overloads. Not sure how to handle generics in such a world — abolishing them seems too harsh. (Mandatory Internet disclaimer: the above is facetious, but slightly ha-ha-only-serious.)

There are languages that do not have overload resolution. A .NET language could, for instance, specify the unique metadata token associated with the method definition. But in a world without overload resolution, how would you do query comprehensions?

Keep it secret, keep it safe

A lot of people really love the idea of cryptography. To computer geeks like us there is nothing cooler than the idea that computing relatively simple arithmetic on a message can enable you to communicate secretly with anyone in the world, even if there are eavesdroppers. Unfortunately, this means that there are a lot of computer people out there who are enamored with the idea of creating their own cryptosystems, or their own implementations of the cryptosystems invented by others. It also means there are a lot of software product managers who are convinced that security equals cryptography, and that their products will magically become “secure” if only there is more crypto in them.

I’ve gotten a fair number of questions over the years about how to add crypto to applications — a subject that I am actually not an expert on — and most of the time the answer is “don’t”. Many times, the question comes from a developer who, though an expert developer of applications in their line of business doesn’t “get” the basic idea of cryptography. As a public service, let me lay it out for you. The fundamental idea of modern cryptography is this:

The strength of the security of a large quantity of data — known as the “plaintext” — against discovery or modification by a motivated attacker depends upon the security of a small quantity of data — known as the “key”. (*)

That is, modern crypto is essentially a form of mechanical advantage. With a gearing system or a lever you can turn a small motion into a large motion. With a strong cryptosystem you can turn the security of a 1 KB key file into the security of a 10 MB data file. Cryptosystems do not manufacture new security, any more than a lever manufactures new motion. Cryptosystems turn the security of one thing (the key) into the security of another much larger thing (the plaintext).

It is the failure to understand that fundamental idea that underlies most of the questions I get from non-crypto experts about implementing crypto in their applications. Non-crypto experts get enamoured with the math of the cryptosystem but that is not the part that provides the security. The part that provides the security is the security of the key. The cryptosystem itself (or its implementation) might be weak or strong, but even the strongest modern cryptosystem depends fundamentally on the correct management of the keys for its security .

Before the 1970’s, all modern cryptosystems were “shared key” systems. That is, if Alice wanted to send a message to Bob over an insecure channel that had attackers attempting to either read or modify the message, then first Alice and Bob would somehow communicate a secret shared key over a secure channel. Once Alice and Bob both had the shared key, Alice could encrypt the message with the shared key, send it over the insecure channel, and Bob could decrypt it with the shared key. Bob would then have the plaintext message, but the eavesdropper would only have the encrypted message. (There are also techniques in shared-key systems whereby Bob could verify that no one tampered with the encrypted message while it was in the insecure channel.)

Clearly the security of this system depends entirely on there being a secure channel over which Alice and Bob can negotiate what their small shared key is. If they try to exchange the secret key over an insecure channel then an eavesdropper can determine what the secret key is, and it’s no longer a secret. (As we’ll see later, a more aggressive “man in the middle” can cause even more havoc.)

You might immediately wonder why Alice and Bob need to use crypto at all if a basic assumption is that they have a secure method for key exchange. Why not just use the required-to-be-secure channel for sending the unencrypted text? The point is that the secure channel might be extremely expensive, or it might be available only at certain times. For example, the “secure channel” might be that Alice and Bob meet for coffee in Seattle in their hidden underground bunker, decide on a secret key, and then Alice moves to Switzerland and Bob moves to the Cayman Islands, and they can no longer cheaply meet for coffee. But they can still communicate securely after their key exchange, even if they are no longer in the same room together.

The security of the system also depends upon them both Alice and Bob being able to keep that key a secret. If Alice or Bob reveals the key to a third party — say Eve, the eavesdropper — then Eve can discover the plaintext of every message sent over the insecure channel. Worse, if the key is discovered by Mallory — the “man in the middle” modifying the message — then she can not only discover the plaintext but can modify the plaintext, encrypt the modified message with the secret key, and send the fake message to Alice or Bob, purporting to be the other.

What about public key cryptosystems? Don’t they solve this problem? Turns out, no. Now you have four key management problems. Let me explain:

The idea of a public key cryptosystem is that there are two keys. One, the private key, is kept secret. The other, the public key, is — you guessed it — public. A message encrypted with the public key cannot be decrypted with the public key, but it can be decrypted with the private key, and vice versa. Alice and Bob both have a key pair; they do not know each other’s private keys, but they do know each other’s public keys. To send a message to Bob, Alice encrypts it with her private key(**), and then encrypts that with Bob’s public key. She sends the twice-encrypted message over the insecure channel to Bob. Bob decrypts it with his private key, and then decrypts the result with Alice’s public key. Now he knows that only he could read the message, because only he has his private key. And he knows that it was not tampered with, because only Alice’s public key could decrypt the message. Therefore the message was neither read by Eve nor tampered with by Mallory.

Like I said, we now have four key management problems whereas before we had one. Before, Alice and Bob had to securely exchange their private key and then keep it secret. Now Alice and Bob both have to keep their private keys secret. If Bob’s private key is compromised then Eve can read any message sent by Alice to Bob. And if Alice’s private key is compromised then Mallory can send a fake message to Bob purporting to come from Alice.

That’s two; what are the other two key management problems? I completely glossed over the most important part of the system! The detail that the security of the entire system rests on is that bit above where I said “but they do know each other’s public keys”. Somehow they had to securely exchange public keys. Why’s that?

Suppose Alice and Bob are sending each other their public keys. How do they do that? If they do it over an insecure channel, it does not matter if Eve can read the public keys. They are public, after all. But it matters very much if Mallory can modify the keys in transit! If Alice sends her public key to Bob over an insecure channel then Mallory can intercept the message and replace Alice’s public key with Mallory’s public key. Mallory now knows Alice’s public key, and Bob believes that Mallory is Alice! When Alice sends a message to Bob, Mallory can intercept it and replace it with a different message encrypted with Mallory’s private key and Bob’s public key. Bob decrypts the message with his private key and Mallory’s public key, believing it to be Alice’s public key, and Mallory has tricked Bob into thinking that she is Alice.

Similarly, when Bob sends his public key to Alice, Mallory can intercept it and replace it with Mallory’s public key. When Alice sends a message to Bob she encrypts it with her private key and Mallory’s public key, thinking that it is Bob’s public key. Mallory can intercept it, decode it with her private key, read the message, and re-send it along to Bob, encrypted with Bob’s real public key.

Somehow there has to be a secure channel by which public keys are exchanged. Remember, the entire point of crypto is to turn the security of a small amount of data into the security of a large amount of data. If the keys aren’t secure — either because the private keys are compromised or the public keys are not correctly associated with their real owners — then the communication is not secure at all.

This is not of theoretical concern; we have to solve this problem in real life. When you go to a web site that is going to take your credit card number, you want some assurances that first, no eavesdropper is going to be able to see that credit card number when it goes over the wire, and second, that when you send your credit card to a web site, you really are communicating with the real web site, not a fake hacker web site that looks just like it. You might not know the public key of the web site! Without a secure mechanism to obtain that public key, you might be obtaining the man-in-the-middle’s public key.

This problem is in practice solved by both the client (you) and the server (the web site) agreeing to trust a third party called the Certifying Authority, or CA. When the web site sends you the certificate containing their name and public key, the certificate is encrypted with the CA’s private key. By decrypting it with the CA’s public key, you can verify that the CA vouches for that web site actually being associated with that public key. Of course, if you do not trust the certifying authority, or the web site does not obtain certification from that certifying authority, then the key negotiation fails and the browser gives you some error message about the certificate not being verified.

But hold on a minute. Now we have yet another key management problem. The CA has to keep their private key a secret, and the CA has to somehow tell you what their public key is without being attacked by a man in the middle purporting to be the CA! We seem to have not solved the problem at all, but rather just pushed it off another level. How do we finally solve this problem once and for all? Can we use even more crypto?

Nope. This is the point where the crypto stops. (***) It has to stop somewhere; again, the point of crypto is to turn the security of a small thing into the security of a large thing; that security has to come from somewhere originally, just as the energetic motion of a lever has to come from somewhere outside the lever mechanism. The secure transmission of the CA’s public key is the piece that has to be accomplished without using any crypto. How you accomplish that is up to you; typically the operating system comes pre-installed with a list of known public keys of certifying authorities that have been verified independently by the operating system vendor. New CA “root” certificates can also be installed by your machine or network administrator. The CA roots are the organizations you trust to make security-impacting decisions on your behalf.

I seem to have strayed somewhat from my original point, but I hope I’ve made myself clear: the hard part of a secure design that uses crypto is not the math. When adding crypto to an application, the fundamental question you should be asking yourself is not “what math am I going to do on the plaintext to encrypt it?” The fundamental question is “how are my users going to generate keys, securely exchange secret keys or public keys, and keep the secret keys private?” The security of the entire system rests upon being able to leverage the secure generation and distribution of the keys into the security of the message, so solve the key management problem first.


(*) A cryptosystem is strong or weak to the extent that it delivers on this fundamental goal. If the security of the message can be compromised without the user compromising the key then the cryptosystem is weak. The goal of modern crypto is to create cryptosystems that deliver on this promise.

(**) In a realistic scenario she encrypts a hash and appends that to the message, because that is higher performance. But let’s gloss over that detail.

(***) Just to complete the sketch: the way HTTPS actually works is that a shared “session” key is securely exchanged between the client and the server. The shared key is then used to encrypt and decrypt all the messages sent between the client and the server. As we already discussed, to exchange that shared key we need a secure channel. To obtain a secure channel, the client and the server agree upon a mutually trusted CA. The server convinces the client that the agreed-upon CA vouches for the server’s public key, and so the client now has the real public key of the server. The client can then suggest a shared session key to use for the rest of the communication, and send it to the server encrypted with the server’s public key. So: the secure transmission of the shared session key relies upon (1) the secrecy of the server’s private key, (2) the secrecy of the CA’s private key, and (3) that the client and the server both have an accurate copy of the CA’s public key, obtained through some non-crypto-based secure channel. The cryptosystem leverages the secrecy of two private keys and the successful transmitting of one public key into the security of the messages transmitted in the session. If any of those keys are compromised then the whole system falls apart. (To deal with that problem, CA’s also provide the service of publishing a “revocation list” of servers that have lost control of their private keys, so that you know to not trust those guys.)


The MSFT archive of this article is here.

What is this thing you call a “type”? Part Two

Well that was entirely predictable; as I said last time, if you ask ten developers for a definition of “type”, you get ten different answers. The comments to the previous article make for fascinating reading!

Here’s my attempt at describing what “type” means to me as a compiler writer. I want to start by considering just the question of what a type is and not confuse that with how it is used.

Fundamentally, a type in C# is a mathematical entity that obeys certain algebraic rules, just as natural numbers, complex numbers, quaternions, matrices, sets, sequences, and so on, are mathematical entities that obey certain algebraic rules.

By way of analogy, I want to digress for a moment and ask the question “what is a natural number? ” That is, what fundamentally characterizes the numbers 0, 1, 2, 3, … that we use to describe the positions of items in a sequence and the sizes of everyday collections of things?

This question has received a lot of attention over the centuries; the definitive answer that we still use today was created by Giuseppe Peano in the 19th century. Peano said that a natural number is defined as follows:

  • Zero is a natural number.
  • Every natural number has a “successor” natural number associated with it.
  • No natural number has zero as its successor.
  • Unequal natural numbers always have unequal successors.
  • If you start from zero, take its successor, and then take the successor of that, and so on, you will eventually encounter every natural number. (*)

Surprisingly, that’s all you need. Any mathematical entity that satisfies those postulates is usable as a natural number. (**) Notice that there’s nothing in there whatsoever about adding natural numbers together, or subtracting, multiplying, dividing, taking exponents, and so on. All of those things can be bolted on as necessary. For example, we can say that addition is defined as follows:

  • (n + 0) = n
  • (n + (successor of m)) = ((successor of n) + m)

And you’re done; you’ve got a recursive definition of addition. We can similarly define “less than”:

  • (n < 0) = false
  • (0 < (successor of m)) = true
  • ((successor of n) < (successor of m)) = (n < m)

We can define every operation on natural numbers in this way; try defining multiplication, just for fun. (Hint: assume that you’ve already defined addition.)

We can come up with a similar “axiomatic” definition of “type”:

  • Object is a type
  • “Declared” types are types (note that int, uint, bool, string, and so on can be considered “declared” types for our purposes; the runtime “declares” them for you.)
  • If T is a type and n is a positive integer then “n-dimensional array of T” is also a type.

And that’s pretty much it as far as “safe” C# 1.0 is concerned. (To be as strict as the Peano axioms for natural numbers we’d want to also throw in some similar safeguards; for example, we don’t ever want to be in a situation where the type “one-dimensional array of double” has type equality with the type “int”, just as we don’t ever want to be in a situation where the successor of a natural number is zero.)

Things get a bit more complex when we throw in generic types and pointer types, but I’m sure you can see that we could come up with a precise axiomatic description of generic types and pointer types with a little bit of work. That is, type parameter declarations are types, generic type declarations constructed with the right number of type arguments are types, and so on.

We can then start piling on algebraic operations. Just as we defined “less than” on numbers, we can define the “is a subtype of” relation on types.

  • T <: object is true for any type T that is not object
  • object <: T is false for any type T
  • if S <: T and T <: U then S <: U
  • … and so on…

Just as I do not care how numbers are “implemented” in order to manipulate them algebraically, I also do not care how types are “implemented” in order to manipulate them with the rules of type algebra. All I care about are the axioms of the system, and the rules that define the algebraic operators that I’ve made up for my own convenience.

So there you go; we have a definition of “type” that does not say anything whatsoever about “a set of values” or “a name”. A type is just an abstract mathematical entity, a bit like a number, that obeys certain defining axioms, and therefore can be manipulated algebraically by making up rules for useful operators — just like numbers can be manipulated abstractly according to algebraic rules that we make up for them.

Now that we have a sketch of an axiomatic definition of what a type is, what are we going to do with it?

Perhaps the most fundamental purposes of static type checking in a programming language like C# are to associate a type with every relevant storage location, to associate a type with every (†) relevant compile-time expression, and to then ensure that it is impossible for a value associated with an incompatible type to be written to or read from any storage location. A compile-time proof of runtime type safety (  ): that’s the goal.

The key word there is proof; now that we have developed an axiomatic definition of “type” we can start constructing proofs based on these axioms. The C# specification defines:

  • what type is associated with every compile-time expression; of course, expressions whose types cannot be determined might be program errors
  • what type is associated with every storage location
  • what constitutes an acceptable assignment from an expression of given type ( ‡‡ to a storage location of a given type

The task of the compiler writer is then to implement those three parts of the specification: associating a type with every expression, associating a type with every storage location, and then constructing a proof that the assignment is valid given the rules. If the compiler is able to construct a proof then the assignment is allowed. The tricky bit is this: the specification typically just gives the rules of the system, not a detailed algorithm describing how to construct proofs using those rules. If any proof exists then the compiler is required to find it. Conversely, if no proof exists, the compiler is required to deduce that too and produce a suitable compile-time type error.

Unfortunately, as it turns out, that’s impossible in general. A system where it is possible for every expression to be classified as either “type safe” or “type unsafe” in a finite number of logical steps is called a “decidable” system. As Gödel famously proved, natural number arithmetic as axiomatized above is undecidable; there are statements in formal arithmetic that can be neither proved nor disproved in a finite number of logical steps. Assignment type checking is also in general undecidable in programming languages with both nominal subtyping ( ‡‡ ‡ ) and generic variance. As I mentioned a while back, it turns out that it would be possible to put additional restrictions on type declarations such that nominal subtyping would become decidable, but we have not ever done this in C#. Rather, when faced with a situation that produces an infinitely long proof of type compatibility, the compiler just up and crashes with an “expression was too complex to analyze” error. I’m hoping to fix that in a hypothetical future version of the compiler, but it is not a high priority because these situations are so unrealistic.

Fortunately, situations where type analysis is impossible in general, or extremely time consuming, are rare in realistic C# programs; rare enough that we can do a reasonable job of writing a fast compiler.

Summing up: A type is an abstract mathematical entity defined by some simple axioms. The C# language has rules for associating types with compile-time expressions and storage locations, and rules for ensuring that the expression type is compatible with the storage type. The task of the compiler writer is to use those axioms and algebraic rules to construct a proof or disproof that every expression has a valid type and that every assignment obeys the assignment compatibility rules.

Though the axioms of what makes a type are pretty simple, the rules of associating types to expressions and determining assignment compatibility are exceedingly complex; that’s why the word “type” appears 5000 times in the specification. To a large extent, the C# language is defined by its type rules.

———————-

(*) I am of course oversimplifying here; more formally, the real axiom formally states that proof by induction works on natural numbers. That is: if a property is true of zero, and the property being true for a number implies the truth for its successor, then the property holds for all numbers.

(**) It is instructive to consider why the third, fourth and fifth postulates are necessary. Without the third postulate, there need only be one number: zero, which is its own successor! Without the fourth postulate we could say that there are only two numbers: the successor of zero is one, the successor of one is one. Without the fifth postulate there could be *two* zeros: “red zero” and “blue zero”. The successor of red zero is red one, the successor of blue zero is blue one, and so on. In each case, the system without the axiom satisfies the remaining four axioms, but in each case it seems very contrary to our intuition about what a “natural number” is.

(†) We fudge this in C# of course; null literals, anonymous functions and method groups are technically classified as “expressions without any type”. However, in a legal program they must occur in a context where the type of the expression can be inferred from its surrounding context.

 ) The cast operator gives the lie to this, of course; one meaning of the cast operator is “there is no compile-time proof that this is type safe; I assert that it is safe, so generate code that checks my assertion at runtime.” Also, unsafe array covariance makes this goal unreachable.

‡‡ ) Again, this is fudged in a few places. For example, it is not legal to assign an expression of type int to a variable of type short, unless the expression of type int is a compile-time constant known to fit into a short.

‡‡ ‡ ) That is, a language where you state the base type of a newly declared type. Like “interface IFoo<T> : IBar<T>” uses nominal subtyping.

What is this thing you call a “type”? Part one

The word “type” appears almost five thousand times in the C# 4 specification, and there is an entire chapter, chapter 4, dedicated to nothing but describing types. We start the specification by noting that C# is “type safe” and has “a unified type system” (*). We say that programs “declare” types, and that declared types can be organized by namespace. Clearly types are incredibly important to the design of C#, and incredibly important to C# programmers, so it is a more than a little bit surprising that nowhere in the eight hundred pages of the specification do we ever actually define the word “type”.

We sort of assume that the developer reading the specification already has a working understanding of what a “type” is; the spec does not aim to be either a beginner programming tutorial or a mathematically precise formal language description. But if you ask ten working line-of-business developers for a formal definition of “type”, you might get ten different answers. So let’s consider today the question: what exactly is this thing you call a “type”?

A common answer to that question is that a type consists of:

* A name
* A set (possibly infinite) of values

And possibly also:

* A finite list of rules for associating values not in the type with values in the type (that is, coercions; 123.45 is not a member of the integer type, but it can be coerced to 123.)

Though that is not a terrible definition as a first attempt, it runs into some pretty nasty problems when you look at it more deeply.

The first problem is that of course not every type needs to have a name; C# 3 has anonymous types which have no names by definition. Is “string[][]” the name of a type? What about “List<string[][]>” — does that name a type? Is the name of the string type “string” or “String”, or “System.String”, or “global::System.String”, or all four? Does a type’s name change depending on where in the source code you are looking?

This gets to be a bit of a mess. I prefer to think of types as logically not having names at all. Program fragments “12” and “10 + 2” and “3 * 4” and “0x0C” are not names for the number 12, they are expressions which happen to all evaluate to the number 12. That number is just a number; how you choose to notate it is a fact about your notational system, not a fact about the number itself. Similarly for types; the program fragment “List<string[][]>” might, in its context, evaluate to refer to a particular type, but that type need not have that fragment as its name. It has no name.

The concept of a “set of values” is also problematic, particularly if those sets are potentially infinite “mathematical” sets.

To start with, suppose you have a value: how do you determine what its type is? There are infinitely many sets that can contain that value, and therefore infinitely many types that the value can be! And indeed, if the string “hello” is a member of the string type, clearly it is also a member of the object type. How are we to determine what “the” type of a value is?

Things get even more brain-destroying when you think, oh, I know, I’ll just say that every type’s set of values is defined by a “predicate” that tells me whether a given value is in the type or not. That seems very plausible, but then you have to think about questions like “are types themselves values?” If so, then “the type whose values are all the types that are not members of themself” is a valid predicate, and hey, we’ve got Russell’s Paradox all over again. (**)

Moreover, the idea of a type being defined by a predicate that identifies whether a given value is of that type or not is quite dissimilar to how we typically conceive of types in a C# program. If it were, then we could have “predicate types” like “x is an even integer” or “x is not a prime number” or “x is a string that is a legal VBScript program” or whatever. We don’t have types like that in the C# type system.

So if a type is not a name, and a type is not a set of values, and a type is not a predicate that determines membership, what is a type? We seem to be getting no closer to a definition.

Next time: I’ll try to come up with a more workable definition of what “type” means to both compiler writers and developers.

————

(*) An unfortunate half-truth, as pointer types are not unified into the type system in the version of the C# language which includes the “unsafe” subset of functionality.

(**) When Russell discovered that his paradox undermined the entire arithmetic theory of Frege, he and Whitehead set about inventing what is now known as the “ramified theory of types”, an almost impossible-to-understand theory that is immune to these sorts of paradoxes. Though the mathematical underpinnings of type theory/set theory/category theory/etc are fascinating, I do not understand them nearly well enough to go into detail here. Remember, what we’re looking for is a definition of “type” that is amenable to doing practical work in a modern programming language.

The curious property revealed

Today is the fifteenth anniversary of my first day of full time work here at Microsoft. Hard to believe it has been a decade and a half of writing developer tools. I am tremendously fortunate to be able to work with such a great team on such a great toolset for such great customers. I’m looking forward to the next decade and a half!

Now back to our regularly scheduled puzzling.

There are two curious properties of the string I mentioned last time. The first somewhat curious property is that this string has the same hash code as the empty string. A reader posted a comment to that effect within a half an hour of the blog post going up.

Of course, there are only four billion possible hash codes and way more than four billion possible strings, so there have got to be collisions in there somewhere. There are infinitely many strings that have this property; this just happens to be a particularly short one.

What is much more interesting is that any number of concatenations of this string yields a string that also has the same hash code as the empty string! A number of readers discovered this property.

I hinted that the curious property was one shared by a common string. And indeed, the property of having the same hash code for arbitrarily many concatenations is also shared with the empty string. Ha ha!

I noted in my first hint that this property was platform-dependent; you need to be using a non-debug 32 bit version of CLR v4. As we document in the MSDN page, the string hashing algorithm is subject to change at any time, and in fact has changed in the past. The debug version of the CLR changes its string hashing algorithm every day, so as to be sure that none of our internal infrastructure depends on the hash code algorithm remaining stable.

What gives this particular string this bizarre property?

The internals of the hashing algorithm are pretty straightforward; we start with two integers in a particular state, and then mutate those integers in some way based on successive bytes of the string data. The algorithm works on considering two characters (four bytes) of string data at a time for efficiency. The transformation consists of some bit shifts and adds and whatnot, as you’d expect.

You can think of the hashing algorithm as a finite state machine with 64 bits of state. Each four bytes of input from the string changes the 64 bits of state. The final hashed result is derived from those 64 bits of state at the point where the input stream ends.

This particular sequence of four bytes – A0 A2 A0 A2 – has the property that it transforms the start state bits into themselves. As long as that input keeps coming in, the internals of the hash calculation never get out of the start state, and therefore must always output the same hash code as that of the empty string.

If you have a function f, and a value x such that f(x) = x, the value x is called a “fixed point” of f. What I’ve found here is technically not exactly a fixed point of the state transition function, but it has the same sort of flavour as a fixed point. Fixed points in various forms come up a lot in computer science and mathematics; they have many interesting and useful properties which I might go into in later fabulous adventures. Today though I want to think a bit about the security implications.

Essentially what we have here is a way of constructing arbitrarily many strings that all have the same hash code. (*) If a hostile client can feed many such strings into a hash table on a benign server, the attacker can essentially mount a denial-of-service attack against the server that uses the hash table. The hash table will be unable to hash the strings into different buckets, and therefore it will get slower, and slower, and slower.

It’s not much of an attack, but it is an attack. (**)

As always, when thinking about how to mitigate attacks, think about “defense in depth”. If you think you might be vulnerable to this sort of attack, make the attacker do several impossible things, not just one, to pull off the attack. For example:

* Monitor inputs to ensure that one user isn’t sending an unusually huge amount of data for processing; throttle them back if they are.

* Filter inputs for unusual characters; attacks on hash tables are likely to involve out-of-the-ordinary data that has been specially crafted. Most people aren’t going to be sending strings to a database full of Yi Script characters.

* Monitor system performance, looking for slowdowns that might be indicative of DoS attacks

* Build a custom hash table that uses its own string hashing algorithm. When you make a new hash table, randomize the starting state and the transformation of the hash.

And so on. Of course, this is just a sketch of a few ideas; if you are genuinely worried about such an attack, it is quite tricky to actually design and implement an attack-resistant hash table. Consult an expert. (I am not an expert on this!)

———–

(*) Of course there are plenty of ways to come up with hash collisions; like I said before, there are infinitely many hash collisions since there are only four billion possible hash codes. This is just a particularly cheap and easy way to come up with hash collisions.

(**) And of course an attack by hostile clients on a slow hash table is just a special case of “attacking” yourself by accidentally hitting a slow code path due to a buggy design, as I learned the hard way.

What curious property does this string have?

There are all kinds of interesting things in the Unicode standard. For example, the block of characters from U+A000 to U+A48F is for representing syllables in the “Yi script”. Apparently it is a Chinese language writing system developed during the Tang Dynasty.

A string drawn from this block has an unusual property; the string consists of just two characters, both the same: a repetition of character U+A0A2:

string s = “ꂢꂢ”;

Or, if your browser can’t hack the Yi script, that’s the equivalent of the C# program fragment:

string s = “\uA0A2\uA0A2”;

What curious property does this string have?

I’ll leave some hints below — scroll down — and post the answer next time.

Hint #1: The curious property is platform-dependent; you’ll want to be using a 32 bit version of CLR v4.

Hint #2: The curious property is also a property of a much more commonly-used string.

Hint #3: This string has the same hash code as the empty string, but that’s not the only thing that is curious about it that it has in common with the empty string.

Read-only and threadsafe are different

Here’s a common problem that we face in the compiler realm all the time: you want to make an efficient immutable lookup table for mapping names to “symbols”. This is in a sense the primary problem that the compiler has to solve; someone says x = y + z; and we have to figure out what x, y and z mean before we can do any more analysis. An obvious way to do that is to figure out all the name-to-symbol mappings for a particular declaration space once, ahead of time, stuff the results into a lookup table, and then use that lookup table during the analysis.

The lookup table can be immutable because once it is built, it’s built; no more symbols are going to be added to it. It’s going to be used solely as a lookup mechanism. A common and cheap way of doing that is to use what I whimsically call “popsicle” immutability: the data structure is a fully mutable structure until you freeze it, at which point it becomes an error to attempt to change it. This technique differs markedly from the sort of “persistently re-usable data structure” immutability we’ve talked about in the past, where you want to reuse existing bits of the data structure as you build new, different data structures out of it.

For example, we might write up a really simple little hash table that supports “freezing”. Something like this:

abstract class Symbol
{
public string Name { get; protected set; }
}
sealed class SymbolTable
{
private bool frozen = false;
private class BucketListNode
{
   public Symbol Symbol { get; set; }
   public BucketListNode Next { get; set; }
   public BucketListNode Prev { get; set; }
}
private BucketListNode[] buckets = new BucketListNode[17];
private int GetBucket(string s)
{
   return Math.Abs(s.GetHashCode()) % buckets.Length;
}
public void Add(Symbol symbol)
{
   // Omitted: error checking for null symbol, null name,
// duplicated entry, and so on.
   if (this.frozen)
     throw new InvalidOperationException();
   int bucket = GetBucket(symbol.Name);
   var node = new BucketListNode();
   node.Symbol = symbol;
   node.Prev = null;
   node.Next = buckets[bucket];
   if (node.Next != null)
   node.Next.Prev = node;
   buckets[bucket] = node;
 }
public void Freeze()
 {
   this.frozen = true;
 }
 public Symbol Lookup(string name)
 {
   // Omitted: error checking
   int bucket = GetBucket(name);
   for (var node = buckets[bucket]; node != null; node = node.Next)
   {
     if (node.Symbol.Name == name)
       return node.Symbol;
   }
   return null;
 }
}
sealed class SymbolTable
{
private bool frozen = false;
private class BucketListNode
{
   public Symbol Symbol { get; set; }
   public BucketListNode Next { get; set; }
   public BucketListNode Prev { get; set; }
}
private BucketListNode[] buckets = new BucketListNode[17];
private int GetBucket(string s)
{
   return Math.Abs(s.GetHashCode()) % buckets.Length;
}
public void Add(Symbol symbol)
{
   // Omitted: error checking
   if (this.frozen)
     throw new InvalidOperationException();
   int bucket = GetBucket(symbol.Name);
   var node = new BucketListNode();
   node.Symbol = symbol;
   node.Prev = null;
   node.Next = buckets[bucket];
   if (node.Next != null)
     node.Next.Prev = node;
   buckets[bucket] = node;
}
public void Freeze()
{
  this.frozen = true;
}
public Symbol Lookup(string name)
{
   // Omitted: error checking
   int bucket = GetBucket(name);
   for (var node = buckets[bucket]; node != null; node = node.Next)
   {
     if (node.Symbol.Name == name)
       return node.Symbol;
   }
   return null;
}
}

Very simple. We have an array of seventeen doubly-linked lists. We balance the table by choosing which of the seventeen linked lists to go with by hashing the name of the symbol. That way our lookup cost is one-seventeeth the cost of doing a straight lookup in a complete list.

Now of course there are other ways to do this efficiently. We could be sorting the list of symbols and doing a binary search. We could be re-hashing into a larger array of bucket lists if it turns out that there are thousands of symbols in this table. For the sake of argument, let’s suppose that we’ve decided to go with a fixed-number-of-buckets hashing approach, as we’ve done here.

One of the much-touted benefits of immutable data structures is that they are “threadsafe”. Since they cannot be written to, you’ll never run into a situation where a write operation is interrupted halfway by a thread switch, causing another thread to read an inconsistent data structure. However, it is a fallacy to believe that just because a data structure does not admit any way for you change its contents, its implementation must be threadsafe!

Suppose for example we do a real-world performance analysis of our little table here and discover that in practice, this sort of pattern comes up a lot in our customers’ code:

    x.Foo();
    x.Bar();
    x.Blah();
    y.Abc();
    y.Def();
    y.Ghi();

That is, the same identifier is looked up multiple times in a row in the context of a particular symbol table. We might decide to add a little optimization to our table’s lookup method:

for (var node = buckets[bucket]; node != null; node = node.Next)
{
  if (node.Symbol.Name == name)
  {
    MoveToFront(bucket, node);
    return node.Symbol;
  }
}
...
private void MoveToFront(int bucket, BucketListNode node)
{
  if (buckets[bucket] == node)
    return;
  node.Prev.Next = node.Next;
  if (node.Next != null)
    node.Next.Prev = node.Prev;
  node.Prev = null;
  node.Next = buckets[bucket];
  node.Next.Prev = node;
  buckets[bucket] = node;
}

We have essentially tweaked our hash table to have an array of Most Recently Used Lists, rather than merely an array of Doubly Linked Lists. That might be a performance win in the case where the lists get long and the same identifiers tend to be looked up in clusters. (It is enough of a real-world win that in the VBScript engine, the lookup tables use a variant of this optimization.)

But you see what we’ve done here? Reading the table now sometimes mutates its interior structure, and that mutation is not threadsafe! Even though there is no way for the user to logically change a frozen table, we do not ensure that reading the data is threadsafe.

In practice, most data structures do not do mutations on reads, but you cannot rely upon that unless it is documented. For example, the documentation for the Dictionary class notes that a Dictionary is threadsafe for multiple readers so long as there are no writers (though of course the actual enumerators are not threadsafe objects, as they are constantly mutating; rather, the collection being enumerated is threadsafe). Unless the author of a particular object makes a guarantee to you like that, you should assume that none of the operations are threadsafe, even if they appear to be read-only.


Original:

https://web.archive.org/web/20150523082529/http://blogs.msdn.com/b/ericlippert/archive/2011/05/23/read-only-and-threadsafe-are-different.aspx

Implementing the virtual method pattern in C#, part three

Last time we saw how you could emulate virtual methods in a language that only had static methods by creating fields of delegate type, and then choosing what delegates go into the fields. However, this is not very space-efficient. Suppose there were a hundred virtual methods on Animal instead of two. That means that every class derived from Animal has a hundred fields, and in most of them, those fields are exactly the same, all the time. You make three hundred giraffes and each one of them will have exactly the same delegates in those hundred fields. This seems redundant and wasteful.

What the CLR actually does is it bundles those up into a virtual function dispatch table, or “vtable” for short. A vtable is a collection of delegates; the purpose of the vtable is to answer the question “if a virtual method was invoked on an object of this runtime type, which method should be called?”

Continue reading