Inferring from “is”, part two

Posted on October 22, 2015 by ericlippert

In part one I gave a bunch of reasons to reject the proposed feature where the compiler infers additional type information about a local variable when inside the consequence of a conditional statement:

if (animal is Dog)
{
    animal.Bark(); 
    // instead of ((Dog)animal).Bark();
}

But the problem still remains that this is a fairly common pattern, and that it seems weird that the compiler cannot make the necessary inference.

A number of readers anticipated my denouement and made some of the same proposals I was going to make in this part. Let’s go through a few of them; see the comments to the previous article for a few more.

First, allow the if statement to declare a new variable (as the for, foreach and using statements already do.) Some proposed syntaxes:

if (var dog when animal is Dog)
  dog.Bark();

if (Dog dog from animal)
  dog.Bark();

if (animal is Dog dog)
  dog.Bark();

If I recall correctly, the last there has been proposed for C# 7.

One of the hardest aspects of language design is deciding how general a feature should be. Should the syntaxes proposed above be restricted to the if statement? Can we add a larger, more orthogonal feature to the language and make the whole language more powerful? Suppose var dog when animal is Dog is simply a Boolean expression with the semantics of “declare a local variable of appropriate scope, initialize the local variable appropriately, the value of the expression is the value produced by the is subexpression.” Then you could use this construct in other locations. But that then raises other problems, as a commenter noted.

if (foo || var dog when animal is Dog)
  dog.Bark();

If that’s an expression, then it can be the right side of a logical operator, and therefore might not be evaluated! Should this be an “use of uninitialized variable” error? Seems likely. But these are solvable problems.

I want to get back to the idea of generality though. If the feature is to allow a variable to be introduced in an expression and produce a value, then I say let’s just go all the way. (Something like this was proposed for C# 6, but was unfortunately cut.)

if ((var dog = animal as Dog) != null)
  dog.Bark();

Make a local variable declaration with an initializer an expression whose value is the value that was assigned to the variable. (Note, not the value of the initializer; that might be of a type different than the variable!)

There are a few tricky cases you have to consider here regarding what exactly is the scope of the variable depending on where it was declared lexically, and I’m not going to go into those today. Basically the idea here is to solve the problem by declaring a new variable that is clearly of a particular type. However there are other ways to solve the problems we raised last time.

Many of those problems arose from the fact that a variable can change; variables vary. But C# does have a few mechanisms whereby variables can be introduced that change only once, and are treated as values. readonly fields are the obvious example, which are variables only in a constructor, and values otherwise. The “variables” introduced by foreach and using and let in a query also cannot be changed, passed by ref and so on. This has always bugged me, because of the lack of generality here. One of the few features Java has that C# lacks is “final” local variables. C# of course has const locals and fields, but they can only be initialized to compile time constants.

The argument against adding readonly locals to C# is that the feature is unnecessary. Locals have local scope, obviously. The region of code in which the name is valid is of a size of your choosing, and ideally that size is small enough that you can easily know whether the local is written more than once. If you choose to write exactly once, that’s your choice; there’s no need to have the compiler there to enforce that decision. I used to like that argument, but I am liking it less and less as time goes on.

Readonly locals allow the developer to express to the compiler “this variable is actually a named value, not a variable; an attempt to use it as a variable is wrong, and changing the code to make it a variable may be a breaking change, the costs of which I am willing to bear in the future should it become necessary; please feel free to introduce as many optimizations as you like assuming that this is a value, not a variable.”

So I really like readonly locals as a proposed future feature; when combined with the “declare a variable in an expression” feature, it gets even more useful. C# could really use something like a query let that works everywhere.

A third proposal is to introduce pattern matching / type switching / etc. A commenter points out that Nemerle uses:

match (animal)
{
  | dog is Dog => dog.Bark()
  | cat is Cat => cat.MaintainDignity()
}

I’m not super thrilled with the punctuation there but I like the general idea. There are proposals for adding sophisticated pattern matching to C# that I think I will deal with at another time.

18 thoughts on “Inferring from “is”, part two”

Jacob on October 22, 2015 at 9:22 am said:

why not just simply:

(animal as Dog)?.Bark()

Reply ↓
- fizixman on October 22, 2015 at 10:10 am said:
  
  Unfortunately, this is only good for one-liners. Also, if your intent was to pass it as an argument into another method rather than invoke a method on the dog/animal, it doesn’t help.
  
  Reply ↓
pete.d on October 22, 2015 at 9:24 am said:

I have always been hoping for readonly local variables, for the reasons stated in the article. I still hope for them.

As for being able to declare variables in expressions, that seems a lot less useful to me. It doesn’t do anything I couldn’t do almost as easily otherwise and there’s a host of other features I’d rather see added to C# than that. It also seems like an obvious step in the opposite direction of simply disallowing assignments in “if” statements (Mr. Lippert has written in the past arguing against the practice of using of assignments in “if” statements, for reasons which I agree with even as I do occasionally violate the advice 🙂 ).

But more importantly, it does not seem like the feature to allow the declaration of a new variable in an expression meets the usual “feature cost/benefit” analysis bar.

Reply ↓
- John Payson on October 27, 2015 at 9:08 am said:
  
  There are a lot of cases where code ends up either needing lots of “throwaway” variables, or having to make lots of calls to 1-3 line functions. While some people like chopping methods up into lots of little tiny pieces, I would suggest that in many cases it’s better to read the code in the context where it’s used than to see a method name where the code is used and have to either guess what it means or jump to some entirely different spot in the file where the method is defined so as to read the actual definition.
  
  That having been said, what I’d like to see would be a special syntax for declaring “throwaway variables”, with the rule that (1) names of throwaway variables could be reused, (2) redeclaration of a name within an inner scope would invalidate the name in outer scopes, such that any reference to such a variable could not identify anything other than the previous definition in source-code order.
  
  Given something like:
  
  someType foo;
  
  foo=getFoo(wizzle);
  doSomething(foo.baz+boo, foo.boz+goo);
  
  … unrelated code…
  
  foo=getFoo(wuzzle);
  doSomething(foo.baz+moo, foo.boz+zoo);
  
  the code only declares “foo” once, but semantically the code uses two variables–one of whose lifetime starts at the first “getFoo” and ends at the first call to “doSomething”, and the other of which starts at the second “getFoo” and ends at the second call to “doSomething”. IMHO, there should be a way for code to use separate variables where without having to come up with new names each time [requiring different names increases the likelihood of copy/paste errors, but the two-liner of getting a foo and using it would seem a bit short to justify placing it in its own method].
  
  Reply ↓
  - treed on October 27, 2015 at 9:40 am said:
    
    You could try introducing a new scope within the function itself. You just need to add in some curly braces to make a new block, and declare the variable within the new block. This will have 2 declarations of the throwaway variable, but it won’t escape the new blocks, and you can re-use the name.
    
    public void blah()
    {
    {
    someType foo=getFoo(wizzle);
    doSomething(foo.baz+boo, foo.boz+goo);
    } // foo goes out of scope here
    
    … unrelated code…
    
    {
    //same name, new scope, no clashes
    someType foo=getFoo(wuzzle);
    doSomething(foo.baz+moo, foo.boz+zoo);
    }
    }
    
    Reply ↓
Eldc on October 22, 2015 at 10:12 am said:

Readonly locals are great when you are refactoring. I often change complex code by applying a series of very simple code transformations that I know preserve semantics. Sometimes the transformation is only valid if I can prove that no one changed a given variable. To do that, I have to find all the references to it and verify them one by one, visually, with a high risk of human error.
With readonly locals, that whole process is replaced by just checking that the readonly keyword was used at the declaration. If it was, we’re instantly done, if it wasn’t, just add it and check that no compile errors appear. End result: get things done faster and with less human errors.

Reply ↓
- Sahuagin on October 23, 2015 at 9:04 am said:
  
  just fyi, while it’s not exactly readonly locals, resharper does have the ability to color muted locals differently than non-muted ones. for example in my dark theme in VS2013, I have non-muted ones as a dull grey, and muted ones as bright white. as soon as I give a variable a second value, it goes from grey to white and I instantly see that it has become a muted variable. (note 100% sure “muted” is a word, but it seems to fit better than “mutable” here since both types are technically “mutable”.)
  
  Reply ↓
  - pete.d on October 23, 2015 at 9:29 am said:
    
    “Mutable” is derived from the regular verb “mutate”. So one can just use “mutated” as an adjective to describe a variable that has been modified. I would recommend that word, rather than “muted”.
    
    I would advise against using the term “muted”, as that’s a completely different English word, meaning something completely different (i.e. “silent; reduced volume or intensity”).
    
    Reply ↓
  - Eldc on October 24, 2015 at 3:39 pm said:
    
    Interesting… I generally avoid VS extensions because VS is already slow and buggy enough without them, so I haven’t tried using Resharper since a while ago. I should take a fresh look at what it does now.
    
    Reply ↓
gregsdennis on October 22, 2015 at 10:16 am said:

I wonder what all of this compiler inference does to the language. I mean, we have a lot of inference in C# already, and we deal with it. We learn to read it and understand what it means. But how far can we take this before the code really becomes unreadable?

One argument against it becoming too much is that the compiler has to understand it, so there are specific rules. But I still fear that C# is becoming too much of an abstract, “this really means that is going on” type of language.

I like some explicivity in my code. It goes back to the idea that code should be self-documenting (to a degree).

Reply ↓
doctorloser on October 22, 2015 at 2:20 pm said:

To go sideways on this issue, and considering that C# as of 5.0 is about as close to being a feature-complete language as one can imagine, can I ask the following question?
At what point in the evolution of a computer language do we start to talk about “syntactic sugar?”
To me, a language construct that saves me from casting to (Dog) in the very, very few cases that I need to cast to (Dog) is essentially a complete waste of time and space.
On yet another tangent, I suspect that the pattern-matching proposed for C# 7.0 (disclaimer: I have no idea how this is proposed to be implemented) might very well eliminate the few use cases where the is/as/cast tergiversations might help.
This is a fun discussion, however. Kudos to all contributors.

Reply ↓
John Payson on October 22, 2015 at 2:51 pm said:

In C# it is possible for a procedure to have an alias for a variable in its caller, via a byref parameter. It’s also possible for a method with byref parameter to have a different view of that parameter’s type than the caller had of the passed-in variable. For example, one could have a static method CallDispose<T&gt(ref T it) where T:IDisposable which calls `IDisposable.Dispose` on T even if implements that method explicitly and doesn’t expose it via its own type.

Further, if one does use static delegates it’s possible to have a function with a generic type T pass T as a ref parameter to a method which applies a stronger generic constraint than the caller. I’m not sure of CIL would allow any means to accomplish that without using delegate dispatch, but I’ve used the technique to implement a generic “HasAnyFlag” which is usable on classes derived from System.Enum and performance is pretty good, without boxing. A similar approach would seem usable here. If one uses that approach, one needn’t worry about keeping the “variables” in sync because the byref would identify the original variable.

Reply ↓
Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1953
Pingback: Dew Drop – October 23, 2015 (#2118) | Morning Dew
defaultex on October 24, 2015 at 3:22 am said:

Looking at the proposal for C# 6 brings to mind two theoretical snippets that seem like they are legal syntax currently.

while ((var dog = animal as Dog) != null) {
dog.Bark();
break;
}

or

for (var dog = animal as Dog; dog != null; ) {
dog.Bark();
break;
}

Reply ↓
shaun on October 26, 2015 at 11:18 am said:

One of the things I really like about Rust is its pattern matching. I would love to see that in C# as well.

Reply ↓
Pingback: Visual Studio – Developer Top Ten for Oct 28th, 2015 - Dmitry Lyalin
Pingback: Inferring from “is”, part one | Fabulous adventures in coding