What is up with transparent identifiers? Part two

This will be my last post before I head off for my annual vacation in Canada; see you again in September for more Fabulous Adventures in Coding!

Last time on FAIC I suggested a rule for translating nested “from” query expressions into a much simpler form than the C# specification requires. Why does the C# specification not use my simplified form?

In fact what I showed yesterday is pretty close to what the LINQ translation rules for SelectMany queries looked like shortly before shipping C# 3.0. The problem with it becomes apparent when you consider the following:

from q in items
from r in a(q)
from s in b(q, r)
from t in c(q, r, s)
from v in d(q, r, s, t)
select e(q, r, s, t, v)

How is this translated in the scheme I laid out last time? We apply the SelectMany rule four times and the Select rule once to obtain:

items.SelectMany(
  q => a(q).SelectMany(
    r => b(q, r).SelectMany(
      s => c(q, r, s).SelectMany(
        t => d(q, r, s, t).Select(
          v => e(q, r, s, t, v))))))

The range variables are all in scope exactly where they need to be and this query has the desired semantics so we’re good on that front. But now consider the problem that overload resolution has to solve. Go back and read my 2007 article on overload resolution problems for nested lambdas. I’ll wait.

Welcome back. Now you know that since there are in fact four SelectMany overloads and two Select overloads we have to consider all 4 x 4 x 4 x 4 x 2 possibilities, which is only 512; that’s totally doable. But now suppose that we wanted a query like this nested ten or fifteen deep: the number of possibilities that have to be evaluated grows exponentially. Also, even if we limited the number of overloads in LINQ-to-objects, there might be many overloads in other LINQ pattern implementations.

The C# testing and development team quickly discovered that even simple, realistic queries that had a lot of nested from clauses could cause the IDE to bog down or the compiler to run out of memory.

This was the problem we faced before C# 3.0 shipped. Let’s take a step back here, and remember the problem we’re trying to solve. We need to have all those range variables in scope in the right places, but now we want to do so without nesting any lambdas. Therefore we need a new scoping mechanism. Transparent identifiers are that mechanism! In a world where we have transparent identifiers, we do the translation as follows:

from q in items
from r in a(q)
from s in b(q, r)
from t in c(q, r, s)
from v in d(q, r, s, t)
select e(q, r, s, t, v)

first becomes

from * in items.
  SelectMany(
    q => a(q), 
    (q, r) => new {q, r})
from s in b(q, r)
from t in c(q, r, s)
from v in d(q, r, s, t)
select e(q, r, s, t, v)

which becomes

from ** in items.
  SelectMany(
    q => a(q), 
    (q, r) => new {q, r}).
  SelectMany(
    * => b(q, r), 
    (*, s) => {new {*, s}))
from t in c(q, r, s)
from v in d(q, r, s, t)
select e(q, r, s, t, v)

I’ll skip a few steps; you see how this goes. We end up with:

items.
  SelectMany(
    q => a(q), 
    (q, r) => new {q, r}).
  SelectMany(
    * => b(q, r), 
    (*, s)=>{new {*, s})).
  SelectMany(
    ** => c(q, r, s), 
    (**, t) => new {**, t})).
  SelectMany(
    *** => d(q, r, s, t), 
    (***, v) => e(q, r, s, t, v))

And now we desugar all the transparent identifiers:

items.
  SelectMany(
    q => a(q), 
    (q, r) => new {q, r}).
  SelectMany(
    d1 => b(d1.q, d1.r),
   (d1, s) => {new {d1, s})).
  SelectMany(
    d2 => c(d2.d1.q, d2.d1.r, d2.s),
   (d2, t) => new {d2, t})).
  SelectMany(
    d3 => d(d3.d2.d1.q, d3.d2.d1.r, d3.d2.s, d3.t), 
    (d3, v) => e(d3.d2.d1.q, d3.d2.d1.r, d3.d2.s, d3.t, v))

And hey presto, no nested lambdas, so this can be resolved very quickly with overload resolution.

This same technique was then applied to let and join clauses which introduce new range variables, to ensure that they did not also cause big blowups in the amount of lambda nesting during translations.

12 thoughts on “What is up with transparent identifiers? Part two”

Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1665

Pingback: Dew Drop – August 4, 2014 (#1828) | Morning Dew

This kind of translation might make the compiler faster but it also makes writing a LINQ provider harder as well. (On a side not, I also like transparent identifiers more, because the code is faster by about 2x constant factor).

The fact that the compiler team has managet to resolve the N1 x N2 x N2 … x Nm overload resolution problem by simply rewriting the query, this means that the overload resolution itself can use the same aproach instead of brute forcing all posible combinations.

Instead of trying to resolve all posible combinations for the inner most lambda, resolve one parameter a time from the outermost lambda to the innemost propagating the posible types down to the inner most lambda. This way the posible types for each parameter is resolved one at a time (just like using trasparent idenfiers, which makes overload resolution sequencial).

In your example:

items.SelectMany(
q => a(q).SelectMany(
r => b(q, r).SelectMany(
s => c(q, r, s).SelectMany(
t => d(q, r, s, t).Select(
v => e(q, r, s, t, v))))))

T is candidate type for the lambda q => … if it is also a candidate for a(q), b(q,r), c(q,r,s), d(q,r,s,t) and e(q,r,s,t,v). In this example T will most likely have one type (the item type of items source) which leads to only one candidate for the type of q (if there are more candidates it’s the same)). Based on this information overload resolution can be performed on a(q) which should produce at most one type candidate for each type candidate of q. In this case T1. then the type candidate for r becomes T1 if T1 is also a candidate for b(q, r), c(q,r,s), d(q,r,s,t), e(q,r,s,t,v). etc. This means wrong candidates for a type are found after at most N-level searches in dept. So if a(q) has type candidates [T1, T2] but T2 does not satisfy e(q, r, s, t, v) then imediatly after the search is done (n-2)^(n-1) wrong combinations are eliminated.

Hope I’ve managed to make this somewhat clear.

Reply ↓

You know, this blog has the strangest and most random ads of any blog I read.

It would be nice if I had some great insight on the actual subject matter of the post – which I found interesting and informative, thank you – but this is what I’ve got.

Reply ↓

Phil on August 6, 2014 at 10:51 am said:

Ads? I use AdBlock+ and NoScript, and I don’t remember what ads look like …

Reply ↓
- Joshua on August 6, 2014 at 6:28 pm said:
  
  I’m at work, and there are pretty severe limits on how I can administer my own machine, alas. I’m just glad I’m not using telnet to read this blog.
  
  Reply ↓
ericlippert on August 18, 2014 at 11:09 am said:

I agree, the ads served by WordAds are bizarre compared to those served by AdWords. WordAds, which pays per impression, actually generates enough revenue to cover my hosting costs. AdWords pays per click, generated less than half the revenue WordAds generates, and was constantly serving up tasteless, offensive ads with bikini-clad models selling software despite my best efforts to suppress them.

Reply ↓