Restating the problem

A problem statement:

I am trying to loop though a sequence of strings. How can I determine when I am on the last item in a sequence? I don’t see how to do it with foreach.

Indeed, foreach does not make it easy to know when you are almost done. Now, if I were foolish enough to actually answer the question as stated, I’d probably say something like this:

You can do so by eschewing foreach and rolling your own loop code that talks directly to the enumerator:

IEnumerable<string> items = GetStrings();
IEnumerator<string> enumtor = items.GetEnumerator();
if (enumtor.MoveNext())
{
  string current = enumtor.Current;
  while(true)
  {
    if (enumtor.MoveNext())
    {
      // current is not the last item in the sequence.  
      DoSomething(current); 
      current = enumtor.Current;
    }
    else
    {
      // current is the last item in the sequence. 
      DoSomethingElse(current); 
      break;
    }
  }
}
else
{
    // handle case where sequence was empty
}

Yuck. This is horrid. A fairly deep nesting level, some duplicated code, the mechanisms of iteration overwhelm any semantic meaning in the code, and it all makes my eyes hurt.

When faced with a question like that, rather than writing this horrid code I’ll usually push back and ask “why do you want to know?”

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”.  After each item I want to place a comma except not after the last item. So I need to know when I’m on the last item.

Well knowing that certainly makes the problem a whole lot easier to solve, doesn’t it? A whole bunch of techniques come to mind when given the real problem to solve:

First technique: find an off-the-shelf part that does what you want.

In this case, call the ToArray extension method on the sequence and then pass the whole thing to String.Join and you’re done.

Second technique: Do more work than you need, and then undo some of it.

Using a string builder, to build up the result, put a comma after every item. Then “back up” when you’re done and remove the last comma (if there were any items in the sequence, of course).

Third technique: re-state the problem and see if that makes it any easier.

Consider this equivalent statement of the problem:

I am trying to walk though a sequence of strings and build a string like “a,b,c,d”. Before each item I want to place a comma except not before the first item. So I need to know when I’m on the first item.

Suddenly the problem is much simpler. It’s easy to tell if you’re on the first item in a sequence by just setting an “at the first item” flag outside the foreach loop and clearing it after going through the loop.

When I have a coding problem to solve and it looks like I’m about to write a bunch of really gross code to do so, it’s helpful to take a step back and ask myself “Could I state the problem in an equivalent but different way?” I’ll try to state a problem that I’ve been thinking about iteratively in a recursive form, or vice versa. Or I’ll try to find formalizations that express the problem in mathematical terms. (For example, the method type inference algorithm can be characterized as a graph theory problem where type parameters are nodes and dependency relationships are edges.) Often by doing so I find that the new statement of the problem corresponds more directly to clear code.


Commentary from 2020

I should have anticipated when writing this that commenters would not focus on the general point I was making — that reformulating a specific problem can lead to a more elegant solution — but would rather get nerd sniped into solving the specific problem. I mean, it was 2009. I had read the comments to Jeff’s 2007 article on FizzBuzz, and I knew that, as Jeff said “it’s amusing to me that any reference to a programming problem immediately prompts developers to feverishly begin posting solutions.”

Unsurprisingly, the comments to this article were mostly alternative solutions for the comma-separated string problem, and then critiques of those solutions. I embraced this in my follow-up post, Comma Quibbling, where I explicitly solicited solutions to a harder version of the problem.

A selection of the solutions left in comments and some critiques is below.


Note that at the time this was written, string.Join did not take a sequence, it took an array. My preferred solution suggested by a reader would be to write your own extension method that has the pleasant interface you want:

public static string Join(
  this IEnumerable<string> source, string separator)
{
  return String.Join(separator, source.ToArray());
}

But I usually go one step farther and write something like:

public static string Comma(this IEnumerable<string> source)
{
  return source.Join(",");
}

Other solutions that were suggested and critiqued:

list.Aggregate((x, y) => x + "," + y) is a Schelmiel solution.

list.Skip(1).Aggregate(
  new StringBuilder().Append(list.First()),
  (buf, e) => buf.Append(",").Append(e),
  buf => buf.ToString());

is a non-Schlemiel solution that builds an iterator twice, which is considered inelegant.

This terrible solution calls a potentially linear count method twice on every iteration and is a Schlemiel solution:

string newString = string.Empty;
for (int i = 0; i < items.Count(); i++)
   newString += items[i] + (i == items.Count() - 1 ? string.Empty : ",");

If we know that the items are not empty strings, which seems reasonable, then we can use the length of the accumulated buffer as a “not first” flag:

StringBuilder sb = new StringBuilder();
foreach (var item in items)
{
 if (sb.Length > 0)
   sb.Append(",");
 sb.Append(item);
}

We can also avoid a flag entirely by doing an extra empty-string append:

StringBuilder sb = new StringBuilder();
String comma = "";
foreach(item in list) {
 sb.Add(comma);
 sb.Add(item);
 comma = ",";
}

A number of commenters pointed out that this problem becomes much easier in languages where lists are typically in head::tail form rather than arrays; one commenter even went so far as to implement an immutable stack in their comment! (I know how to write an immutable stack but thanks for the effort.) I’m not sure why this fact is relevant.

One commenter stated that the best solution is to treat the comma as coming before every item but the first, not after every item but the last, and gave a solution for that. The same commenter then pointed out in a follow-up comment that he had written the comment before reading the entire post. Since the title of the piece is “restating the problem” I suspect that this commenter also did not read the title before banging out a solution.

My favourite comment though was a link to Jon Skeet’s 2007 article where he solves the more general problem in an elegant way. If you want to enumerate a sequence and do something different if you are on the first or last item, just make an interface that provides that information. We are computer programmers; making abstractions that solve the stated problem is what we do.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s