DRY out your policies

Occasionally I’m asked to review code that has a lot of repetition in it. Like, for instance, someone is writing a function memoizer:

static Func<A, R> Memoize<A, R>(this Func<A, R> function)
{
  var cache = new Dictionary<A, R>();
  return argument =>
  {
    R result;
    if (!cache.TryGetValue(argument, out result))
    {
      result = function(argument);
      cache[argument] = result;
    }
    return result;
  };
}

And then they will write remarkably similar code for functions of two arguments, functions of three arguments, functions of four arguments, and so on. “I want my code to be DRY; how can I avoid repeating myself?” is what the developer asking for review usually says.

DRY of course means “Don’t Repeat Yourself”. In this case I simply deny the premise of the question. This sort of code is WET: We Enjoy Typing. There’s no good way in the C# type system to make abstractions across generic types that have a varying number of type arguments. In this case you’re just going to have to grit your teeth and deal with it. And besides, it’s not that much typing.

But more generally, I would say that the developer asking for review is misunderstanding the true value of the DRY principle. The fundamental idea of Don’t Repeat Yourself is not “never write similar code twice”, but rather, as Wikipedia helpfully points out “every piece of knowledge must have a single, unambiguous, authoritative representation in the system”. What does that have to do with avoiding cutting-and-pasting to make a half-dozen slightly different function memoizers? Really, not much at all. Those things are not themselves representations of knowledge; they’re the springs and gears of the machine that represents the knowledge.

The real value of DRY is not in the mechanism domain of the program at all; the underlying computational machinery, like our memoizer above, is what I mean by the mechanism. You can tell it is a mechanism because this code is not part of the business domain of the program; this memoizer might be used in everything from a video game to online banking software. In the business domain of those programs — the wizards and lasers, or the accounts and debits — that is where you want to ensure that all the knowledge in the system has exactly one clear representation, because that’s the stuff that is likely to change due to user requirements.

In particular, when applying DRY to my own code I think hard about DRYing out the “policy” code: the code that determines the rules of the business domain of the program. If I want to add a new rule about how wizards interact with lasers, I want to do that in as few places as possible, and ideally only one.

A good way to tell whether policy code is insufficiently DRY is if any of the policy code contains comments like “if you change something here, remember to change it elsewhere too”. Back in the pre-Roslyn days of the C# compiler team we had this problem all the time. There would be multiple (and often slightly inconsistent) implementations of, say, the method that computes “is this expression convertible to this type?” — as clear an example of “policy” code as you can imagine. There would be comments in the code saying “if you add a new kind of conversion here, don’t forget to add it in the IntelliSense code as well”. Because of course we enjoy typing in everything twice, writing the test cases twice, regression testing everything twice, … That’s where DRY saves real time, effort and money; spending extra developer effort to avoid a few hundred duplicated keystrokes in a mechanism somewhere simply isn’t worth it most of the time.

A commenter asks when is it worthwhile to use a templating language or some such thing to generate highly repetitive code. This is a judgment call; for me, it comes down to the question of first, just how much mechanism code is being produced, and second, what is the likelihood that it is going to change? We used the Visitor Pattern extensively in Roslyn, and it is famous for having absolutely huge amounts of boilerplate code in the visitor base classes. We knew that we would be adding new syntax nodes all the time during the creation of Roslyn, and that even after it was done there would be new nodes to visit in the future, so I wrote up a trivial little code generator that took a list of node types written in XML and produced the visitor base classes automatically when Roslyn was rebuilt.

42 thoughts on “DRY out your policies

    • Out of curiosity: Why use tuples at all, if you use them essentially as collections?

      About the post itself:
      Thanks for pointing me at the Visitor Pattern, Eric. I was not aware of it by that name and it seems quite interesting.

      • Mainly curiosity.Wanted to experiment to learn. I once was forced by a bright mind to use a Tuples library on .NET 3.5 which has a different API but has an indexed property. Perhaps usefull when used as vectors. I wondered whether I could bolt it on the standard one. Learned a lot about special situations in .NET, C# and VB.NET. Still wonder about some choices. Recently found time to to expand and implement an Either struct, aka union type aka crossover between Tuples and Nullable. A type which has at most one valid value. Ran into more problems and made a guess why Item1, Item2 etc are used to acces the values on tuples. Besides limitions of .NET I think mainly performance.

    • I’d not even use tuples. Tuples don’t communicate your intent. It doesn’t says what it holds. While reading code, how can you know what does `tuple.Item2` mean? you need to goto the place where you’ve populated the Tuple to see what does it holds. Or at least you need the IDE help to know the type parameter of `Item2` to know little more about it.

      I’d recommend not to use tuples at all and create your own named type (typically a class).

  1. I agree that repetitious boilerplate which does not entail “if you change this here, change it here too” is less of a DRY issue than having a policy implemented in multiple places. On the other hand, if certain patterns get repeated an awful lot, I’d say that’s a pretty good sign that the language or Framework should have something to handle it. If a compiler gets used by people writing 1,000,000 programs, and 5% of those programs would use a particular pattern, including it once in the compiler will allow it to be shaved out of 50,000 programs.

    Also, how do you suggest dealing with situations where the same general policy needs to be dealt with in two places, but the corner cases are slightly different? How do you decide whether to use two methods for the policy, use one method but have a parameter to indicate which set of corner-case behaviors to apply, or do something else?

    • I try hard to avoid *public* methods that take a Boolean which switches on a particular behavior. The call site either ends up unreadable: CreateAmortizationTable(principal, interest, true);, or you end up putting in a comment or using an auxiliary variable: CreateAmortizationTable(principal, interest, true /* compound monthly */); If it is truly a binary choice then it might be wise to make two methods; if we have a policy that just happens to be binary right now but might be different, then I might use an enum.

      I relax a little bit when the method is private because private methods are for the benefit of the expert on the implementation of the code, not for the benefit of the user of the library, so you can assume more knowledge in the reader. But still I try to avoid them.

  2. There are policy decisions in this code, too, though, like using a dictionary as the cache, and not making the memoized function threadsafe. If you wanted to change those policy decisions later, and you’ve got 10 copies of this code, you’ve got to make the same change in ten places.

    Definitely agree with Hans that there’s an argument for using codegen, such as through T4 templates, as a way to solve this sort of limitation.

      • I think James has a good point here. The “business domain” of this code is the cache policy; if we decided that instead of dictionaries that grow without bound, we should instead use a MRU list with ten elements, then we’d have to change that in as many places as we used dictionaries.

  3. What about things like argument checks. or other low level dependencies. I now use Code Contacts which does not compile out of the box in VS 2015. Should I wrap something like code contracts or write my own probably wrong version?

  4. I’ve seen people jump through all sorts of tortured hoops to avoid writing two superficially similar pieces of code. Frankly, I’d rather have a bit of duplication because it’s easy for me to see and understand, and therefore maintain.

    • I find that 99% of the time, non-duplicated code is “easier to maintain and understand” than duplicated code. It’s similar to the principle “encapsulate what changes”. The only thing that should be repetitious are the parts that change. The static parts should be written once.

      I think Eric is not so much making an argument that WET is in some cases better, as he is making an argument that WET is “good enough”, whereas DRY is not as important in all cases. *If* we could make his example DRY, it would just be better, wouldn’t it?

      • Sure, that’s pretty much how I was interpreting Eric’s post.

        And I wasn’t implying that duplicated code is better than non-duplicated code, but rather that simple duplicated code is (much!) better than complicated non-duplicated code.

        In other words, if I was asked to review a change that reduced duplication but increased complexity, I would reject it. If I was asked to review a change that reduced duplication but didn’t increase complexity, I’d accept the change and congratulate the developer on a job well done.

        • Do you have an example of something that you would reject? Removing duplication is generally going to always be more “complex”, since it involves extracting methods, extracting classes, adding parameters or overloads, etc.

          • Maybe simple vs complex / complicated is not the right terminology. I like Rich Hickey’s “Simple Made Easy” talk where it leads with a “simple = easy” and “complicated = hard” explanation.

            So, perhaps I could have explained it better if I’d said “remove the duplication, as long as it’s still at least as easy to understand”.

            I can’t give a “real” example from the production code that I’ve been working on because it belongs to my employer (and wouldn’t serve as an example anyway). However, I’ve tried to put together a trivial example: https://gist.github.com/JohnLudlow/1ebf112016d79557357a#file-dup

            Consider the two methods, M1 and M2. Suppose someone took a method that looks like M1, and changed it such that it looked like M2. I would, depending on some things, reject that, because I don’t consider M2 to be at least as easy to read as M1. I think it’s harder to read – trivially harder, because this is a trivial example, but harder.

            In a real example, the difference may be greater. There are also ways to rewrite M2 such that it’s much clearer. Write it in one of those ways, and then I’ll accept that.

            Extracting a may method may make the code easier to read and understand, or it may have the opposite effect. Similarly, adding parameters or overloads may make the code easier or harder to understand. The reason I’ll reject it if I think you’re making the code harder to understand is because it being easy to understand is the thing I care most about.

            I care more about the code being easy to understand than it fulfilling the requirements, because if the code is easy to understand but does the wrong thing, then we’re in a position to make it do the right thing in a simple, understandable way.

            If it’s hard to understand but appears to do the right thing, then we have two problems – we can’t actually be sure that the code is doing the right thing because it’s hard to understand, and (even if it is doing the right thing) when the requirements change we will struggle to adapt to them because that requires understanding the code.

        • I would say that’s only a problem *because* of how trivial it is. If you call 10 methods every time, 1 method that varies, and then another 10 methods every time, you should not be duplicating the 20 method calls, for example. It’s something that should be looked at in terms of trade-offs or cost-benefit. I understand what you mean, though, you shouldn’t be *so* gung-ho to be DRY that you remove even tiny bits of ostensible duplication, turning your code into a mess. The duplication removal should be making the code easier to read and maintain.

  5. I found above that the comments from John Payson and James Hart resonate most strongly with me:

    1. To the extent that we _do_ find ourselves rewriting code often, that is a strong suggestion that this is code that really should have been in the framework. Indeed, pretty much every non-trivial project I know of comes along with a custom “utility framework” that extends the capabilities of the nominal platform (Windows, .NET, Cocoa, whatever).

    2. To the extent that we not only find ourselves rewriting code, but having to duplicate that code in some material way (as in Eric’s example here), that is a strong suggestion that there is room for improvement in the language itself. E.g. when the only way to support memoization of functions with different length parameter lists is to rewrite the code for each number of parameters to support, that suggests it would be useful for the language to have a way to represent that in an abstract way that would allow writing the code just once.

    You can in fact encapsulate the memoization caching policy in a single reusable function, but in C# doing so would require some hoop-jumping and the loss of much of what makes generics useful. (I.e. I know for a fact I can put the caching policy into a single method, with the same code used no matter how many parameters are in the function to be memoized; but I also know that implementation would wind up pretty messy, and would not have the type-safety of the example Eric’s provided).

    Which ties into my main point here…

    3. James’ point about the example at hand and how it does in fact represent some specific policy that could change in the future. To that I will offer one of the points that I have found in practice to be at least as important with respect to DRY: fixing a bug.

    Encapsulating policy, of whatever sort, is important and useful. And I admit that in other scenarios, maybe that’s actually the most important goal, due to frequently-changing specifications or a desire to rapidly prototype or experiment with different policies. But in my experience, in the types of projects I tend to work on, what comes up a lot more often is simply dealing with poorly-written code (and once in a blue moon, I was even the one who wrote it!). It’s bad enough to have to track down a bug, especially one that’s hard to reproduce, but it’s even worse to discover that bug exists in multiple places, because it was present before the code was duplicated multiple times.

    Personally, I find that there are very few scenarios where copy/pasting code is acceptable. The example Eric’s given is a good one; the language just doesn’t provide a very good way to resolve the problem. But for the vast majority of cases, no matter what aspect of the program the code is implementing, whether a high-level business logic policy, or some nitty-gritty implementation detail, if I find myself even _tempted_ to hit that Ctrl-C/Ctrl-V combo, I know it’s time to start abstracting. Break out that part of the code that is identical, and share it between the other parts that do have their own specific implementation details.

    • The example is an interesting one because factoring out everything into a common base might seem to be forcing a global policy decision, but I think one could argue the reverse is actually true. If a single factory could easily produce code to handle items with any number of generic parameters, then in cases where a different policy was required one could substitute a different factory, and know that changes to that factory would have uniform effect on code using any number of parameters.

      The difficulty with the non-factored case is that the lack of variadic parameters forces the code to be duplicated along an axis which is orthogonal to any probable policy distinctions. If objects can be divided into groups which have the same policy today, but which may need to have different policies tomorrow, using separate types for each group will facilitate such a future split. In the event that, for example, it was for some reason it was necessary to change a policy associated with four-parameter objects without changing the policy associated iwth three-parameter or five-parameter objects, then having separate code for each parameter count would be helpful. Unfortunately, the likelihood of needing to have a policy split based on parameter count pales in comparison with the likelihood of needing a split for other reasons.

  6. The whole discussion about business logic vs non-business logic is quite artificial. One person’s business is another person’s mechanism and vice-versa. Write a program in as few lines of code as possible, but no less.
    Not much to argue in this case, the language is a bit defective here, so there is no alternative but to write extra lines.
    The whole argumentation that this is actually not a bad thing is flawed.
    Code is a liability, not an asset.

  7. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1847

  8. Pingback: Dew Drop – April 24, 2015 (#2000) | Morning Dew

  9. Agreed with the metaprogramming. But to prevent all kinds of wonderfull custom compositions of concerns shouldn’t we use some standard constructs to compose them? Thinking about Aspect Oriented Programming(Kiczales et al), Composition Filters(Aksit et al )or more general standard combinators like those presented in ‘To Mock a Mockingbird’ http://www.amazon.com/To-Mock-Mockingbird-Other-Puzzles/dp/0192801422 How to compose combinators is well defined.

  10. Pingback: Automate the Planet

  11. I prefer to use T4 templates when needing to write something like this.

    I think DRY is important for non-business logic as well- if you realize there is a bug in the method implementation, you’d need to manually change it in n number of places and hope you didn’t miss any (or worse yet, hope that someone else doesn’t make a change and miss any). Using a templating system neatly avoids this problem.

  12. Pingback: Wizards and warriors, part five | Fabulous adventures in coding

  13. Pingback: Three Short Links: May 27, 2015 - Apprenda

  14. Pingback: DRY revisited - Enterprise Craftsmanship

  15. Pingback: Generic Memorize Method | Philip Rashleigh's coding blog

  16. Pingback: Generic Memoize Method – Philip Rashleigh's coding blog

  17. Pingback: Generic memoization in C# – Audacia Code Blog

  18. Pingback: Are you lazy? That may not be a bad thing | Software on a String

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s