Fixing Random, part 1

[Code is here.]

How to say this delicately?

I really, really dislike System.Random. Like, with a passion. It is so… awful.

The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way. And then System.Random comes along; I cringe every time I see code on StackOverflow that uses it, because it is almost always wrong, and it is seldom easy to see how to make it right.

Let’s start with the obvious problem: the natural way to use it is also the wrong way. The naive developer thinks “I need a new random dice roll, so…”

void M()
  Random r = new Random();
  int x = r.Next(1, 6);

Two lines in and we are already in a world of pain.

First, every time M() is called, we create a new Random, but in most of the implementations of Random available for the last couple decades, by default it seeds itself with the current time, and the granularity of the timer is only a few milliseconds. Computers run on the nanosecond scale, and so the likelihood that we’ll get two Randoms with the same seed is very high, and therefore it is very likely that successive calls to M() produce runs of the same number. You never want to make a new Random if you can avoid it; you want to make one once, and then re-use it. But it’s a bad idea to stash it away in a static, because it’s not thread safe!

Fortunately, I have just learned from an attentive reader that this problem has been fixed in some versions of the CLR; exactly which I am not sure. In those versions, a new Random() now seeds itself randomly, rather than based on the current time. Thanks to the BCL team for fixing this irksome problem; better late than never.

Second, the arguments to Next are the minimum value produced, inclusive, and the maximum value produced, exclusive! This program produces random numbers drawn from 1, 2, 3, 4, 5. The correct way to get numbers from 1 to 6 is of course Next(1, 7).

What we’ve got here is a classic example of Steve Maguire’s “candy machine interface” concept. The candy machines at Microsoft used to have an item number — say, 75, and a price, say, $0.85, you put in your quarters, punch in the item number and, dammit, wait, that’s item 085 and it costs $0.75, I got it backwards, and now I’ve got some weird gum instead of those jalapeno chips. This is a true story! I have actually made that mistake in Microsoft candy machines.

But those are trivial problems. The really fundamental problem here is that we’re working at too low a level of abstraction. It is not the 1970s anymore, when rand() was good enough. We have sophisticated problems in statistical modeling and the attendant probabilistic reasoning to solve in modern programming languages. We need to up our game by writing code in the “business domain” of probabilities, not the “mechanism domain” of calls to methods that return random numbers.

Coming up on FAIC: We will start by simply improving the existing implementation of Random, but from that humble beginning we’ll develop a new class library that makes programming with probabilities much more readable, powerful and efficient in C#.


31 thoughts on “Fixing Random, part 1

  1. “The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way.”

    And yet there’s no type aliasing, so you’re forced to use implementation types where you want to use problem types.

    • Sure. And it took seven versions for value tuple types to be made first class, and there still aren’t let-expressions outside of queries. We can list missing features all day. There’s only a finite amount of compiler developer effort available; not everyone’s favourite feature can get into v1.

      My point here is not that C# got everything right the first time; obviously it did not. My point is that Random in several ways violates a fundamental good design principle: that the naive, obvious way to write simple code is also the correct way to do it.

  2. Not sure I understand the part you said making Random static being thread unsafe? Isn’t static type by default thread safe in .NET?

    • What I mean is, if you say

      private static Random myRandom = new Random();

      and you use myRandom on multiple threads without locking, you can get into big trouble quickly. The implementation of Random updates its state in a non-threadsafe manner.

      And if you believe that static somehow makes things threadsafe, think again. When you see in the documentation comments that say “static members of this type are threadsafe”, that’s because someone designed, implemented and tested threadsafe code for the static members; that did not happen automatically.

      • I always hated that statement – “static members of this type are threadsafe” – because so many people read it and thought “great, so if I put one of these objects into a static member, it’s thread safe” rather than “the members *of* this type, which are static, are also thread safe”.

        There had to be a better/clearer way of phrasing it.

      • Ok, I thought you meant making Random a static class. Just to be clear, static types/methods are thread safe by default in .NET, right?

        • Absolutely not. Nothing is “threadsafe by default” in C#.

          There are many static methods in the base class library that are documented as being threadsafe. **That is because someone did the work to ensure that they were.**

          • Ok, what I really meant was in .NET framework class library (FCL) guarantees that all static methods are thread safe. But that’s probably your point i.e. they’re made thread safe by the FCL team rather thread safe by default from C#.

      • I used to work heavily with C#, and this quirk was one of my favorite. I was in the game industry at the time and someone introduced a random generator in our base framework, to provide some helper methods and plug our own seeder. But this random generator was a singleton pattern of sort with a private static field containing a System.Random instance somewhere…
        Thanks to this, I got the occasion to actually see what happens when a Random is heavily called by multiple threads (or at least that was the behavior at the time): at some point the state gets corrupted and from this point all the instance methods return “0” when called, until the end of the instance’s lifetime.
        This made it to production (none of our automated tests triggered the behavior), and that day a lot of people got the same game boards and the same loots…
        And a team learned that System.Random don’t have any thread safety guarantees 🙂

    • Fabulous! I did not know that and I am very happy to hear it. I’ll update the post.

      Fortunately (for the purposes of this blog!) there are plenty more deficiencies with the current approach to randomness that we can fix.

  3. Pingback: Roundup #34: Channels, ring buffers and logs | The Creeping IT Apocalypse | dotnet-format | Right Tool for the Job | Fixing Random | Microsoft Graph - CodeOpinion

  4. Another way System.Random is horrible is its design (or lack thereof) for inheritance. The class is not sealed and has virtual methods making it seem like you could override it and change the algorithm, but it’s not intuitive at all! I just looked and MSDN now mercifully says the following. This didn’t used to be documented.

    To supply your own algorithm, you must override the Sample() method, which implements the random number generation algorithm. You should also override the Next(), Next(Int32, Int32), and NextBytes() methods to ensure that they call your overridden Sample() method. You don’t have to override the Next(Int32) and NextDouble() methods.

    But the implementation of Next(int,int) is not obvious. (It’s wrong to use something like min + rand() % (max-min) because that introduces bias, and you also have to handle the case that max-min >= 2^31.) And there’s no magic going on there that prevents them from just calling Sample. Why can’t there just one be method to override? And even if you do override it, Random is still doing work to seed and carry around a lot of state for the base class algorithm. You’re duplicating effort… It’s also a bit annoying that everything is based on double. It’s much faster to construct a double from random integers (just do some bit twiddling on a long and reinterpret it as a double) than to construct integers from random doubles (which generally requires floating point multiplication).

    You also can’t natively get a 32-bit random number (i.e. a random uint). Next() returns 31 bits, and Next(int.MinValue, int.MaxValue) excludes int.MaxValue. Of course, you can do (uint)(NextDouble() * uint.MaxValue) but it seems inefficient.

    Finally there’s no way to save and load the state of the generator, which is useful in some applications.

    • Of course, you can do (uint)(NextDouble() * uint.MaxValue)

      I guess I shouldn’t have written “of course”, because that also excludes uint.MaxValue. I guess you’ve gotta do (uint)(NextDouble() * ((ulong)uint.MaxValue + 1)), or something, which is even uglier…

  5. I am looking forward to hearing your ideas. I was the person to propose the scheme to initialize Random un GitHub. Very glad it got done because, like you, I *really* hate this class. It’s design is so poor.

    The algorithm is horrible as well. It is slow, has a large state and does not seem to produce a good sequence. I personally researched good random algorithms a few years ago. The XorShift family is nearly optimal for non-cryptographic use. It is extremely fast (like 8 ALU operations), 4 words of state size and it wins almost all statistical tests in the common batteries (which I think puts it near the top of the field). It can be significantly strengthened with a simple extension. It’s 2x slower then (which is plenty) and passes all tests.

    The common Mersenne twister is actually quite bad. It seems that random algorithm choices are often made like how people buy competing products. They buy what they have heard about the most 🙂

  6. Pingback: Dew Drop – February 1, 2019 (#2890) | Morning Dew

  7. It’s been 25-ish years since I read Steve Maguire’s book. When I’m standing in front of a candy machine that has Letters for the row index and numbers for the “columns”, I still point out the importance of having the M&Ms be E-2 rather than 5-2.

  8. Just to make things more confusing: If the minimum and maximum values are equal, the maximum value is treated as being inclusive. Thus, Next(1,1) produces 1, rather than generating an exception. This isn’t implemented explicitly. What actually happened is they multiplied a sample from [0..1) by the range.

    floor ([0..1) * 0) == [0..0] //So Next(0,0) == 0
    floor ([0..1) * 1) == [0..0] //So Next(0,1) == 0
    floor ([0..1) * 2) == [0..1] //So Next(0,2) == 0 or 1

    • Just… wow.

      One of the characteristics of the system I’m going to explore in this series is the idea that every probability distribution is required to produce at least one value; should attempts to produce null distributions throw an exception? There are some subtleties here, but we won’t get to them for many episodes yet.

    • I think it is a great idea, and I’m going to address it in the third part of this series. There are good reasons why we want to have a slightly different abstraction over random numbers than “infinite sequence”. We’ll do it in a way where we get the benefits of both: a new, more powerful abstraction, and the ability to still treat a random distribution as an infinite sequence.

  9. Pingback: Fixing Random, part 2 | Fabulous adventures in coding

  10. Pingback: New top story on Hacker News: Fixing random – Golden News

  11. Pingback: New top story on Hacker News: Fixing random – Outside The Know

  12. Pingback: New top story on Hacker News: Fixing random – World Best News

  13. Pingback: New top story on Hacker News: Fixing random – Latest news

  14. Pingback: New top story on Hacker News: Fixing random – Hckr News

  15. Pingback: New top story on Hacker News: Fixing random – News about world

  16. Pingback: Fixing Random | My Tech Blog

  17. You should look at how random generators work in property-based testing frameworks like QuickCheck (Haskell), Hedgehog (Haskell) and ScalaCheck (Scala).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s