How to say this delicately?
I really, really dislike System.Random
. Like, with a passion. It is so… awful.
The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way. And then System.Random comes along; I cringe every time I see code on StackOverflow that uses it, because it is almost always wrong, and it is seldom easy to see how to make it right.
Let’s start with the obvious problem: the natural way to use it is also the wrong way. The naive developer thinks “I need a new random dice roll, so…”
void M() { Random r = new Random(); int x = r.Next(1, 6); ...
Two lines in and we are already in a world of pain.
First, every time M()
is called, we create a new Random
, but in most of the implementations of Random available for the last couple decades, by default it seeds itself with the current time, and the granularity of the timer is only a few milliseconds. Computers run on the nanosecond scale, and so the likelihood that we’ll get two Random
s with the same seed is very high, and therefore it is very likely that successive calls to M()
produce runs of the same number. You never want to make a new Random
if you can avoid it; you want to make one once, and then re-use it. But it’s a bad idea to stash it away in a static, because it’s not thread safe!
Fortunately, I have just learned from an attentive reader that this problem has been fixed in some versions of the CLR; exactly which I am not sure. In those versions, a new Random() now seeds itself randomly, rather than based on the current time. Thanks to the BCL team for fixing this irksome problem; better late than never.
Second, the arguments to Next
are the minimum value produced, inclusive, and the maximum value produced, exclusive! This program produces random numbers drawn from 1, 2, 3, 4, 5. The correct way to get numbers from 1 to 6 is of course Next(1, 7)
.
What we’ve got here is a classic example of Steve Maguire’s “candy machine interface” concept. The candy machines at Microsoft used to have an item number — say, 75, and a price, say, $0.85, you put in your quarters, punch in the item number and, dammit, wait, that’s item 085 and it costs $0.75, I got it backwards, and now I’ve got some weird gum instead of those jalapeno chips. This is a true story! I have actually made that mistake in Microsoft candy machines.
But those are trivial problems. The really fundamental problem here is that we’re working at too low a level of abstraction. It is not the 1970s anymore, when rand()
was good enough. We have sophisticated problems in statistical modeling and the attendant probabilistic reasoning to solve in modern programming languages. We need to up our game by writing code in the “business domain” of probabilities, not the “mechanism domain” of calls to methods that return random numbers.
Coming up on FAIC: We will start by simply improving the existing implementation of Random, but from that humble beginning we’ll develop a new class library that makes programming with probabilities much more readable, powerful and efficient in C#.
“The C# design team tries hard to make the language a “pit of success”, where the natural way to write programs is also the correct, elegant and performant way.”
And yet there’s no type aliasing, so you’re forced to use implementation types where you want to use problem types.
Sure. And it took seven versions for value tuple types to be made first class, and there still aren’t let-expressions outside of queries. We can list missing features all day. There’s only a finite amount of compiler developer effort available; not everyone’s favourite feature can get into v1.
My point here is not that C# got everything right the first time; obviously it did not. My point is that Random in several ways violates a fundamental good design principle: that the naive, obvious way to write simple code is also the correct way to do it.
Not sure I understand the part you said making Random static being thread unsafe? Isn’t static type by default thread safe in .NET?
What I mean is, if you say
private static Random myRandom = new Random();
and you use myRandom on multiple threads without locking, you can get into big trouble quickly. The implementation of Random updates its state in a non-threadsafe manner.
And if you believe that static somehow makes things threadsafe, think again. When you see in the documentation comments that say “static members of this type are threadsafe”, that’s because someone designed, implemented and tested threadsafe code for the static members; that did not happen automatically.
I always hated that statement – “static members of this type are threadsafe” – because so many people read it and thought “great, so if I put one of these objects into a static member, it’s thread safe” rather than “the members *of* this type, which are static, are also thread safe”.
There had to be a better/clearer way of phrasing it.
Ok, I thought you meant making Random a static class. Just to be clear, static types/methods are thread safe by default in .NET, right?
Absolutely not. Nothing is “threadsafe by default” in C#.
There are many static methods in the base class library that are documented as being threadsafe. **That is because someone did the work to ensure that they were.**
Ok, what I really meant was in .NET framework class library (FCL) guarantees that all static methods are thread safe. But that’s probably your point i.e. they’re made thread safe by the FCL team rather thread safe by default from C#.
I used to work heavily with C#, and this quirk was one of my favorite. I was in the game industry at the time and someone introduced a random generator in our base framework, to provide some helper methods and plug our own seeder. But this random generator was a singleton pattern of sort with a private static field containing a System.Random instance somewhere…
Thanks to this, I got the occasion to actually see what happens when a Random is heavily called by multiple threads (or at least that was the behavior at the time): at some point the state gets corrupted and from this point all the instance methods return “0” when called, until the end of the instance’s lifetime.
This made it to production (none of our automated tests triggered the behavior), and that day a lot of people got the same game boards and the same loots…
And a team learned that System.Random don’t have any thread safety guarantees 🙂
Looks like they changed Random in .NET Core so it uses a random seed instead of the current time: https://github.com/dotnet/corefx/blob/master/src/Common/src/CoreLib/System/Random.cs#L126
Fabulous! I did not know that and I am very happy to hear it. I’ll update the post.
Fortunately (for the purposes of this blog!) there are plenty more deficiencies with the current approach to randomness that we can fix.
Pingback: Roundup #34: Channels, ring buffers and logs | The Creeping IT Apocalypse | dotnet-format | Right Tool for the Job | Fixing Random | Microsoft Graph - CodeOpinion
Another way System.Random is horrible is its design (or lack thereof) for inheritance. The class is not sealed and has virtual methods making it seem like you could override it and change the algorithm, but it’s not intuitive at all! I just looked and MSDN now mercifully says the following. This didn’t used to be documented.
But the implementation of Next(int,int) is not obvious. (It’s wrong to use something like min + rand() % (max-min) because that introduces bias, and you also have to handle the case that max-min >= 2^31.) And there’s no magic going on there that prevents them from just calling Sample. Why can’t there just one be method to override? And even if you do override it, Random is still doing work to seed and carry around a lot of state for the base class algorithm. You’re duplicating effort… It’s also a bit annoying that everything is based on double. It’s much faster to construct a double from random integers (just do some bit twiddling on a long and reinterpret it as a double) than to construct integers from random doubles (which generally requires floating point multiplication).
You also can’t natively get a 32-bit random number (i.e. a random uint). Next() returns 31 bits, and Next(int.MinValue, int.MaxValue) excludes int.MaxValue. Of course, you can do (uint)(NextDouble() * uint.MaxValue) but it seems inefficient.
Finally there’s no way to save and load the state of the generator, which is useful in some applications.
I guess I shouldn’t have written “of course”, because that also excludes uint.MaxValue. I guess you’ve gotta do (uint)(NextDouble() * ((ulong)uint.MaxValue + 1)), or something, which is even uglier…
I am looking forward to hearing your ideas. I was the person to propose the scheme to initialize Random un GitHub. Very glad it got done because, like you, I *really* hate this class. It’s design is so poor.
The algorithm is horrible as well. It is slow, has a large state and does not seem to produce a good sequence. I personally researched good random algorithms a few years ago. The XorShift family is nearly optimal for non-cryptographic use. It is extremely fast (like 8 ALU operations), 4 words of state size and it wins almost all statistical tests in the common batteries (which I think puts it near the top of the field). It can be significantly strengthened with a simple extension. It’s 2x slower then (which is plenty) and passes all tests.
The common Mersenne twister is actually quite bad. It seems that random algorithm choices are often made like how people buy competing products. They buy what they have heard about the most 🙂
That is an excellent point which I am not going to address at all in this series, but I strongly agree with you. We can do a lot better in the basic algorithms.
Pingback: Dew Drop – February 1, 2019 (#2890) | Morning Dew
It’s been 25-ish years since I read Steve Maguire’s book. When I’m standing in front of a candy machine that has Letters for the row index and numbers for the “columns”, I still point out the importance of having the M&Ms be E-2 rather than 5-2.
Just to make things more confusing: If the minimum and maximum values are equal, the maximum value is treated as being inclusive. Thus, Next(1,1) produces 1, rather than generating an exception. This isn’t implemented explicitly. What actually happened is they multiplied a sample from [0..1) by the range.
Thus:
floor ([0..1) * 0) == [0..0] //So Next(0,0) == 0
floor ([0..1) * 1) == [0..0] //So Next(0,1) == 0
floor ([0..1) * 2) == [0..1] //So Next(0,2) == 0 or 1
Just… wow.
One of the characteristics of the system I’m going to explore in this series is the idea that every probability distribution is required to produce at least one value; should attempts to produce null distributions throw an exception? There are some subtleties here, but we won’t get to them for many episodes yet.
Hey, what do you think about this approach (by Zoran Horvat) to:
http://www.codinghelmet.com/articles/wrap-system-random-into-infinite-ienumerable-int-sequence
I think it is a great idea, and I’m going to address it in the third part of this series. There are good reasons why we want to have a slightly different abstraction over random numbers than “infinite sequence”. We’ll do it in a way where we get the benefits of both: a new, more powerful abstraction, and the ability to still treat a random distribution as an infinite sequence.
Pingback: Fixing Random, part 2 | Fabulous adventures in coding
Pingback: New top story on Hacker News: Fixing random – Golden News
Pingback: New top story on Hacker News: Fixing random – Outside The Know
Pingback: New top story on Hacker News: Fixing random – World Best News
Pingback: New top story on Hacker News: Fixing random – Latest news
Pingback: New top story on Hacker News: Fixing random – Hckr News
Pingback: New top story on Hacker News: Fixing random – News about world
Pingback: Fixing Random | My Tech Blog
You should look at how random generators work in property-based testing frameworks like QuickCheck (Haskell), Hedgehog (Haskell) and ScalaCheck (Scala).
Pingback: So long, MSDN blog | Fabulous adventures in coding
I find myself coming back to your code on a fairly regular basis. Have you, or anyone else, turned this code into a NuGet library?
Thanks! This has not been shipped as a library to my knowledge. This series was intended to be an inspiring exploration of what could be done to make improvements here, rather than an actual production-quality implementation. Also, a library is a good first step but it would be even better to have support in the language itself. C# supports sequence generation coroutines and task coroutines; it could also support the multi-shot continuations that you need for inference.
Pingback: Introducing Bean Machine | Fabulous adventures in coding
Pingback: Bean Machine – A Declarative Probabilistic Programming Language | صحافة حرة FREE PRESS
Pingback: What Is A Way To Generate OTP 6-digit Pin In C# .NET WCF? (without Using Random()) - Programming Questions And Solutions Blog