Last time on FAIC I complained bitterly about the shortcomings of System.Random. In this episode, I’ll take a stab at the sort of thing I’d like to see in a redesigned random library. This is the “fix the trivial problems” episode, and then we’ll go deeper from there.

The first thing I want is two static methods, one that gives me an actually-random integer from 0 to Int32.MaxValue, and one that gives me a random double from 0.0 (inclusive) to 1.0 (exclusive). Not pseudo-random, but actually indistinguishable from a true random uniform distribution. Here’s an attempt:

using CRNG = System.Security.Cryptography.RandomNumberGenerator;

public static class BetterRandom

{

private static readonly ThreadLocal<CRNG> crng =

new ThreadLocal<CRNG>(CRNG.Create);

private static readonly ThreadLocal<byte[]> bytes =

new ThreadLocal<byte[]>(()=>new byte[sizeof(int)]);

public static int NextInt()

{

crng.Value.GetBytes(bytes.Value);

return BitConverter.ToInt32(bytes.Value, 0) & int.MaxValue;

}

public static double NextDouble()

{

while (true)

{

long x = NextInt() & 0x001FFFFF;

x <<= 31;

x |= (long)NextInt();

double n = x;

const double d = 1L << 52;

double q = n / d;

if (q != 1.0)

return q;

}

}

}

It’s pretty straightforward, but let’s take a quick look. First, all the state is thread-local, so we trade a small amount of per-thread overhead and per-call indirection for thread safety, which is probably a good tradeoff. We’re going to be using the same four-byte buffer over and over again, so I figured it is probably worthwhile to cache it and avoid the hit to collection pressure; however this is just a guess and I would want to verify that with empirical tests.

There are other possible performance problems here; for example, is it worthwhile to generate a few thousand random bytes at once, cache them, and then gradually use them up? Or is the per-call cost of GetBytes sufficiently low that this is not an issue? I’d also want to empirically check that.

I want only positive integers; clearing the top bit by and-ing with 0x7FFFFFF does that nicely without changing the distribution.

For a random double, we know that doubles have 52 bits of precision, so we generate a random 52 bit integer and make that the numerator of a fraction. We want to guarantee that 1.0 is never a possible output, so we reject that possibility; one in every few billion calls we’ll do some additional calls to `NextInt`

. I can live with that.

**Exercise**: An alternative, and possibly better solution, would be to build the double from bits directly, rather than doing an expensive division. I have not actually written the code or done the math here; as an exercise, does doing so produce a better distribution in any way?

Now that we have a crypto-strength threadsafe random number library, we can build a cheap pseudo-random library out of it:

public static class Pseudorandom

{

private readonly static ThreadLocal<Random> prng =

new ThreadLocal<Random>(() =>

new Random(BetterRandom.NextInt()));

public static int NextInt() => prng.Value.Next();

public static double NextDouble() => prng.Value.NextDouble();

}

We’ve solved two of the main problems with Random: historically it gave highly correlated values when called repeatedly from one thread, and it is not thread safe. We now have a single prng per thread, and it is seeded with a crypto-strength random seed, not the current time.

The problem of the candy-machine interface has gone away because I’ve eliminated that method entirely! We’ll come back to this point in a later episode.

I think this is already a major improvement, in that 99% of the questions about mis-uses of Random on StackOverflow would not exist if this had been the received randomness library in the first place, and I am very happy to learn that at long last, these problems have been fixed in the core implementation.

What about performance? The crypto RNG is a lot slower than the pseudo-RNG, and there is definitely overhead for the thread local infrastructure. But: **who cares?** Most programs would not be *observably* slowed down by these improvements, and for the ones that are slowed down, you go right ahead and make a cheap, thread-unsafe version if you need it for performance.

We should always make good tradeoffs; if the cost of eliminating a whole class of bugs is a tiny performance hit, we should make that tradeoff. **The entire premise of managed languages is that we trade a small amount of performance for improved developer productivity. **That’s why we have things like a GC, immutable strings, checked array access, and so on; these all trade a small amount of performance for more safety, robustness and productivity. And that’s why we have unsafe pointers; if you need to abandon some safety to get performance, C# should let you do that. But we should default to safety!

A commenter noted last time that recent implementations of System.Random solve this problem using some similar techniques; it’s instructive to compare my four-line solution here to a solution actually engineered to be performant at scale. As always, remember that the code on my blog is for pedagogy and not intended to be an example of industrial-quality code. You’ll notice that for instance they make only the rarely-used parts of the algorithm thread-safe. That seems like a reasonable tradeoff between performance and safety.

Already I think we’re off to a good start, but we can do a *lot* better still. Like I said last time, these are the trivial problems. There are a lot more problems we could be solving. Two that come to mind:

- The received PRNG is simply not very good in terms of its randomness; there are far better algorithms that produce harder-to-predict outputs with similar performance and similar amount of state.
- A common use for PRNGs is to generate a random-seeming but actually-deterministic sequence for the purposes of randomizing a game. But the received implementation has no way to say “save the current state of the PRNG to disk and restore it later”, or any such thing.

I’m not going to address either of these problems in this series, though both are fascinating problems that deserve posts of their own. I’m going to take a different approach to improving how we deal with randomness; I want to look at the problem at a higher level of abstraction than the low-level details of how the random numbers are generated and what the state of the generator is. *To really improve the state of the art of dealing with probability in C#, we’ll need some more powerful types.*

**Next time on FAIC:** I’ll make some useful classes for common continuous probability distributions.

Constructing the double directly from the bits would be…difficult. You’d be trying to select from the set of representable doubles, which are not uniformly distributed in [0,1), not even close. Assuming you wanted a uniform distrubution within [0,1) you would have to weight the choice of exponent so that -1 is chosen half the time, -2 a quarter of the time, and so on.

Let’s see: a non-zero, non-NaN, non-infinite, non-denormal double is sign x 1.mantissa x 2^exponent. What if we set the sign bit to positive, the exponent to zero, the mantissa bits to a random 52 bit number, and then subtracted one from the resulting double? That might do the trick, but like I said, I haven’t worked out the math or tried it.

Yeah I suspect that would work. In fact I suspect that would give you exactly the same distribution you’ve already got. I’d definitely want to benchmark before choosing one or the other, though; I would not be prepared to bet money on whether a modern compiler would be better at subtracting floats or dividing a float by an integer power of two.

Floating-point division by a power of two is complicated enough that I doubt any compiler can optimize it better than just ensuring the appropriate CPU instruction gets used (e.g. FSCALE instead of FMUL or FDIV on x86). To do it in software is tricky. Normally you just reduce the exponent, but you’ve gotta handle denormalization and underflow, and the whole thing is much slower than a plain old multiplication.

Sadly, I’ve been pretty disappointed by .NET JIT compiler in this regard.

Maybeit’s smart enough to use FSCALE these days, but I’d be surprised.AFAIK RyuJIT generates SSE instructions only for floating point nowadays.

There are many more doubles in the range [0,1) than in [1,2). Since your method is a direct mapping [1,2) -> [0,1), there are doubles that will never be produced, which makes the distribution non-uniform.

Right, but I’m not interested in producing a uniform distribution over possible doubles; I’m interested in producing a distribution over real numbers that is approximated by a distribution over doubles. What I don’t want is to end up biasing towards a region of the real line that is “more dense” in represented doubles.

I think what we are both getting at here is: there are 2^52 doubles between 0.5 and 1.0, and there are 2^52 doubles between 0.25 and 0.5. If we uniformly sampled from that set of 2^53 doubles, we would not get a uniform distribution on real numbers between 0.25 and 1.0; we’d get twice as many from 0.25 to 0.5 as we do 0.5 to 0.75.

My solution to this problem is to say that I’m OK with some doubles in the range of 0.0 to 1.0 never being produced; in fact, most of them are never produced. If there is an easy way to produce them that does not introduce a bias in the uniform distribution, I would be very happy to link to an implementation of it.

Incidentally, the current implementation of Random.NextDouble() only produces max 2^32 possible different doubles, so we’ve lived with this problem for a long time.

You are obviously correct, I saw that most results will end up with a lot of zeros at the end of mantissa and somehow confused that with the distribution being non-uniform.

After some research it seems the two methods you presented (in the comment and in the article) are the most commonplace. Both Rust and Julia use the [1,2)-1.0 in their standard library, while openJDK does random53bits/(double)(1<<53). (due to the implicit 1 in IEEE754 format, all 53b integers have accurate double representation, despite 52b mantissa).

.Net Core and Mono as you mentioned do random32/(double)MAX_INT, so a lot less entropy here.

The only method covering all the doubles in [0,1) I came across is https://mumble.net/~campbell/2014/04/28/random_real.c (see random_real(void))

with the basic idea of treating the random bit stream as a binary expansion of a real number, and turning that into a double (so exponent = -CountTrailingZeros(), mantissa = random52b). Don't know how well tested that code is, but the idea seems sound.

Reminds me of an answer I attempted five years ago, https://stackoverflow.com/a/24648788/659190, I went with a shared buffer because I thought the call to `GetBytes` was expensive and worked better with a larger buffer. I don’t remember actually measuring that so it could quite likely be wrong. I did persist the clunky interface of the original `Random`.

In you first code sample, is there any reason other than a personal coding style you used the while(true) over:

(subverting the intended form of while loop is a fingernail-on-chalkboard thing for me, and I was wondering it there was a reason for it here)

No particular reason. I personally find “while(true)” to be very easy to reason about.

I figured it was probably a personal preference thing. I see “while(true)” as saying “I’m probably going to loop for a while before exiting”, and “do {} while()” as saying “I’m probably going to exit after the first pass”, but I guess that’s just me.

.NET Core again makes our lives a bit easier, as it includes a static RandomNumberGenerator.Fill method that calls straight into the OS’s RNG without us having to worry about a disposable thread-unsafe object.

The System.Security.Cryptography.RandomNumberGenerator class has been around for a long time. If you call the (static) Create factory method, you will get a Crypto RNG algorithm instance that you can use to call GetBytes (or GetNonZeroBytes). The Fill method has a Span{T} in its signature, so you need a late enough Framework or Core release to accommodate that.

The point is that the Fill method (which is new) is static, so you can skip the Create/Dispose ceremony.

And that method works with a Span, so you can stackalloc the 4 bytes and pass that in, which allows you to get rid of the second ThreadLocal field.

> As always, remember that the code on my blog is for pedagogy and not intended to be an example of industrial-quality code.

So, where can I look for a production quality library that does this? I love your blog, but I think this is just too god to look at it “academically”.

Since you mention it a few times here, I’m curious about your approach to empirical testing small specific things. For example, if you *really* needed to know about the cost of GetBytes() vs a larger buffer, how do you approach it? Would you use a profiler against a real application with both approaches, or just write a small isolated test program using something like Stopwatch? Or perhaps an even more rigorous approach?

Thanks, and love this series!

What I’d be worried about here is collection pressure, so I’d likely write a test app that made reasonable use of the API, and then use a memory profiler to find out how much pressure it was producing. Then make some changes, profile again, and measure the impact of the change.

I would use BenchmarkDotNet. It takes care of a lot of issues with benchmarking (and especially microbenchmarking) and also has some some nice additional features (like measuring allocations).

Gonna want to fix your RSS feed too, if you haven’t already 😉

You want to take a look at CryptoRandom.cs implementation in Inferno crypto library. https://github.com/sdrapkin/SecurityDriven.Inferno

Pingback: Dew Drop – February 5, 2019 (#2892) | Morning Dew

The random double code I use in my own library is:

public virtual unsafe double NextDouble()

{

double n;

*(uint*)&n = NextUInt32();

*((uint*)&n+1) = NextBits(20) | (1023u<<20);

return n – 1;

}

It basically sets the exponent to 0 (1023 after biasing) with a random mantissa. It should generate a random double in [1,2), and then you subtract one to get it in [0,1).

As you say it's for pedagogy and the interesting bit you're developing is the new high-level randomness abstractions – which I'm looking forward to! – but I've gotta say that RNG interfaces with no way to directly get a 32-bit integer kinda bug me. If you want a random long (or ulong) you can't just OR two 31-bit integers together, 'cuz that's only 62-bits rather than 63 (or 64). You'd have to get a third int and then throw away almost all of it.

Fixing Random, part 15 popped up in my RSS reader yesterday. Are you that far ahead with the writing? Well done!

(It’s probably what Zac refers to too.)

Yes, I accidentally set it to publish immediately and it was in the RSS feed for a couple minutes. I try to get way ahead whenever possible, so that I know where I’m going to end up!

P.S. There’s no need to compare against 1.0. The maximum numerator is 2^52-1 and the denominator is 2^52, and since you’re not losing any precision with that division the result will never equal 1.0. (Dividing a float by a power of two doesn’t lose precision unless it makes the exponent go out of range, but at that point you’re getting very close to zero.)

To prove it, just try (double)((1L<<52)-1)/(1L<<52) == 1. The result should be false.

That’s a good point, thanks. Floating point arithmetic being in nondeterministic precision makes me more conservative than I need to be.

Pingback: Fixing Random, part 3 | Fabulous adventures in coding