(This is a follow-up article to my post on generating random data that conforms to a given distribution; you might want to read it first.)
Here’s an interesting question I was pondering last week. The .NET base class library has a method
Random.Next(int) which gives you a pseudo-random integer greater than or equal to zero, and less than the argument. By contrast, the
rand() method in the standard C library returns a random integer between 0 and
RAND_MAX, which is usually 32768. A common technique for generating random numbers in a particular range is to use the remainder operator:
int value = rand() % range;
However, this almost always introduces some bias that causes the distribution to stop being uniform. Do you see why?
Last time on FAIC I generated a “random” permutation of a deck of cards and gave you the first five cards, and challenged you to determine what the next five cards were. David Poeschl (of the Roslyn team!) was the first to get a possible order and, with the additional hint that the sixth card was the three of hearts, Joel Rondeau found the correct solution, which is below. Both used brute-force algorithms.
Last time in this series I presented an algorithm for generating a random number, and the time before I presented an algorithm that turns such a number into a permutation, so we can put them together to make an algorithm that returns a random permutation:
static Permutation RandomPermutation(int size, Random random)
return Permutation.NthPermutation(size, RandomFactoradic(size, random));
Is this actually a correct algorithm for generating a random permutation? Give it some thought.
UPDATE: I’ve posted a related article here.
When building simulations of real-world phenomena, or when generating test data for algorithms that will be consuming information from the real world, it is often highly desirable to produce pseudo-random data that conform to some non-uniform probability distribution.
But perhaps I have already lost some readers who do not remember STATS 101 all those years ago. I sure don’t. Let’s take a step back. Continue reading
One more easy one. I want to “sort” a list into a random, shuffled order. I can do that by simply randomizing whether any two elements are greater than, less than, or equal to each other:
myList.Sort((x, y) => (new Random()).Next(-1, 2));
That generates a random -1, 0 or 1 for every comparison, right? So it will sort the list into random order, right?
There are multiple defects here. First off, clearly this violates all our rules for comparison functions. It does not produce a total ordering, and in fact it can tell you that two particular elements are equal and then later tell you that they have become unequal. The sort algorithm is allowed to go into infinite loops or crash horribly when given such an ill-behaved comparison function. And in fact, some implementations of sorting algorithms attempt to detect this error and will throw an exception if they determine that the comparison is inconsistent over time.
Second, every time this thing is called it creates a new
Random instance seeded to the current time, and therefore it produces the same result over and over again if called multiple times in the same timeslice; hardly random.
Shuffling is not sorting; it is the opposite of sorting, so don’t use a sort algorithm to shuffle. There are lots of efficient shuffle algorithms that are easy to implement. (That said, it is legal to shuffle by sorting a list ordered by a randomly chosen key. But the key must be chosen exactly once for each item in the list and there must be a correct comparison function on that key.)