Producing permutations, part six

Last time in this series I presented an algorithm for generating a random number, and the time before I presented an algorithm that turns such a number into a permutation, so we can put them together to make an algorithm that returns a random permutation:

static Permutation RandomPermutation(int size, Random random)
{
  return Permutation.NthPermutation(size, RandomFactoradic(size, random));
}

Is this actually a correct algorithm for generating a random permutation? Give it some thought.

Turns out, surprisingly, no, it is not correct. Now, there’s no flaw that I’m aware of in the logic or the mathematics that underlies it. Rather, the problem is that the Random type is a big fat lie. We’d like to have the property that a random permutation generator produces every possible permutation with equal likelihood; that is, it should be unbiased. This algorithm is deeply biased because Random is not a strong enough source of random numbers. Some permutations are less likely than others.

So what’s the big deal? If there were just a small bias, that would be unfortunate but not fatal. It’s actually worse than that. With this bad source of randomness, given a portion of a permutation, you can work out what the rest of the permutation likely is.

And that’s my challenge to you. I’ve posted some code below, putting together everything we’ve seen so far in this series. The first five card dealt when I ran the program were:

39: Ace of Clubs
11: Queen of Spades
29: Four of Diamonds
5: Six of Spades
20: Eight of Hearts

My challenge to you is: what are the next five cards that I generated? And…. go!

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;

public static class Extensions
{
    public static IEnumerable<T> InsertAt<T>(
      this IEnumerable<T> items, int position, T newItem)
    {
        if (items == null)
            throw new ArgumentNullException("items");
        if (position < 0)
            throw new ArgumentOutOfRangeException("position");
        return InsertAtIterator<T>(items, position, newItem);
    }

    private static IEnumerable<T> InsertAtIterator<T>(
        this IEnumerable<T> items, int position, T newItem)
    {
        int index = 0;
        bool yieldedNew = false;
        foreach (T item in items)
        {
            if (index == position)
            {
                yield return newItem;
                yieldedNew = true;
            }
            yield return item;
            index += 1;
        }
        if (index == position)
        {
            yield return newItem;
            yieldedNew = true;
        }
        if (!yieldedNew)
            throw new ArgumentOutOfRangeException("position");
    }
}

struct Permutation : IEnumerable<int>
{
    public static Permutation Empty { get { return empty; } }
    private static Permutation empty = new Permutation(new int[] { });
    private int[] permutation;
    private Permutation(int[] permutation)
    {
      this.permutation = permutation;
    }

    private Permutation(IEnumerable<int> permutation) 
      : this(permutation.ToArray()) { }

    private static BigInteger Factorial(int x)
    {
      BigInteger result = 1;
      for (int i = 2 ; i <= x ; ++i)
        result *= i;
      return result;       
    }

    public static Permutation NthPermutation(int size, BigInteger index)
    {
      if (index < 0 || index >= Factorial(size))
        throw new ArgumentOutOfRangeException("index");
      if (size == 0)
        return Empty;
      BigInteger group = index / size;
      bool forwards = group % 2 != 0;
      Permutation permutation = NthPermutation(size - 1, group);
      int i = (int) (index % size);                      
      return new Permutation(
        permutation.InsertAt(forwards ? i : size - i - 1, size - 1));
    }

    // Produce a random number between 0 and n!-1
    static BigInteger RandomFactoradic(int n, Random random)
    {
      BigInteger result = 0;
      BigInteger radix = 1;
      for (int i = 1; i < n; ++i)
      {
        // We need a digit between 0 and i.
        result += radix * random.Next(i+1);
        radix *= (i+1);
      }
      return result;
    }
    public static Permutation RandomPermutation(int size, Random random)
    {
      return NthPermutation(size, RandomFactoradic(size, random));
    }
    public int this[int index]
    {
      get { return permutation[index]; }
    }
    public IEnumerator<int> GetEnumerator()
    {
      foreach (int item in permutation)
        yield return item;
    }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
      return this.GetEnumerator();
    }
    public int Count { get { return this.permutation.Length; } }
    public override string ToString()
    {
      return string.Join<int>(",", permutation);
    }
}

class Program
{
    static void Main()
    {
        string[] suits = { "Spades", "Hearts", "Diamonds", "Clubs" };
        string[] values = {"Ace", "Deuce", "Trey", "Four", "Five",
                       "Six", "Seven", "Eight", "Nine", "Ten",
                       "Jack", "Queen", "King" };
        Permutation p = Permutation.RandomPermutation(52, new Random());
        foreach (int i in p)
            Console.WriteLine("{0}: {1} of {2}", 
                i, values[i % 13], suits[i / 13]);
    }
}

25 thoughts on “Producing permutations, part six

  1. Well, since you’re re-using the same instance of Random, the same sequence of numbers will be generated for a given seed. That’s actually a pretty useful property for online games. What I wonder is: if you create a new instance of Random in the RandomFactoradic method, and therefore get a time-based seed, will the bias still be there?

  2. The next 5 terms are:

    7: Eight of Spades
    31: Six of Diamonds
    8: Nine of Spades
    47: Nine of Clubs
    1: Deuce of Spades

    KooKiz – While bias could be a problem in a better algorithm (see the common bias in incorrect implementations of Fisher-Yates), the larger problem here is that there are only a few billion possible decks of cards (the seed is a 32-bit integer), while there are 52! distinct shuffled decks of cards. 2^32 / 52! = 5.3249002 x 10^-59. That’s a teeny tiny percentage of possible decks.

    • David, that was astonishingly fast but unfortunately not quite right. The sixth is:

      15: Trey of Hearts

      Does that give you enough information to find the answer?

      What technique did you use to find the answer that you propsed?

      (And your analysis of the bias due to insufficient entropy in the seed is spot on.)

      • There are 52*51*50*49*48 = 311875200 possible first five cards, so with 2^32 seeds we should see on average 13.77 seeds for a list of 5 starting cards. Given the sixth card, we should see about 0.29 seeds. The first seed I found that matched was 150272048. Will continue the hunt given the Trey of Harts.

        • You are correct that I underspecified the problem; I didn’t actually bother to check if there were collisions.

          I presume that you removed the error checking to make it run faster.

          I tried for a while to come up with a technique that was faster than brute force. I think there is one, but I was insufficiently clever. (Clearly there are much faster techniques if you have a less complicated shuffle algorithm, like Fischer-Yates but I couldn’t come up with one for this shuffle.)

        • Are you making the assumption that the Random ctor without constructor uses Environment.TickCount wich increment the value only by 15 or 16 (avarage 1000/64)? In this case, I think the sixth number isn’t needed.
          Knowing Eric’s habits regarding PC reboots would help too.

          • Unfortunately the machine I did the run on had not been rebooted in many days, so there’s not much of an advantage there.

  3. So about 226 bits of randomness are needed, right? Up from a “32 which isn’t even really 32 because time isn’t random”. Where can you even get something like that?

  4. 15: Trey of Hearts
    3: Four of Spades
    16: Four of Hearts
    33: Eight of Diamonds
    43: Five of Clubs

    The Random seed for this is -781599298.

    It was done via a brute force search, but by rewriting the NthPermutation method to short-circuit as soon as the RandomFactoradic value (index parameter) was known to produce a bad permutation (for example, placing any of the numbers 40-51 in one of the first 5 positions of the permutation).
    My final NthPermutation method didn’t actually produce a Permutation value, as that wasn’t necessary. It simply returned a bool if the index parameter would cause a proper permutation to be produced.

    • In case people were interested, here is the rewritten NthPermutation method, now called PermutationCheck. It uses the position from permutation.InsertAt() and validates that putting the value at that position results in a Permutation that gives the correct first 6 cards. Literals are off by 1 from expected due to InsertAt() being called with size – 1. Position values aren’t 0,1,2,3,4,5 because the larger values aren’t yet in the permutation when InsertAt is called for the smaller values.

       -- Code was a complete mess due to formatting problems; I've removed it. -- Eric 
    • Here, again, is the PermutationCheck() method. It uses the position from permutation.InsertAt() and validates that putting the value at that position results in a Permutation that gives the correct first 6 cards. Literals are off by 1 from expected due to InsertAt() being called with size – 1. Position values aren’t 0,1,2,3,4,5 because the larger values aren’t yet in the permutation when InsertAt is called for the smaller values.

      public static bool PermutationCheck(int size, BigInteger index)
      {
        //makes no difference where 0-4 go in the permutation.
        if(size <= 5)
        {
          return true;
        }
      
        BigInteger group = index / size;
        bool forwards = group % 2 != 0;
        int i = (int)(index % size);
        int position = forwards ? i : size - i - 1;
      
        //consider a lookup table.
        if((size == 40 && position != 0) ||
           (size == 30 && position != 1) ||
           (size == 21 && position != 2) ||
           (size == 16 && position != 2) ||
           (size == 12 && position != 0) ||
           (size == 6 && position != 0) ||
           (size > 40 && position <= 5) ||
           (size > 30 && size < 40 && position <= 4) ||
           (size > 21 && size < 30 && position <= 3) ||
           (size > 16 && size < 21 && position <= 2) ||
           (size > 12 && size < 16 && position <= 1) ||
           (size > 6 && size < 12 && position == 0))
        {
          return false;
        }
      
        return PermutationCheck(size - 1, group);
      }
      

      In the original post, I had the pre tags, just not the HTML escapes. Thanks for the tip.

  5. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1349

  6. I let it run all night, and I got the same result as Joel, except that for me the seed is *positive* 781599298, not negative… I didn’t optimize as heavily as Joel though, I just removed the error checking.

    I wonder if there is a more efficient way than brute force to solve this problem. Knowing an approximate value of the system’s TickCount would help a lot of course, but TickCount doesn’t seem to always be accurate: I usually shut down my computer every night, and the TickCount reports a duration of more than 23 days…

    • Looking at the source code (thanks Joel, it is much faster than looking at ildasm) the pseudo-random algorithm isn’t all that good, at least if non-reversability is desirable, even tought there are a lot of operations to fill the SeedArray they all happen to same places for every seed, the InternalSample is weak as well, it is possible to brute force much cheaper than creating a Random instance for every possible seed by just producing and looking at needed values, also keeping a track of those operations may reveal how to reverse them, to some point, but I think than humans trying to find the reverse will take more time than computers trying to brute force…

  7. I wonder if Neil Stephenson’s Cryptonomicon was an inspiration for this challenge, along with the detailed description of the Solitair encryption in the afterword to the novel. In any case, this random permutation stuff, complete with high-quality randomness, simply begs to be used for encryption.

  8. One interested in this topic may find interesting to read about format-preserving encryption. I’ve ported one implementation to .NET, available at dotfpe.codeplex.com .

  9. Hi Eric, I’ve noted with much interest the behavior of Random as discussed here in your bog. Now what I have is more of a question than anything. If one wanted to minimize the bias, would using code such as this really suffice?

    FischerYatesShuffle&#40;array1, new Random(new Random().Next()));

Leave a comment