Producing combinations, part one

I’ve done many articles over the years on different ways to manipulate sets and sequences in C#: The Cartesian product is when you have a sequence of sequences, say

{ 
  { 1, 2 }, 
  { 10, 11, 12} 
}

and produce the sequence of all possible sequences that take one element from each of the original sequences. That would be:

{
  {1, 10}, 
  {1, 11}, 
  {1, 12}, 
  {2, 10}, 
  {2, 11}, 
  {2, 12} 
}

in this case. The permutations of a sequence, say, {20, 30, 40} are all the different ways to reorder that sequence:

{ 
  {20, 30, 40}, 
  {20, 40, 30},
  {30, 20, 40}, 
  {30, 40, 20}, 
  {40, 20, 30}, 
  {40, 30, 20} 
}

Today I want to talk about a problem I’ve never covered, which is how to generate all the combinations of a sequence. The combinations are all the ways to take some given number of elements from the sequence. Or, equivalently, all the subsequences of a given length. (A subsequence of a sequence has elements from the sequence in the same order, but some elements from the original sequence can be missing.) For example, if we have the sequence {50, 60, 70, 80, 90} and we wish to choose three, then the combinations are:

 {
    {50, 60, 70},
    {50, 60, 80},
    {50, 60, 90},
    {50, 70, 80},
    {50, 70, 90},
    {50, 80, 90},
    {60, 70, 80},
    {60, 70, 90},
    {60, 80, 90},
    {70, 80, 90}
}

How many such subsequences are there? If there are n items in the original sequence and we are choosing k of those items then I will give you without proof or argument that the number of possible combinations is n!/k!(n-k)!. Let’s just double check that: we had five elements, we chose three of them, 5!/3!(5-3)! is in fact 10. This formula produces the binomial coefficient, and it has many interesting properties beyond those in combinatorics.

When writing computer programs it is wise to state the edge cases. Given the same sequence, what if we wish to choose none of them? There does exist a subsequence which has zero elements, so we should produce it; the answer would be { { } }. What if we have a sequence of five items and we wish to choose six of them? There is no way to do that; there is no six-element subsequence. So the answer should be { }, the empty sequence of sequences.

The first clever bit is to notice that essentially what we are doing in each subsequence is deciding whether or not to include each element from the sequence. Let’s lay out those subsequences again and indicate whether the corresponding member was included or not, true or false:

 
{                     // 50, 60, 70, 80, 90
    {50, 60, 70},     // T   T   T   F   F
    {50, 60, 80},     // T   T   F   T   F
    {50, 60, 90},     // T   T   F   F   T
    {50, 70, 80},     // T   F   T   T   F
    {50, 70, 90},     // T   F   T   F   T
    {50, 80, 90},     // T   F   F   T   T
    {60, 70, 80},     // F   T   T   T   F
    {60, 70, 90},     // F   T   T   F   T
    {60, 80, 90},     // F   T   F   T   T
    {70, 80, 90}      // F   F   T   T   T
}

Aha! If we can enumerate all the sequences of n bits that have exactly k true then we have solved our problem.

Next time: We’ll do just that.

19 thoughts on “Producing combinations, part one

  1. Pingback: The Morning Brew - Chris Alcock » The Morning Brew #1715

  2. It’s a fun problem to approach with just a whiteboard. You write out all the sub-sequences of {A,B,C}, and write out the truth table for whether A, B or C was included. Then you think, hang on a minute, that’s just a binary number incrementing.

    Though I’ve not tried solving it for specific lengths so now I’m curious to see if there’s a fancier way than just running through the enumerating and discarding results that are too short/long.

    • Suppose you are trying to choose 16 items out of 32. There are four billion combinations and you’re going to reject over three billion of them. You can certainly do better than that, as we’ll see.

      • An interesting approach to enumerating combinations, especially if the set of items has 63 or fewer items, is to write a function which, given a value with some number of bits set, will return the next larger value with the same number of bits set. This can be done quickly if a system has an efficient way to divide two numbers which are known to be powers of two (unfortunately I know of no way to express such a concept in C# or CIL).

        Start by determining the LSB of the number (which can be done with “lsb = n & (n-1);”. Then compute “temp = n+lsb; newSet = temp & ~n;” which will indicate the highest bit that needs to become set in the next value. Finally “adj = newSet/lsb/2-1;” to determine if more bits got cleared than set, and “return temp+adj;” to get the final answer.

  3. Pingback: Dew Drop – October 14, 2014 (#1876) | Morning Dew

  4. My first thought was to try recursion: given a set s, n, and k, if n == k, return the { s }, if n < k, return { }, otherwise recurse twice: Pick element x from s, and let s' = s – { x }. Recurse with s', n-1, and k-1, and then add x back into those subsets; then recurse with s', n-1, and k. Union the two results and you have your answer.

    Of course, there's a bit of overlap here. You'll be calling with small n/k values more than once. Maybe we could want to memoize the results somewhere, as you've done in some examples in the past.

    • This is basically the approach we’re going to take, though I’ll use a slightly different base case.

      Yes, you do end up making the subsequences towards the end over and over again. It’s hard to know what is harder on the GC — many small allocations of structures you’ve seen before that go away quickly vs an ever-growing cache — without profiling. My assumption is that each enumerated combination will be used and discarded, and the GC will clean it up quickly.

  5. I won’t pretend having a proper explanation but I think I found something that seems nice (to me at least) and the idea comes from seeing someone playing accordion xD

    (* weird explanation below *)
    start with the true at one side (the left for me), and see them as a trail of wagon
    advance the first one you can (the first true followed by a false) and if none are found then you’re done
    all “wagon” before (the true before the one moved) are reset at start
    rinse and repeat

    I know it’s not a good way of describing this (and not having english as my first language doesn’t help) but it’s more than nothing 😀

    I made a fiddle of all this ; it’s not commented but I hope it’s clear enough
    you can see it here (the site requires Silverlight) https://dotnetfiddle.net/mjt7QY

    • Indeed, I considered describing this algorithm. I see it the same way you do: one of the bits sets off heading to the left, goes as far as possible, and then runs back home and grabs a second bit to go with it on the journey again — and then recurse!

  6. Pingback: Producing combinations, part two | Fabulous adventures in coding

  7. Pingback: Producing combinations, part three | Fabulous adventures in coding

  8. Iterating an Enumerable of Enumerable of BitVector32 generated from enumerable.Range? What about expanding from binary to another radix? Binary gives powerset. Radix n gives partitions of size n.

    • As I mentioned before, we can do better than generating the power set and filtering it. I’m not quite following your suggestion of using another radix; can you give an example?

      • Thinking about using Enumerable.Range as source for generating numbers in another radix and use those to tag elements of a set like you tag numbers with 0 and 1 for the powerset. Maybe use a-z to tag elements in a set instead of numbers in radix26.

  9. Pingback: Graph traversal, part three | Fabulous adventures in coding

Leave a comment