Last time we showed how you can take any two coprime positive integers
m and compute a third positive integer
y with the property that
(x * y) % m == 1, and therefore that
(x * z * y) % m == z % m for any positive integer
z. That is, there always exists a “multiplicative inverse”, that “undoes” the results of multiplying by
m. That’s pretty neat, but of what use is it?
Well, here’s a question I occasionally get:
My service has an array of n objects, obviously indexed from 0 to n-1, each of which contains state that is relevant to some client. I don’t want to pass the object to the client directly; instead I want to give them an integer token that they can use to query the server about the state of the object. The obvious token is the index of the array, but I am worried that the client developer will take an unwarranted dependency on the fact that the token is a small integer, rather than treating it as a meaningless opaque handle. (Note that I am assuming that in this scenario the client is not actively hostile towards the server, but rather potentially buggy.) I want to obfuscate the token I pass back. I could generate a sequence of unique random numbers and maintain a two-way dictionary mapping indices to tokens and vice versa, but that sounds like (1) work, and (2) possible bugs, and (3) extra time and memory.
I issue a few hundred invoices a year, each with an invoice number. For various reasons I would rather not send out sequential invoice numbers. (For instance: it is too easy to make confusing transposition or off-by-one errors with sequential numbers.) But yet I want to be able to easily sort my invoices by the actual order in which they were produced.
Or any number of other equivalent statements of the problem.
This problem is easily solved with multiplicative inverses. Start by picking a large number
m, much larger than the number of tokens you’re ever going to need to generate. Let’s say 1000000000. Now pick any number
x coprime to that; lets say 387420489. We know at a glance that these numbers are coprime because 1000000000 has only factors that are divisible by 2 or 5, but a number ending in a 9 cannot possibly be divisible by 2 or 5; numbers divisible by 2 or 5 end in 0, 2, 4, 5, 6 or 8. Now compute the multiplicative inverse of
x using our large integer library, which turns out to be 513180409.
Now we’re all set. We encode the numbers 0 through n-1 via:
Integer encoded = invoiceNumber * i387420489 % i1000000000;
This gives us a unique random-seeming number between 0 and 999999999. We know the number will be different for every input.
We decode it with the inverse:
Integer decoded = encoded * i513180409 % i1000000000;
Pretty neat, and very cheap. Not cryptographically strong, but any two sequential tokens look very different from each other.
A practical use of multiplicative inverses: lightweight handle obfuscation. I like it.
This is great! I had no idea!
Is it true or likely to be true that “any two sequential tokens look very different from each other”? (Not that it matters much for practical purposes, I am just curious.)
Of course, thinking about it for a few more seconds, I realized that this question doesn’t make sense without a definition of “look[ing] very different.” Perhaps someone knows of a definition about which something can conveniently be proven.
With the example numbers, token k + 1 is token k + 387420489, truncating after the ninth most significant place. Without proving it (I woke up too early this morning), I’m pretty confident this means at least half the corresponding digits must differ between the two tokens.
I guess Hamming distance or Damerau–Levenshtein distance would work.
For the second example this is probably still not a very good solution.
What you want is a number where a single transposition or wrong digit doesn’t lead to another valid number – the added demand of being able to sort invoices without additional information shouldn’t be a problem either.
Still I’m very happy that for once someone uses something other than cryptography to demonstrate modulo math – as always very creative Eric! And the first is actually a really nice example, I may steal that some day 😉
To turn this into an error-detecting code (or even an error-correcting one!) is very easy to do. The simplest way is to add an additionnal checksum digit at the end of the result to detect errors.
Similar but a little more complicated methods can also preserve you from digit swapping or even give you the most probable correct number when an erroneous number is entered.
Turning it into an error-correcting code is probably trivial. If the maximum sequence number is going to be 999,999, then even though every nine-digit number will map to a different nine-digit number, 99.9% of them will map to invalid sequence numbers. I would expect that a good choice of multiplication factor would yield a code where most transpositions of valid numbers would get mapped to invalid numbers. That having been said, I don’t particularly like the idea of a purely-linear obfuscation code. Adding a non-linear stage–even something as simple as “if a number X is no larger than K, replace it with X-K, else leave it as-is”, may improve greatly the ability of the algorithm to obscure the underlying sequence, especially if one does a few “rounds”. If one runs the multiplication by itself, any two consecutive values will always have the same difference, but adding even a single “swap range” after it will mean that the difference between consecutive values will depend upon whether one or both was in the swapped range. If one does “multiply swap multiply swap”, there will be even more variety in the differences between consecutive values, but converting any scrambled value back to the original sequence number will still be pretty easy.
As Eric pointed out himself, this algorithm is not designed for security against an hostile user. A linear algorithm is plenty enough to prevent errors from a buggy user, and it is lightning fast too.
As a matter of fact, you should simply rely on obscurity for your security. You should rely on really complex mathematical problems (like big number factorization) for that never a “what if I add a swap here and another multiplication there” approach.
I wasn’t so much interested in the security as the mathematical distinction between sets of functions which are closed under composition and those which aren’t. If all operators use the same modulus, any combination of multiplication by a constant and addition of a constant can be replaced by a single multiplication and single addition. By contrast, if one adds a non-linearity, even a “simple” one, then the set of composite functions becomes much larger.
Yes I’m well aware of Verhoeff and similar algorithms, just pointing out that you actually have to use one of those to get the wanted properties for the sequence.
123 * 387420489 % 1000000000 = 652720147
652720147 * 513180409 % 1000000000 = 128
What am I missing here?
You can tell by looking that the last digit has to be a 3, not an 8. 652720147 * 513180409 = 334963192000000123.
You multiplied two odd numbers and got an even number out? There’s something wrong with your multiplication algorithm then. Are you sure you’re using a big integer library?
You don’t need a special library. Doing it in longs yields the right result for me (ie “652720147L * 513180409L % 1000000000L”). Doing it with unchecked ints doesn’t come up with anywhere near that wrong answer so I’m putting it down to a mistake in manual long multiplication.
P.S. Eric: I loved footnote 2. I hadn’t even really thought of that method of finding two large coprime numbers simply. 🙂
Another use which I didn’t even realize was relying on multiplicative inverse properties until reading your articles. You can encode multi-dimensional array indexers as a single integer. I’ve never went for more than 3 dimensions since decoding is a step by step process, but the math is recursive for each dimension after the first.
Do tell. Sounds interesting.
Not sure what he’s getting at either.
The best I can come up with is that if you limit your dimensions to something smaller than int-size you can store several dimensions in one int. You usually do that by just shifting and bitmasking, but the general solution can be done using multiplication and modulo operations.. but apart from the very superficial similarities I don’t see anything either.
Slightly off-topic, but many countries require that invoice numbers *are* sequential. For example:
“A VAT invoice must show:
* an invoice number which is unique and follows on from the number of the previous invoice …”
I was going to bring up the recent scandal in Europe where, when someone got their invoice back, they decided to try simple URL hacking (since the invoice number was in the URL) in order to see other invoices. They were able to find the invoices from several major companies. It is a baffling requirement, and that simple little hack must surely necessitate a change to that law.
Or the website could use some real method of authentication. I know which is the more straightforward fix.
Come on though. This is a government we’re talking about. It’s not like they are capable of anything above a middle-school level of programming anyway.
Agreed, I mean if you look at what the NSA is doing, that really shows how incompetent all those government agencies are when technology is involved.
That may be true, but the question is whether they are better at programming or legislation. 🙂
But even if the numbers were not consecutive, all that means is that you need to write a script to get URLs. Not exactly hard work.
The invoice numbers are required to be consecutive as part of preventing, and identifying, embezzlement. With bar-code scanners available in everyone’s phone now-days, the case could be made that the “invoice number” for GAP purposes need not be the “number printed on the invoice as identification”, provided they were proven to be in 1-1 correspondence and the conversion algorithm was as simple as Eric’s proposal.
Modular multiplicative inverses are pretty common in cryptography.
1. in RSA, to compute the private exponent `d` when you know the public exponent `e` and the prime factors, you solve `e*d = 1 mod phi(n)`, or `d=ModInv(e, phi(n))`
2. With elliptic curves one often works with fractions x/z internally, which need to be converted to an integer in the end. This is done by computing x*ModInv(z, p).
Besides Extended Euclidean, there is an alternative approach to computing the modular inverse if the prime factors of the modulus are known:
ModInv(x) = x^(phi(n)-1) mod n
If n is a prime p, this can be simplified to:
ModInv(x) = x^(p-2) mod n
This approach is usually a bit slower than extended euclidean. It’s main advantage is that its runtime does not depend on the input x. This is important in cryptography, since variable runtime can enable timing side-channel that recover x.
I love the ideal of using it for tracking invoicing as it would save me a few problems too (one-off services before creating accounts)… Well worth considering when I next play with my PDF generator 🙂
I read this post about 2 years ago, and I have actually come across a situation where I will use this. Thank you for your ready-made solution!
Awesome post. Exactly what I need to obfuscate the sequential ids generated in a Rails app without having to store additional info.
If you set m to to match the size of the datatype (2^32 or 2^64):
A) You know you’ll never overflow unless you also overflow the underlying datatype
B) Unchecked math is documented as dropping the overflowed bits. This is effectively a modulo operation.
C) Picking x is easy (just pick an odd number)
Pingback: Creating a random code and saving if it does not exist – yyyb
Pingback: ruby-on-rails - La génération unique, difficile à deviner “coupon codes
Pingback: ruby-on-rails - Generar los únicos, difíciles de adivinar "cupón" códigos de
Pingback: algorithm - La generazione unica, difficile da indovinare “coupon” codici
Pingback: Obfuscating an ID – Row Coding