High-Dimensional Spaces Are Counterintuitive, Part Three

My next book project is ready for copyediting, the wedding invitations are sent out, my bug count is under control, I’ve got a layer of levelling compound poured in the basement, there’s no more ivy on my roof, and I’m rigging my sailboat sometime this week (about six weeks later than I would have liked to, but let’s not be picky.) Life is getting back to normal in my little corner of Seattle. So what better time to continue with my little exploration of high-dimensional spaces?

Last time we noticed a few interesting facts:

  • most of the volume of a high-dimensional object is very near its surface.
  • small movements away from a point in many dimensions add up to a long total distance.

Suppose you have a bunch of points in space scattered randomly around some average. You would expect that if the points are more or less evenly distributed then the number of points in a given volume of the space would be proportional to the volume. (In fact, that’s probably how we’d define “more or less evenly distributed.”)

Let’s think of a low-dimensional example – points on a line. For example, suppose you’ve got a thousand 30-year-old women and you measure their heights. If the average is 150 cm, then you’d expect that the number of women that fall into the 2-cm range from 145-147 is roughly the same as from 150-152. There will probably be more in the range closer to the mean, but you’d expect that the order of magnitude would be roughly the same. It’s not like there’s going to be 100 in one range and 10 in the other. I’m assuming here that we’re considering regions close to the mean so that the distribution is pretty even throughout.

Now suppose you take a 2-cm range and a 1-cm range, both close to the mean. You would similarly expect that the 1-cm range would have roughly half as many points in it as the 2-cm range, right? The number of points in a region is roughly proportional to the size of the region.

What if the distribution is 2-dimensional? Suppose now you plot both the heights and hair lengths of these 1000 women. Again, there will be some average around which the points cluster, say (150 cm, 30 cm). Now imagine that you draw two circles of the same area close to the mean. Again, you’d expect that two circles of the same area would have roughly the same number of points inside them. The circle with its center closer to the mean probably has a few more, but you’d expect roughly the same number of points in each circular region.

Now what if you make one of those circles half the radius of the other? Then it will have only one quarter of the area, and therefore only on average one quarter as many points.

Now suppose that you have a bunch of points scattered around a 100-dimensional space, all clustered around a particular average point. Pick a hyperspherical region centered on the average, with radius large enough to contain most of the points.

We expect to find more points in regions of the space with more volume. We know that 99% of the volume of this n-sphere is concentrated in a thin shell near its surface. The conclusion is inescapable: points clustered around a mean in high dimensions are far more likely to be found in the high-volume narrow shell far from the mean, not in the low-volume region near the mean itself!

This again seems really counterintuitive at first but its not if you think about it for a while. In our example above, most of the women are of near average height. And most of the women are of near average hair length. But clearly the number of women that are near-average height AND near-average hair length is considerably smaller than either taken separately.

As we add more and more dimensions to the mix, the number of points where all measurements are near the mean point gets very small indeed. Of the six billion people in the world, how many of them are within 1% of average height and average hair length and average age and average income and average cholesterol level and average IQ? Probably a very, very small number. Most everyone is slightly abnormal somehow! And in a high-dimensional space, many small deviations from the mean adds up to a large distance from the mean. If you plotted 100 characteristics of 6 billion people in a 100-dimensional space, the number of people very near the middle of every axis is quite small; there’s hardly any volume there.

What the heck does this have to do with computer science? Suppose you have all that personal data in a database, you have a dead guy in the morgue with no ID, and you want to find a match in your database, if there is one? What if you have 100 variables about a song and you want to identify whether a given scrap of music is in your database? There are lots of real-world search problems that have sparse, large data sets spread out over a high-dimensional space.

Next time we’ll dig into that a little bit more.

High-Dimensional Spaces Are Counterintuitive, Part Two

The volume of an n-cube of edge length s is easy to work out. A 2-cube has s2 units of area. A 3-cube has s3 units of volume. A 4-cube has s4 units of 4-volume, and so on — an n-cube has sn units of n-volume. If the n-cube has edge of s>1, say s=2, then clearly the n-volume dramatically increases as the dimensionality increases — each dimension adds a lot more “room” to the n-cube.

A 2-sphere (circle) is pretty close in area to the smallest 2-cube (square) that encloses it — sure, you lose some area at the four corners, but not a whole lot. Though the circle is far from the square at the four corner, it is very close to the square at the four sides. A circle has about 75% the area of its enclosing square. But a 3-sphere inside a the smallest 3-cube that encloses it is far from eight corners and close to only six sides. A 3-sphere is about half the volume of the 3-cube. As you go up in dimensions, you get more and more corners that are far from the n-sphere — there are 2n corners and only 2n sides so the comparative volume of the sphere goes down.

In fact, you don’t even need to compare n-spheres to n-cubes — after you reach 5 dimensions, the n-volume of an n-sphere starts going down, not up, as dimensionality increases. With some pretty easy calculus you can show that the n-volume of an n-sphere of radius r is:

V1 = 2 r
V2 = π r2
Vn = Vn-2 2 π r2 / n

For any fixed radius this rapidly approaches zero as n gets big. Pick a big n, say 100. The volume of a 100-sphere is going to be (π r2 /50) x (π r2 /49) x (π r2/48) … (π r2/1). Suppose r is 1 — then all of those terms except for the last few are going to be quite a bit smaller than 1. Every time you add more dimensions, the n-volume of the unit n-sphere gets smaller and smaller even as the n-volume of the smallest n-cube that encloses the unit n-sphere gets exponentially larger and larger!

Here’s another weird fact about the volume of hypersolids. Consider two squares, one inside the other. How big does the small square have to be in order to have, say, 1% the area of the larger square? That’s pretty easy. If the inner square has 10% the edge length of the outer square, then it has 1% of the area of the outer square.

What about nested 3-cubes? An inner 3-cube with edges 10% the length of the edge of the outer 3-cube would have 0.1% the volume, too small. Rather, it needs to havean edge about 21% of the edge of the outer 3-cube, because .21 x .21 x .21 = about 0.01.

What about a nested n-cube? In order to have 1% the n-volume of the outer n-cube, the inner n-cube needs to have an edge of (0.01)1/n of the outer n-cube side. For n=100, that’s 0.955. Think about that for a moment. You’ve got two 100-cubes, one has edges 2 units long, the other has edges 1.91 units long. The larger n-cube contains ONE HUNDRED TIMES more volume.

Try to visualize the smaller n-cube being entirely inside the larger n-cube, the two n-cubes having the same central point. Now wrap your mind around the fact that the smaller n-cube is 1% the volume of the larger. The conclusion is unavoidable: in high dimensions the vast majority of the volume of a solid is concentrated in a thin shell near its surface! Remember, there’s 2100 corners in a 100-cube, and that makes for a lot of space to put stuff.

It’s counterintuitive because the very idea of “near” is counterintuitive in higher dimensions. Every time you add another dimension, there’s more room for points to be farther apart.

The distance between opposite corners of a square of edge 2 is 2√2. The distance between opposite corners of a 3-cube is 2√3, quite a bit bigger. The distance between opposite corners of a 100-cube is 2√100 = 20 units! There are a whole lot of dimensions to move through, and that adds distance.

We could make the same argument for an n-sphere and show that the vast majority of its (comparatively tiny) volume is also in a thin shell near the surface; I’m sure you can see how the argument would go, so I won’t bother repeating myself.

Because distance is so much more “expensive” in higher dimensions, this helps explain why n-spheres have so much less volume than n-cubes. Consider a 100-cube of edge 2 centered on the origin enclosing a 100-sphere of diameter 2, also centered on the origin. The point (1,0,0,0,0,0…,0) is on both the 100-cube and the sphere, and is 1 unit from the origin. The point (1,1,1,…,1) is on the 100-cube and is ten units away from the origin. But a 100-sphere by definition is the set of points equidistance from the origin, and distance is expensive in high dimensions. The nearest point on the 100-sphere to that corner is (0.1, 0.1, 0.1, …, 0.1), 9 units away from the corner of the 100-cube. Now its clear just how tiny the 100-sphere is compared to the 100-cube.

OK, so far we’ve been considering n-cubes that entirely enclose n-spheres, ie, an n-cube of edge length 2 that encloses a unit n-sphere, kissing the sphere at 2n points. But we know that this n-cube has ginormously more volume than the n-sphere it encloses and that most of that volume is near the edges and corners. What if we abandon the constraint that the n-cube contains 100% of the n-sphere’s volume. After all, there are only 200 points where the 100-sphere kisses the 100-cube, and that’s not very many at all.

Suppose we want an 100-cube that contains 99.9% of the volume of the unit 100-sphere. We can cover virtually all of the volume of the 100-sphere with an 100-cube of edge 0.7 instead of 2. Sure, we’re missing (1,0,0,0,…,0), but we’re still hitting (0.1,0.1,0.1,…) with huge amounts of room to spare. Most of the volume inside the 100-sphere isn’t near the 200 points with coordinates near the axes.

How much do we reduce the volume of the 100-cube by shrinking it from 2 on the edge to 0.7? We go from 2100 n-units of volume to 0.7100, a factor of around 4×1045 times smaller volume! And yet we still enclose virtually all the volume of the 100-sphere. The corner of the smaller 100-cube at (0.35, 0.35, 0.35, …) is now only 2.5 units away from (0.1, 0.1, …) instead of 9 units away. This is a much better approximation of the unit 100-sphere. It’s still hugely enormous compared to the unit 100-sphere in terms of sheer volume, but look at how much volume we save by approximating the 100-sphere as a small 100-cube!

Feeling dizzy yet? Next time we’ll see that these facts

  • n-spheres are tiny compared to n-cubes
  • hypersolids have most of their volume close to their surfaces
  • you can enclose almost all the volume of an n-sphere with a small n-cube

have major repercussions when we throw probability theory into the mix.

High-Dimensional Spaces Are Counterintuitive, Part One

A friend of mine over in Microsoft Research pointed out to me the other day that high-dimensional spaces are really counterintuitive. He’d just attended a lecture by the research guys who wrote this excellent paper and we were geeking out at a party about it. I found this paper quite eye-opening and I thought I might talk a bit about some of the stuff that’s in here at a slightly less highfalutin level — the paper assumes a pretty deep understanding of high-dimensional spaces from the get-go.

**************************

It’s hard to have a geometrical picture in your head of a more than three-dimensional space. I usually have to use one of two analogies. The first analogy I like goes like this: think of a line — one dimensional. Imagine that you have a slider control that determines your position on that line, from, say, -1 to 1, left-to-right. That’s pretty visually clear.

Add another slider that determines your up-and-down position, and you’ve got a square. Each point on the square has a unique set of slider positions.

Add another slider that determines your out-and-in position, and you’ve got a cube. Again, these are easy to picture. Every point in the cube has a unique combination of slider positions that gets you to that point.

Now think of a cube with a slider control below it that lets you slide from intense red on one end through dark red and to black on the other. Now you’ve got four axes you can move around — height, width, depth and redness. The top right front corner of the bright red cube is a certain “colour distance” from the corner of the top right front black cube. That this is not a spatial dimension isn’t particularly relevant; we’re just picturing a dimension as redness for convenience.

Every time you want to add another dimension, add another slider — just make sure that whatever is sliding is completely independent of every other dimension. Once you’ve added green and blue sliders then you’ve got a six-dimensional hypercube. The “distance” between any two 6-d points is a function of how much you have to move how many sliders to get from one to the other.

That analogy gets across one of the key ideas of multi-dimensional spaces: that each dimension is simply another independent degree of freedom through which you can move. But this is a quite mathematical and not very geometric way of thinking about dimensionality, and I want to think about the geometry of these objects. Let’s abandon this analogy.

The second analogy is a little bit more geometrical. Think of a line, say two units long. Now associate every point on that line with another line, also two units long “crossing” it at the new line’s center. Clearly that’s a filled-in square — after all, every point along one side of a square has a straight line coming out from it perpendicularly. In our slider analogy, one slider determines the point along the “main line”, and the second determines how far to go along its associated line.

Think of another line, but this time, associate every point on it with a square. That’s a solid cube.

Now think of yet another line. Associate every point on it with a cube, and you’ve got a 4-cube. At this point it gets hard to visualize, but just as a cube is an infinite number of equally-sized squares stuck together along a line, so is a 4-cube an infinite number of 3-cubes stuck together along a line. Similarly, a 5-cube is a line of 4-cubes, and so on.

Where things get weird is when you start to think about hyperspheres instead of hypercubes. Hyperspheres have some surprising properties that do not match our intuition, given that we only have experience with two and three dimensional spheres. (2-spheres are of course normally called “circles”.)

The definition of a hypersphere is pretty simple — like a 2-sphere or 3-sphere, a hypersphere is the collection of points that are all the same distance from a given center point. (But distance works strangely in higher dimensions, as we’ll see in future episodes!)

Its hard to picture a hypercube; it’s even hard to picture a hypersphere. The equivalent analogy for n-spheres requires us to think about size. Again, imagine a line two units long. Associate with each point on the line another line crossing at the middle. But this time, the associated lines are of different lengths. The lines associated with the end points are tiny, and the lines associated with the middle are longer. This describes a circular disk — for each point along the diameter of a circle you can draw a perpendicular line through the point extending to the boundaries of the disk on each side.

Now do the same thing again. Take a line, and associate each point on the line with a circle. If the circles are all the same size, you have a cylinder. But if they vary from small at the ends to big in the middle, you’ve got a sphere. Successive cross-sections of a sphere are all circles, but they start small and get big and then get small again.

Now do the same thing again. Take a line and associate each point on the line with a sphere, small at the ends and big in the middle, and you’ve got a 4-sphere. Successive “cross sections” of a 4-sphere are 3-spheres of varying size. Keep going to 5-, 6-, etc, spheres.

A circle of diameter 2 fits into a square of edge length 2, and a sphere of diameter 2 fits into a cube of edge length 2. Clearly an n-sphere of diameter two fits exactly into an n-cube of edge length two — the n-sphere “kisses” the center of each face of the n-cube. You can’t make the n-cube smaller without the n-sphere poking out of it somewhere.

But things start getting weird when you consider the volume of an n-sphere. Tomorrow we’ll compare the volume of an n-sphere to the volume of an n-cube, and discover some surprising and counterintuitive things about where that volume is.

You Want Salt With That? Part Four: Challenge-Response

My friend Kristen asked me over the weekend when I was going to stop blogging about crypto math and say something funny again. Everyone’s a critic!

Patience, my dear. Today, the final entry in my series on salt. Tomorrow, who knows?


So far we’ve got a system whereby the server does not actually know what your password is, it just has a salted hash of your password. But we’re still sending the password over the wire in clear text, which seems risky.

System #4

What if the hash goes over the wire? The client sends the user name to the server. The server sends the password salt to the client. The client appends the password to the password salt, hashes the result, and sends the hash to the server. The server compares the hash from the client to the hash in the user list. Now the password never goes over the wire at all. Awesome!

Unfortunately, this system is worse. In previous systems the eavesdropper got the password; in this system the eavesdropper gets the salted hash. The eavesdropper can then write their own client which sends that username and salted hash to the server.

And the “steal the password list” attack just came back; now an attacker who gets the password list gets all the salted hashes, and can use them to impersonate the user. Sure, the attacker will still find it hard to deduce the original password from the salted hash, but he doesn’t need to deduce the password anymore. (Unless of course the attacker is attempting to deduce your password on one system so that he can use it to attack another system that you use. This is why it’s a bad idea to use the same password for two different systems.)

Essentially we’ve turned the salted hash into a “password equivalent”. Can we fix this up?

System #5

How about this?

The client sends the username to the server. The server creates a second random salt which is NOT stored in the user list. This random salt is usedonly once — we either make it so big that odds of generating it again are low, or keep a list of previously used random salts and pick a new one if we have a collision. We’ll call the random salt “the challenge” for reasons which will become apparent in a minute.

The server sends the user’s password salt and the challenge to the client. The client appends the password salt to the password and hashes the salted password. It converts the salted hash to a string, appends the string to the challenge, and hashes the resulting string to form the “response” hash. The response is sent across the wire.

The server then does the same thing – converts the stored salted password hash to a string, appends it to the challenge, and hashes the resulting string. If the response from the client is equal to the value the server just computed, then the client must have computed the same salted hash, and therefore knows the password.

Now what does the eavesdropper know? The eavesdropper knows the username, the password salt, the challenge and the response. The eavesdropper has enough information to launch an offline dictionary attack against that user. But since the random challenge is never going to be used again, the fact that the attacker knows a valid challenge/response pair is essentially irrelevant.

This system has the downside that an attacker who gets the password file has obtained password equivalents, so no dictionary attack is necessary. (Unless of course the attacker is trying to determine a user’s password in order to try it against the user’s account on a different system!)

Fortunately, these weaknesses can be mitigated somewhat by changing your password frequently, not using the same passwords for different accounts, never using common dictionary words as passwords, and making passwords long — passphrases are better than passwords.


This general challenge-response pattern is quite common in authentication systems that rely upon shared secrets, becauseat no point is the original secret password ever actually stored or transmitted! With such a system, a machine that does not know your secret can verify that you do know it — almost seems magical, doesn’t it? Of course, now that you know how it works, it’s not quite so magical — the salted hash is essentially the shared secret, and it is stored and transmitted.

Clearly we could go on, adding more and more layers of encryption to this system to further mitigate these vulnerabilities, but I’m going to stop here. Readers interested in ways to solve problems such as mutual authentication (where the server authenticates the client AND the client verifies that it is talking to the right server) or other heavy-duty authentication tasks should read this charming dialogue on the design of the Kerberos authentication system: https://web.mit.edu/kerberos/www/dialogue.html

The foregoing is of course just a sketch with lots of details glossed over and greatly simplified for didactic purposes. Real professional-grade authentication systems do not work exactly like any of my sketches, though several are quite similar.

Unix boxes used to typically use a 12 bit salt and hash it with a 56 bit password using a DES-based hash, for instance. That’s pretty weak! A 12 bit salt only makes construction of a dictionary take 4096 times longer — one dictionary for each possible salt. In modern systems the more secure MD5 hash is often used now, which supports arbitrarily large salts. Unix boxes also used to keep the user list in the clear, so that anyone could read the salted hashes of the passwords; nowadays the encrypted passwords are kept in a file that can only be read by the operating system itself — defense in depth!

NT by contrast uses an unsalted 128 bit MD5 hash to store the password equivalent, so it is susceptible to dictionary attacks should the password file be stolen. NTLMv2 is an authentication system based on some of the challenge-response ideas from System #5. It hashes the password, user name, domain, and current time in a fairly complex way; I’ll spare you the details. And of course, many NT and unix boxes use Kerberos-based authentication these days.

You Want Salt With That? Part Three: Salt The Hash

Last time we were considering what happens if an attacker gets access to your server’s password file. If the passwords themselves are stored in the file, then the attacker’s work is done. If they’re hashed and then stored, and the hash algorithm is strong, then there’s not much to do other than to hash every string and look through the password file for that hash. If there’s a match, then you’ve discovered the user’s password.

You don’t have to look through the vast space of strings in alphabetical order of course. An attacker will start with a dictionary of likely password strings. We want to find some way to make that attacker work harder. Setting a policy which disallows common dictionary words as passwords would be a good idea. Another technique is to spice up the hashes a bit with some salt.

System #3

For every user name we generate a random unique string of some fixed length. That string is called the “salt”. We now store the username, the salt and the hash of the string formed by concatenating the user’s password to the salt. If user Alpha’s password is “bigsecret” and the salt is “Q3vd” then we’ll hash “Q3vdbigsecret”.

Since every user has their own unique random salt, two users who happen to have the same password get different salted hashes. And the dictionary attack is foiled; the attacker cannot compute the hashes of every word in a dictionary once and then check every hash in the table for matches anymore. Rather, the attacker is going to have to re-hash the entire dictionary anew for every salt. A determined attacker who has compromised the server will have to mount an entire new dictionary attack against every user’s salted hash, rather than being able to quickly scan the list for known hashes.

Salting essentially makes it less feasible to attack every user at once when the password file is compromised; the attacker must start a whole new attack for each user. Still, given enough time and weak passwords, an attacker can recover passwords.

In this system the client sends the username and password to the server, the server appends the password to the salt, hashes the result, and compares the result to the salted hash in the table.

This answers the original question posed by the JOS poster; the salt can be public because it is just a random string. Ideally, both the salt and the salted hash would be kept private so that an attacker would not be able to mount a dictionary attack against that salt. But there is no way to deduce any information whatsoever just from the salt.

And of course, it’s better to not get into this situation in the first place — don’t allow your password list to be stolen! But it’s a good idea for a security system to not rely on other security systems for its own security. We call this idea “defense in depth”. You want to make the attacker have to do many impossible things to compromise your security, so that if just one of those impossible things turns out to be possible after all, you’re not sunk.

But what about the fact that the password goes over the wire in the clear, where anyone can eavesdrop? That’s now the weak point of this system. Can we do something about that? Tune in next time and we’ll see what we can come up with.


MSFT archive of original post is here.

You Want Salt With That? Part Two: We Need A Hash

OK, we want to sketch out an authentication system which is sufficiently secure against common attacks even if all the details of the system are known to the attacker. Let’s start with a simple system, take a look at what its vulnerabilities are, and see if we can mitigate them:

System #1

The client transmits the username and password “in the clear” to the server. The server consults a list of usernames and passwords, and grants or denies access accordingly.

There are two big problems with such a system. First, it’s susceptible to eavesdropping if the client and the server are not the same machine. If someone can listen in on the message sent from the client to the server with a network packet sniffer then the eavesdropper learns a valid username and password. Second, if the server is ever compromised and the password list is read, the attacker learns everyone’s username and password.

Can we mitigate these problems? Let’s look at the second vulnerability first. Does the server need to store the password? How could it NOT store the password? What would it compare against?

We can eliminate the need for the server to store the password if we have a hash function. A hash function is a function which takes a string as its argument and produces a number, usually a hundred–or-so-bit integer, as its result.

A good hash algorithm has the property that slightly different inputs produce very different outputs. A one-bit change in the input should cause on average 50% of the output bits to change. Because of this, it should be extremely difficult to deduce the an input that produces a given output, even given partial information about the input. (There are other hash algorithm properties that are important for cryptographic operations such as document signing, but that’s a topic for another day.)

Proving that a given algorithm actually has this property can be quite tricky, but we have some industry-standard hash algorithms which have withstood rigorous testing and deep analysis by professional cryptographers and are widely believed to be “one way functions” — it’s easy to go from string to number, and very hard to go backwards.

System #2

The client sends the username and password to the server. The server has a list of usernames and the hashes of passwords, but not the passwords themselves. (When the user first created the password, the system hashed the password and then discarded it, saving only the hash.) The server hashes the client-supplied password and compares the hashes.

This is better; now the server is not storing the passwords, just the hashes. If an attacker compromises the server, they can’t easily go from the hash back to the password. (It also has the nice property that every entry in the password table is now the same size. Longer passwords do not have longer hashes.)

But there are two new problems with this system. First, any two users who have the same password have the same hash. If one of those users is evil and compromises the server, they immediately learn who has the same password as they do.

Second, this system is susceptible to dictionary attacks. An attacker can hash every word in a large dictionary, compromise the server, and then compare every password hash to every word in the dictionary. Since dictionary words are likely passwords, the attacker will probably be able to figure out at least a few passwords.

And of course we still haven’t mitigated the fact that eavesdroppers could be listening in on the conversation between the client and the server.

Next time, we’ll add a little salt to the mix in an attempt to mitigate the dictionary attack and same-password vulnerabilities. Then we’ll see if we can use some of the hash technology to mitigate the eavesdropping attack.


MSFT archive of original post is here.

You Want Salt With That? Part One: Security vs Obscurity

A poster to one of the Joel On Software fora the other day asked what a “salt” was (in the cryptographic sense, not the chemical sense!) and why it’s OK to make salts public knowledge. I thought I might talk about that a bit over the next few entries.

But before I do, let me give you all my standard caution about rolling your own cryptographic algorithms and security systems: don’t.   It is very, very easy to create security systems which are almost but not quite secure. A security system which gives you a false sense of security is worse than no security system at all! This blog posting is for informational purposes only; don’t think that after you’ve read this series, you have enough information to build a secure authentication system!

OK, so suppose you’re managing a resource which belongs to someone — a directory full of files, say.  A typical way to ensure that the resource is available only to the authorized users is to implement some authentication and authorization scheme.  You first authenticate the entity attempting to access the resource — you figure out who they are — and then you check to see whether that entity is authorized to delete the file, or whatever.

A standard trick for authenticating a user is to create a shared secret. If only the authentication system and the individual know the secret then the authentication system can verify the identity of the user by asking for the secret.

But before I go on, I want to talk a bit about the phrase “security through obscurity” in the context of shared secrets. We usually think of “security through obscurity” as badness. A statistician friend of mine once asked me why security systems that depend on passwords or private keys remaining secret are not examples of bad “security through obscurity”.

By “security through obscurity” we mean that the system remains secure only if the implementation details of how the security system itself works are not known to attackers. Systems are seldom obscure enough of themselves to provide any real security; given enough time and effort, the details of the system can be deduced. Lack of source code, clever obfuscators, software that detects when it is being debugged, all of these things make algorithms more obscure, but none of these things will withstand a determined attacker with lots of time and resources. A login algorithm with a “back door” compiled into it is an example of security through obscurity; eventually someone will debug through the code and notice the backdoor algorithm, at which point the system is compromised.

A strong authentication system should be resistant to attack even if all of its implementation details are widely known. The time and resources required to crack the system should be provably well in excess of the value of the resource being protected.

To put it another way, the weakest point in a security system which works by keeping secrets should be the guy keeping the secret, not the implementation details of the system. Good security systems should be so hard to crack that it is easier for an attacker to break into your house and install spy cameras that watch you type than to deduce your password by attacking the system, even if all the algorithms that check the password are widely known. Good security systems let you leverage a highly secured secret into a highly secured resource.

One might think that ideally we’d want it both ways: a strong security system with unknown implementation details. There are arguments on both sides; on the one hand, security plus obscurity seems like it ought to make it especially hard on the attackers. On the other hand, the more smart, non-hostile people who look at a security system, the more likely that flaws in the system can be found and corrected. It can be a tough call.

Now that we’ve got that out of the way, back to our scenario. We want to design a security system that authenticates users based on a shared secret. Over the next few entries we’ll look at five different ways to implement such a system, and what the pros and cons are of each.


MSFT archive of original post is here.

The attribute of manliness

This is a technical, not a political, current-events, linguistic or academic blog. (You know of course that as soon as I say that, it’s because I’m about to post something that is political, timely, linguistic and academic. Foreshadowing: your sign of a quality blog!) Despite all that, I was so struck by this passage I read last night that I felt I had to share it. We’ll get back to error handling in VBScript or some such topic later this week.

The writer is discussing semantics, specifically how word meanings and popular opinions change in political debates during wartime. The writer is… well, I’ll just let him say it, and talk about the writer afterwards.

Words had to change their ordinary meaning:

  • reckless audacity came to be considered the courage of a loyal ally ; prudent hesitation, specious cowardice
  • moderation was held to be a cloak for unmanliness, frantic violence became the attribute of manliness
  • ability to see all sides of a question, inaptness to act on any
  • cautious plotting, a justifiable means of self-defense
  • the advocate of extreme measures was always trustworthy; his opponent was a man to be suspected
  • the fair proposals of an adversary were met with jealous precautions by the stronger of the two, and not with a generous confidence
  • revenge also was held of more account than self-preservation

The cause of all these evils was the lust for power arising from greed and ambition; and from these passions proceeded violence.

Thus Thucydides of Athens, 2435 years ago. (Translation by Richard Crawley. I’ve changed the formatting and trimmed it a bit — Crawley gets a little wordy, but I love the balanced sentences.)

The first reaction I had upon reading this was “isn’t it astonishing how modern Thucydides sounds across the ages? If he’d only thought to coin the snappy term ‘doublespeak’, he’d have scooped Orwell by a couple millennia!”

And then I gave my head a shake, because of course I was reasoning backwards. This shouldn’t be astonishing in the least; I live in a culture where general opinions on government, politics, warfare, sports and art are more or less just as they were in Classical Greece. It would be more astonishing if Thucydides insights into human nature were not applicable today.


Commentary from 2020:

I do not recall precisely what triggered this post but it was some disingenuous statement by President Bush or another federal politician on the subject of the ongoing then, and ongoing now, pointless then and pointless now, American invasion of Afghanistan.

The comments on this article were mostly from other fans of ancient Greece, rather than engaging in the modern political situation that inspired it. I did mention in the original comments a funny conversation I once had with a member of the VB compiler team when I was an intern:


I recall having a meeting when I was an intern. One of the devs said:

“We can’t ask the users to understand this Byzantine documentation. It’s a Sisyphean task!”

A silence fell over the conference room. Finally, the intern piped up.

“Did you by any chance attend a private school in England when you were growing up?”

“Why yes, but what does that have to do with anything? And how did you know?”

Customer service is not rocket science

I was down at Fry’s Electronics yesterday — huge electronics warehouse. Geek paradise. Everything from DVD boxed sets to multimeters. Why I was there is unimportant; let’s just say that the connector consipiracy is after me in a big way. This was my second time in Fry’s, Leah’s first.

It took a while to figure out what was the right Monster widget I needed to fix my problem, it was getting into the afternoon and we were getting kind of peckish. “There’s a sandwich shop right here in the store” said Leah. Perfect.

These guys think of everything — you want people to spend all day shopping in your store, you’ve got to feed them. Smart business move.

Or is it? Maybe not.

We go into the sandwich shop, stand in line for a while, order chicken salad for Leah, pastrami, hot, no mustard for me. Sit down at a table.

The line was quite slow, but, whatever. No problem yet.

We wait. And wait. And wait. And wait some more. The place has other customers, but not so many that it should take twenty minutes to put together a couple of sandwiches. I ask the guy who took our order what was up with the sandwiches

“Yeah, they’re coming”.

We now have a problem, obviously, but this problem is merely Vexing. I can live with Vexing.

A couple minutes later, there’s still no sign of Leah’s sandwich, but mine arrives — covered in mustard.

Now we have a more serious problem. Three problems actually: where’s Leah’s sandwich, why is mine covered in mustard, and why is this all taking so long? We have moved from the Vexing category to Boneheaded.

We are now seriously low on blood sugar and getting cranky.

I once more go up to the guy and point out that my sandwich has mustard on it.

I am not making this up: the very first thing out of his mouth is

“That’s not my fault. You saw me write down ‘no mustard’.”

OK, now we have a BIG problem. We have rapidly left Boneheaded far behind and are firmly ensconced into the Fatal problem category. We now have a meta-problem. This guy wants to argue with me about whose fault it is, rather than making me a new sandwich. I am not particularly interested in having that conversation.

“Look, you know what, I could have driven home and made a sandwich in the amount of time we’ve been waiting. Leah’s still hasn’t shown up. Void out the transaction and we’ll just eat at home.”

“I need my manager to void a transaction.”

“You do what you have to do, Zach.”

At this point we start watching the clock with growing interest.

A solid five minutes later, a young woman ambles in, who is apparently the manager. She completely ignores me — she does not speak a single word to me throughout this entire encounter, though she does attempt a feeble, unconvincing justification to Leah on the subject of why it is that a chicken salad takes so long.

She chews out Zach for writing down Leah’s order with no table number.

She then chews out the sandwich makers — who, I gather from her conversation with them, found an order for a chicken salad, made it, discovered that there was no table number written on the order, and therefore stuck it behind the counter and ignored it.

Obviously this belies her earlier ridiculous explanation that chicken salad takes a long time to prepare, but she chooses to ignore this little contradition.

Now, I used to make sandwiches for a living, and let me tell you, even if you have only a few simple sandwich making skills, it’s not that hard to figure out that someone probably wants to eat that sandwich, and that if you don’t know to whom it belongs, it behooves you to find out. I mean, what did they think would be the outcome of hiding it? (If you guessed “they’d give up, go home and blog about it” — you’re right!)

When she’s done chewing out her staff, she admits that actually, she has no idea how to work the cash register and therefore cannot void out the transaction. She needs her manager.

So far, I have encountered zero competent employees, and a considerable number of incompetent employees. We sit back to watch the clock again.

You know those old Star Trek TNG episodes where Picard goes “hostile aliens are loose in the Engineering Room! Riker, Worf, take care of it!” and then instead of, oh, I don’t know, beaming themselves instantaneously into engineering, they kind of walk — briskly — the quarter mile from the bridge to the engine room? My initial conjecture was that things were taking so long because everyone was really, really busy serving other customers. Based on the speed that they actually move, I’m now starting to think that they’re just plain slow.

Anyway, ANOTHER solid five minutes later, the manager’s manager ambles in. This guy attempts to save the day. After he’s brought up to speed by the cashier and the manager, the first thing out of his mouth is “I’m sorry this happened.”

“You know, you’re the first person to say that in the last fifteen minutes.”

“Oh. Well. I’m sorry about that too.”

We are, for the first time, on the right track. Can he pull it off? Tragically, no. He tries to do the right thing, but he screws it up. How he screws it up is interesting. Thus far, every mistake made has been due to total incompetence. Let’s break it down:

First order mistakes: Hide a sandwich when you don’t know whose it is. Put mustard on a sandwich where the order clearly says no mustard. Fail to understand how your own cash register works.

Second order mistkes: When given a customer problem, engage in blame shifting. Argue back to the customer. Ignore the customer’s problem while you concentrate on process. Don’t take responsiblity for your mistakes. Don’t apologize. Don’t do anything to actually SOLVE the PRIMARY problem (two hungry people with a basket full of high margin widgets that they’d like to buy). Call in multiple levels of management to solve a simple sandwich making issue.

These are all ridiculous and obvious mistakes that should be covered in the first day of new employee orientation at a business that so heavily depends on repeat customers.

What was the manager’s final, deal breaking mistake?

“Is there anything we can do to make it up to you?”

This last mistake is subtle. Clearly he meant well and knew what to do — apologize, take responsibility, mollify the customer — but not how to do it.

The problem is that I’ve already told them what I want them to do — I want them to give me a sandwich, and, if they cannot, to give me my money back so that I can stop spending my incredibly busy day with time-wasting idiots.

Engaging in a negotiation with management over what would be an appropriate level of contrition for them to display for the disaster they’ve managed to embroil me in is not how I want to spend another second of my day. We are in this situation BECAUSE I want to stop talking to them.

Figuring out what they can do for me when they screw up is management’s job, not the customer’s job! Thus, I said “Thanks, but I’m just going home.” put down my basket, and left.

Customer service at a sandwich shop is not rocket science. I said in an earlier entry:

My father has been in the restaurant business for many years. Something he taught me at an early age is that one measure of the quality of a restaurant is how few mistakes they make, but a more important measure is how they treat the customer once a mistake has been made. Do they apologize, take responsibility, and immediately act to correct the mistake, or do they engage in cover-ups, blame-shifting and foot-dragging? I don’t go back to the second kind of restaurant.

When that happens to me at a restaurant where the core competency is in serving food, the restarant probably loses tens or hundreds of dollars of business from me. In a restaurant business where the core competency is actually separating technology-loving geeks like me from thousands of dollars at a time, the opportunity cost of making a customer relations disaster out of a sandwich is considerably higher.

Finally, let me make this very clear: though obviously it is fun to vent, that’s not my primary purpose here. I want to call attention to this problem in a public way because Fry’s sells Microsoft products and therefore I want them to succeed. Even if I cannot, in good conscience, ever shop there again, I want other people to have a pleasant shopping experience there, and buy lots of computers and XBox games. If I didn’t want them fixed, I wouldn’t point out the problems.

I’m going to send a link to this to upper management at Fry’s, and I invite them to respond with details of how they’re solving these problems.


Update from 2022: They never responded to my post and went out of business in 2021.

Multi-cast delegates the evil way

A lot of people have asked me over the years how various kinds of event binding work.  Basically, event binding works like this:

1)     Someone clicks on a button,
2)     then a miracle happens, and…
3)     the button’s event handlers execute.

It’s that second step that people struggle with.

First, some terminology.  I studied applied mathematics, and somethings we talked about quite a bit were sources and sinks. Sources produce something — a faucet produces water at a certain rate, for example.  A sink takes that water away.  We’ll borrow this terminology for our discussion of events.  An event source is something that produces events, like a button or a timer.  An event sink is something that consumes events, like an event handler function.  (Event sinks are also sometimes called “listeners”, which mixes metaphors somewhat, but that’s hardly unusual in this profession.)

This terminology leads to a rather unfortunate homonymy — when I first heard “this method sinks the click event”, I heard “this method syncs the click event”.  When we talk about event sinks, we’re talking about the consumer of something, not about synchronizing two things in time.  (Sinks, of course, can be asynchronous…)

The miracle actually isn’t that miraculous.  Implementing event sources and sinks requires two things: first, a way to wrap up a function as an object, such that when the source wants to “fire” the event, all it does is invokes the sink’s wrapper.  Second, a way for the thread to detect that the button, or whatever, has been pressed and thereby know to trigger the sink wrappers.

An explanation of the magic behind the latter would take us fairly far afield.  Suffice to say that in IE, the details of how that mouse press gets translated into windows messages and how those messages are dispatched by the COM message loops behind the scenes are miracles that I don’t want to talk about in this article.  I’m more interested in those wrappers.

In the .NET world, an object that can be invoked to call a function is called a delegate.  In JScript Classic, all functions are first-class objects, so in a sense, all functions are delegates.  How does the source know that the developer wishes a particular delegate (ie, event sink) to be invoked when the event is sourced?

Well, in IE, it’s quite straightforward:

function doSomething() {  }
button1.onclick = doSomething;  // passes the function object, does not call the function

But here’s an interesting question — what if you want TWO things to happen when an event fires?  You can’t say

function doSomething() {  }
function doOtherThing() {  }
button1.onclick = doSomething;
button1.onclick = doOtherThing;

because that will just replace the old sink with the new one.  The DOM only supports “single-cast” delegates, not “multi-cast” delegates.  A given event can have no more than one handler in this model.

What to do then?  The obvious solution is to simply combine the two.

function doSomething() {  }
function doOtherThing() {  }
function doEverything() { doSomething(); doOtherThing(); }
button1.onclick = doEverything;

But what if you want to dynamically add new handlers at runtime?  I recently saw an inventive, clever, and incredibly horribly awful solution to this problem.  Some code has been changed to protect the guilty.

function addDelegate( delegate, statement) 
{
  var source = delegate.toString() ;
  var body = source.substring(
    source.indexOf('{')+1,   
    source.lastIndexOf('}'));
  return new Function(body + statement);
}

Now you can do something like this:

function dosomething() { /* whatever */ }
button1.onclick = dosomething;
// ... later ...
button1.onclick = addDelegate(button1.onclick, "doOtherThing();");

That will then decompile the current delegate, extract the source code, append the new source code, recompile a new delegate using “eval”, and assign the new delegate back.

OK, people, pop quiz.  You’ve been reading this blog for a while.  What’s wrong with this picture?  Put your ideas in comments and I’ll discuss them in my next entry.

This is a gross abuse of the language, particularly considering that this is so easy to solve in a much more elegant way.  The way to build multi-cast delegates out of single-cast delegates is to — surprise — build multi-cast delegates out of single cast delegates.  Not decompile the single-cast delegate, modify the source code in memory, and then recompile it!  There are lots of ways to do this.  Here’s one:

function blur1(){whatever}
function blur2(){whatever}

var onBlurMethods = new Array();

function onBlurMultiCast() {
for(var i in onBlurMethods)
onBlurMethods[i]();
}
blah.onBlur = onBlurMultiCast;
onBlurMethods.push(blur1);
onBlurMethods.push(blur2);

I’ll talk about VBScript and JScript .NET issues with event binding another time.