Friday, April 03, 2009

What is wrong with short encryption passwords?

What is wrong with short encryption passwords (a.k.a. "keys")?

It is natural to want to use a short convenient password to control access to something like a bank account, a locked door, or an encrypted document. Banks and multiuser computer systems get away with using login passwords that are short enough to be easy to remember: 4-5 digits, or a single randomly chosen dictionary word. Why can't we use such short convenient keys to protect the secrets in an encrypted document?

The basic problem is that when someone gets access to an encrypted document, you (mostly) can't keep them from using their computer to try very many guessed encryption keys very fast. In this post, first I will write about why (for encrypted documents, as opposed to for some other applications of secret passwords) we can't stop people from trying guessed encryption keys very fast. Then I will write, or just do some arithmetic, to show how severe this vulnerability is for short passwords for encrypted documents.

If you try to guess the password on a stolen ATM card, the bank's ATM can make this difficult for you in two ways. First, the ATM can keep you from testing any guess quickly --- you could train yourself to type the key very quickly, but the ATM need not cooperate by giving you the answer very quickly. And second, the ATM can become suspicious (cooperating even less, and/or raising red flags for security people) when you make too many incorrect guesses.

(Multiuser computer systems can do similar things when people try to guess login passwords.)

(ATMs also have other security advantages that I won't discuss. For example, you have to be physically present at an ATM to try a guess.)

An encrypted document just sits there. Unlike an ATM or a multiuser computer system, an encrypted document (mostly) can't stop someone from trying guesses quickly. And an encrypted document absolutely can't become suspicious and change its behavior when someone tries guesses. An ATM or a computer system is a machine which draws electrical power and thus is capable of having interesting behavior which interferes with breakin attempts. An encrypted document can have no behavior. Typically it is stored in some fundamentally passive way, like a pattern of magnetization on a disk; even when it is not being stored that way, the need to make it passively storable that way keeps it from having behavior.

(I wrote "(mostly)" about not being able to stop someone from trying guesses quickly. There is one class of exceptions to that: an encrypted document can be encrypted in such an inconvenient way that it takes a large amount of computer time to decode it even when you have the correct key. This is generally impractical, because people mostly don't want to work with such inconveniently slow encryption systems. However, it is practical in a few niche applications; the main niche I am aware of is in encrypting the files which contain computer login passwords, partly by using the trivial technique of inconveniently slow encryption systems and partly by using more devious "salt" techniques.)

So now, having tried to convince you that someone who misappropriates an encrypted document will be able to quickly try guessed passwords against it, let me try to sketch how severe the problem is.

In order to estimate how fast a guess can be tried by a computer, we need to estimate two numbers. First, how much text will the computer need to decode in order to detect that it has made a bad guess, so it can go on to the next guess? It might be as few as 10 characters, but 100 characters is a more conservative guess. Second, how fast can a computer decode those 100 characters? We will assume that it can do that about as fast as it can read characters from a hard disk, because it is inconvenient to use encryption schemes which are much slower than that, so people tend to avoid them. In round numbers, then, a typical laptop computer can decrypt on the order of 10 million characters per second, so it can try about 100,000 guessed keys per second.

(Will someone who misappropriates an encrypted document be able to run the software to try these guesses? Maybe not. But the population of people who can run software is pretty large. The population of people who can write the software is smaller, but it only needs to be written once. (It is also pretty simple software, easy for an experienced programmer to write, and possible for a motivated teenager to write.) Useful software tends to get widely distributed. Software useful for mischief --- decryption programs, keystroke loggers, trojan horse programs, etc. --- gets distributed in some highly visible ways, but various less visible ways too. It is seldom safe to assume that someone who is motivated to break into an encrypted document will not have access to such software.)

What are the implications of guessing 100,000 keys per second? It takes one second to try all possible 5-digit keys. It takes a few hours to try all possible 9-digit keys. It takes about one second to try all dictionary words. It takes a few hours to try all sequences of six letters and numbers, such as "jrr6d9".

Roughly, then, the security you get from encrypting a document with a short easy-to-remember key is comparable to the security you get from locking a physical box with a padlocked hasp whose screws aren't protected by the hasp, so that the box can be opened simply by unscrewing the screws. (Since decryption software isn't as common as an ordinary or Phillips screwdriver, perhaps the screws should be something less common like TORX screws.) Such an arrangement does make it clear that you intend the message to be private, but it causes no more than a minute of inconvenience to someone who chooses to ignore your intent.

So now what? If expressing but not enforcing the intent to privacy is enough, use a short password and be happy. If instead you want to use encryption passwords as a serious obstacle to unintended reading, it is a well-studied problem, and I am neither an expert nor motivated to write a complete survey of what I do know, but some possibilities are: annoyingly long random keys (e.g., 16 or more random letters and numbers), passphrases (a messy subject, but think 5 randomly generated dictionary words), various schemes using "public key cryptography" to do an end run around the problem, or various schemes replacing the hard encrypt-a-document problem with the easier (ATM-like) authenticate-a-user problem. A useful programmer's reference for many problems related to this is Schneier's book Applied Cryptography; unfortunately I don't happen to know a comparably useful nonprogrammer's reference.