Passwords are a key to commerce today. So it behooves us to know something about them.
Well-maintained sites never store the password themselves. Instead, they use a cryptographic hash: A function that takes an original string and produces a different string that can basically only be produced by the original string, but for which knowing the result is basically not helpful at all at providing us the original string. For example, giving the word 123456 to one widely used cryptographic hash function called MD5 produces the result e10adc3949ba59abbe56e057f20f883e. (By the way, MD5 is too simple to be used for passwords.) A very slight change, like tweaking the last digit to 123457, results in an entirely different result, f1887d3f9e6ee7a32fe5e76f4ab80d63; it needs to be an entirely different result, since being close would allow us to know that we were “getting warm” and so be able to hone in on the correct result.
A cryptographic hash function is sometimes also called a one-way function: Knowing an original string, you can apply the function to get a result; but knowing the result you can't determine the original string. It may surprise you that such functions exist, but consider the process of squaring a number: Squaring a number is quite easy, but the inverse — the square root — is more difficult. But suppose we were to square the number and keep just the final eight digits? For example, 123,456² is 15,241,383,936, so our “hash” would be 41383936. Now this is still quite easy to compute, but there's no obvious way that if we knew the hash was 41383936 we could figure out 123456, aside from just trying many different values. (That's not to suggest this is a good hash function, though. For starters, seeing that the hash ends in a 6 already tells us that the original number ended in a 4 or a 6.)
Let us suppose our cryptographic hash function is known as h. So if our password is x, we would compute h(x) and store the result in our database rather than x itself. Later, when somebody enters a potential password z, we of course can't compare z to x directly, since we forgot x. But we can compute h(z), and confirm whether it matches the h(x) that we stored in the database. If these do match, then we conclude that z is a valid password.
Note that this means that a site that stores your information means that it cannot tell you what your password is, since they simply don't store your password. So when you forget your password, the standard solution is for a site to generate a new password for you and then send that revised password to you. If you're dealing with a site that can actually send you your original password, be very wary: They certainly have poor security, and you can't trust them.
One more complication: To defeat a whole range of particular techniques for cracking passwords, in fact secure Web sites use “salt”: As each account is created, they generate a random sequence specific to that account known as its “salt.” Suppose s is the salt generated for an account, and x is the user's generated password. Then in the database we store both s and h(s + x) — that is, the result of appending x to the salt and then applying our cryptographic hash function to that. Then, when the user later types z as the purported password, we check whether it is correct by looking up s in our database, appending s and z, computing the cryptographic hash of that (h(s + z)), and verifying whether this matches up with the h(s + x) we have stored in our database.
How can a malicious person get a password?
One straightforward way to retrieve passwords is phishing. You have probably seen simple phishing before, where you get an e-mail from somebody claiming to be from your e-mail administrator saying that your account is failed but they need your password in order to repair it. Or they might call you. Of course, you should assume that any unknown person who contacts you for your password is fake. If a real administrator needs your password, then the correct response is to tell them that they should go ahead and change your password, as they would have the ability to do easily.
A different tack, though, is to just get the database itself. One can “break into” the system by figuring out how to access the system remotely (or actually physically figure out how to get into the room), and from there they can copy the file remotely. Admittedly, if the system uses cryptographic hashing, then this doesn't give us the passwords, but then one can hope to apply a myriad of techniques.
Another possible attack to retrieve passwords is a technique called SQL injection. When you access a Web site, it usually accesses a database. It talks to the database in a language called SQL. For example, suppose the Web site offers to let you type a “search string” and then lists the name and e-mail address for all users whose name starts with the search string. It would typically do this by executing an SQL query such as the following.
SELECT name, email FROM users WHERE name LIKE 'SEARCH_STRING%'
The idea of SQL injection is to enter something to fool the site into performing an entirely different query. For example, suppose a user typed the following as the “search string ”:
never5#2' UNION SELECT login, password FROM users --
Obviously, the user isn't thinking that some user names genuinely start this way. But when it is pasted into our SQL, we get the following:
SELECT name, email FROM users WHERE name LIKE 'never5#2' UNION SELECT login, password FROM users -- %'
This is valid SQL, and it will list all login IDs and passwords. Of course, with a good site it just gets cryptographic hashes; one then goes on to the next section to actually retrieve individual users' data.
Vulnerability to SQL injection is one of the topmost security problems with Web sites. Good programmers know how to avoid it: Basically, they need to be careful that user input is always inserted in a way that things like quotes are “escaped” so that they are treated just as quote characters. But even good programmers occasionally lapse.
An injection attack was responsible for one of the biggest leaks of password data ever: In 2009, somebody retrieved all the user passwords for 32 million users of RockYou, a company that wrote games to be played on Facebook. This was particularly notable because RockYou had not bothered to hash the passwords, so immediately 32 million real-world passwords became available.
But even if you get a file of cryptographic hashes, it is still not secure. Anybody skilled at cracking passwords knows that they can crack a large number of passwords simply by buying a reasonably fast computer ($10,000 would be more than enough), and then one can try hashing a wide range of passwords.
To try to defeat these exhaustive techniques, good sites will choose to use cryptographic hash functions that intentionally take lots of time. (That's a major reason why MD5 is such a poor choice: MD5 is designed to be fast.) While a computationally complex hash function means that the site will spend a lot of computation verifying passwords, it slows down password crackers immensely, so that they can try far fewer possibilities.
So what can you do to protect yourself?
Choose uncommon passwords. For heaven's sake, don't choose anything that resembles any of the “top 20” passwords (as retrieved from RockYou, and confirmed in several similar leaks).
1. 123456 6. princess 11. Nicole 16. Lovely 2. 12345 7. rockyou 12. Daniel 17. michael 3. 123456789 8. 1234567 13. babygirl 18. Ashley 4. Password 9. 12345678 14. monkey 19. 654321 5. iloveyou 10. abc123 15. Jessica 20. Qwerty
None of these look good, but in fact a lot of good-looking passwords do exist further down on the list of 32 million RockYou passwords. If it resembles a dictionary word or a name, it'll be there.
Choose a password of 10 or more characters. After all, good crackers can iterate through all combinations of 6-character passwords. Sometimes they can even try all combinations of 8 characters. In today's world, 10 characters is considered reasonably safe.
Use a different password for every site. A lot of people reuse passwords from one site to another. If you do this, though, then somebody who manages to crack your password on one site (perhaps due to particularly poor security there) would be able to access all the sites.
Use a password manager of some sort. If you're choosing 10-character passwords that are genuinely difficult to generat, then you're going to forget them. A low-tech solution is simply to carry around a sheet of paper in your wallet with the passwords written on it; this actually isn't bad, as long as you're good with keeping track of your wallet and not running it through the washing machine.
Another solution, which is also good, is to use a “password manager” program that is widely known and known to be good; personally, I use Password Safe, but KeePass is another good choice. These work by storing an encrypted file on your computer; when the application is started, you enter a “local password”, which opens up the encrypted file so that you can view and copy the passwords from the file into whatever Web site you are accessing. True, somebody who gets your file and your password gets access to all your accounts, but that requires figuring out how to access the computer.