In most cases, computer access is protected by username and password.
Usually it is not too difficult to find out some or all user names
on a given computer. Names leak as email addresses and in usenet posts.
Utilities like finger
or rwho
may give some.
There are many standard user names, root
being the most obvious one.
System logs and similar may be visible on the web, and found using Google.
Finding the password is not so simple. Usually one has to brute-force, trying all words in a dictionary, a list of first names, or just all strings of at most six printable symbols. A good password cracker is John the Ripper. Given the passwd file of some Unix machine, say with two or three dozen user names and passwords, one normally finds two or three vulnerable ones within a day or two.
For Windows NT there is the very fast L0phtCrack password cracker. Later versions also work on W2000.
How does one obtain the passwd file? On a local machine it is just
readable. Sometimes one can obtain it remotely via anonymous ftp,
or via a CGI script, using a .../../../../../etc/passwd
parameter. Of course, nowadays people often use shadow password files,
and these may be more difficult to obtain.
On most Unix systems, passwords are at most 8 characters long. Picking control characters or non-ASCII characters is bound to give problems when logging in remotely via other systems, so it is reasonable to expect characters in the range 32-126. Now 95^6 = 0.74 . 10^12 and 95^8 = 0.66 . 10^16 so if one can check one password in a microsecond then nine days suffice to check all strings of length at most six. (On my computer a DES-type check takes 10 microseconds.)
Of course, it is not necessary to try all possible strings. Trying all words in a fat dictionary takes only a few minutes.
Exercise Crack some or all of the following passwords.
aap:$1$ucQls3qa$sUbnWL2cEHtjM5qnGGs1q/:501:501::/home/aap:/bin/bash noot:$1$rPntDw4c$kf5jBMhdI7JfT7C2FlkBs1:502:502::/home/noot:/bin/bash mies:$1$0AMQsYdo$bLi7rpEK7IaYzmt3bVYH70:503:503::/home/mies:/bin/bash wim:$1$PRrAVTSy$3xDb1kZi9Rz/pTG4KecQK/:504:504::/home/wim:/bin/bash zus:$1$ou8y1Y/R$NTwWWHtWN.TBYEi.1ul.l.:505:505::/home/zus:/bin/bash
I find that common passwords include '' (the empty string), 'secret', 'password' (and in Holland the Dutch versions 'geheim', 'wachtwoord'), strings of consecutive digits or letters like '123', '12345', '1234567', 'abc', and proper names like 'eric', 'kevin', 'sandra', 'melissa', 'Nikita'.
On 2010-12-12 hackers published about 750000 encrypted passwords of users of Gawker blogs such as Lifehacker, Consumer, Gizmodo, Gawker, Jezebel, io9, Jalopnik, Kotaku, Deadspin, Fleshbot. (This explains the occurrence of these words below.) I cracked a bit more than 425000 of these passwords in about 12 hours. The list below gives the 350 most frequent passwords with their frequencies in this cracked set.
2516 123456 124 swordfis 89 bubbles 74 tennis 62 fuckyou2 2190 password 124 summer 88 startrek 74 scooby 62 fender 1205 12345678 122 asdf 88 diamond 74 naruto 62 butter 782 lifehack 121 matthew 88 coffee 74 mercedes 61 wolverin 696 qwerty 121 asdfgh 88 butterfl 74 maxwell 61 samsung 499 abc123 120 mustang 88 brooklyn 74 fluffy 61 rush2112 459 12345 119 yankees 88 amanda 74 eagles 61 podnTN76 439 monkey 117 hannah 88 adidas 74 11111111 61 pa55word 413 111111 117 asdfghjk 87 test 73 penguin 61 lalala 385 consumer 117 1qaz2wsx 86 wordpass 73 muffin 61 christin 376 letmein 116 cookie 86 sparky 73 bullshit 60 yourmom 351 1234 115 midnight 86 morgan 73 6Lthpku5 60 westside 318 dragon 115 123qwe 86 merlin 72 steelers 60 rocket 307 trustno1 114 scooter 86 maverick 72 jasper 60 melissa 303 baseball 114 purple 86 elephant 72 flower 60 icecream 302 gizmodo 114 banana 86 Highlife 72 ferrari 60 casper 300 whatever 113 matrix 85 poopoo 71 slipknot 59 spooky 297 superman 113 jezebel 85 nirvana 71 pookie 59 random 276 1234567 113 daniel 85 love 71 murphy 59 prince 266 sunshine 111 hunter 85 liverpoo 71 joseph 59 mookie 266 iloveyou 111 freedom 85 lauren 71 calvin 59 greenday 262 fuckyou 110 secret 84 stupid 71 apples 59 cooper 256 starwars 110 redsox 84 chelsea 71 159753 59 arsenal 255 shadow 108 spiderma 84 8FNFYgx1 70 tucker 58 hello1 241 princess 108 phoenix 83 compaq 70 martin 58 guinness 234 cheese 108 joshua 83 boomer 70 11235813 58 gamecube 231 123123 108 jessica 82 yellow 69 whocares 58 diablo 229 computer 108 asshole 82 sophie 69 pineappl 58 999999 226 gawker 108 asdf1234 82 q1w2e3r4 69 nicholas 58 98765432 223 football 107 william 82 fucker 69 jackass 58 333333 204 blahblah 107 qwertyui 82 coolness 69 goober 58 131313 203 nintendo 107 jackson 82 cocacola 69 chester 58 00000000 199 000000 107 foobar 82 blink182 69 8675309 57 xbox360 198 soccer 106 nicole 81 zxcvbnm 69 222222 57 school 195 654321 106 123321 80 snowball 68 winston 57 iceman 193 asdfasdf 105 peanut 80 snoopy 68 somethin 57 goldfish 184 master 104 samantha 80 rachel 68 please 57 friends 182 passw0rd 104 mickey 80 gundam 68 dakota 57 defamer 182 michael 104 booger 80 alexande 68 112233 57 555555 175 hello 103 poop 79 jasmine 67 wonkette 56 winter 170 kotaku 102 hockey 79 danielle 67 rosebud 56 welcome1 167 pepper 100 thx1138 79 basketba 67 dallas 56 qaz159 165 jennifer 100 121212 79 7777777 67 696969 56 porsche 165 666666 99 ashley 78 thunder 66 shithead 56 monkey1 164 welcome 98 silver 78 snickers 66 popcorn 56 kermit 164 buster 98 gizmodo1 78 patrick 66 peaches 56 jackie 161 Password 98 chocolat 78 darkness 66 pakistan 56 hardcore 159 batman 98 booboo 78 boston 66 dexter 56 donkey 158 1q2w3e4r 97 metallic 78 abcd1234 66 canada 55 success 157 maggie 97 1q2w3e 77 pumpkin 65 victoria 55 richard 154 michelle 96 bailey 77 creative 65 rockstar 55 redrum 153 killer 95 google 77 88888888 65 qwe123 55 rainbow 153 andrew 95 babygirl 76 smokey 65 newyork 55 poohbear 151 pokemon 94 thomas 76 sample12 65 nathan 55 jordan23 151 internet 94 simpsons 76 godzilla 65 lovely 55 heather 150 biteme 94 remember 76 december 65 benjamin 55 fireball 148 orange 94 gateway 76 corvette 64 robert 55 dietcoke 148 jordan 93 oliver 76 bandit 64 pickle 55 access 147 ginger 93 monster 76 123abc 64 passport 54 warcraft 145 123 92 qazwsx 75 voodoo 64 mother 54 skippy 144 aaaaaa 91 taylor 75 turtle 64 harley 54 ranger 138 tigger 91 madison 75 spider 64 forever 54 radiohea 137 charlie 91 guitar 75 london 64 falcon 54 qwerty1 136 chicken 91 anthony 75 jonathan 63 trinity 54 qwer1234 135 nothing 90 justin 75 hello123 63 barney 54 michael1 132 fuckoff 90 elizabet 75 hahaha 62 willow 54 drpepper 130 deadspin 90 1111 75 chicago 62 ncc1701 54 destiny 125 valleywa 89 november 75 brandon 62 kitten 54 cowboy 125 qwerty12 89 monkey12 75 austin 62 jesus 54 changeme 125 george 89 drowssap 74 yvyfCbI6 62 hacker 53 xxxxxx
Having passwords in cleartext in a file is a bad idea - they will
be compromised. Unix introduced the idea of feeding the password
to some one-way hash function and storing the result.
Now the password file /etc/passwd
can (and does) have
general read permission.
Unix V5 and V6 used a simulation of the M-209 cipher machine, and encrypted (the first 8 characters of) the password to an 8-char string.
Anecdote In the Unix V6 days I once gave a Polish colleague a username and password, and told him his username and said that he could guess the password. He sat down and logged in, and was surprised that it worked. `How did you know I was going to try "ladne"?' But I had given him the password "aline".
(Thus, we found a collision. Rechecking:
# passwd ka Usage: passwd user password # passwd ka aline # grep ka /etc/passwd ka:ugiTjezp:11:1::/usr/ka: # passwd ka ladne # grep ka /etc/passwd ka:ugiTjezp:11:1::/usr/ka: #We see another weakness here: this version of
passwd
required the password on the command line. This means that it
would be visible to someone who did ps
at the same time.)
As Morris & Thompson wrote, the encryption algorithm was too fast, and brute force search was too easy. So, in Unix V7 the algorithm was replaced by a modified DES, repeated 25 times. DES because it was slower and safer, repeated to make it even slower, and modified in order to protect against hardware implementations of the actual DES. (Moreover, there has always been some uncertainty: could it be that there is some backdoor in DES? Maybe a modified DES is more secure than the actual DES.)
The input to this encryption consists of a 12-bit salt
concatenated with the user's password. The 64-bit output is
converted to an 11-char string and compared to the entry in
/etc/passwd
, which has a 13-char string representing
salt and encrypted password.
(DES has two inputs: key and data. Here salt plus password
is used as 64-bit key, and the initial data is the constant zero.)
Bits are converted to printing characters in groups of 6,
using the alphabet ./0-9A-Za-z
(in this order).
The salt makes sure that one cannot precompute encryptions for all dictionary words, say - each word has 4096 different encryptions. It is chosen at random when the user sets his password.
These passwords are recognized by their length of 13 characters.
root:VwL97VCAx1Qhs:0:1::/:Exercise Crack this.
This is what the standard Unix routine crypt()
does.
Today it is fairly insecure. Exhaustive search is feasible
with special purpose hardware, and the speed of 100000 attempts/second
is too high. Only 8 characters of the password are used.
The salt is too small - it is quite feasible to precompute
the encryption for all possible 4096 salts and all words in a large
dictionary or word list and store the result on disk.
(Also Windows NT uses a form of DES. It is even weaker and allows 800000 attempts/second. There is no salt.)
What to do about the weakness of crypt()
?
The main defense is now the use of shadow password files,
that is, the hiding of the password file from the users.
But that has all the problems that caused Unix to abandon
a plaintext password file. It is better to replace crypt()
.
Various cryptographic hash functions are designed to be fast, and such that constructing collisions or finding preimages is infeasible. That latter property is precisely what is needed for password encryption, but a password hash must be slow. Brute force cracking of raw MD5 is very easy.
According to US law, exporting cryptographic software was a form of munitions export. This caused a lot of stupid annoyances. Of course everybody in the whole world had DES source code, but nevertheless distribution was restricted.
In order to overcome this difficulty, FreeBSD 4.2 switched to a complicated
algorithm based on MD5. That had several advantages: it is a bit stronger,
with 128-bit output instead of 64-bit, it uses the entire password instead
of only the first 8 characters, and it is slower (the digest is rehashed
a thousand times), so brute force takes longer.
(On my machine 2000 attempts/sec, against 100000 for modified DES.)
Also RedHat 6.0 and up uses MD5 (but SuSE does not by default - ach).
These FreeBSD-type MD5 passwords can be recognized as 34-char encrypted
passwords starting with $1$
. The first 8 characters following
are the salt. Poul-Henning Kamp described his
design criteria.
Niels Provos and David Mazières developed bcrypt()
,
the best choice for a password hash today. It is based on Blowfish,
and contains facilities for making the algorithm arbitrarily expensive.
It is used by OpenBSD, and has passwords starting with $2$
,
$2a$
, $2x$
, or $2y$
.
Brute force is even slower here, at 100 attempts/sec.
Various implementations of crypt()
have suffered from problems
in an 8-bit environment since the programmers expected ASCII input.
What to do with non-ASCII bytes? In
some implementations they were
replaced by '?', so that a strong password turned into the constant string
"????????". In some implementations the high order bit was masked off,
so that 0x80 became end-of-string. In 2011 a sign-extension bug was
discovered in the Openwall implementation of Blowfish.
The $2y$
-prefix in bcrypt()
-generated passwords
indicates that they were generated by a post-fix algorithm.
Before version 4.1 MySQL had a very weak password algorithm. Here it is.
void hash_password(ulong *result, const char *password, uint password_len) { ulong nr=1345345333L, add=7, nr2=0x12345671L; ulong tmp; const char *password_end = password + password_len; for (; password < password_end; password++) { if (*password == ' ' || *password == '\t') continue; tmp = (ulong) (uchar) *password; nr ^= (((nr & 63)+add)*tmp) + (nr << 8); nr2 += (nr2 << 8) ^ nr; add += tmp; } result[0]=nr & (((ulong) 1L << 31) -1L); result[1]=nr2 & (((ulong) 1L << 31) -1L); // printf("%08lx%08lx\n", result[0], result[1]); }The input is an arbitrary string. The output is a 16-hexdigit hash, 62 bits. This is weak for many reasons. It is fast, so brute force cracking is easy. There is no salt, so precomputation is possible. The value of
nr2
is not used in the computation of nr
,
so a cracker can forget about the second half of the hash and work with
the first 31 bits only. On the other hand, nr2
provides
valuable information. Since the final nr
and nr2
are known
(except for their high order bit) one can find the value of nr2
before the last step, so that incorrect candidate passwords can be discarded
without looking at their last byte, greatly speeding up the search.
And with a bit of work one can also derive information about the stages
after all but two or all but three bytes of the password, making the search
even faster. This means that cracking 8-symbol passwords is quick.
See also Philippe Vigier's
poc.c.
Recent versions of MySQL use a double application of SHA1, and do not have obvious weaknesses. However, many sites still use the old scheme, e.g. for compatibility reasons.
Exercise Crack some or all of the following passwords.
Joe:4d432f8b35591ab8 Edv:0918419960333840 Bas:0692c5e37db23ab9 Jam:18bd29ea2b511d53
The PKZIP utility is used to create compressed archives.
The
format of the outputfile
is well-documented. One can protect archives with a password.
In the Microsoft world many (usually commercial) brute force
ZIP password crackers are available, the most famous being
Elcomsoft's AZPR. In the Unix world one has zipcracker
(for distributed cracking over a Beowulf network) and fcrackzip
(for simple and fast brute force), that come with source code.
There is also pkcrack
that implements the algorithm
described by Eli Biham and Paul Kocher and uses some (at least 13 bytes)
known plaintext. Altogether, it is usually feasible to find the
password of a traditional ZIP archive. Recognizing that the password
protection had become too weak, PKZIP 5.0 introduced stronger encryption.
Concerning the cracking speed one can expect: a moment ago I needed
to crack a ZIP password and found that zipcracker
did
approximately 1000000 tries/sec on a 1400MHz machine.
Exercise In my mailbox I found the password protected zip file Message.zip. What is the password? What does this file contain?
Adobe's Portable Document Format is one of the more popular formats
in which to distribute files representing printed material.
Such files are commonly viewed with Acrobat Reader or with xpdf
.
The format allows the creator of the file to set certain protections.
The protection comes in two flavours: protection bits and
password protection. There may be two passwords: the owner's
password and the user password.
The permission bits are not enforced in any way.
But Adobe asks implementers of PDF readers to respect their settings.
The Linux xpdf
indeed respects the bits. Of course it is
trivial to modify the source removing the tests on okToPrint()
,
etc.
Revision 2 knows about four permission bits (in a 16-bit short):
Bit | Permission |
2 | |
3 | Change |
4 | Copy |
5 | Annotate |
The 16-bit protection mode has the leading ten bits set to 1, the trailing two bits set to 0, and the remaining four permission bits Rev 2: Modifying / Copying text & graphics, accessibility support / Adding text annotations / Printing Rev 3: Filling in forms and signing the document / Assembling the document: adding or deleting pages, bookmarks, thumbnail images / Printing in a way that allows one to obtain a digital copy.
Encryption dictionary entries for the standard security handler R Revision (2, 3 or 4) O String related to Owner and User Password U String related to User Password P Permission mask
For people who don't know how to do this themselves, Password Crackers Inc. will remove permission bit protection of some document for $40, and password protection for $500. They write:
Our Acrobat .pdf recovery service does not brute-force check for passwords. We search for the encryption key that Acrobat used to encrypt the file. There are many fewer keys than possible passwords, hence we are able to search all of the possible keys in less than 25 days.
On the other hand, brute force may not be required.
Many services come with a default password and instructions to change that immediately, but often the default is left. See (the old and outdated) Default Password List to find, e.g., the default Debian LILO password, or the old Slackware user names without password. See also this webzcan list.
Example WWWBoard is a threaded World Wide Web discussion forum and message board. It comes with a default password file
WebAdmin:aepTOqxOi4i8U(or
WebAdmin:aeb/uHhRv6x2LQvxyii4Azf1
, or
WebAdmin:$1$ae$eVdFF2d.W9C3JSO3qluZ70
)
where the password is WebBoard. It is easy to find instances of WWWBoard
with the default password untouched, for example in /bbs/passwd.txt
.
Now in /cgi-bin/wwwadmin.pl
one might find the admin script
and help the sysadmin by discarding unwanted messages.
Cisco reports (April 2004): A default username/password pair is present in all releases of the Wireless LAN Solution Engine (WLSE) and Hosting Solution Engine (HSE) software. A user who logs in using this username has complete control of the device. This username cannot be disabled. There is no workaround.
Three days later we see: Backdoor in X-Micro WLAN 11b Broadband Router: The following username and password works in every case, even if you set an other password on the web interface: Username: super, Password: super. By default the builtin webserver is listening on all network interfaces (if connected to the internet, then it is accessible from the internet too). Using the webinterface one can install new firmware, download the old, view your password, etc., etc. (This is a funny one. The X-Micro people soon "fixed" this problem, and released new firmware. The new firmware has backdoor username and password "1502".)
There are lots and lots of such cases. Sometimes planned, sometimes by accident. Access paths needed in the testing phase are not removed when the product is released.
Some systems allow read-only access to kernel memory (e.g. in order
to allow the ps
utility to read system tables), and one
can read tty input buffers and snoop passwords.
(Nostalgia - this
worked beautifully on Unix V6.)
On a local network (and local may be the wireless network in the building you stand in front of) one can sniff passwords using an ethernet sniffer.
Some passwords are sufficiently interesting to be published in the news.
Usually such posts are removed rather quickly again, but internet has a
memory, and it is very difficult to erase what was published once.
For example, recently (2007-02-11) arnezami
published
that the processing key for HD DVD discs is
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
.
Today Google gives more than a million hits on this string.
Many sites have lists of cracked MD5 passwords. So if you find one, say
c3875d07f44c422f3b3bc019c23e16ae
, then ask Google before trying
to crack. Immediately a dozen sites will tell you that this is denis
.
If one is going to do brute force, a lot of time is needed. But if passwords have to be cracked repeatedly, it is possible to do an (expensive) precomputation with the result that subsequent password cracking will be fast.
The original idea is due to Hellman. Later many refinements have been proposed. A primitive version goes as follows.
Suppose one has a known algorithm, known plaintext and known ciphertext, and wants to find the key used. (This is often the case with passwords: the password is used as the key, and a constant is enciphered, and the password file contains the result of enciphering.) Say C = E(K,P), with obvious abbreviations.
It is inconvenient when C is longer than K, so we invent a reduction R that shortens C to have the same length as K, maybe just by throwing out some bits. Put f(K) = R(E(K,P)). Then f is a map from the space of keys to itself.
Starting from a key K0, put K1=f(K0), K2=f(K1), ... until we reach Kn, for some very large n. Store the pair (K0,Kn) in memory, and forget the intermediate steps.
The idea is to cover the space of all keys by such "threads" of keys, and to remember start and end of each thread. Either give all threads the same length n, or choose ends in such a way that they have some recognizable property, maybe having the first ten bits 0 or so.
Now the password cracking works as follows: given f(K) we would like to find K. Apply f many times to f(K) until we find the end Kn of the thread. Find the pair (K0,Kn) from the table. Apply f many times to K0 until we reach f(K). The step before that we had K.
This approach is best when there are no collisions, no pairs of distinct keys K, K' with f(K) = f(K'). If there are, then the threads merge and there wil be several possible K0 for a given Kn, leading to larger tables and larger cracking time. There is nothing one can do about the function E, it is given, but Oechslin proposes to vary the reduction function R along the thread. That speeds up things by a factor of at least 2.
On lasecpc13.epfl.ch an "instant NT password cracker" is available, that uses a 0.95 GB precomputed table to crack alphanumerical WinNT passwords in 5 sec average (and returns "not found" when the password contains non-alphanumerics). The precomputation took about six CPU days. For the theory, see Oech03.pdf.
Oechslin-type precomputed tables are known as rainbow tables. If the password algorithm uses a b-bit random salt, precomputation is made a factor 2b more difficult. See also Wikipedia.
Now that GPU's have become so fast, rainbow tables are obsolete.
Sometimes it is possible to recover a password or key by observing the timing or other external characteristics of a server or password checker.
There are several versions of the story referred to earlier that it used to be possible on the Tenex system in the early seventies to recover a password by laying out an attempted password across a page boundary, and observing whether the checker incurred a page fault.
The bug was that you could align the given password string so that it was at the end of a page boundary, and have the next page in the address space mapped to a non-existant page of a write-protected file. Normally, Tenex would create a page when you tried to access a non-existant page, but in this case it couldn't (since the file was write-protected).So, you did a password-checking system call (e.g. the system call which tried to obtain owner access to another directory) set up in this way and you would get one of two failures: Incorrect Password meant that your prefix was wrong, Page Fault meant that your prefix was right but that you need more characters. -- Mark Crispin
One can successfully use timing in attacks using blind SQL injection.
For example, give queries that will conditionally do a BENCHMARK
or sleep/delay, and in this way discover user names and passwords
in essentially the same way as in the Tenex example, where now
one sees whether or not a delay occurs.
The time needed for a memory access depends on whether the looked for data was cached already. This non-uniformity can be exploited. For example, Dan Bernstein demonstrated that one can recover an AES key by statistics over timing data. And Colin Percival shows that when hyperthreading is enabled, one process can determine which cache lines were used by the other process, and thus get strong information on an SSL key used. This means that data-dependent table lookup is suspect now.
A somewhat different situation is that where a large database is meant to be freely accessed by people (using a browser), but not by robots who might copy the entire contents. Frequently the protection consists of a picture with more or less clearly visible text, where the user is requested to type the text in an input field.
Such images are now known as captchas, and used everywhere where it is desirable to verify that the software is dealing with an actual human. An important use is that against spam robots. Long ago I wrote
Project Write (or find) a program that recognizes the text in such images.
but there are many such programs now, and captchas that cannot be read by software tend to be difficult to parse for humans as well. There exist audio versions for blind users. See also Wikipedia.