echo -n MyPassword | shasum | cut -c6-40
allows the user to create a SHA-1 sum of his password and take the 6th through 40th characters of the result. Then the user could easily search the 120MB file to see if his hash was present in the file. If it was, then of course his password had been leaked and his account associated with that password was at risk.
John the Ripper
When the OpenWall community released a patch to run John The Ripper on the leaked file, it caught my attention. It has been a long time since I have run John The Ripper, and I decided to install this new, community-enhanced "jumbo" version and apply the LinkedIn patch.
John the Ripper attempts to crack SHA-1 hashes of passwords by iterating on this process: 1. guess a password, 2. generate its SHA-1 hash, and 3. check if the generated hash matches a hash in the 120MB file. When it finds a match, then it knows it has a legitimate password. John the Ripper iterates in a very smart way, using word files (a.k.a. dictionary attack) and rules for word modifications, to make good guesses. It also has an incremental mode that can try any possible passwords (allowing you to define the set of passwords based on the length or the nature of the password, with numeric, uppercase, or special characters), but this becomes very compute-intensive for long passwords and large character sets.
The fact that the file of hashed passwords was not salted helps a lot. As an aside, even if they were salted, you could concentrate the cracking session to crack the easiest passwords first using the "single" mode of John the Ripper. But this works best with additional user information like a GECOS, which was not available in this case, at least to the public. So the difficulty would be much greater for salted hashes.
In my case, I have an old machine with no GPU and no rainbow table, so I decided to use good old dictionaries and rules.
I ran the default john command that just launches a small set of rules (like append/prepend 1 to every word, etc.) on a small default password dictionary of less than 4000 words. It then switches to incremental mode based on statistical analysis of known password structures, which helps it try the more likely passwords first. The result was quite impressive because after 4 hours I had approximately 900K passwords already cracked.
But then, as it got to the point were it was trying less and less likely passwords and therefore found matches more slowly, I decided to stop it and run a series of old dictionaries I had: from default common password lists (16KB of data) to words of every existing language (40MB of data). It was very efficient and found 500K more passwords in less than an hour, for a total of 1.4M passwords.
Even though my dictionaries were 10 years old and didn't contain newer words like "linkedin", it appeared that some cracking rules, by reversing strings or removing some vowels could guess new slang words from already cracked passwords.
And as I had just acquired 1.4M valid passwords, I believed that using these newly discovered passwords as a dictionary I could find more. It worked and the rules applied to the already cracked passwords produced 550K new ones. I ran a second iteration using the 550K passwords from the first iteration as a dictionary, and found 22K more. I iterated in this manner a total of ten times.