I like to thank Dale Johnstone and Meng-Chow Kang for their useful comments and suggestions.
Recent security breaches further confirm that a password alone is not an effective security measure. An increasing number of cyber attackers are going after the password storage, be it a file or a database. Password storage once exposed, results in an impact that has been shown to be catastrophic. Password storage facilities today need to designed on the basis that a password storage mechanism will be compromised by a malicious party who will get a copy of the password credentials. In both the Target and eBay cases, it is hacking. In other cases, it may be the result of a lost backup tape or as a result of an insider exploiting their authorised access for unauthorised purposes. Through the application of the defense in depth principle, the formulation of an additional layer of control to mitigate the risk of a compromised password storage is becoming more critical.
RSA tried to tackle this issue by using threshold cryptography, which protects through the splitting of an individual password across two servers, addressing the risk of a single point of compromise at any one server. This may be a solution to the rich man. However, I am going to suggest a DIY way for the more cost conscious and tech-savvy to implement this additional layer of defence.
A better strategy would be that we DO NOT store all input with the password hash. Making it less feasible to automate a password cracking system. When a password is required to be retrieved through a manual process that requires human interactions, it changes the game completely by making it extremely resource intensive to scale up an attack. It is easy to add hundreds of servers while it will be cost prohibitive to recruit a hundred computer operators to perform password screening.
Best Practises of Password Storage
So what do the cyber attackers get after gaining access to your system? When a cyber attacker gains control of a system, they will likely make a copy of your security configuration files, including database files, password file, private keys and logs. From these files, a cyber attacker obtains full knowledge of :
- The password hash
- Username, email, phone number, DoB etc
- User login records and transaction records (like invoices)
- Password reset questions plus answers
With the above information, a cyber attacker could launch Brute force attack, Dictionary attack and Birthday attacks. There is an excellent article on how to defend against these attacks, “How to encrypt user passwords?” provides 6 rules on how to store passwords securely as follows:
- Encrypt passwords using one-way techniques, this is, digests
- Match input and stored passwords by comparing digests, not unencrypted strings
- Use a salt containing at least 8 random bytes, and attach these random bytes, undigested, to the result
- Iterate the hash function at least 1,000 times
- Prior to digesting, perform string-to-byte sequence translation using a fixed encoding, preferably UTF-8
- Finally, apply BASE64 encoding and store the digest as an US-ASCII character string
Assume a developer follows these rules strictly and correctly, what are the risks when a cyber attacker gets a copy of the password storage and user information? The password is still safe with cryptography sound hashing, right?
No. These six rules offer good protection if the following assumptions hold:
- The password is complex enough
- The password is stored totally independent of the user’s personal details, like Date-of-Birth, spouse’s name, address etc
A brute force attack is still a very efficient way to uncover passwords like “123456” or “imapassword”!! If the user creates a password using a Date-of-Birth or home street name, then a cyber attacker could generate a set of possible password combinations using the user’s personal information. A dictionary attack is still a high risk. Users who create a password using personal information are the low hanging fruit. “PaulLivein_Florance” is a complex password in mathematical sense but is not complicated at all when hacking is involved.
Let us assume you are tasked to decrypt a password using the above-mentioned six rules. What will you do? Here are some examples:
1. Find the shared parameter using a few simple passwords “123456” or “abcd1234”. The shared parameters will be the hash iteration count shown above in rule 4, minimum, maximum password length.
2. Optimise the decryption mechanism with knowledge from step 1 and harvest ALL simple passwords.
3. For the remaining passwords, create hash using combinations of personal information (i.e “DoB_name”, “NameInCity” etc). Then match them with the stolen password storage.
With these three simple steps, most likely there are enough user/password pairs that will keep a cyber attacker busy for a few weeks. The cyber attacker may also possibly start profiting from it! While the cyber attacker is busy shopping using stolen information, the decryption process does not stop. It will continue to yield golden eggs everyday.
Applying defense in depth, an additional layer of defence
When password storage is compromised, our only defence boils down to protecting the secret by making it computational expensive to transform and reveal passwords in plain text. This worked in the past without cloud computing and GPU arrays. The Bitcoin gold rush has already created cloud based infrastructure for GPU mining. Relying on a computational expensive hash process alone is not the best strategy.
My suggestions will involve changing rule 3 and rule 4.
The above 6 rules focus on hashing and making it computational intensive. But these 6 rules ignore that the good guys has an advantage over the cyber attacker and do not make effective use of it. Correct password attributes give us an advantage over the cyber attacker.
For rule 3, the simple and traditional way is adding the salt and password directly. But we do not need to store the random salt side by side with the password! We store the password hash and the salt totally separate from each other in a manner where there is no logical relationship between the password hash and the salt. The good guys will associate which salt is to be used with which password hash using user input, which could be the answer to a security question or the sum of ASCII code of a user’s password.
Let me provide an example:
When user enters his password “Pass123”, these characters are converted to a number using a formula like ASCII(P)x10+ASCII(a)x100+ASCII(s)x1000+ASCII(s)x10000 etc.. which results in “566175500”. Let us call this number the salt index. Then the system will use the salt at location 566175500 as an input for the hash function.
How the formula is designed is not important. Anyone one can create one, as long as it maps the user-entered password string to a large number with few collisions. However, collisions will happen when two passwords generate the same number. Which will also mean two passwords are using the same salt. Using the above example formula, “Pass123” and “Pkrs123” both give the same salt index “566175500”. So the same salt is used for generating the password hash “Pass123” and “Pkrs123”. Collisions happen since we are using password attributes (like length, its ASCII code etc.) to generate salt index. This collision gives us another advantage over one-to-one mapping.
As the cyber attacker does not know the correct password, even the attack needs to determine the formula to generate the salt index and then spend time to cracking the passwords. The password cracking output will give multiple passwords for one user. Among this multiple possible passwords, which one is used by the user? “Paul@2014” is more likely to be a password than “Pkvl@2014”, right? To distinguish the linguistic differences and make a right guess, will require human cognitive brain. A computer program will not be able to tell immediately, without help from some linguistic algorithm. This is the game changer I mentioned earlier in the article. Making it less feasible to automate a password cracking system. An attacker trying to retrieve password from a stolen password storage will need to go through some manual steps that requires human interactions, it changes the game completely by making it resource intensive to scale up an attack. We choose the right battlefield to setup our defense.
Another advantage is of this approach is to prevent possible spillover when users reuse password in multiple websites/applications. When eBay has a data breaches and possible user password leakage, the real risk is on users reusing same eBay password in their email account or bank account. With this suggest method, the attacker will find a list of possible password instead of one. It is less likely for the attacker to gain immediately access to other websites/applications belonging to the same user.
I have to admit the headline statement is a bit exaggerating and it is not entirely hack proof, but engineering a mechanism to pinpoint a weakness of password cracking by adding manual interpretations certainly introduces some advantages.