Estimating Password Strength

We all struggle with passwords, but why are they so difficult? The fundamental problem with passwords can be stated in the following way:

Passwords must be unpredictable

AND
Passwords must be remembered by users.

These two requirements are in conflict. Unpredictable passwords are difficult to remember because they must include random characters. Randomness is what makes them less predictable. Password policies often require users to include numbers, uppercase letters, and special characters in their passwords, making the resulting strings less meaningful and more difficult to remember.

However, this is not the whole story. A set of passwords is only truly strong if it is unpredictable to an attacker. Passwords are guessed by bots, not fellow humans. To explore this, my dissertation estimated password strength using machine learning to build models from the passwords leaked in data breaches. Even if a password appears random, an attacker could know about it through a data breach.

Billions of passwords have been leaked from over 600 websites (as of this writing), rendering many human-chosen passwords insecure. In addition, machine learning can mimic how humans generate passwords, making passwords we will create in the future insecure as well.

The chart below shows how the model I developed performed on a random sample of 1000 passwords created under an 8-character password policy. This password policy is easy to predict and therefore a poor choice compared to other policies. Much of my work focused on finding better password policies. Our research group also used this framework to evaluate the passwords of over 25,000 students, faculty, and staff at CMU, providing further evidence of the strength (or weakness) of human-chosen passwords.

Make this chart full screen (and try hovering over the line!)

My dissertation is available here and the code for the framework I developed can be found on GitHub.