Finding a needle in a haystack – Lyft Engineering


Secrets don’t belong in source code. At Lyft we use a secret management system (Confidant) to ensure our repositories are free of secrets, so we also want to ensure no one accidentally adds secrets into repositories. Manually auditing for this is laborious, so we wanted to add automated tests that fail if secrets are introduced in pull requests.

OpenStack recently released Bandit, a static analyzer that traverses abstract syntax trees (ASTs) of Python code. Bandit looked like a solid base for building an automated Python analysis tool, and its architecture supports plugins, so we created a Bandit plugin for identifying secrets in source code.

Our plugin, bandit-high-entropy-string, captures strings from the AST and attempts to detect if the strings are secrets. The problem with most secrets is that they’re randomly generated (have high entropy), which means unlike other strings, for instance dangerous SQL statements, secrets are difficult to find in an automated way. The high entropy itself is something that we can use as a marker, though.

As an example, let’s look at the following code:

In the above, when the AST is parsed, the plugin captures assignments, function definitions, function calls, comparisons, dicts, lists, etc. When it captures these, it takes a look at the surrounding context to bump the confidence level up or down.

If a string is being assigned to a variable that looks like it would store a secret, it has higher confidence. If it’s being assigned to a variable that looks safe or is inside of a function call that looks safe, it has lower confidence. We also lower the confidence on strings that look like common patterns in Python. For instance, strings referencing files on a filesystem, or URLs, or Flask routes.

We first assess the confidence of a string based on its context and then we use Dropbox’s zxcvbn library to determine the entropy of each string. If the string’s total entropy is high we increase the confidence, if the entropy per character (total entropy / string length) is high we increase the confidence some more.

Right now the plugin has relatively low noise and good signal when Bandit is run to filter all but high confidence issues (-iii). It’s not perfect by any means, though, so please give it a try, and open issues and send in pull requests for improvements!

Thanks to the OpenStack security team for giving us help and feedback on the plugin and, of course, for writing Bandit. Thanks also to the Dropbox product security team for feedback on docs and signal/noise issues.

Interested in open source work and having a big impact? Lyft is hiring! Drop me a note on Twitter or at ryan.lane@lyft.com.

N E X TAnnouncing Confidant, an open source secret management service from Lyft



Source link