Open Source Pwned Passwords with FBI Feed and 225M New NCA Passwords is Now Live!
In the last month, there were 1,260,000,000 occasions where a service somewhere checked a password against Have I Been Pwned's (HIBP's) Pwned Password API. 99.7% of the time, that check went no further than one of hundreds of Cloudflare edge nodes spread around the world (95% of the world's population is within 50ms of one). It looks like this:
There are all sorts of amazing Pwned Passwords use cases out there. For example, the Hims personal wellness website:
Or literally thousands of other services doing everything from providing their own password checker through to checking their customers' passwords on every registration, login or password change to see if it's previously been breached. And per the network request in the above image, every single password check is done using the k-anonymity model I launched back in 2018 to ensure that privacy is maintained and passwords can safely be checked without risk of disclosure. Oh - and it's all 100% free from top to bottom 😊
Today, I'm really excited to mark a major milestone in the project thanks to the support of two of the world's foremost law enforcement agencies, the FBI and the NCA.
Open Source Code and FBI Ingestion Pipeline
This has been a long time coming, and it's finally here. Today. Now. I mean it's literally live and working as you read this 😎
Last year I wrote about my intention to begin open sourcing parts of HIBP. I'm only able to run this project due to support from the community, so I wanted to start giving it back to the public in a bid to make it more open, more sustainable and in turn, more valuable to every single one of you using it. I made the decision to begin with Pwned Passwords and in May, I transitioned it into the .NET Foundation and announced we'd be building an ingestion pipeline. This pipeline enables the ingestion of passwords from law enforcement agencies, like the FBI. The premise is simple: during the course of their investigations, they come across a lot of compromised passwords and if they were able to continuously feed those into HIBP, all the other services out there using Pwned Passwords would be able to better protect their customers from account takeover attacks.
Fast forward to now and that ingestion pipeline is finally live. If you're using the Pwned Passwords API to check passwords, you're already benefiting; every new password added to the service will automatically be checked each time you call that API. Further, passwords already in the service are having their prevalence value updated to ensure you know just how bad those passwords really are.
I want to acknowledge Stefán Jökull Sigurðarson's role in making this possible. He stepped up and coordinated the community, worked with our FBI contacts, upgraded the tech stack to the latest and greatest versions and brought this whole thing to reality. He volunteered his time to make this possible and I'm enormously grateful for that, thanks mate 🙏
All of this alone would be awesome in and of itself, but as they say, there's just one more thing...
225M More Passwords From the NCA
The UK's National Crime Agency has done some wonderful work over the years to combat cybercrime. Back when I could travel, I'd often catch up with NCA folks in London and it was always fascinating to get just a little glimpse into how they were tackling things in that corner of the world. I used to show a short video of theirs at the beginning of many of my talks; it's titled Teenage Cybercrime: Help your child make the right choices and it formed part of their #CyberChoices campaign. I'd run it up front whilst people were filtering into the room because it was fun and light-hearted, but it told a serious story. There are a bunch of really smart kids out there and they find themselves at a crossroads where they could easily go down the wrong path with computer crimes but equally, easily be steered in a direction which may produce a wonderful career for them. The NCA wanted to help parents identify when kids may be at that crossroad and steer them in the right direction.
But let's get back to passwords: A little while back I was having a chat with some NCCU folks (the NCA's National Cyber Crime Unit), and talk turned to passwords. Turns out that like the FBI, they come across rather a lot of them and they had a very large corpus (as in hundreds of millions) they believed weren't already in HIBP. Now, keep in mind that before today's announcement, there were already 613M of them in the live Pwned Passwords service (and many millions more in my local working copy waiting for the next release), so the NCA's corpus represented a significant increase in size. Working in collaboration with the NCA, I imported and parsed out the data set against the existing passwords, I found 225,665,425 completely new instances out of a total set of 585,570,857. As such, this whole set (along with other sources I'd been accumulating since November last year) has all been rolled into a final version of the manually released Pwned Passwords data.
So, what does this mean mechanically? Two things: firstly, every single one of those NCA passwords is now searchable in the live API. Want to check? Here are some new passwords in the data set:
Secondly, as with the previous 7 releases, version 8 is downloadable as both SHA-1 and NTLM hashes (read back to the original release blog post if you're wondering why passwords for this service are stored in a fashion we'd never do with normal user credentials). You can download these, query them offline or build your own private Pwned Passwords which is now easier than ever as all the code is open source 😊
In The NCA's Words
I asked the NCA if there's anything they'd like to add here, and they were kind enough to provide me with the following:
The UK National Crime Agency’s (NCA) mission is to protect the public by leading the UK’s fight to cut serious and organised crime. Within the cyber arena, their National Cyber Crime Unit (NCCU) works proactively to identify members of the public who could be at risk of harm, whether through the fraudulent use of personal details stolen via cyber offences, or through cyber-attacks enabled by credentials held by criminal groups.
The NCCU's Mitigation@Scale team have, over the last few years, been engaging with companies who they identified as having had user account details compromised or stolen, to help them secure these accounts and protect members of the public from further victimisation.
During recent NCA operational activity, the NCCU’s Mitigation@Scale team were able to identify a huge amount of potentially compromised credentials (emails and associated passwords) in a compromised cloud storage facility. Through analysis, it became clear that these credentials were an accumulation of breached datasets known and unknown.
The fact that they had been placed on a UK business’s cloud storage facility by unknown criminal actors meant the credentials now existed in the public domain, and could be accessed by other 3rd parties to commit further fraud or cyber offences.
Because the credentials identified could not be attributed to any one company or platform, the NCCU engaged with Troy Hunt, the CEO and creator of the ‘Have I Been Pwned’ (HIBP) website. The NCCU’s Mitigation@Scale team conducted a comparison of the compromised data against the HIBP password repository to identify any previously unseen passwords now in the public domain.
As a result of this activity, over 225 million compromised passwords previously unseen by HIBP were provided by the NCA to HIBP for incorporation into their password repository, allowing them to be checked by individuals and companies worldwide seeking to verify the security risk of a password before usage, supporting the NCA’s mission to protect the public from cyber criminality.
K-anonymity API vs. Downloadable Corpus
The k-anonymity API is the fastest possible way of getting up and running with Pwned Passwords. Not just in terms of upfront effort, but because of the nature of Cloudflare and the proximity of their edge nodes to major data centres, it's blisteringly fast. The k-anonymity model was already awesome for privacy and I made it even better again last year with the introduction of padding which protects against an MitM drawing conclusions based on TLS packet sizes. But perhaps most importantly, what the API gives you is the best way to identify newly identified compromised passwords because you don't have to do anything to get access to them!
The downloadable corpus, on the other hand, is... substantial. If you want the latest set in SHA-1 format ordered by prevalence, that's a 17.2GB zip file that expands out to more than double that. Part of the reason it's so large is that it's all passwords I've seen; short ones, long ones, sequential ones - whatever - the point is it's not filtered down at all. I've had many requests from people asking me to bundle everything up into separate zips such that they only contain passwords longer than [insert your own personal preference here]. It's not going to happen and it's not a problem anyway if you use the k-anonymity API. However, if you really want to download them and process them in a more optimised fashion, check out Scott Helme's recent blog post on how he used Count-Min Sketchon Pwned Passwords.
Lastly, as of right now, the code to take the ingestion pipeline and dump all passwords into a downloadable corpus is yet to be written. We want to do this - we have every intention of doing this - but given how long it frequently was between releases, we don't feel the need to rush. The priority has been building the ingestion pipeline and providing new passwords via the k-anonymity API and as of now, that's working beautifully.
Today's release brings the total Pwned Passwords count to 847,223,402, a 38% increase over the last version. More significantly, if we take the prevalence counts into consideration that's 5,579,399,834 occurrences of a compromised password represented in this corpus.
But more importantly, today's release is about turning on the firehose of new passwords and making them immediately available to everyone for free. Having this open to the community, owned by the community and supported by the FBI and NCA is an enormously pleasing result, and I couldn't be happier than to end the year on this note 😊