Breach Hunting 101

June 9, 2018   


I hope this article can give you a basic introduction to the world of ‘breach hunting’ something I have recently come to enjoy (wierd right?). This is by no means an exhaustive list of tools and methodologies but more a primer to get you going in the exhilarating world of spending hours trawling through other peoples lists, spreadsheets and datasets for the greater good.


With all the different methods I find it is more manageable to split my searches by country. Normally I will focus initially on the main AWS regions (IE / US / DE) then move on to the smaller AWS regions (GB / CA / AU) and finally move on to other countries (NL / SE / NO / BE / IN / JP / and many more). I often filter out CN from my results as such a high number of devices in China tends to pollute my set.

Take a look at this ranked list of database flavours to see what is worth searching for


The most popular and well know serch engine for finding things rather than content. Extremely powerful search filters and well worth paying the initial fee if you don’t have a .edu email address.

Combine filters together to hone in on unauthenticated database web admin panels or a simple search for title:"index of /" http.html:"sql" will return surprising results. Limited to 200 pages of results via the web.


A lesser known search engine very much like Shodan, run by Chinese security team Knownsec. Has a huge list of predefined dorks that have been submitted by users. Again very powerful search filters. As both Shodan and ZoomEye are scanning the same Internet your are going to get overlapping results, but I find as the results are ordered differently something on page 200 of Shodan might be on page 2 of ZoomEye.

Google & Other Regular Search Engines

I wont go in to much detail here as there are many resources out there on Google dorking but it’s safe to say that google is still a great resource to find things that shouldn’t be there. Take the following search for example inurl:gov filetype:bash_history

Internet Wide Scan Data

If you can get access to regularly updated Internet wide scan data then you can start to build up your own index exposed services and run a diff on the data every few days to see if any new services have appeared online. You could also conduct your own Internet wide surveys but be warned, most networks won’t react well to being probed you’ll spend a lot of time answering abuse emails, don’t do this from home you will end up on a blacklist

Tracking Down and Contacting a Data Owner

OK, so you have found some juicy data and it’s time to disclose this to the organization, but how can be be sure who actually owns this data?

  • SSL certificates. Check other ports on the sever for HTTPS or other SSL enabled services and check the certificate name.
  • Check to see if there are any websites listening on common web ports such as 80 / 8080.
  • OSINT tools. Take a look at SpiderFoot for this, it will reduce your OSINT tasks to a simple button click
  • Check the data for clues, acronyms or a common theme.
  • Do you see emails addresses? If you have an idea who owns this data then check some of the emails in a ‘forgotten password’ process.
  • Is SSH open on the server? check the SSH banner for a branded warning message.
  • Troy Hunt also published this in depth article on streamlining breach disclosures

Once you are (mostly) certain you know who owns this data how do we go about telling them? I find that a simple short email is the best way to make initial contact. Take a look around their website for a contact email address, a general customer service / support inbox should be a last resort. Take a look in their T&Cs and privacy policy for a data related contact. Also check to see if they have a VRP / bug bounty. Don’t be scared to call them on the phone if you need to.

I use the following template for the first email.


Potential Data Breach  


Please pass this email to your IT team ASAP

I believe I have found an unauthenticated database instance routed to the Internet that belongs to <company>. The dataset contains <a rough overview of the contents without disclosing any actual data>

Please can someone get in touch to confirm the technical details.

Best Regards
<email signature>

If this is a high priority dataset normally I’ll send 3 emails similar to this over 3 to 5 days. If I don’t get a response then I would consider contacting the media to help, an email from a journalist will often cause a company to jump and get things fixed.

How to (try to) Stay Out Of Trouble

There are always going to be risks involved in this kind of work, it is up to you if you want to accept the risk. I’m not a lawyer, this is not legal advice, though I have a few rules that I try to always follow.

  • There is no need to download an entire dataset if you don’t need to. Avoid copying and storing data.
  • Only query a dataset as much as you need to investigate.
  • Keep a log of your IP as you may need to pass it on to a companies security team so they can filter you out of their investigation.
  • Don’t ask for rewards / money / candy in exchange for a disclosure, this is extortion, you are a bad person.
  • If you think a disclosure is going to pose serious legal risk to yourself, proxy through a journalist or another trusted party as a kind of loose safety net.
  • Personally I only check for unauthenticated datasets, I don’t test default or common passwords as this is technically hacking and thus probably illegal in most places.