Smart searching with googleDorking
“googleDorking,” also known as “Google hacking”, is a technique used by newsrooms, investigative organisations, security auditors as well as tech savvy criminals to query various search engines for information hidden on public websites and vulnerabilities exposed by public servers. Dorking is a way of using search engines to their full capacity to penetrate web-based services to depths that are not necessarily visible at first.
All you need to carry out a googleDork is a computer, an internet connection and knowledge of the appropriate search syntax.
This guide will describe what googleDorking is and how it works across different search engines, provide tips on how to protect yourself while googleDorking and suggest ways to protect your websites and servers from those who would use these techniques for malicious purposes.
History
A brief history of the googleDork
googleDorking has been in documented use since the early 2000s. Like many of the most successful hacks, googleDorking is not technically sophisticated. It simply requires that you use certain operators — special key words supported by a given search engine — correctly and sometimes creatively. Johnny Long, aka j0hnnyhax, was a pioneer of googleDorking. Johnny first posted his definition of the newly coined term in 2002:
Johnny Long's 2002 definition of a googleDork.
In an 2011 interview, Johnny Long said, “In the years I've spent as a professional hacker, I've learned that the simplest approach is usually the best. As hackers, we tend to get down into the weeds, focusing on technology, not realizing there may be non-technical methods at our disposal that work as well or better than their high-tech counterparts. I always kept an eye out for the simplest solution to advanced challenges.”
Rather than an ordinary type of search query that focuses on a semantic way of asking questions, either directly through writing the whole question or selected key words, googleDorking is based on reverse engineering the way machines scan and index web content.
In this context, googleDorking uses search functions beyond their semantic role, which not only changes how we typically imagine using search engines, but also vastly expands the capacity of the tool in the hands of people searching for a way of exploring content and access to various services.
Such access might lead to the discovery of information that can be used for fraud or terrorism, finding information on yourself or your institution, as well as information that assists in the investigation of governments, corporations or powerful individuals. These results, rather than being characteristic of the tool or method itself, instead rely on the intentions of those using googleDorking, the questions they are asking, and what they do with the results.
Dorking exposes vulnerabilities and also unleashes the unintended, often powerful, consequences of searching search engines.
To dork or not to dork
If you are thinking about using googleDorking as an investigative technique, there are several precautions to take. Although you are free to search at-will on search engines, accessing certain webpages or downloading files from them can be a prosecutable offense, especially in the United States in accordance with the extremely vague and overreaching Computer Fraud and Abuse Act (CFAA). Moreover, if you're dorking in a country with heavy internet surveillance (i.e. any country), it's possible that your searches could be recorded and used against you in the future.
As protection, we recommend using the Tor Browser or Tails when googleDorking on any search engine. Tor masks your internet traffic, divorcing your computer's identifying information from the webpages that you are accessing. Security-in-a-Box includes detailed guides on how to use the Tor Browser on Linux and on Windows. Using Tor will often make your searches more difficult. Google and other search engines might ask you to solve captchas to prove you're human. If your Tor exit node has recently been overrun with bots, search engines might block your searches entirely. In this case, you should refresh your Tor circuit until you connect to an exit node that's not blacklisted. To do so, click the onion icon in the upper-left hand corner of the browser and select “New Tor Circuit for this Site,” as shown below.
Please note that, depending on what country you are in, using Tor might flag your online activity as suspicious. This is a risk you must be wiling to take when using Tor, though you can mitigate that risk to some extent by using a Tor Bridge with an obfuscated pluggable transport. Unless your are specifically targeted by an advanced attack, however, the Tor Browser is quite good at preventing anyone from associating your online identity with the websites you visit or the search terms you enter. If you can not use Tor, you might want to find a VPN provider that you trust and use it with a privacy-aware search engine, such as DuckDuckGo.
If you decide to proceed with an investigation that involves googleDorking, the remainder of this guide will help you get started and provide a comparison of supported dorks across search engines as of March 2017.
How it works
Dorking can be employed across various search engines, not just on Google. In everyday use, search engines like Google, Bing, Yahoo, and DuckDuckGo accept a search term, or a string of search terms and return matching results. But search engines are also programmed to accept more advanced operators that refine those search terms. An operator is a key word or phrase that has particular meaning for the search engine. Operators include things like “inurl”, “intext”, “site”, “feed”, “language”, and so on. Each operator is followed by a colon which is followed by the relevant term or terms (with no space before or after the colon).
A googleDork is just a search that uses one or more of these advanced techniques to reveal something interesting.
These operators allow a search to target more specific information, such as certain strings of text in the body of a website or files hosted on a given url. Among other things, a googleDorker can locate hidden login pages, error messages that give away too much information and files that a website administrator might not realise are publicly accessible.
Not all advanced search techniques rely on operators. For example, including quotation marks around text prompts the engine to search for only the exact phrase in quotes. Using an all-caps “OR” between search terms prompts the engine to return results with one term or the other.
A simple example of a dork that does rely on an operator might be:
site:tacticaltech.org filetype:pdf
This googleDork will search https://tacticaltech.org for all PDF files hosted under that domain name.
Another example might look something like this:
inurl:exposing inbody:invisible
If the search term contains multiple words, they should be surrounded by quotation marks:
intext:exposing intitle:“the invisible”
Dorks can also be paired with a general search term. For example:
exposing feed:rss
or
exposing site:tacticaltech.org filetype:pdf
Here, “exposing” is the general search terms, and the operators “site” and “filetype” narrow down the results returned.
Example search results are shown below:
A similar search on https:exposingtheinvisible.org turns up no documents, showing us that there are not any public PDF's hosted on that website:
You can use more than one operator, and the order generally does not matter. However, if your search isn't working, it wouldn't hurt to switch around operator names and test out the different results.
Dorking for Dummies
There are many existing googleDork operators, and they vary across search engines. To give you a general idea of what can be found, we have included four dorks below. Even if two search engines support the same operators, they often return different results. Replicating these searches across various search engines is a good way to get a sense of those differences. (You might also want to have a look at our Dorking operators across Google, DuckDuckGo, Yahoo and Bing table below.)
As you explore these searches, you might locate some sensitive information, so it's a good idea to use the Tor Browser, if you can, and to refrain from downloading any files. (In addition to legal issues, it's good to keep in mind that random files on the internet sometimes contain malware. Always download with caution.)
Example 1: Finding budgets on the US Homeland Security website
This dork will bring you all excel spreadsheets that contain the word budget:
budget filetype:xls
The “filetype” operator does not recognise different versions of the same or similar formats (i.e. doc vs. docx, xls vs. xlsx vs. csv), so each of these formats must be dorked separately:
budget filetype:xlsx OR budget filetype:csv
This dork will bring you all publicly-accessible PDF files on the NASA website:
site:nasa.gov filetype:pdf
This dork will bring you all publicly-accessible xlsx spreadsheets with the word “budget” on the United States Department of Homeland Security website:
budget site:dhs.gov filetype:xls
That final query, performed across various search engines, will return different results, as illustrated below:
On Google, we had to solve a captcha:
Bing
Yahoo
DuckDuckGo
As you can see, results vary from engine to engine. Importantly, the DuckDuckGo query does not return correct results. However, using the filetype operator on its own does return correct results, just not targeted to the dhs.gov website.
But using the ext operator, which serves the same purpose on DuckDuckGo does return results targeted to the dhs.gov website.
You will have to investigate quirks like this as you proceed.
Example 2: Finding passwords
Searching for login and password information can be useful as a defensive dork. Passwords are, in rare cases, clumsily stored in publicly accessible documents on webservers. Try the following dorks in different search engines:
password filetype:doc site:Your site password filetype:docx site:Your site password filetype:pdf site:Your site password filetype:xls site:Your site
In this case, the search engines again returned different results. When we tried this search without the "site:[Your site]" term, Google returned documents that contained actual usernames and passwords for a North American high school. We have blocked out these results in the screenshot below, and notified the school that their data is vulnerable. The other search engines did not return this information on the first few pages of results. As you can see, both Yahoo and DuckDuckGo also returned some non-relevant results. This is to be expected when dorking: some queries work better than others.
Example 3: London house prices
Another interesting example targets housing price information in London, below are the results from the following query we entered into four different search engines:
filetype:xls “house prices” and “London”
Example 4: Looking for security plans on the government of India's website
A final example will locate any documents containing the words “security plan” on Indian government websites, below are the results from the following query we entered into four different search engines:
filetype:doc “security plan” site:gov.in
Perhaps now you have your own ideas about what websites you'd like to focus on with your search. You can find more ideas in this guide from the Center for Investigative Journalism. In the following section, we will share the dorks we found, and how they work across search engines.
Dork It Yourself
Below, is an updated list of the relevant dorks we identified as of March 2017. This list might not be exhaustive, but the operators below should help you get started. In order to understand advanced implementation of these dorks, see the Google Hacking Databases (GHDB). We collected and tested these dorks across search engines with the help of the following resources: Bruce Clay Inc, Wikipedia, DuckDuckGo, Microsoft and Google.
DorkDorkGo
We have included the most widely-used search engines in this analysis. Our recommendation is always to use DuckDuckGo, which is a privacy-focused search engine that does not log any data about its users. However, you should still use DuckDuckGo in combination with Tor while dorking to ensure someone else is not snooping on your search. (For general searching, we also recommend using StartPage, which is a search engine that returns Google results via a privacy filter, also masking user information from Google. However, as important as it is to use privacy-aware search engines in your day-to-day browsing, Tor should offer enough protection to let you dork across search engines. It might be interesting and helpful to your investigation to see the different results that search engines return even when they share the same set of operators.)
Dorking operators across Google, DuckDuckGo, Yahoo and Bing
Key | Colour |
Query works on this search engine | |
Query does not work on this search engine |
Dork | Description | DuckDuckGo | Yahoo | Bing | |
cache:[url]
| Shows the version of the web page from the search engine’s cache. | ||||
related:[url]
|
Finds web pages that are similar to the specified web page.
| ||||
info:[url]
| Presents some information that Google has about a web page, including similar pages, the cached version of the page, and sites linking to the page. | ||||
site:[url]
|
Finds pages only within a particular domain and all its subdomains.
| ||||
intitle:[text] or allintitle:[text]
|
Finds pages that include a specific keyword as part of the indexed title tag. You must include a space between the colon and the query for the operator to work in Bing.
| ||||
inurl:[text] or allinurl:[text]
|
Finds pages that include a specific keyword as part of their indexed URLs.
| ||||
meta:[text]
|
Finds pages that contain the specific keyword in the meta tags.
| ||||
filetype:[file extension]
|
Searches for specific file types.
| ||||
intext:[text], allintext:[text], inbody:[text]
|
Searches text of page. For Bing and Yahoo the query is inbody:[text]. For DuckDuckGo the query is intext:[text]. For Google either intext:[text] or allintext:[text] can be used.
| ||||
inanchor:[text]
|
Search link anchor text
| ||||
location:[iso code] or loc:[iso code]
region:[region code]
|
Search for specific region. For Bing use location:[iso code] or loc:[iso code] and for DuckDuckGo use region:[region code].
| ||||
contains:[text]
|
Identifies sites that contain links to filetypes specified (i.e. contains:pdf)
| ||||
altloc:[iso code]
|
Searches for location in addition to one specified by language of site (i.e. pt-us or en-us)
| ||||
domain:[url]
|
Wider than the site: operator, locates any subdomain containing the “suffix” of the main website's url
| ||||
feed:[feed type, i.e. rss]
|
Find RSS feed related to search term
| ||||
hasfeed:[url]
|
Finds webpages that contain both the term or terms for which you are querying and one or more RSS or Atom feeds.
| ||||
imagesize:[digit, i.e. 600]
|
Constrains the size of returned images.
| ||||
ip:[ip address]
|
Find sites hosted by a specific ip address
| ||||
keyword:[text]
|
Metaoperator; that is, an operator that is used with other operators. Takes a simple list as a parameter. All the elements in the list are searched as and/or pairs together. keyword:(intitle inbody)software. This example is equivalent to intitle:software OR inbody:software.
| ||||
language:[language code]
|
Returns websites that match the search term in a specified language
| ||||
book:[title]
|
Searches for book titles related to keywords
| ||||
maps:[location]
|
Searches for maps related to keywords
| ||||
linkfromdomain:[url]
|
Shows websites that link to the specified url (with errors)
|
Defensive dorking
googleDorking can be used to protect your own data and to defend websites for which you are responsible. In 2011, after googleDorking his own name, a Yale university student discovered a spreadsheet containing his personal information, including his name and social security number, along with that of 43,000 others. The file had been publicly accessible for several years but had not been exposed by search engines until 2010, when Google began to index FTP (file transfer protocol) servers. Once indexed, it was possible for anyone to find, and it might have remained accessible if the student had not informed those responsible. Similarly, within ten minutes of beginning our research for this guide, we located PDFs containing login and password details for two different schools. We alerted both schools, and the information has since been removed.
There are two types of defensive dorking, firstly when looking for security vulnerabilities in online services you administer yourself, such as webservers or FTP servers. The second type concerns sensitive information about yourself, sources or colleagues that might be unintentionally exposed.
The security software company McAfee recommends six precautions that webmasters and system administrators should take, and googleDorking can sometimes help identify failure to comply with the vast majority of them:
- Keep Operating Systems, services and applications are up-to-date
- Make use of security solutions that prevent intrusion
- Understand how search engine crawlers work, know what is public, and audit your exposure
- Move sensitive resources out of public locations
- Block access to all non-essential resources from external or foreign identities
- Perform frequent penetration testing
In fact, googleDorking is an example of that final point. Frequent "penetration testing" can be undertaken by anyone who might be concerned about their data or the data of those they want to protect. To perform defensive googleDorking, we recommend starting with the following simple commands on your own websites, your name, and other websites that might contain information about you. For example:
[Your name] filetype:pdf
You can repeat this search with other potentially relevant filetypes: xls, xlsx, doc, docx, etc. Or you can search for regular website content with:
[your name] intext:[personal information such as a phone number, social security number or address]
See the table above for information about whether your search engine of choice uses intext: or inbody: as the text-searching operator.
You can also search for information associated with the IP address of your servers:
ip:[Your server's ip address]
Other useful test might include:
site:[your website] filetype:[pdf, docx, doc, xls or xlsx]
or
ip:[your ip] filetype:[pdf, docx, doc, xls or xlsx]
If you're not running a lot of websites, scanning through several pages of results should be enough to give you an idea of what's publicly available. However, you can refine this with keywords and other terms taken from the Google Hacking Databases (linked below).
To strengthen this defense, try some of the malicious attacks in the Google Hacking Databases (GHDB) on your own websites and IP addresses. Various incarnations of the GHDB can be found here (the original), here (the original “reborn”), here, and here. Note that these databases include search operators as well as search terms. While they may help attackers locate vulnerable websites, they also help administrators protect their own.
Published on 29 May 2017. Follow us @seeingsideways, get in touch, or read another of our guides here.
No comments:
Post a Comment