But That’s What I Wanted!
As you grom in your Google-Fu, you will undoubtedly want to perform a search that Google’s syntax doesn’t allow. When this happens, you’ll have to find other ways to tackle the problem. For now though, take the easy route and play by Google’s rules.
Google Highlighting
Google highlights search terms using multiple colors when you’re viewing the cached version of a page, and uses a bold typeface when displaying search terms on the search results pages. Don’t let this confuse you if the term is highlighted in a way that’s not consistent with your search syntax. Google highlights your search terms everywhere they appear in the search results. You can also use Google’s cache as a sort of virtual highlighter. Experiment with modifying a Google cache URL. Locate your search terms in the URL, and add words around your search terms. If you do it correctly and those words are present, Google will highlight those new words on the page.
Allintext: Locate a String Within the Text of a Page
The allintext operator is perhaps the simplest operator to use since it performs the function that search engines are most known for: locating a term within the text of the page. Although this advanced operator might seem too generic to be of any real use, it is handy when you know that the text you’re looking for should only be found in the text of the page. Using allintext can also serve as a type of shorthand for “find this string anywhere except in the title, the URL, and links.” Since this operator starts with the word all, every search term provided after the operator is considered part of the operator’s search query.
For this reason, the allintext operator should not be mixed with other advanced operators.
Googleturds
So, what about that link that Google returned to r&besk.tr.cx? What is that thing? I coined the term googleturd to describe what is most likely a typo that was crawled by Google. Depending on certain undisclosed circumstances, oddball links like these are sometimes retained. Googleturds can be useful, as we will see later on.
Tools & Traps…
How’d You Do That?
The data in Table 2.2 came from two sources: filext.org and Google. First, I used lynx to scrape portions of the filext.org Web site in order to compile a list of known file extensions. For example, this line of bash will extract every file extension starting with the letter A, outputting it to a file called extensions:
Then, each extension is fired through a Google filext search, to concentrate on the Results line:
for ext in $$cat extensions$$; do lynx -dump
“http://www.google.com.ezp-prod1.hul.harvard.edu/search?q=filetype:$ext” | grep Results | grep “of about”; done
The process took tens of thousands of queries and several hours to run. Google was gracious enough not to blacklist me for the flagrant violation of its Terms of Use!
How’d You Do That?
The data in Table 2.2 came from two sources: filext.org and Google. First, I used lynx to scrape portions of the filext.org Web site in order to compile a list of known file extensions. For example, this line of bash will extract every file extension starting with the letter A, outputting it to a file called extensions:
Then, each extension is fired through a Google filext search, to concentrate on the Results line:
for ext in $$cat extensions$$; do lynx -dump
“http://www.google.com.ezp-prod1.hul.harvard.edu/search?q=filetype:$ext” | grep Results | grep “of about”; done
The process took tens of thousands of queries and several hours to run. Google was gracious enough not to blacklist me for the flagrant violation of its Terms of Use!
Notes from the Underground…
Bad Google Hacker!
If Gandalf the Grey were to author this sidebar, he wouldn’t be able to resist saying something like “There are fouler things than characters lurking in the dark places of Google’s cache.” The most grave examples of Google’s power lies in the use of the numrange operator. It would be extremely irresponsible of me to share these powerful queries with you. Fortunately, the abuse of this operator has been curbed due to the diligence of the hard-working members of the Search Engine Hacking forums at http://johnny.ihackstuff.com. The members of that community have taken the high road time and time again to get the word out about the dangers of Google hackers without spilling the beans and creating even more hackers. This sidebar is dedicated to them!
No comments:
Post a Comment