Last week we discovered many of the details of how to begin optimizing a website for a search engine based on Google’s Webmaster Guidelines. This week, we are looking at why our searches almost always return the desired results thanks to Google Basics. The three key processes of crawling, indexing, and serving are essential to Google’s success. The crawl process is done by Googlebot, which is described as a “huge set of computers” used to fetch billions of pages on the web. After finding new sites, changes to existing sites, and dead links, Googlebot enters the next important process: indexing.
The indexing process sounds even simpler than the crawl process, but I imagine it isn’t. Not only does Googlebot take visible content into account, but it also searches key content tags and attributes hidden in each site’s HTML. After reading the explanation, I’m left with a few questions I’d like to ask Google. First, how does Googlebot determine whether a word is misspelled or not? There are lots of abbreviations, acronyms, and slang on the web that are definitely searchable. It must be difficult to have Googlebot recognize these as opposed to misspellings. Next, how does the index produce such accurate results when a user enters a long phrase, sentence, or question? I seem to always get accurate results when I enter a lengthy query. However, I picture an index as only containing individual words. Based on this, I would think each word would return different results.
The final key process of serving is based on many factors, but in the simplest terms, it’s based on keyword consistency from the index and PageRank. Thanks to Jessica’s presentation last week, we are now very familiar with backlinks, which are incoming links to a webpage from another page. These are the primary contributors to PageRank. Luckily Google has found ways to deter illegal backlink practices, which I’m sure were cluttering search engines in their early phases. Another step I really appreciate in the serving process is the Did you mean feature, which displays related terms, popular queries, and common misspellings like I mentioned earlier in this post. I find it incredible that Google’s algorithm can determine these factors and return accurate search results in mere milliseconds.
The processes discussed in this post remind me of a real-life version of the movie The Matrix. I’m not too concerned with machines developing artificial intelligence and taking over the world, but it is amazing how powerful and seemingly smart Googlebot is. I know it’s based on a lot of genius programming, but still, the sum of Googlebot’s parts greatly exceeds any one human brain’s ideas. Hopefully the machines stay friendly and keep making our search engine lives that much easier!
Questions for this week:
Question 1: Of the three processes discussed in this week’s post, which one do you think is the most impressive and why?
Question 2: What are some steps you could take to increase the amount of backlinks to help improve the PageRank of one of your websites?