Last week we discovered many of the details of how to begin optimizing a website for a search engine based on Google’s Webmaster Guidelines. This week, we are looking at why our searches almost always return the desired results thanks to Google Basics. The three key processes of crawling, indexing, and serving are essential to Google’s success. The crawl process is done by Googlebot, which is described as a “huge set of computers” used to fetch billions of pages on the web. After finding new sites, changes to existing sites, and dead links, Googlebot enters the next important process: indexing.
The indexing process sounds even simpler than the crawl process, but I imagine it isn’t. Not only does Googlebot take visible content into account, but it also searches key content tags and attributes hidden in each site’s HTML. After reading the explanation, I’m left with a few questions I’d like to ask Google. First, how does Googlebot determine whether a word is misspelled or not? There are lots of abbreviations, acronyms, and slang on the web that are definitely searchable. It must be difficult to have Googlebot recognize these as opposed to misspellings. Next, how does the index produce such accurate results when a user enters a long phrase, sentence, or question? I seem to always get accurate results when I enter a lengthy query. However, I picture an index as only containing individual words. Based on this, I would think each word would return different results.
The final key process of serving is based on many factors, but in the simplest terms, it’s based on keyword consistency from the index and PageRank. Thanks to Jessica’s presentation last week, we are now very familiar with backlinks, which are incoming links to a webpage from another page. These are the primary contributors to PageRank. Luckily Google has found ways to deter illegal backlink practices, which I’m sure were cluttering search engines in their early phases. Another step I really appreciate in the serving process is the Did you mean feature, which displays related terms, popular queries, and common misspellings like I mentioned earlier in this post. I find it incredible that Google’s algorithm can determine these factors and return accurate search results in mere milliseconds.
The processes discussed in this post remind me of a real-life version of the movie The Matrix. I’m not too concerned with machines developing artificial intelligence and taking over the world, but it is amazing how powerful and seemingly smart Googlebot is. I know it’s based on a lot of genius programming, but still, the sum of Googlebot’s parts greatly exceeds any one human brain’s ideas. Hopefully the machines stay friendly and keep making our search engine lives that much easier!
Questions for this week:
Question 1: Of the three processes discussed in this week’s post, which one do you think is the most impressive and why?
Question 2: What are some steps you could take to increase the amount of backlinks to help improve the PageRank of one of your websites?
Before I go on to your question, I think the way Google is able to find the proper search results is through predictive analysis. I don’t know the computer science behind it (I’m sure there’s some major algorithm), but have you noticed when you misspell a word, Google will ask if you meant something else? Perhaps every click on that option is a means of crowdsourcing for better results. For instance, ‘the’ is commonly mistyped ‘teh.’ The more people who help Google recognize these patters, the easier it is for them to recognize what the intended search might be. I believe its a form of learn-as-you-go artificial intelligence.
As for the most impressive part of Google’s search process, I’m going to have to say index. While I understand the basics of databases, I can’t expand my mind enough to understand how they index the web. Is it alphabetically? Categorically? I can’t even start to figure it out.
Like I mentioned in my post, crawling has to be the easiest. Anyone can write a simple recursive program that can run until it hits the end of the Internet.
I think developing backlinks just has to be organic. You create good backlinks, I think, by creating content (or a product) people want to read about. Unique information is the key. I think Amazon, through its product information and user reviews, and Wikipedia, through, well… linking every subject in the world to others, is a perfect example of this.
I had an experience at my old company (again, the guilty to remain nameless) where I kept trying to explain to them that we needed linkable content. We needed to have our writers and bloggers write stories that people elsewhere, on blogs or other web sites would willingly link to. They didn’t get it.
You can create backlinking through social media, but you can only do so much before you’re regarded as a spammer. The same goes for even churning out content — how many sites have simply turned into content mills to game search engines?
Yeah, I guess we’re all supposed to be cutthroat and ruthless to get sites up at the top of Google’s PageRank. But again, are we trying to serve humans or machines? If a web site is of good quality and creates the kind of content that engages users on the site and elsewhere, then the backlinks will grow organically.
Good thoughts Erin! You’re definitely right about the common misspellings. Google has even gotten to the point where it doesn’t even ask if you misspelled something. They just go ahead and display the results of the correct spelling and allow you to display the wrong spelling if you so choose. Every once in a while I will throw Google a curveball and actually not want it to display what it thinks is correct, but that happens very rarely.
Google’s index has to be one of the most impressive databases to ever exist. Can you imagine what would happen if it were erased? I’m sure it’s backed up and secured in a manner that is unfathomable, but still, the damage done by losing the index would be catastrophic. Indexing the entire web seems like an impossible task, but Google’s index is probably as close as it gets.
It would be nice if the process of producing backlinks was completely organic, but unfortunately I don’t think this is the case. Companies will always have a way of getting their URL out there without actually having people talking about it. Using social media has to be one of the best ways to do it inorganically without being regarded as a spammer. It seems like it’s a very fine line though. I wonder what Google’s cutoff is for having too many social media backlinks? Do they warn people about potential negative PageRank practices or do they just blacklist a site and move on? I’d be very interested to find out.
I tend to think that the spider or the Googlebot may be the most impressive thing simply because of the sheer number of computers and servers that are put into use to crawl through billions of web pages (and growing every day). There is a picture of Google’s color-coded data centers on its site (http://www.google.com/about/datacenters/gallery/#/tech) that gives an indication of the immense machinery that goes into Google’s business. Not only does the bot go through each page, it is sophisticated enough to recognize whether a page is new, has new content, or has dead links. I consider the algorithm that then indexes it all to be an programming marvel, but the bot and the horsepower behind it are staggering.
To increase the number of backlinks, I would make an attempt to find every website that has some tangible connection to mine where I can post a comment and/or a link back to my site. I would contribute content to sites, perhaps as a (free) guest blogger, to form an association that goes back to my site. I would also concoct ways of posting a related video on YouTube that identifies me (which is really my site). I would make sure to have a vibrant social media network that coordinates and relates together so that all of my account links go back to the my site. I would enlist people I know to review my products and put in their social media sites how great those products are. In short, I would make every effort to get my site name out there across multiple sites, and I would have to stay on top of keeping them fresh.
Thanks for sharing those pictures Holly. They are all very impressive and a lot more visually pleasing than I would guess. The color-coded pipes and cables show how organized Google is with their data centers. I never would have guessed it takes that much water to keep Google running smoothly. The size of each facility is also worth noting. Geographically, they are all over the place. I wonder how many data centers Google actually has?
Your ideas for increasing backlinks is spot on. I think it’s important to look at every option, and you hit a lot of them. Keeping all of these fresh would probably be the most difficult part. It might even take a dedicated staff member, depending on the size of your company and how often you want to update everything. The best social media campaigns seem to have daily updates. Finding new blogs to post on is also an endless task. YouTube videos would probably be the least frequently updated, but video production does take a lot more time and resources.
Thanks for the comments Holly!
Good questions, Matt.
I am with Erin, in that indexing is the most impressive part of the process. I am not saying I would be able to write a program that would scan all Internet content (I am not nearly as adept at web design as most of you), but am pretty confident I could figure it out. I have always been interested in organization (I like things to be organized), so understanding how Google takes and organizes all of this information into a workable system is amazing. We are not talking organizing your shoe collection. We are talking about organizing and recalling innumerable amounts of information within seconds. That is mind blowing.
As far as back links are concerned, I think the best way to grow back links is organically. Growing organic back links take time and genuine interest in the area. If you are genuinely interested in fashion, cooking, etc. you are more likely to become more involved in it and to follow the latest trends. If you have something valuable to add to this area, then people will begin to recognize that, but it will take time and effort to get the organic back links. Just my thoughts.
Staying organized is crucial in any successful business. If you haven’t already, you should check out the pictures Holly posted on her comment: http://www.google.com/about/datacenters/gallery/#/tech. Google seems to be well organized in every way possible, from hardware to software. Organizing a massive amount of information is impressive even if it takes a lot of time, but the fact that it can be processed in less than a second is, as you said, mind blowing.
I think you could take all of Holly’s ideas for growing backlinks and apply them to your idea. If you are genuinely interested in a topic, you will be able contribute valuable information to blogs, social networks, and YouTube. If the information is relavent, people will likely check out your site. I think we’ve gotten to the point where people will only click on a link if the content surrounding it is interesting. Anyone can produce inorganic backlinks by just posting a URL on any of the previously mentioned avenues. Coming up with creative content to attract users is the real trick. Thanks Jessica!
Well they are all very impressive but I did love learning about crawling. I find it fascinating that google can find pages and then index them. Indexing is incredible, I imagine like spreadsheet with all the internet’s information. There is so much out there and Google can store all of its information. This is truly amazing, not only because it records all this but also because it keeps itself organize. It knows what’s new and what’s old. Who ever programmed this, is a genius.
As for back linking, if I had a blog I would definitely refer back to my previous posts many times. I was just reading a photography post and this man backlinked so much. It was so helpful to me and for him! His blog is popular and I learned a lot because of his back links. He also have lots of photography blogs linking to him.
Crawling is a great feature that likely set Google apart from the rest of the search engines. Googlebot is always working to make sure everything is up-to-date. Without it, we would probably see a lot of irrelevant information and wouldn’t rely nearly as much on search engines. It seems to be a general consensus that Google’s index contains an unfathomable amount of data. As I said in my first reply to Erin, I just can’t imagine if the index were somehow lost. While that is probably impossible, I think it would set the entire web back years.
Starting a blogging relationship with others that have similar interests is a really good idea for backlinking, especially if the other blogs are popular. Like you said, it not only helps you, but it also helps the author. You can both increase your PageRank while reading posts you are interested in. It’s a win-win!