Google can’t rank what it hasn’t indexed. Newbie webmasters will sometimes take for granted that Google can and will access their entire sites. But that’s not always the case. Lots of things can get in the way of a full crawl, including duplicate content, unfriendly navigation and too many parameters in URLs.
So the first thing you’ll want to do is determine what Google’s indexed. There are several ways to do this, but my favorite three are Webmaster Tools, viewing your cached page and doing a site: search.
There are many, many great reasons to use Google’s Webmaster Tools, and they include that Google will actually tell you exactly how many of your pages it’s indexed. Here’s how to add your site if you haven’t already:
To add and verify a site:
- Sign into Google Webmaster Tools with your Google Account.
- Click Add a site, and type the URL of the site you want to add. Make sure you type the entire URL, such as http://www.example.com/
- Click Continue. The Site verification page opens.
- (Optional) In the Name box, type a name for your site (for example, My Blog).
- Select the verification method you want, and follow the instructions.
The very easiest, simplest way to tell if a particular page has been indexed in its entirety is to perform a search for the cached version of the page. This shows you what’s in Google’s cache, or repository of pages it might choose to rank, for your page. To see the cached version of a page on your site, or any site, Google (without quotes) “cache:http://yourpage.com/interior-page”. Remember to leave no space between the colon and your URL.
Is your page there, looking like it should look, with all its images and navigation? If something missing, that means Google’s not accessing it.
The last way to see what Google’s indexed of your site is the site: command. To do this, just Google (without quotes) “site:http://mysite.com/”.
Indexing and caching aren’t the same thing. Indexing means eligible for ranking, caching just means Google could access the page. The best information is that which you’ll get from Webmaster Tools, because it lets you know which pages have actually been indexed. The cache: command is great for seeing what is accessible on your pages. The fact that a page is indexed doesn’t mean everything on the page is indexed.
It doesn’t help to optimize or build links to a page that’s not accessible, so this should be one of the first steps to any optimization effort.
Issues with any of this? Other tips for seeing what Google’s indexed? Let me know in the comments.