Making Sure Content Can Be Indexed by Search Engines [VIDEO]
The first in our of Bite Size SEO Videos tips. This video covers how to make sure the content on your site is being indexed and avoid the common pitfalls.
Done with this one? View the next video: Checking & Updating Your Sitemap
Looking for the first video in the series? Find it here: Making Sure Your Sites Content Can Be Indexed
Hi, this is Chris from Dejan SEO. In this tip, I’m going to show you different ways this site can be blocked from being indexed by search engines. It’s important to be able to spot things that can block search engines from indexing your content, because they can prevent your site from ranking. It’s also useful to know how to block pages that you might want keep out of search engines like sensitive, private, or duplicate content.
This is something you should be reviewing at the start of your audit process but tends to be something that you’ll just happen to spot as you’re looking around the site. There are a couple of tools which will make your job a lot easier. First, you might want to install Google Chrome browser. This is a common preference among SEOs and web designers for many reasons, but for this audit, you’ll need to be able to install browser extensions. The same thing can be done in Firefox, but all my examples will be in Chrome.
You’ll also want to have an extension which can check for and highlight noindex and nofollow texts, an extension which can show canonical links, ideally, the Ayima redirect path plug-in or a similar extension which can show redirects and server response codes. And you should already have Webmaster Tools installed, which we mentioned in the first video.
While some of these might sound a bit complicated, once you have them ready, they can make long difficult jobs quick and easy and help you understand more about how your website is put together.
There are several ways your content can be prevented from being indexed including the robots noindex tag, the robots.txt file, canonical links, and server response codes. The noindex tag tells search engines not to add the page to their index while still allowing them to crawl the page and the links on it. It’s placed between the head tags and the HTML. You can find these by looking at the source code, or you can more easily spot these by installing a browser extension that pops up whenever the tag is present.
The robots.txt file is a way of telling search engines not to look at parts of your site. You can find the robots.txt file by going to your domain followed by robots.txt. URLs that follow the disallow will not be crawled by search engines. This can be useful for preventing Google crawling parts of the site you don’t want them to, but if done incorrectly can prevent Google from accessing areas of the site you do want them to crawl and index. Something to watch out for is a disallow followed by a single forward slash. This means that your whole site is being blocked from web crawlers like Google.
Canonical links sound complicated but are actually very simple. Some websites create multiple URLs for the same content. One of the most common is the index.html version of the homepage. Canonical links are used to tell search engines which version of the page you want indexed. Whilst they don’t strictly stop content from being indexed, where two pages are identical, they do give a strong indication of which version should be indexed. Canonical links are placed between the head tags and HTML. Again, the easiest way to spot if a page is using a canonical tag is to use a browser plug-in or extension.
Server response codes are a way for a website to tell browsers and search engines the type of result they’re getting back from the server. You might be familiar with 404 pages for page not found. There are also 500 pages, which mean there’s an error on the server end. If Google sees these types of error codes, they will either not index new content or gradually remove existing pages from the index. Although rare, it’s possible for normal pages to return an error code and appear perfectly normal to users. Again, the easiest way to spot server response codes is using a browser plug-in like Redirect Path. These plug-ins can also show you what types of redirects are in place on your site, which can be useful for other technical tasks.
With each of these, there are lots of things to test and tools to try out. If you want to try something more advanced, you can download a free version of a web crawler tool called Screaming Frog which lets you see lots of information about your site all at once.
That was our quick tip on making sure your site can be indexed. If you want to use the tips in this video, you can get started with the action shown here. You can also find supporting articles on HubSpot and the Dejan SEO website.