How I Hijacked Rand Fishkin’s Blog

Search Result Hijacking

Search result hijacking is a surprisingly straightforward process. This post will go over theory, test cases done by Dejan SEO team and offer ways for webmasters to defend against search result theft.

I wish to thank Jim Munro, Rob Maas and Rand Fishkin for allowing me to run my experiment on their pages.

Brief Introduction

Before I go any further I’d like to make it clear that this is not a bug, hack or an exploit – it’s a feature. Google’s algorithm prevents duplicate content displaying in search results and everything is fine until you find yourself on the wrong end of the duplication scale. From time to time a larger, more authoritative site will overtake smaller websites’ position in the rankings for their own content. Read on to find out how exactly this happens.

Search Theory

When there are two identical documents on the web, Google will pick the one with higher PageRank and use it in results. It will also forward any links from any perceived ‘duplicate’ towards the selected ‘main’ document. This idea first came to my mind while reading a paper called “Large-scale Incremental Processing Using Distributed Transactions and Notifications” by Daniel Peng and Frank Dabek from Google.

PageRank Copy

Here is the key part:

“Consider the task of building an index of the web that can be used to answer search queries. The indexing system starts by crawling every page on the web and processing them while maintaining a set of invariants on the index. For example, if the same content is crawled under multiple URLs, only the URL with the highest PageRank [28] appears in the index. Each link is also inverted so that the anchor text from each outgoing link is attached to the page the link points to. Link inversion must work across duplicates: links to a duplicate of a page should be forwarded to the highest PageRank duplicate if necessary.”

Case Studies

I decided to test the above theory on real pages from Google’s index. The following pages were our selected ‘victims’.

  1. MarketBizz
  2. Dumb SEO Questions
  3. ShopSafe
  4. Rand Fishkin’s Blog

Case Study #1: MarketBizz


26 October 2012: Rob Maas kindly volunteered for the first stage test and offered one of his English language pages for our first ‘hijack’ attempt. We set up a subdomain called and created a single page by copying the original HTML and images. The newly created page was +’ed and linked to from our blog. At this stage it was uncertain how similar (or identical) the two documents had to be for our test to work.

30 October 2012: Search result successfully hijacked. Not only did our new subdomain replace Rob’s page in results but the info: command was now showing the new page even for the original page and it’s original PageRank 1 was replaced by PageRank “0” of the new page. Note: Do not confuse the toolbar PageRank of zero with real-time PageRank which was calculated to be 4.

 Hijacked SERP

Notice how the info: search for the URL returns our test domain instead?

So all it took was higher PageRank stream to the new page and a few days to allow for indexing of the new page.

Search for text from the original page also returned the new document:

Hijacked Result

One interesting fact is that still returns the original page “” and does not omit it from site search results. Interestingly that URL does not return any results for cache, just like the copy we created. Google’s merge seems pretty thorough and complete in this case.

Case Study #2:


30 October 2012: Jim Munro volunteers his website in order to test whether authorship helps against result hijacking attempts. We copied his content and replicated it on without copying any media across.

1 November 2012: The next day Jim’s page was replaced with our subdomain, rendering Jim’s original as a duplicate in Google’s index. This suggests that authorship did very little or nothing to stop this from happening.

Dumb SEO Questions Hijack

The original website was replaced for both info: command and search queries.

Interesting Discovery

Search for the exact match brand “Dumb SEO Questions” brings the correct result and not the newly created subdomain. This potentially reveals domain/query match layer of Google’s algorithm in action.

Exact Brand Match

Whether Jim’s authorship helped in this instance is uncertain, but we did discover two conflicting search queries:

  1. Today we were fortunate to be joined by Richard Hearne from Red Cardinal Ltd. (returns the original site)
  2. Dumb+SEO+questions+answered+by+some+of+the+world’s+leading+SEO+practitioners (returns a copy)
One returned the original site while the other showed its copy. At this stage we have not yet tested the impact of rel=”canonical” in potential prevention of result hijacking and for that reason we created a separate experiment.

Case Study #3: Shop Safe


The following subdomain was created replicating a page which contained rel=”canonical”. Naturally the tag was stripped off on the duplicate page for the purposes of the experiment.

This page managed to overtake the original in search, but never replaced it when tested using the info: command. All +1’s were purposely removed after the hijack to see if the page would be restored. Several days later the original page overtook the copy, however it is unclear if +’s had any impact on this.

Possible defense mechanisms:

  1. Presence of rel=”canonical” on the original page
  2. Authorship markup / link from Google+ profile
  3. +1’s

Case Study #4: Rand Fishkin’s Blog

Rand's Blog

Our next test was related to domain authority so we picked a hard one. Rand Fishkin agreed to a hijack attempt so we set up a page in a similar way to previous experiments with a few minor edits (rel/prev, authorship, canonical). Given that a considerable amount of code was changed I did not expected this particular experiment to succeed to full extent.

We did manage to hijack Rand’s search result for both his name and one of his articles, but only for Australian searches:

Rand Fishkin

Notice that the top result is our test domain, only a few days old. Same goes for the test blog post which now replaces the original site in Australian search results:

Rand's Article

This “geo-locking” could be happening at least two reasons:

  1. .au domain hosts the copy
  2. .au domain links pointing towards the copied page

Not a Full Hijack

What we failed to achieve was to completely replace his URL in Google’s index (where info: shows our subdomain) which is what happened with Rob’s page. This could be partly due to the fact that the code was slightly different than the original and possibly due to Rand’s authorship link which we left intact for a while (now removed for further testing). Naturally Rand’s blog also has more social signals and inbound links than our previous test pages.

Interesting Observation

When a duplicate page is created and merged into a main “canonical” document version it will display it’s PageRank, cache, links, info but in Rand’s case also +1’s. Yes, even +1’s. For example if you +1 a designated duplicate, the selected main version will receive the +1’s. Similarly if you +1 the selected main URL the change in +1’s will immediately reflect on any recognised copies.

Example: – URL shows 18 +1’s which really belong to Rand’s main blog.

When a copy receives higher PageRank however, and the switch takes place, all links and social signals will be re-assigned to the “winning” version. So far we have two variants of this. In case of a full hijack, we’re seeing no +’s for the removed version and all +’s for the winning document, borderline cases seems to show +’s for both documents. Note that this could also be due to code/authorship markup on the page itself.

We’re currently investigating the cause for this behavior.

leakPreventative Measures

Further testing is needed to confirm the most efficient way for webmasters to defend against the result/document hijacking by stronger, more authoritative pages.


Most websites will simply mirror your content or scrape a substaintial amount of it from your site. This is typically done on the code level (particularly if automated). This means that presence of properly set rel=”canonical” (full URL) ensures that Google knows which document is the canonical version. Google takes rel=”canonical” as a hint and not an absolute directive so it could still happen that the URL replacement happens in search results, even if you canonicalise your pages.

There is a way to protect your documents too (e.g. PDF) through use of http header canonicalisation:

GET /white-paper.pdf HTTP/1.1
(…rest of HTTP request headers…)
HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <>; rel=”canonical”
Content-Length: 785710
(… rest of HTTP response headers…)


I am not entirely convinced that authorship will do much to prevent search result swap from a more juiced URL, however it could be a contributing factor or a signal and it doesn’t hurt to have it implemented regardless.

Internal Links

Using full URLs to reference to your home page and other pages on your site means that if somebody scrapes your content they will automatically link to your page passing PageRank to it. This of course doesn’t help if they edit the page to set the URL path to their own domain.

Content Monitoring

By using services such as CopyScape or Google Alerts webmasters can listen to references of their brand and content segments online, and as they happen. Acting quickly and requesting either removal or a link back /citation back to your site is an option if you notice a high authority domain is replicating your pages.

NOTE: I contacted John Mueller, Daniel Peng and Frank Dabek for comments and advice regarding this article and still waiting to hear from them. Also this was meant to be a draft version (accidentally published) and is missing information about how page hijacking reflects in Google Webmaster Tools.


Article titled “Mind-Blowing Hack for Competitive Link Research” explains how the above mentioned allows webmasters to see somebody else’s links in their Google Webmaster Tools.

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.

More Posts - Website