Is Google's Cache Your Traffic or Google's Traffic?

I have real-time analytics on my wall in the office and every now and then I bump into an odd case of search traffic. This time I spotted a very unusual keyword:

cache:https://dejanmarketing.com/anchor-text-proximity-experiment/

keyword

It’s impossible to search Google for the above as a regular query (unless you use quotes) and users are taken to Google’s cache of the page and not your own site. It looks though that Google is using your analytics code to real-time traffic on Google’s own domain.

Even active pages are on Google’s own domain:

Active Pages

I would imagine this could have (rather small) impact on your overall organic search traffic, but likely with keywords censored out due to low volume.

Is this feature intentional or just a bug?

 

Update: Alistair Lattimore contributes with his explanation.

I think this is happening as a result of three points:

1) How Google Analytics tracker is implemented

2) How Google Analytics web profiles are implemented by default

3) How a Google advanced cache: search operator works

Google Analytics supports an array of different search engines out of the box. The code within Google Analytics tracker is flexible enough to identify a good selection by default and also allows users to add additional search engines to the list to be identified as ‘organic’. This generic nature results in certain hostnames that meet the appropriate criteria as being potentially wrongly attributed from time to time. In this instance, I don’t think that is the case – users have indeed searched for [cache:https://dejanmarketing.com/anchor-…].

By default Google Analytics web profiles will accept traffic from any website that a given tracking id is present on. So if your favourite news website accidentally includes your Google Analytics profile id in their configuration by mistake – your statistics will go through the roof with what would otherwise be fake traffic. To this end, when someone views the Google cache for a URL; Google Analytics tracking code is loaded and passes the web page view into your web statistics.

Third item to remember is that if you view a Google cache for a URL, the objects within the page load off of the origin server – they aren’t loaded from a copy of those assets that Google might store. For example the images, CSS, JavaScript and so forth that your web page includes are still loaded from your server or their specified location as if the user actually visited your web page. What isn’t loaded from your website is the HTML of the actual page itself – that is coming from a copy/snapshot that Google has stored when they last crawled and cached that particular URL.

The above combination of events means that because the cache URL is hosted on webcache.googleusercontent.com (notably, the domain includes “google”), includes the string “/search”, includes the query parameter “q”, Google Analytics fired and your Google Analytics profile allows traffic from all domains – you end up seeing that as a query in your web statistics.

It should be noted that you could stop that from happening by adding a Google Analytics filter to your profile that excludes all traffic that isn’t from your domain. If you did that, the actual view of the cache URL wouldn’t trigger the search query to appear but if the user clicked any URL on the cached page to load another URL on your site – it would appear in your web statistics because the above magic conditions would then trigger from your referrer.

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

0 Points


2 thoughts on “Is Google's Cache Your Traffic or Google's Traffic?”

  1. I think this is happening as a result of three points:
    1) How Google Analytics tracker is implemented
    2) How Google Analytics web profiles are implemented by default
    3) How a Google advanced cache: search operator works
    Google Analytics supports an array of different search engines out of the box. The code within Google Analytics tracker is flexible enough to identify a good selection by default and also allows users to add additional search engines to the list to be identified as ‘organic’. This generic nature results in certain hostnames that meet the appropriate criteria as being potentially wrongly attributed from time to time. In this instance, I don’t think that is the case – users have indeed searched for [cache:https://dejanmarketing.com/anchor-text-proximity-experiment/%5D.
    By default Google Analytics web profiles will accept traffic from any website that a given tracking id is present on. So if your favourite news website accidentally includes your Google Analytics profile id in their configuration by mistake – your statistics will go through the roof with what would otherwise be fake traffic. To this end, when someone views the Google cache for a URL; Google Analytics tracking code is loaded and passes the web page view into your web statistics.
    Third item to remember is that if you view a Google cache for a URL, the objects within the page load off of the origin server – they aren’t loaded from a copy of those assets that Google might store. For example the images, CSS, JavaScript and so forth that your web page includes are still loaded from your server or their specified location as if the user actually visited your web page. What isn’t loaded from your website is the HTML of the actual page itself – that is coming from a copy/snapshot that Google has stored when they last crawled and cached that particular URL.
    The above combination of events means that because the cache URL is hosted on webcache.googleusercontent.com (notably, the domain includes “google”), includes the string “/search”, includes the query parameter “q”, Google Analytics fired and your Google Analytics profile allows traffic from all domains – you end up seeing that as a query in your web statistics.
    It should be noted that you could stop that from happening by adding a Google Analytics filter to your profile that excludes all traffic that isn’t from your domain. If you did that, the actual view of the cache URL wouldn’t trigger the search query to appear but if the user clicked any URL on the cached page to load another URL on your site – it would appear in your web statistics because the above magic conditions would then trigger from your referrer.
    Anyway, that is why I think it is happening.

  2. Tiggerito says:

    Google Cache of your page also contains your GA code so it will trigger hits.
    To see who is doing this sort of thing look at the Hostname dimension in GA.
    I’ve also noticed that if another website frames yours then the Hostname will indicate who framed you. It is set to the outer frame of the page.
    Last month my website triggered hits from Google Translate, Google Cache and Yandex.
    You can also check the LandingPage dimention to get more of an idea about the URL in question.
    I was interested in what Yandex was doing. Yandex has an option in it’s search results to show the result page with the search terms highlighted. This result shows your website wrapped in their domain so they can alter it.