SEO Experiment: Google Image Search
Our previous SEO experiments hinted at a possibility of Google (purposely?) shuffling the results out of expected pattern. This time we investigate what happens on a larger scale and use 100 images for our test.
We started by creating a row of numbers in Excel (1-100) and used conditional formatting to colour all the cells. This was then exported as PDF and chopped up in Photoshop. Each file corresponded to its number (e.g. 64 = 64.png) and they were arranged in correct sequence on the experiment page. We told Google about the page (social sharing) and waited for the results. And then…
That’s right. What you see above is what Google displays after roughly 30 minutes from our test page going live.
We noticed several interesting things but main discovery is that our images seem to dance randomly in Google Image results and not in a gradient mode we expected with values of numbered images incrementing from 1 to 100. Take a look at the original page and the way images are ordered. One would expect that this should be the order of indexation and likewise, display in search results.
Something like this perhaps:
So what’s going on? Well, there are at least three possibilities:
- Crawler sent the page to index and the media was absorbed in a distributed way by a number of separate machines and stored in different physical locations. This does not seem particularly useful since Google tries to keep things close in its index and cache in order to keep things as little fragmented as possible.
- File size fluctuations could somehow influence ordering, though we see very little practical reason for this on such a small deviation (literally in bytes).
- Google may be purposely adding a RND function to things to mix things up a bit. This would keep results fresher and allow random discovery. Also it would prevent any systematic attempts at reverse-engineering of Google’s ranking algorithm.
Main highlight: Google seems to (whether purposely or not) randomise the order of otherwise neatly ordered images on a page.
Secondary findings include:
- Page enters cache within 60 seconds, assisted by social sharing.
- Two minutes later image search displays nothing.
- 30 minutes later images are in the results, but refer to the time of indexation (back when they were not visible)
- Small variation in indexation time (seconds) does not seem to affect the order of images in Google search.
- Only 27 images were indexed, and at random. Why? Perhaps due to PageRank allowance or some other limiting factor.
- Image colour filtering did not work 2 hours after the page went up indicating that colour processing may be a separate process
Two highlights: Google image search really is faster since recent updates and can include new images in search results within minutes, however they are not likely to be displayed (due to unknown bottleneck) in search results for at least 15-30 minutes.
Following four images were classified as photos:
Not directly associated with the test but we also noticed that a wide screen mode brought up a set of images slightly smaller than the rest (see top two rows).
We performed the same test on a different domain and got the following 26 results:
For comparison, here’s the original 27 results:
Here’s the table which examines co-relation between file size and image results:
5/5/2012: 30 images now display in results. Colours still not filtering.