Co-Occurrence as a Ranking Signal

TL;DR: A little-known property of Google Books search highlights co-occurrence as a ranking signal.

The concept of co-occurrence (not to be confused with co-citation) has been on the radar of many search professionals for a while. I accepted the idea as common sense and never thought to investigate it further, then one day I stumbled into a peculiar set of search results which lead to what appears to be conclusive evidence of Google’s use of co-occurrence in their search results.

It all started with a discovery of a reference to one of my articles from 2004 in Media Discourses by Donald Matheson from University of Canterbury:

Media Discourses

Knowing Google, I was certain they would not ignore a citation like that and decided to search for my name in Google Books (naturally in incognito) to see what else is there. The first result was a direct mention, followed by a few titles that mentioned some other Dan, but the results that followed shocked me with their accuracy. [blockquote type=”blockquote_quotes” align=”right”]Co-occurrence is a linguistics term that can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order.[/blockquote] Each listed book appeared highly relevant to me yet I am neither the author nor have I been mentioned in any of them. So how does Google know all this? I examined each title individually and compiled my results in a speradsheet.*

Summary of my initial findings was as follows:

Relationship with the author (friends, peers, email exchange)
Connectivity with the author online (circles, mentions, following)
Content sharing (articles, videos, book recommendations, ratings)
Personal and professional interests (SEO, marketing, science fiction)
Common mentions online (conferences, events, interviews, social posts)

*ACO = Author Co-Occurrence, TCO = Title Co-Occurrence

I decided to test this with other people. Choosing a common name would have returned too many individuals blended within Google Books results. That would have been difficult to analyse so I only interviewed people with unique names. They were all presented with a list of books returned for their names, yet no reference of their name could be found in the book. I asked them to explain their connection to the title, author and the topic in general.

aleyda solisAleyda Solis

“I could see how they got most of the books right as most of them are either appealing or I’ve already read them. For those that are not at all connected to me I can see how they could have inferred that by things I’ve searched and shared.”

Analysed Titles 25
Relevant to her interests 21
Knows the author personally 05
Familiar with the author 04
Unfamiliar with the author 16

Wolfsmund by Mitsuhisa Kuji

Funny because it’s a topic that doesn’t interest me but I can see how there could be possible connections:

  1. Sometime ago I particpated in a “you & your dog selfie competition” that an indie book publisher organized.
  2. I’ve searched many times about Games of Thrones, Sci-Fi movies & Lord of the Rings, for example.

Other responses:

  1. I follow the author on twitter and have shared his presentations & posts in the past.
  2. I’ve searched, read & shared about similar topics in the past.
  3. I know the author, interact with him in Twitter & Facebook, I share his posts regularly.
  4. I’ve searched, read & shared about similar topics in the past. 
  5. I’ve read the book.
  6. I’ve follow him on Twitter.

Rand FishkinRand Fishkin

I left every one of the books as “yes” on relevant to broad interests, since nearly all of them are about entrepreneurship and/or online marketing.”

Analysed Titles 20
Relevant to his interests 20
Knows the author personally 06
Interacted online 02
Unfamiliar with the author 12

The Billionaire Who Wasn’t: How Chuck Feeney Secretly Made and Gave Away a Fortune

I recommend this book constantly. It’s one of my favorites.

Other Responses:

  1. I follow the author on Twitter.
  2. I suspect I’m cited in this book.
  3. I believe I did a blurb about the author for this book.

Tim CapperTim Capper

“Very Interesting to note that all the above relate to over 20 years ago when I worked in the Industry. No connections to current interests or connections.”

Note: Tim’s case has been quite a surprise. Most of the books that came up for his name were cooking related. I had to ask him about it before getting him to fill out the survey just to make sure. In a sense Google Books has revealed something I didn’t know about one of my Google+ buddies.

Analysed Titles 05
Relevant to his interests 04
Familiar with the author 03
Unfamiliar with the author 02

Tim’s comments for various titles:

  1. I have 1 person in common with the author, which in turn have 5 people in common with.
  2. Not personally following, but a brand page that I own has interacted with them.
  3. No connection to authors, but have mentioned the subject many times.
  4. Contributed to research on the book.

Alistair LattimoreAlistair Lattimore

Alistair’s case puzzled me at first, but then he managed to find a reference to one of the books that was a result of Google Books search for his name:

Untold Wealth: Success from Scratch : Bulding Wealth Out of Property by John L. Fitzgerald, Claire Louise Wright

The connection? Attended a conference by John L. Fitzgerald, wrote an article about it.

Common Element: Co-Occurrence

I have considered many data sources Google could use to discover our interests and connections including Google Play, Google+, Gmail, social shares, book reviews, book topic, relationship to the author and many others but they all appeared conveniently correlational. It’s only after I started querying each phrase pair using the {name} + {author} or {name} + {title} formula that I realised that majority returned positive results*. Every document retuned contained the name of the person and the author or name of the person and the title of the book for which they are returned as a result in Google Books.

WWW: Wake Dan Co-Occurrence
The Eventual Millionaire Rand Co-Occurrence
The HP Way Aleyda Co-Occurrence
The Aga Book Tim Co-Occurrence
Untold Wealth Alistair Co-Occurrence

[blockquote type=”blockquote_quotes” align=”right”]That is a pretty interesting look at how semantic relationships surface content. Google uses the weak and strong ties principle with numeric scores ascribed to edges and nodes based on centrality to determine the relationship connections. David Amerland[/blockquote]*One exception is when the page content changes (e.g. sidebar or footer Twitter stream mention).

It’s hard to believe that a simple mechanism such as co-occurrence performs so well in surfacing relationships and interests of individuals.

Combine this with other signals including the social and knowledge graph and it becomes easier to understand why Google no longer needs authorship.

It appears they’re confident enough to leave mapping of individuals and relationships to their algorithms. We already know that they’re capable of mapping social graphs using implicit connections.

Schema.org itself could be just a form of learning wheels for Google until such point when they no longer need any human input in understanding structured data.

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

0 Points