Interview with Robert J. Sawyer
Interview with a celebrity science fiction writer, Robert J. Sawyer, and his plan to beat PageRank.
Note: This interview is a follow-up to an article titled “Work of Fiction Inspires a New Search Engine“. For those of you with less than 10 minutes to spare, here’s a quick teaser:
I am a hard S.F. fan and cherish the works of Clarke and Asimov but Sawyer’s novel WWW: Wake brought something entirely new and fresh into my reading experience – Google PageRank. Yes you heard me right, PageRank algorithm, Google, links, blackhat, search monopoly and open source alternatives. (read more)
Being an SEO, I study Google’s algorithm daily in attempt to understand it, but what’s a science fiction writer doing researching Google algorithm? Do you have any special interests in search engines or do you research everything this thoroughly while you write?
Well, that’s not an either/or question. I do research everything thoroughly—and, of course, search engines are a big part of helping me do that. Indeed, doing research is my favorite part of the job of being a writer. But I also have a particular interest in search-engine optimization. I was the first science-fiction writer in the world to have a website and the first Canadian author of any type to have one; my multipage site devoted to my science-fiction writing has been live since June 28, 1995 (and I started giving away free fiction via my website just a few days later; I was also the first science-fiction writer to do that). Since then, the site has grown to over one million words and 720 documents with 25,000 internal hyperlinks.
But even back then, before the buzzword existed, I was concerned about discoverability, and so I made it a point early on to learn about search-engine optimization. Besides scoring high for my own name, I also wanted to score high for the phrase “science fiction writer,” and I did everything from choosing my URL (http://SFwriter.com) to crafting meta tags, site content, and a site description to make that happen—and then, back when this sort of thing was done manually, submitting the site to search engines.
By the way, I chose my writing name, including the middle initial, well before there were any search engines, and including the initial was a tip of the hat to Arthur C. Clarke, Robert A. Heinlein, and James T. Kirk. But I’m so glad I did do it, because just “Robert Sawyer” is pretty common but almost all of the hits on “Robert J. Sawyer” are me; I always advise beginning writers to make sure the form of their name they use in their byline will be unique, so that readers can quickly find them with search engines.
Was Jagster always the central idea or just a convenient device you had to invent along the way in order to plug the main character into the web?
Well, the central notion of the book was simply this quote from Sergey Brin, which I use to begin WWW: Wonder, the third volume in the trilogy: “The perfect search engine would be like the mind of God.”
As for the mechanics of Jagster, and how it works, that was driven by my plot need—but it occurred to me as I was inventing it (in April 2007, according to my notes) that such a search engine would indeed give very meaningful results, and that its rankings might be more accurate than Googles. I describe JAGSTER via this fictitious encyclopedia entry in WWW: Wake, the first volume in my trilogy:
Google is the de facto portal to the Web, and many people feel that a for-profit corporation shouldn’t hold that role—especially one that is secretive about how it ranks search results. The first attempt to produce an open-source, accountable alternative was Wikia Search, devised by the same people who had put together Wikipedia. However, by far the most successful such project to date is Jagster.
The problem is not with Google’s thoroughness, but rather with how it chooses which listings to put first. Google’s principal algorithm, at least initially, was called PageRank—a jokey name because not only did it rank pages but it had been developed by Larry Page, one of Google’s two founders. PageRank looked to see how many other pages linked to a given page, and took that as the ultimate democratic choice, giving top positioning to those that were linked to the most.
Since the vast majority of Google users look at only the ten listings provided on the first page of results, getting into the top ten is crucial for a business, and being number one is gold—and so people started trying to fool Google. Creating other sites that did little more than link back to your own site was one of several ways to fool PageRank. In response, Google developed new methods for assigning rankings to pages. And despite the company’s motto—“Don’t Be Evil”—people couldn’t help but question just what determined who now got the top spots, especially when the difference between being number ten and number eleven might be millions of dollars in online sales.
But Google refused to divulge its new methods, and that gave rise to projects to develop free, open-source, transparent alternatives to Google: “free” meaning that there would be no way to buy a top listing (on Google, you can be listed first by paying to be a “sponsored link”); “open source” meaning anyone could look at the actual code being used and modify it if they thought they had a fairer or more efficient approach; and “transparent” meaning the whole process could be monitored and understood by anyone.
What makes Jagster different from other open-source search engines is just how transparent it is. All search engines use special software called Web spiders to scoot along, jumping from one site to another, mapping out connections. That’s normally considered dreary under-the-hood stuff, but Jagster makes this raw database publicly available and constantly updates it in real time as its spiders discover newly added, deleted, or changed pages.
In the tradition of silly Web acronyms (“Yahoo!” stands for “Yet Another Hierarchical Officious Oracle”), Jagster is short for “Judiciously Arranged Global Search-Term Evaluative Ranker”—and the battle between Google and Jagster has been dubbed the “Ranker rancor” by the press …
That said, as a note of historical trivial, and despite the acronym I made up above, Jagster is really named for my great friend James Alan Gardner—whom I call the Jagster from his initials; Jim is a brilliant, Hugo-nominated science-fiction writer in his own right, and WWW:Watch, the second book in the trilogy, is dedicated to him.
But, of course, what’s really interesting about Jagster is how it ranks pages, and that’s described in dialogue later in the book by Anna Bloom, an Internet cartographer:
“Well, remember, Jagster was created as an alternative to the Google approach. PageRank, the standard Google method, looks for how many other pages link to a page, right? But that isn’t necessarily the best measure of how frequently a page is accessed. If you’re looking for info on a hot rock star, like, say, Lee Amodeo …”
“She’s awesome!” said Caitlin.
“So my granddaughter tells me,” said Anna. “Anyway, if you’re interested in Lee Amodeo, how do you find her website? You could go to Google and put ‘Lee Amodeo’ in as the search term, right? And Google will serve up as number one whichever page about her has the most links to it from other pages. But the best Lee Amodeo page isn’t necessarily the one people link to the most, it’s the page they go to the most. If people always go directly to her page by correctly guessing that the URL is leeamodeo.com—”
“Which it is,” Caitlin said.
“—then that might be the most popular Lee Amodeo site even if no one links to it, and Google wouldn’t know it. And, in fact, if you upload a document to the Internet but don’t link it to any Web page, but you send a link to it to people via email, again, Google—and other search engines—won’t know it’s there, even if ten thousand people access the document through the email links. So, besides just traditional spidering, Jagster monitors raw Web traffic going through major trunks, looking at the actual stream of data moving through the routers …”
And that is a cool idea, and one I’m rather proud of.
In your book you write that much of communication between user and server happens in a very short time period and the rest is us reading information from our cache. Does it bother you that a search engine like Jagster would have its own index of the web and not be an actual web with all its interactions and data flow?
Not at all. Google, at least at the outset and maybe still, indexes static copies of the Web dumped to Google’s servers. It didn’t crawl the Web in real time; it captured the Web, and indexed that captured snapshot.
How do you feel about people using Jagster as an idea to create a new search engine?
I love it! Arthur C. Clarke used to wear a T-shirt that said, “I invented the communications satellite and all I got was this lousy T-shirt,” but I’d be thrilled if someone could make the Jagster model a reality.
Please share some of your vision with us, tell us where search engines are heading in the next few years.
Well, the scariest thing of all is the removal of the visible search engine from the process. The iPhone’s Siri is a search engine—that pretends not to be. Many iPhone apps are functioning as search engines behind the scenes, providing results without any transparency about what search algorithms or techniques were used, what role paid-results have in their output, and what subset of sites were searched.
I want to control my search: I might prefer Google today, but there was a time when I preferred Hotbot, and there might be a time when I prefer something else—including Jagster, if anyone really does build it; this embedding of search behind app interfaces takes away my ability to choose which search provider I trust.
As we move further into immersive reality, search will increasingly become a push rather than pull technology, and that will mean “search” will no longer be the right term for the functionality. When your whole real-life experience is marked up with information, it’s not search; you haven’t asked for something to be found—it’s annotation. And that’s really the future of much of search: not the query box, but the automated marking-up of what we’re looking at, reading, and, eventually, simply just thinking about.