Google Fusion Tables
This is an overview of the research paper “Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud”
Authors: Hector Gonzalez, Alon Halevy, Christian S. Jensen, Anno Langen, Jayant Madhavan, Rebecca Shapley, Warren ShenGoogle Inc.
When a user types in a search request into Google Maps for “best fish and chip restaurants in London,” the resulting visual display is the result of the technological prowess of Google Fusion Tables. Drawn from a slew of separately created CSV files, spreadsheets, and KML files, Google Fusion Tables merge related data sets of up to 100 megabytes to create user friendly data visualisations. (1) The tables link the data by using what are known as “joins.” There are two primary architectural components used in sorting through the data and creating the joins. These are known as the “BigTable” and the “Megastore.” (2) The BigTable essentially sorts through data at the ground floor level, helping translate data from different sized schemas and query load statistics into one consistent format. The BigTable does this by breaking down values into three basic components; lookup by key, lookup by key prefix, and lookup by key range. The BigTable also records edits or alterations to data that has been classified as alterable. To reference the example above, if a user wants to upload a different coordinate for a particular fish and chip restaurant, the BigTable records the time at which he adds the alteration. If a second user comes along and wants to change the location again, the BigTable records that alteration as well. The system ultimately displays the second alteration, but keeps a record of all changes along the way. (3)
Much like a departmental manager, the Megastore interacts with several different BigTables and keeps track of higher level changes to the main data. It maintains property indexes and replicates tables across multiple data centres. This allows users from a variety of different locations to view the same data at the same time. The consistency of the data is the most important function of the Megastore in keeping with the fundamentals of the ACID (atomic, consistent, isolated and durable) semantics. (4)
The most exciting part of the process is the data visualisation. The data visualization is accomplished in part by using the Google Visualization API and specially designed “gadgets” or blocks of code that make it easy to quickly render certain types of data into a pre-set visual form, such as a blog post or a pie chart. (5) This rapid transformation of data into a visual form makes it easy for websites to insert graphical components that can be easily updated without the need for any programming savvy on the part of the user.
This functionality has been used to great effect by integrating Google Fusion Tables with the Google Maps Infrastructure. While Google Fusion Tables do not have to be used exclusively for the creation of maps, they make it extremely easy to gather a large amount of separately inputted information and present it in an easily viewable form. (6) In essence, Google Maps works by displaying different “layers” of information. In order to not overwhelm the server with too much information at any given time, these layers of information are broken down into smaller units known as “tiles.” Each tile has a set number of features that it displays. The number of features on each tile is keyed to the “zoom level” that the user requests from the server. (7) This process is known as “sampling.” (8)
In summary, Google Fusion Tables transforms disparate files into consistently organised data while Google Maps creates the accessible data visualisation from this data.
(1) Page 1, Abstract to Figure 1.
(2) Pages 1-2, Figure 2 to 2.2.
(3) Page 2, Figure 2.2.
(4) Page 3, Figure 4.
(5) Page 4, Figure 5.2.
(6) Page 4, Figure 5 to 5.1.
(7) Pages 4-5, Figure 6 to 6.2.
(8) Page 5, Figure 6.3.