Escaping your history
posted by:James Muir // 11:59 PM // March 14, 2006 // ID TRAIL MIX
Imagine that every search phrase you have ever typed into Google from your home computer was recorded and stored in a user-profile on one of Google's servers. What would this profile say about you? No doubt you would consider some of this information private. It might alarm you when you realize that this information is now out of your control. Perhaps you trust Google not to divulge it, but there may be legal circumstances which would force them to do so.
You don't have to imagine this scenario -- Google does in fact keep a record of your search history and they are currently under legal pressure to release a subset of this data to the U.S. government. Some surprising facts about Google's user-profiling are discussed in a recent CNET article (D. McCullagh, 3 Feb 2006). One of the questions that Google's data collection practises raises is the following: Is it possible for a user to use a search engine anonymously from their home computer? For instance, is it possible to do a Google search for "picking magic mushrooms" without having this tied to your identity and possibly used against you at a later date? There is a very brief discussion of this question in the CNET article. Two specific recommendations made are to 1) regularly delete any Cookies your browser collects, and to 2) proxy your web browsing through an anonymizing service like Tor. In this note, we explain just what these two instructions mean and argue that they alone may not suffice to anonymize your Google searches.
We begin by recalling some basic facts about the Internet. Every computer connected to the Internet is identified by a unique number called its IP address. An IP (version 4) address is a sequence of four numbers in the range 0...255 separated by dots (e.g., 192.168.0.1). Your home computer's IP address is obtained from your ISP and they keep track of which IP addresses are assigned to which customers. If your ISP is subpoenaed, then they can be forced to match a customer's identity to a given IP address. When you surf the web normally, your IP address is submitted to the web sites you visit so that their content can be routed back to your computer and displayed in your browser. You can check what IP address you are advertising by visiting here.
Each time a user carries out a Google search, Google can record their IP address and their search phrase (as well as the current date and time). Thus, they can form a history of the search phrases which originate from a particular IP address. However, these IP address search histories are not necessarily the same as user search histories. There are two main reasons for this: 1) ISPs sometimes change the IP addresses of their customers; 2) the customers of some ISPs, like AOL, access the web through caching HTTP proxies which effectively results in many users advertising the same IP address to a web site. These issues can be overcome by using Cookies. A Cookie is a small data-file that a web site generates and stores in your browser. When you first visit Google, they set a Cookie in your browser which serves as a unique user-id. This Cookie can be subsequently read by Google each time you do a search through their web site and so it can be used to track your behaviour, even if your ISP happens to change your IP address.
Deleting Cookies regularly removes data that Google uses to track you and your web browser. Note that the Firefox browser can be set to delete its Cookies each time you close it. This explains the first recommendation. You may be wondering if there is a way to carry out a Google search while keeping your IP address hidden. This is where Tor fits in.
Tor is a network of 250+ Internet computers in various countries which run freely available software designed to facilitate low-latency anonymous communication. Tor has several interesting features but what is most relevant to our discussion is that it can allow anyone to surf the web without revealing their IP address. To start using Tor, you simply download a client program and then configure your browser to send its traffic to the client. Once the client is activated, it negotiates an encrypted pathway through the Tor network which will carry your browser's traffic. The pathway consists of three Tor servers and these are changed every minute or so. When your web traffic travels through the Tor network en route to Google, it appears to Google as though it was originated by the last server in the pathway. In particular, the IP address recorded by Google will be the IP address of the last server in the pathway. So, if you use Tor, your search phrases will likely be bound to an IP address other than your own.
However, the story doesn't end there. Even if you disable Cookies and surf through Tor, it may still be possible to maintain a profile of your web searches. If you take a look here, then you will see several examples of information that can be extracted about your browser and computer even when you have followed the two recommendations. For example, it is possible to learn what browser you are using, its version, what operating system you run, your preferred language, what timezone you are in, what plugins you have installed, and what the current settings of your display are. Google could compute a digest of this information and record it along with any search phrase you have submitted. It's not clear if this information would suffice to uniquely identify a user, but users who use less common browsers and operating systems are more at risk of this.
James Muir is a Postdoctoral Fellow in the School of Computer Science at Carleton University.
Posted by: jason at March 15, 2006 04:22 PM
Scroogle looks a dead simple way of carrying out anonymous Google searches -- although I suppose you have to trust them to delete their server logs. It's too bad this type of simple solution doesn't generalize to other web sites.
Another issue is that Google doesn't just profile user behaviour on www.google.com; many web sites (e.g., slashdot.org) gather user information using Google Analytics (www.google.com/analytics/). If you ever notice cookies in your browser that have fields titled "utm" then you have picked up a Google Analytics tracking cookie. "utm" stands for Urchin Tracking Module (Urchin was the name of the company Google purchased to form Google Analytics).
Some of the information collected by this script includes:
-screen properties (e.g., width, height)
-whether or not you have Java enabled
-whether or not you have Flash installed
Posted by: James at March 15, 2006 06:57 PM