With a Little Help from my Friends (and Colleagues): The Multidisciplinary Requirement for Privacy
posted by:Carlisle Adams // 11:59 PM // February 28, 2006 // ID TRAIL MIX
It is probably not unfair to say that many subjects of academic interest do not force, or even encourage, their researchers to look outside the confines of a fairly narrow field of study. An example with which I am familiar, but which is only one example of many that could be cited, is the field of cryptography. It is quite possible for a researcher in this field to spend his or her entire career – indeed, a rewarding and productive career – making, breaking, and repairing encryption algorithms and security protocols without ever thinking very deeply about where and how these might be used in the real world. The beauty, the elegance, and the mystery of the underlying mathematics can be more than sufficient to fill the researcher’s attention (providing both “stick” and “carrot”) without all those messy peripheral areas of implementation details, environmental considerations (how the surrounding applications and operating systems will make use of these algorithms and protocols), and user issues (interfaces, usability, performance, and so on). A researcher in cryptography does not have to be confined so narrowly (and many are not), but nothing inherent in the field requires this broader view.
Privacy differs from such subjects in that thinking about implementation details, the surrounding environment, and user issues is of the utmost importance. Furthermore, not only are these aspects important, but they also force us to recognize the multidisciplinary nature of this field: implementation details often fall into the domain of the technological; the surrounding environment leads to a consideration of applicable laws and policies; and user issues have to do with the social understanding and desire for privacy. It is difficult (perhaps impossible?) to successfully look at privacy through a purely technical set of glasses; researchers with a primarily technical focus must also think about the context of a situation and about the human users involved. Cryptography is about protecting data, and one can think in completely abstract terms about an equation that will fail to hold true if a single bit in a data stream is flipped from 1 to 0, or from 0 to 1. Privacy, however, is about protecting personal data from other people. It is a person or a legal system (not an equation!) that draws the distinction between “data” and “personal data”, and it is our social and legal understanding of privacy that determines when another person has inappropriately learned or used some personal data.
An example may help to illustrate the need for multiple disciplines in privacy. Consider a “tip line” to the police department. Tips can be helpful in crime prevention and, especially, in solving criminal cases, but many people (out of fear or a simple desire to “not get involved”) would prefer not to use a tip line if the tip could be traced back to them. Consequently, many police departments have established anonymous tip lines. Such lines are often implemented using a telephone number, but let us imagine that a police department would instead like to implement this as an anonymous e-mail service.
Technical Solutions for E-mail Tip Line Anonymity
Thinking about a solution from a purely technical point of view might lead us in one of two possible directions. The first direction is what we might call “anonymizing the channel”. Say a user named Alice would like to send a crime tip to the police department anonymously. As we know, Alice’s computer uses two important protocols, the Transmission Control Protocol (TCP) and the Internet Protocol (IP), to send data from her machine to any other machine on the Internet. Any data that she sends will be broken into small packets (typically fewer than 1500 characters); each packet is put into an envelope that contains a number of pieces of information, including the sender address, the destination address, a sequence number (so that all the packets can be reassembled into the right order at the destination), and a checksum that can be used at the destination to see if any errors have been introduced into the packet during transmission. TCP is responsible for breaking data into packets, putting packets into envelopes, and recombining packets into the original data message at the receiving end. IP is responsible for routing each packet through the network so that it arrives at the destination as quickly as possible (note that each packet, because it has full addressing information in its envelope, can be routed independently of all the others and may therefore take its own individual path to the destination).
The source address in the packet envelope is the obvious enemy in the battle for privacy. Techniques for anonymizing the channel seek to strip this identifying information from data without requiring massive changes to the way the Internet currently works (that is, without having to change the universally-deployed TCP and IP protocols). The idea behind the “onion routing” approach to this is simple and elegant: Alice’s machine will take her tip for the police department and put this inside a message destined for some other machine (say Machine X). When Machine X receives its message, it will find something inside for the police department and will send this to the police department. The police will receive their tip, but the IP packets of the tip will have a source address of Machine X (not Alice’s machine). In real onion routing networks (see, for example, Tor ), many such intermediate machines are used, and encryption is employed at each layer so that the contents of a layer can only be read by the intended recipient for that layer. Each recipient has no way of knowing whether the machine from which it received the message was the original sender or just some other intermediate node, so Alice’s identifying address is effectively hidden from all machines.
The other possible direction for protecting Alice may be called “anonymizing the source”. A popular technique in this area is the public Internet café. Alice can simply go to an Internet café in a large city and send her crime tip in the clear from one of the machines there. Because anyone in the world (theoretically) could have gone to the café and sent a message from that machine to the police department, the message cannot be traced to Alice. This is the Internet equivalent of Alice going to a public telephone in a busy shopping center to call in a crime tip.
A truly paranoid user might of course choose both alternatives: Alice can go to a popular Internet café and send her crime tip from that machine through an anonymizing channel such as Tor. Alice may then feel quite confident that her data packets cannot be traced back to her by the police department or by anyone else sniffing packets on the Internet.
The Insufficiency of Technology
Research [2, 3] has shown, however, that the above approaches may be insufficient, even if they work perfectly at a technical level. In particular, classical stylometry (the study of linguistic style, typically in written language; see, for example, [4, 5]) can be used to analyze the content of a message in order to link the message with its author. Such analysis includes not just preferred words and phrases, but also sentence construction, spelling patterns (both correct and incorrect), grammatical idiosyncrasies, and other syntactic and semantic scrutiny of message content.
In empirical tests, the authors of  have found that reasonable analysis requires about 6,500 words known to be authored by a single individual. In other words, once an analyst has at least 6,500 words authored by Alice, he has a reasonable chance of determining whether or not an article of unknown authorship was indeed written by Alice. The unfortunate implication of this is that those who frequently publicize their work (including, for example, researchers who publish articles, conference papers, and book chapters on the importance of, and need for, privacy) are the very ones who may find it most difficult to post an anonymous letter. In an interesting twist of irony, those who desire privacy most and have worked most actively to achieve it in our society may have unintentionally thrown it away for themselves.
Note that the group of people who “frequently publicize their work” includes, for example, the growing number of otherwise hidden individuals that have decided to make their personal blogs available on the Internet. With movie stars, politicians, sports heroes, and others, we have come to recognize that people who choose a public life now may, to their regret, find it exceedingly difficult to have a private life in the future. The same appears to hold true for our public and private digital lives, and it may be worth taking this into consideration before deciding to post our musings and opinions to the world.
Help from our Friends?
The above discussion serves to remind us that technology alone cannot be a solution: privacy requires more than a group of technical researchers inventing Internet cafés and “anonymous channels”. Privacy is an attitude, a state of mind, a conscious decision. Like a professional actor playing a role well, sending an anonymous communication may require us to step outside ourselves (our personalities and habits) in order to create a piece of writing that says what we intend, but is truly distinct from all our other writing. Thus, for those that lead public lives (including privacy researchers), anonymity may be a form of role-playing, or at least may be an activity requiring focused attention and determined effort. Technologists may be able to “unlink” a message from our machines, but it will probably take the sociologists, psychologists, and philosophers to understand what it means to “unlink” a message from ourselves, and it will probably take the lawyers and policymakers to set out constraints for when and where this is socially acceptable behaviour.
If we wish to have effective privacy, therefore, it is clear that we need the perspectives and contributions of many different research communities. This can be challenging, but it is also what makes this field so stimulating and so interesting. In the end, it is the only recipe for success. As Lennon and McCartney said, “I get by with a little help from my friends, with a little help from my friends.”
 “Tor: an anonymous Internet communication system”; see http://tor.eff.org
 J. R. Rao and P. Rohatgi, “Can Pseudonymity Really Guarantee Privacy?”, Proceedings of the Ninth USENIX Security Symposium, Aug. 2000, pp. 85–96. Available at http://www.usenix.org/publications/library/proceedings/sec2000/full_papers/rao/rao.pdf
 J. Novak, P. Raghavan, and A. Tomkins, “Anti-Aliasing on the Web”, Proceedings of the 13th International Conference on the World Wide Web, 2004, pp. 30–39.
 Wikipedia: Stylometry; see http://en.wikipedia.org/wiki/Stylometry
 The Signature Stylometric System; see http://www.etext.leeds.ac.uk/signature
Carlisle Adams is Associate Professor at the School of Information Technology and Engineering (SITE), University of Ottawa.