understanding the importance and impact of anonymity and authentication in a networked society
navigation menu top border

.:home:.     .:project:.    .:people:.     .:research:.     .:blog:.     .:resources:.     .:media:.

navigation menu bottom border
main display area top border
« Search engine privacy | Main | NYT reports on an uptick in online anymizing tools »

Giving it up for Free: Teens, Blogs, and Marketers’ Lucky Break

posted by:Jackie Strandberg // 11:38 PM // January 24, 2006 // ID TRAIL MIX


In 2004 the Miriam Webster Dictionary named “blog” the word of the year. This is only one indication of the increased popularity and visibility of blogs in the media, and blogs have become a viable and highly accessible source of information and opinion for all internet users. The very fact that I’m writing a blog for work purposes attests to their new found respect and acceptance in academic culture.

However, we earnest bloggers are not the only people out in cyberspace making good use of the free publishing services offered by software sites such as Blogger or LiveJournal. Because blogs are published on the internet, a public forum, they offer marketers a rich source of consumer information: one that they can access absolutely for free.

There is no question that personal information regarding consumer identity and spending habits online is highly valuable to marketers. Typically, much of this information is collected by web sites in the form of registration required for site access. “Lately there has been a tendency to trade information quid pro quo (See Custers 2000), and web users often have to fill out a form to simply gain access to a site” (Van Wel and Royakkers, 2004, p. 132). In many jurisdictions, including Canada, this information can be collected and used only with the explicit and ongoing consent of the consumer, and this requirement puts considerable onus on those collecting and using consumer information to ensure that consumers have offered their informed consent.

The type of information collected through blog mining is especially useful when the target demographic is teenagers, one the marketing industry’s most sought after consumer groups. Digital Marketing, a Toronto based industry magazine advocates mining teenagers’ blogs as they

“are important … not only as a tool to reach those groups [teenagers], but as a tool to get into their heads. I’m completely selling out my generation here by typing this, but so much money is spent trying to figure out what the average teenager is thinking and here on the Internet you have several thousand examples of a teenager’s open diary.” (Crittenden, 2003)

There are a lot of blogs out there produced by teenagers. In fact, teenagers make up more than half of the blogging population (Bocij, 2004, p. 17). Traditional safeguards used to protect privacy in the face of marketers are effectively defeated in a blogging context because teen bloggers are actively posting their consumer information, often of a highly personal nature, of their own volition. Marketers no longer need to conduct pesky focus groups, obtain parental consent, or concern themselves with ethical issues regarding intrusion in order to mine the rich data published on teenagers’ blogs. Teenagers who blog are essentially giving it up for free.

Web mining practices are used by marketers as a set of more subtle “techniques … used to automatically discover and extract information from web documents and services” (Van Wel and Royakkers, 2004, p. 129). When these documents and services are in the public domain, no explicit consent is required for data mining; furthermore, bloggers often are not even aware that their blogs are being mined.

Blog mining is the ability for a computer program to seek out blogs and search the content for information of use to a corporation – contact information (including email and other identifiers), social network information (contact information for connections within a social network), product likes and dislikes, recent purchases, opinions and attitudes, etc. Blog mining gives marketers a unique opportunity to track and capitalize on trends (see Domingos 2005 and Morinaga et. all 2002), with the resulting information contributing significantly to viral marketing practices. Blog mining gives marketers information direct from the source regarding products or services, and the information is acquired faster and for less cost compared to focus groups, surveys, or interviews (see Van Wel and Royakkers, 2004).

This new approach to information gathering has opened up the hitherto untapped information resource of blogs. In fact, blog mining is now virtually de rigueur: according to one industry analyst

“like unstructured content captured on Web forms that never really gets used, blogs' explosive growth is generating raw data sets that your company really can't afford to ignore.” (Columbus, 2005)

The legality of web mining as a whole is, admittedly, a grey area within Canadian Federal law. The Personal Information Protection and Electronic Documents Act (PIPEDA) received royal assent from the House of Commons in 2000, well before data mining had come into its own. An updated version of PIPEDA in the summer of 2004 failed to make any mention of web mining and the legalities surrounding it. Given the relative novelty of blogging as an emerging trend, it is unlikely the government will come to address it or the concerns surrounding the invasion of teenager’s online privacy in the near future. The reality, however, is that bloggers willingly publish these jewels of information and make them publicy acessible, making it “debatable whether this kind of information deserves to be protected at all” (Van Wel and Royakkers, 2004, p. 131).

On the technological side, there is no guaranteed way for bloggers to actively protect their blog from being indexed by a robot or spider, an essential step for data mining to work effectively. LiveJournal does provide advice to its users on how to best avoid being indexed. These recommendations include offering a block spiders/robots option, encouraging the use of Friends only or Private entries, and suggesting users avoid putting personal and contact information on the publicly viewed areas of their blogs. LiveJournal, however, is quick to point out that

“Not all robots respect the rules, although most of the popular search sites' robots do. LiveJournal cannot guarantee that the indexing option will keep your journal from being indexed by search engines. If your journal has been listed in a search engine, you will need to contact the search engine to have it removed from their listings” (Frequently Asked Questions).

Thus, the onus is placed directly on the user to ensure their own privacy and security, and users who do not take this step will leave the content of their blogs accessible to web mining.

This may leave teen particularly vulnerable to blog mining. Considering the fact that most teenagers have shown little to no security or privacy concerns about using the internet, it seems unlikely that they would bother to change the default setting of their blog to ensure their privacy (see Lenhart, Rainie, and Lewis, 2001).

In the end, teens must be encouraged to weigh the benefits of posting a blog against the costs of having their personal information mined without their knowledge. In the case of teenagers, it is difficult to see blogs being given up entirely, especially when their popularity is considered. All we can do is encourage teens to consider the privacy implications of posting information on publicly available blogs, and encourage them to exercise their rights with respect to privacy protection.

Special thanks to Jacquie Burkell for all her editorial assistance.


Bocij, Paul. (2004). Camgirls, blogs and wish lists: how young people are courting danger on the internet. Community Safety Journal. 3 (3), 16-23.

Columbus, L. (2005). Blog Mining Gets Real. CRM Buyer [online]. Available: http://www.crmbuyer.com/story/43483.html

Crittenden, S. (2003). He wuz a sk8er boi... . Marketing Magazine. 108 (10), 20.

Domingos, P. (2005). Mining social networks for viral marketing. IEEE Intelligent Systems, 20 (1), 80-82.
Frequently Asked Questions: What can I do to help stop my journal from ending up on search engines? (2005). LiveJournal [online]. Available: http://www.livejournal.com/support/faqbrowse.bml?faqid=50

Lenhart, A., Rainie, L., Lewis, O. (2001). Teenage Life Online: The rise of the instant- message generation and the Internet's impact on friendships and family relationships. Pew Internet and American Life Project [online]. Available:

Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T. (2002). Mining product reputations on the WEB. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discover and Data Mining, 341-349.

Office of the Privacy Commissioner of Canada (2005). The Personal Information Protection and Electronic Documents Act. [online]. Available: http://www.privcom.gc.ca/legislation/02_06_01_e.asp

Van Well, L., & Royakkers, L. (2004). Ethical issues in data mining. Ethics and Information Technology, 6 (2), 129-140.


Post a comment

Remember Me?

main display area bottom border

.:privacy:. | .:contact:.

This is a SSHRC funded project:
Social Sciences and Humanities Research Council of Canada