Post 6: Data Mine

Words by Colette Duong

In the global village movement, social networking platforms have become hosts for the opinions of the individuals who foster within it. Relevantly, Twitter is a popular, online avenue where users collectively share their perspectives in masses of short comments, using hashtags and contributing to trending topics. Using Twitter as a medium for web scraping, I’ve attempted to collect data surrounding the topic of online privacy.

Pioneering in 2006 by American computer programmer Jack Dorsey, social media powerhouse, Twitter is unique for its 140-character post limit, sharing function of retweeting, and streamlined culture of voicing opinions into the void of the web. Moreover, Twitter is characteristic for its timeliness and “of-the-moment brevity” (Dorsey 2016) which compels users to express their musings in a concise manner. It functions like an archive of thoughts, which makes it a valuable asset to accumulate data from. Approximately 7,000 tweets are made per second, culminating to over 500 million in an average day (Internet Live Stats 2016). Effectively, tweets can be attached to timestamps, locations and hashtags, which can measure trends specific to groups of people and also reflect the social climate and attitudes of the interacting online community when viewed on a larger scale.

Twitter Usage
Twitter usage demographic in 2015 (WeRSM 2016)

Typically with online occupants being from the younger generation, Twitter noticeably yields a large demographic of youth and young adults. Accessible on desktop and mobile devices, tweets are sent out quickly with ease. In 2014, Pew Research Center published a report with their analysis of thousands of conversations on Twitter, categorising types of tweets into six archetypes to reveal the broad uses of the interface.

Twitter Structures
The Six Structures of Conversation Networks (Pew Research Center 2014) “Polarized Crowds, where opposed groups talk about the same topic but mostly just to other group members; Tight Crowds, made up of people bound together by some common interest (such as hobbies or professional pursuits); Brand Clusters, large groups that form around particular products, events or celebrities; Community Clusters, multiple small to medium-sized groups that typically form around big news events; Broadcast Networks, where many people follow and retweet a particular news source or commentator but don’t interact much with each other; and Support Networks, usually created when companies, agencies and other organizations respond to customer inquiries and complaints.” (Desilver 2016)
My Twitter
Aptly named @collectdog, my Twitter was previously a place where I would occasionally post trivial comments about mundane life situations. After every few months, I would typically delete all of my tweets as a form of ‘cleansing myself’. Analysing my behaviour and level of usage on Twitter, I found it one of my less useful forms of social media and would usually neglect it or feel embarrassed at my previous tweets even though I used a private account. I view Twitter as an informal platform of interaction, sometimes to talk to my online pals that I wouldn’t add on Facebook because that’s ‘too personal’. I remember my early perception of tweeting being “like Facebook statuses but less intimidating to post on”. (Twitter 2016)

As the issue of online privacy directly inhabits the digital space of the Internet, scraping data from the source seemed appropriate (and perhaps, partially reflective of the behaviour of data surveillance itself). Interested in personal perspectives behind the controversy, I was originally propelled to investigate how individuals felt about having their privacy compromised.

Twitter Archiver web scraping process (Colette Duong 2016)

Through my earlier research, I often found the dichotomy between private and public life a provoking topic. Now with the task of viewing public tweets, I began using the Twitter Archiver spreadsheet add-on with simple search rules with phrases like “my privacy is” and “my online privacy”.

Although, my findings weren’t too successful as the tweets didn’t have enough context in regards to the emergent issue or seemed to be contaminated by spam tweets, retweets, and external links rather than a mass of personal opinions.

In chronological order, I have conducted the following searches:

  • my privacy is (63 tweets)
  • my online privacy (368 tweets)
  • online privacy personal OR my OR feel OR protect (812 tweets)
  • i feel privacy policy (3 tweets)
  • google privacy feel OR my OR i (176 tweets)
  • hacking privacy #qanda (1 tweet)
  • privacy “i am” online (6 tweets)
  • privacy national OR security OR personal OR data (10774 tweets)
  • social media privacy (1062 tweets)

I didn’t get any results for:

  • online hacking webcam privacy
  • feel my OR privacy #dataveillance
  • #dataveillance
  • government surveillance “i feel”

I found this method of scraping data from the web quite interesting. I avoided the use of hashtags and location since I didn’t really want to focus on a specific event e.g. #censusfail #auspol, since I wasn’t too intrigued by that. Attempting a more concentrated approach to the assigned demographic, I changed my inputs to specify the topic of “social media privacy” to designate a user audience. I found this method of scraping data from the web quite interesting. The results that were most relevant to the demographic came from this search. Many of them seemed to be teenagers complaining about the situation or desiring a more private life.

As a potential visual exploration I feel that it would be interesting to map the emotion (or levels of anger) towards the online privacy issue, with location. Or even plot the different sides of the argument and highlight the humourous contradiction of complaining about social media on social media.

Examples of some successful visualisations that appeal to me. (Click image for captions)

Another side activity I conducted included quickly asking Cleverbot (a responsive, online Artificial Intelligence bot that attains knowledge from previous conversations with people) about their perspective on the issue. The results were interesting but probably not too relevant.

An intimate chat with Cleverbot (Cleverbot 2016)

Five Point Summary

  • Twitter results are not reflective of the world’s opinions on the issue.
  • Insights are mostly provided by an English-speaking audience.
  • Web scraping with Twitter Archiver is useful in culling interesting insights, although sometimes they are difficult to find amongst retweets/links.
  • Many people advocate the respect of their own privacy.
  • There are some differing perspectives on privacy on social media which may offer potential to explore.

Dorsey, J. 2016, ‘140 characters ‘is staying,’ CEO says while looking at Twitter’s history’, TODAY, <>.

Internet Live Stats, 2016, Twitter Usage Statistics, <>.

Simos, G. 2015, ‘2015 Social Media Demographics For Marketers’, WeRSM, <>.

Rainie, L. 2014, ‘The six types of Twitter conversations’, Pew Research Center, <>.

Desilver, D. 2016, ‘5 facts about Twitter at age 10’, Pew Research Center, <>.

Aizenberg, D. 2013, Atlas of the World Wide Web, portfolio, <>.

Carter, J. 2013, Eyes on the Sky, portfolio, <>.

%d bloggers like this: