Words by Colette Duong
In the global village movement, social networking platforms have become hosts for the opinions of the individuals who foster within it. Relevantly, Twitter is a popular, online avenue where users collectively share their perspectives in masses of short comments, using hashtags and contributing to trending topics. Using Twitter as a medium for web scraping, I’ve attempted to collect data surrounding the topic of online privacy.
Pioneering in 2006 by American computer programmer Jack Dorsey, social media powerhouse, Twitter is unique for its 140-character post limit, sharing function of retweeting, and streamlined culture of voicing opinions into the void of the web. Moreover, Twitter is characteristic for its timeliness and “of-the-moment brevity” (Dorsey 2016) which compels users to express their musings in a concise manner. It functions like an archive of thoughts, which makes it a valuable asset to accumulate data from. Approximately 7,000 tweets are made per second, culminating to over 500 million in an average day (Internet Live Stats 2016). Effectively, tweets can be attached to timestamps, locations and hashtags, which can measure trends specific to groups of people and also reflect the social climate and attitudes of the interacting online community when viewed on a larger scale.

Typically with online occupants being from the younger generation, Twitter noticeably yields a large demographic of youth and young adults. Accessible on desktop and mobile devices, tweets are sent out quickly with ease. In 2014, Pew Research Center published a report with their analysis of thousands of conversations on Twitter, categorising types of tweets into six archetypes to reveal the broad uses of the interface.


As the issue of online privacy directly inhabits the digital space of the Internet, scraping data from the source seemed appropriate (and perhaps, partially reflective of the behaviour of data surveillance itself). Interested in personal perspectives behind the controversy, I was originally propelled to investigate how individuals felt about having their privacy compromised.

Through my earlier research, I often found the dichotomy between private and public life a provoking topic. Now with the task of viewing public tweets, I began using the Twitter Archiver spreadsheet add-on with simple search rules with phrases like “my privacy is” and “my online privacy”.
Although, my findings weren’t too successful as the tweets didn’t have enough context in regards to the emergent issue or seemed to be contaminated by spam tweets, retweets, and external links rather than a mass of personal opinions.
In chronological order, I have conducted the following searches:
- my privacy is (63 tweets)
- my online privacy (368 tweets)
- online privacy personal OR my OR feel OR protect (812 tweets)
- i feel privacy policy (3 tweets)
- google privacy feel OR my OR i (176 tweets)
- hacking privacy #qanda (1 tweet)
- privacy “i am” online (6 tweets)
- privacy national OR security OR personal OR data (10774 tweets)
- social media privacy (1062 tweets)
I didn’t get any results for:
- online hacking webcam privacy
- feel my OR privacy #dataveillance
- #dataveillance
- government surveillance “i feel”
I found this method of scraping data from the web quite interesting. I avoided the use of hashtags and location since I didn’t really want to focus on a specific event e.g. #censusfail #auspol, since I wasn’t too intrigued by that. Attempting a more concentrated approach to the assigned demographic, I changed my inputs to specify the topic of “social media privacy” to designate a user audience. I found this method of scraping data from the web quite interesting. The results that were most relevant to the demographic came from this search. Many of them seemed to be teenagers complaining about the situation or desiring a more private life.
As a potential visual exploration I feel that it would be interesting to map the emotion (or levels of anger) towards the online privacy issue, with location. Or even plot the different sides of the argument and highlight the humourous contradiction of complaining about social media on social media.
Examples of some successful visualisations that appeal to me. (Click image for captions)

Another side activity I conducted included quickly asking Cleverbot (a responsive, online Artificial Intelligence bot that attains knowledge from previous conversations with people) about their perspective on the issue. The results were interesting but probably not too relevant.

Five Point Summary
- Twitter results are not reflective of the world’s opinions on the issue.
- Insights are mostly provided by an English-speaking audience.
- Web scraping with Twitter Archiver is useful in culling interesting insights, although sometimes they are difficult to find amongst retweets/links.
- Many people advocate the respect of their own privacy.
- There are some differing perspectives on privacy on social media which may offer potential to explore.
Dorsey, J. 2016, ‘140 characters ‘is staying,’ CEO says while looking at Twitter’s history’, TODAY, <http://www.today.com/video/140-characters-is-staying-ceo-says-while-looking-at-twitter-s-history-647319107566>.
Internet Live Stats, 2016, Twitter Usage Statistics, <http://www.internetlivestats.com/twitter-statistics/>.
Simos, G. 2015, ‘2015 Social Media Demographics For Marketers’, WeRSM, <http://wersm.com/2015-social-media-demographics-for-marketers/>.
Rainie, L. 2014, ‘The six types of Twitter conversations’, Pew Research Center, <http://www.pewresearch.org/fact-tank/2014/02/20/the-six-types-of-twitter-conversations/>.
Desilver, D. 2016, ‘5 facts about Twitter at age 10’, Pew Research Center, <http://www.pewresearch.org/fact-tank/2016/03/18/5-facts-about-twitter-at-age-10/>.
Aizenberg, D. 2013, Atlas of the World Wide Web, portfolio, <http://www.aizendaf.com/Atlas-of-the-World-Wide-Web>.
Carter, J. 2013, Eyes on the Sky, portfolio, <http://jedcarter.co.uk/blog/my-work-2/eyes-on-the-sky>.
You must be logged in to post a comment.