Post 6—Scraping the web and the inevitable #CensusFail

As online technologies have rapidly developed in recent years, social media has become a seamless and powerful platform for individuals to voice their opinions. Social media is often used to document realtime events so it is interesting to examine how comments shared in the cyber sphere impact on physical relationships.


Engaging in a web scraping exercise I chose to explore Twitter for a number of reasons: it has become one of the world’s largest and most popular social media platforms with over 313 million active users each month (Twitter 2016). It’s functionality urges a consolidated and succinct approach to language—defined by its unique constraint of 140 character text “tweets”, users share information, vent their frustration and often make jokes. Twitter accounts are usually publicly accessible to anyone, allowing the ability to connect with others in a universal context and observe a wide scope of responses to current affairs.

Inbuilt into the platform are features including “ReTweet” and mentions—allowing a user to share a tweet which has been posted by someone else and also directly address a specific use. Both of these specific functions serve as tools to facilitate connection, conversation and debate. The most useful function of Twitter is the ever popular hashtag which categorises a tweet into a topic e.g. #auspol. Following a hashtag allows a user to see what others have also shared on a similar topic—again a facilitator for dialogue about a particular event or issue. All tweets posted are instantly archived so Twitter is somewhat of a communal, participatory online diary.

Scraping the web via Twitter archiver

In wake of the recent ABS online census debacle, I was interested to learn about how Twitter users react to large scale data breaches/system failures and hacking. Using a twitter archiver plug-in I began by creating a search rule which fished out tweets which contained the phrase “data breach”.


This twitter wide search returned over 10,000 Tweets. Analysing the first ten results by handle, content and location it became clear that media coverage of Dropbox’s data breach is the current topic of conversation in the online privacy world. Unfortunately these Tweets were not necessarily Australia specific—but this begs the question: are physical borders relevant in the digital realm, especially if a multinational service provider is compromised?

Looking to gauge the Australian public’s viewpoint, I revised my search rule to directly focus on the census. Regarding online privacy and security in Australia the census is the most recent incident which affects the entire population, leading me to the assumption that there would be a long, colourful list of tweets documenting and responding to what unfolded on census night. After hitting enter the Twitter archiver compiled a list of 221 tweets which contained some or all of the following keywords and hashtags: ABS census, #CensusFail, #auspol, #Census2016.


Upon analysis, this broad set of data allowed me observe trends in Twitter users. The general consensus among twitter users appears to be a strong bias against the ABS, the collection of their information and the poor handling of the clean up after the 9th of August.

Observing the location of users it is apparent that a majority are based in large townships or cities which raises questions around education and awareness. In larger cities the use of online technologies are well integrated and talked about, whilst on the other hand living in a densely populated urban area may be cause to care far more holistically about security and privacy. Perhaps there is still somewhat of a disconnect between regional/rural communities and online technology. People in these regions may be more concerned about other social and political issues.

173 of the 221 Tweets posted were ReTweets. A lot of twitter users may be concerned about their online privacy and how the 2016 census ended up, yet they may not have a personal opinion to put forward. Using the ReTweet feature is therefore a viable method to communicate their stance and also share awareness with others on Twitter. Lastly, just over half of the tweets were posted via mobile device which suggests that many Twitter users access information on the go. The use of a mobile device suggests that a lot of users would be up to date with current affairs as events unfold in real time.

Scraping the web via Twitter advanced search

The search results that the Twitter archiver provided me with were less interesting than I had anticipated and I soon realised I was more interested in opinionated and emotional responses to the situation. The second web scrape involved using the advanced search function on the Twitter website. Using the same key words and hashtags I was able to navigate a very different landscape. I was able to see a range of angry, concerned and humorous text and also image based responses. When compared with the first scraping method it becomes clear that Twitter Archiver filters a lot of content down. The advanced search feature however is subjective as I can pick out and be more interested in a certain persons tweet whilst the archiver is an objective quantifier of data. These are some of the more interesting comments and responses:


As the Twitter archiver revealed in initial searches, the advanced search feature confirmed that there is strong opposition to the ABS, certain politicians and other government agencies. The advanced search revealed emotional responses of users which varied from frustration and anger to helplessness which have continued from census night until early September. Other users comment on the #CensusFail using sarcastic humour and memes. However, most active users are not specifically venting about their rights to online privacy or surveillance, rather their frustration and concern lies in the ABS and the Australian government’s inability to deliver on previously made promises. This may perhaps be due to the fact that census data was not extracted or breached when the website was taken offline. Considering that the census is usually understood as a progressive and necessary survey of the population, people may not be totally opposed to their data being collected, rather how and when it is collected.


Outcomes—emotional responses

It appears that within Australia the conversation about online privacy and surveillance only extends so far in the digital realm. Most users concerned with this issue are saying the same thing in differing ways. It is also evident that many responses are personal, and emotionally influenced. Analysing a user, their tone of voice, subject matter and attachments (images, links gifs and videos) would be a way understand how certain demographics feel about the issue. For instance from scraping the web it is probable that younger generations are more likely to use humour to mock the #CensusFail (e.g. share a meme) whilst older users seem to vent their frustration via ReTweeting and sharing external media links—With this in mind I am interested in visualising emotional responses to online privacy by demographic. I believe that understanding an emotional response allows for a tailored approach to raising further awareness around online privacy. This type of visualisation may help to bridge the gap between polarising demographics of users. Below is a potential visualisation:


This map takes into account 6 generalised emotional responses by a demographic of 18-30 yr old users. The size of each circle represents the quantity of responses which fall under each category and the overlap represents how certain emotions share a relationship or exist in the same space. This type of map is flexible and further layers, textures or colour could be added as the complexity of the issue is fully understood.

by Samson Ossedryver 


Twitter 2016, About, Twitter, viewed 2 September 2016, <>.

Smith, B. 2012, The Beginner’s Guide to Twitter, Mashable, viewed 2 September 2016, <>.


%d bloggers like this: