Post 6: Scraping Twitter to create a data set

Molly Grover

Functioning at its most basic level as an online messaging service, Twitter provides users from all over the world with a platform to communicate, engage with one another and express ideas. Limiting posts to 140 characters or less, the platform encourages quick-fire exchanges of dialogue and conversation about a endlessly wide range of topics, such as news, current affairs, popular culture, humour, beliefs and personal matters.

Trending topic categories, hashtag and re-tweet functions all reinforce the platform’s emphasis on the rapid public dissemination of thought and opinion. A single user’s post has the power to spark a large-scale international debate in a number of minutes, gaining momentum every time the message is re-tweeted, engaged with or replied to by another user.

The sidebar of a user’s Twitter browser displays a constantly updated list of trending topics (Twitter 2016).

Encompassing 313 million active users worldwide, it can be argued that the Twitter community is primarily comprised of people who value the right to express their opinion. More than a profile picture or a one-line biography, identity in the Twittersphere is primarily constructed by the opinions, interests and ideas which one chooses to align with. Due to the inevitably polarising nature of this kind of open discussion, expressions of outrage are as common as positive affirmation in the ‘platform where all voices can be heard’ (Twitter 2016).

For an issue as complex and controversial as the Australian refugee and asylum seeker influx, it can thus be argued that Twitter is the perfect platform from which to scrape and analyse direct, passionate public sentiment.

Scraping Attempt 1

For my first scraping exercise, I used Google’s Twitter Archiver to collect tweets containing the hashtag #Refugees or #Asylumseekers or #Auspol.


Unfortunately, the breadth of these topics resulted in an extremely large, and not particularly relevant data set. Collecting almost 25,000 tweets from a period of only one week, a skim through the first few tweets in the data set revealed that many were unrelated to Australia’s treatment of refugees and asylum seekers. For example, the #Auspol hashtag collected sentiments regarding other areas of Australian politics, whilst the #Refugees and #AsylumSeekers hashtags collected discussion on refugee crises in other parts of the world.

My first attempt at scraping Twitter provided me with results which were not particularly relevant (Copyright 2016 Molly Grover).

Scraping Attempt 2

In order to refine my search to collect data of greater relevance, I decided to conduct a preliminary search on Twitter to gauge which terms and hashtags were likely to return the most interesting information. I created an advanced search for the words “Nauru” or “Manus” or “Detention”, and the hashtags #Nauru or #Manus or #Detention.


Reading through a handful of the tweets collected by this search, I was pleasantly surprised by the increased relevance of the results. By hashtagging the locations of Australia’s detention centres, I had successfully narrowed the topic from all those seeking asylum to only those seeking asylum in Australia. Furthermore, upon reading these tweets I was amazed by the number of users from other countries who were engaging in discussion surrounding Australia’s offshore detention policies.


Screen Shot 2016-09-06 at 10.34.47 AM
Tweets posted by users in London and New Zealand about Australia’s detention centres in Nauru (El-Enany & RNZ International 2016). 

Scraping Attempt 3

Fascinated by this, I returned to the Twitter Archiver and replicated this search, collecting nearly 9,000 tweets containing #Nauru or #Manus or #Detention from the last ten days. Interested in capturing the sentiment and discourse coming from outside Australia, I looked to the locations specified in each user’s Twitter profile, using conditional formatting to hide all those that contained an Australian location, such as Sydney, QLD, or Gippsland.

Combing through the tweets, I used conditional formatting with light coloured text to hide all Australian locations (Copyright 2016 Molly Grover).

Excluding users who listed non-specific or even fictional locations (e.g. Global Citizen or Gaia), I combed through the leftover results, copying and pasting the first 100 tweets into a separate spreadsheet.

Screen Shot 2016-09-06 at 10.44.35 AM
The new data set of 100 tweets from non-Australian locations (Copyright 2016 Molly Grover).

Process Flowchart

Screen Shot 2016-09-06 at 6.03.25 PM.png
A summary of my scraping processes (Copyright 2016 Molly Grover).


Upon reading the contents of these 100 tweets, I found 29 to be irrelevant, containing the hashtag #Detention yet not relating specifically to the Australian issue. From the remaining 71 tweets, I manually compiled a list of user locations, in order to gauge the pervasiveness of the issue from an international context.

Screen Shot 2016-09-06 at 10.59.55 AM
The 25 countries from which users in the sample tweeted (Copyright 2016 Molly Grover).

The 71 tweets in the sample were posted from a total of 25 different countries. Dominating the sample were England and the United States, with totals of 15 and 11 tweets respectively. After these followed New Zealand and Ireland, with a total 5 tweets each. These results are not surprising due to the active relationships between these countries and Australia, as a result of cultural similarities and the shared English language.

The remaining 21 countries were spread across Europe, Asia, Polynesia and the Middle East, revealing a much greater level of global pervasiveness than I had expected. Interestingly, whilst coming from such a diverse range of locations, the vast majority of these tweets were actually re-tweets of popular statements regarding only a select few issues. These included the recent incident of Danish politicians being denied access to Nauru, the 169 consecutive days of peaceful asylum seeker protests on the island, the leaking of the Nauru files, and a message of support to male detainees on Father’s Day.

Such similarity within this group of tweets does much to highlight the power of the re-tweet function as a disseminator of knowledge within the platform, echoing and spreading ideas at a rapid pace throughout the international Twitter community.

Furthermore, all of these tweets were positioned in objection to Australia’s current systems of offshore detention, echoing the dissatisfaction that is increasingly expressed within Australia, by both the public and the media, regarding the inhumane treatment of refugees on Manus Island and Nauru.

Screen Shot 2016-09-06 at 11.05.54 AMScreen Shot 2016-09-06 at 11.06.21 AMScreen Shot 2016-09-06 at 11.06.43 AM

Screen Shot 2016-09-06 at 11.07.04 AM
The four main tweets which were re-tweeted by the sample of non-Australian users (Bochenek, Karapanagiotidis, Insurrection News & TheDetentionForum 2016).

5 Point Summary

  • Australia’s detainment of refugees is a topic of international discussion within the Twitter platform, evidenced by tweets posted by many users who identify themselves as being outside Australia.
  • Tweets were sampled from 25 countries in total, spread across Europe, Asia, Polynesia and the Middle East.
  • Only a small handful of statements were re-tweeted and echoed between this large spread of users and locations.
  • From the sample taken, England and The United States of America were the two countries most involved in the conversation, attributable to their active relationships with Australia.
  • The overwhelming consensus from international users was a dissatisfaction with Australia’s treatment of asylum seekers and refugees.

Possible visual responses

From these exercises came a number of rich possibilities for a design response. Immediately, I imagined a data visualisation in the form of a world map, in which all tweets about the Australian refugee and asylum seeker issue could be plotted according to the locations of their users. Illuminating both the geographical and proportional spread of discussion surrounding the issue, this visualisation could include interactive functionality, allowing the user to click on a particular country to see the range of opinions expressed there.

My first idea for a design response: a world map of tweets related to the Australian issue (Copyright 2016 Molly Grover).

Another form of data visualisation could involve the chronological plotting of related tweets along a timeline, categorized by country of origin. Visualisation temporal frequency in the same manner as a heart rate monitor, this design response would communicate the constant spreading and shifting of conversation over time and place.

My next idea for a design response: a temporal visualisation of international tweets (Copyright 2016 Molly Grover).

Lastly, a design response could visualize the geographical trajectory of a single tweet as it is re-tweeted over and over again by users of different origins. Inspired by the small handful of recurring statements present in the data sample I collected, this response would comment on the methods with which information related to the issue is disseminated throughout the global Twitter platform.

My final idea for a design response: A visualisation of a single tweet’s global trajectory (Copyright 2016 Molly Grover).


Bochenek, M. 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Cooper, E. 2016, Millennials respond excellently to #HowToConfuseAMillenial Hashtag, Australia, viewed 5 September 2016, <>.

El-Enany, N. 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Google 2016, Twitter Archiver, California, viewed 5 September 2016, <>.

Insurrection News 2016, Tweet, Twitter, viewed 5 September 2016, <>.

Karapanagiotidis, K. 2016, Tweet, Twitter, Wurundjeri Land, viewed 5 September 2016, <>.

RNZ International 2016, Tweet, Twitter, New Zealand, viewed 5 September 2016, <>.

The Detention Forum 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Twitter 2016, Careers, San Francisco, viewed 5 September 2016, <>.

Twitter 2016, Twitter, San Francisco, viewed 5 September 2016, <>.