Post 6: Scraping Twitter to create a data set

Molly Grover

Functioning at its most basic level as an online messaging service, Twitter provides users from all over the world with a platform to communicate, engage with one another and express ideas. Limiting posts to 140 characters or less, the platform encourages quick-fire exchanges of dialogue and conversation about a endlessly wide range of topics, such as news, current affairs, popular culture, humour, beliefs and personal matters.

Trending topic categories, hashtag and re-tweet functions all reinforce the platform’s emphasis on the rapid public dissemination of thought and opinion. A single user’s post has the power to spark a large-scale international debate in a number of minutes, gaining momentum every time the message is re-tweeted, engaged with or replied to by another user.

The sidebar of a user’s Twitter browser displays a constantly updated list of trending topics (Twitter 2016).

Encompassing 313 million active users worldwide, it can be argued that the Twitter community is primarily comprised of people who value the right to express their opinion. More than a profile picture or a one-line biography, identity in the Twittersphere is primarily constructed by the opinions, interests and ideas which one chooses to align with. Due to the inevitably polarising nature of this kind of open discussion, expressions of outrage are as common as positive affirmation in the ‘platform where all voices can be heard’ (Twitter 2016).

For an issue as complex and controversial as the Australian refugee and asylum seeker influx, it can thus be argued that Twitter is the perfect platform from which to scrape and analyse direct, passionate public sentiment.

Scraping Attempt 1

For my first scraping exercise, I used Google’s Twitter Archiver to collect tweets containing the hashtag #Refugees or #Asylumseekers or #Auspol.


Unfortunately, the breadth of these topics resulted in an extremely large, and not particularly relevant data set. Collecting almost 25,000 tweets from a period of only one week, a skim through the first few tweets in the data set revealed that many were unrelated to Australia’s treatment of refugees and asylum seekers. For example, the #Auspol hashtag collected sentiments regarding other areas of Australian politics, whilst the #Refugees and #AsylumSeekers hashtags collected discussion on refugee crises in other parts of the world.

My first attempt at scraping Twitter provided me with results which were not particularly relevant (Copyright 2016 Molly Grover).

Scraping Attempt 2

In order to refine my search to collect data of greater relevance, I decided to conduct a preliminary search on Twitter to gauge which terms and hashtags were likely to return the most interesting information. I created an advanced search for the words “Nauru” or “Manus” or “Detention”, and the hashtags #Nauru or #Manus or #Detention.


Reading through a handful of the tweets collected by this search, I was pleasantly surprised by the increased relevance of the results. By hashtagging the locations of Australia’s detention centres, I had successfully narrowed the topic from all those seeking asylum to only those seeking asylum in Australia. Furthermore, upon reading these tweets I was amazed by the number of users from other countries who were engaging in discussion surrounding Australia’s offshore detention policies.


Screen Shot 2016-09-06 at 10.34.47 AM
Tweets posted by users in London and New Zealand about Australia’s detention centres in Nauru (El-Enany & RNZ International 2016). 

Scraping Attempt 3

Fascinated by this, I returned to the Twitter Archiver and replicated this search, collecting nearly 9,000 tweets containing #Nauru or #Manus or #Detention from the last ten days. Interested in capturing the sentiment and discourse coming from outside Australia, I looked to the locations specified in each user’s Twitter profile, using conditional formatting to hide all those that contained an Australian location, such as Sydney, QLD, or Gippsland.

Combing through the tweets, I used conditional formatting with light coloured text to hide all Australian locations (Copyright 2016 Molly Grover).

Excluding users who listed non-specific or even fictional locations (e.g. Global Citizen or Gaia), I combed through the leftover results, copying and pasting the first 100 tweets into a separate spreadsheet.

Screen Shot 2016-09-06 at 10.44.35 AM
The new data set of 100 tweets from non-Australian locations (Copyright 2016 Molly Grover).

Process Flowchart

Screen Shot 2016-09-06 at 6.03.25 PM.png
A summary of my scraping processes (Copyright 2016 Molly Grover).


Upon reading the contents of these 100 tweets, I found 29 to be irrelevant, containing the hashtag #Detention yet not relating specifically to the Australian issue. From the remaining 71 tweets, I manually compiled a list of user locations, in order to gauge the pervasiveness of the issue from an international context.

Screen Shot 2016-09-06 at 10.59.55 AM
The 25 countries from which users in the sample tweeted (Copyright 2016 Molly Grover).

The 71 tweets in the sample were posted from a total of 25 different countries. Dominating the sample were England and the United States, with totals of 15 and 11 tweets respectively. After these followed New Zealand and Ireland, with a total 5 tweets each. These results are not surprising due to the active relationships between these countries and Australia, as a result of cultural similarities and the shared English language.

The remaining 21 countries were spread across Europe, Asia, Polynesia and the Middle East, revealing a much greater level of global pervasiveness than I had expected. Interestingly, whilst coming from such a diverse range of locations, the vast majority of these tweets were actually re-tweets of popular statements regarding only a select few issues. These included the recent incident of Danish politicians being denied access to Nauru, the 169 consecutive days of peaceful asylum seeker protests on the island, the leaking of the Nauru files, and a message of support to male detainees on Father’s Day.

Such similarity within this group of tweets does much to highlight the power of the re-tweet function as a disseminator of knowledge within the platform, echoing and spreading ideas at a rapid pace throughout the international Twitter community.

Furthermore, all of these tweets were positioned in objection to Australia’s current systems of offshore detention, echoing the dissatisfaction that is increasingly expressed within Australia, by both the public and the media, regarding the inhumane treatment of refugees on Manus Island and Nauru.

Screen Shot 2016-09-06 at 11.05.54 AMScreen Shot 2016-09-06 at 11.06.21 AMScreen Shot 2016-09-06 at 11.06.43 AM

Screen Shot 2016-09-06 at 11.07.04 AM
The four main tweets which were re-tweeted by the sample of non-Australian users (Bochenek, Karapanagiotidis, Insurrection News & TheDetentionForum 2016).

5 Point Summary

  • Australia’s detainment of refugees is a topic of international discussion within the Twitter platform, evidenced by tweets posted by many users who identify themselves as being outside Australia.
  • Tweets were sampled from 25 countries in total, spread across Europe, Asia, Polynesia and the Middle East.
  • Only a small handful of statements were re-tweeted and echoed between this large spread of users and locations.
  • From the sample taken, England and The United States of America were the two countries most involved in the conversation, attributable to their active relationships with Australia.
  • The overwhelming consensus from international users was a dissatisfaction with Australia’s treatment of asylum seekers and refugees.

Possible visual responses

From these exercises came a number of rich possibilities for a design response. Immediately, I imagined a data visualisation in the form of a world map, in which all tweets about the Australian refugee and asylum seeker issue could be plotted according to the locations of their users. Illuminating both the geographical and proportional spread of discussion surrounding the issue, this visualisation could include interactive functionality, allowing the user to click on a particular country to see the range of opinions expressed there.

My first idea for a design response: a world map of tweets related to the Australian issue (Copyright 2016 Molly Grover).

Another form of data visualisation could involve the chronological plotting of related tweets along a timeline, categorized by country of origin. Visualisation temporal frequency in the same manner as a heart rate monitor, this design response would communicate the constant spreading and shifting of conversation over time and place.

My next idea for a design response: a temporal visualisation of international tweets (Copyright 2016 Molly Grover).

Lastly, a design response could visualize the geographical trajectory of a single tweet as it is re-tweeted over and over again by users of different origins. Inspired by the small handful of recurring statements present in the data sample I collected, this response would comment on the methods with which information related to the issue is disseminated throughout the global Twitter platform.

My final idea for a design response: A visualisation of a single tweet’s global trajectory (Copyright 2016 Molly Grover).


Bochenek, M. 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Cooper, E. 2016, Millennials respond excellently to #HowToConfuseAMillenial Hashtag, Australia, viewed 5 September 2016, <>.

El-Enany, N. 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Google 2016, Twitter Archiver, California, viewed 5 September 2016, <>.

Insurrection News 2016, Tweet, Twitter, viewed 5 September 2016, <>.

Karapanagiotidis, K. 2016, Tweet, Twitter, Wurundjeri Land, viewed 5 September 2016, <>.

RNZ International 2016, Tweet, Twitter, New Zealand, viewed 5 September 2016, <>.

The Detention Forum 2016, Tweet, Twitter, London, viewed 5 September 2016, <>.

Twitter 2016, Careers, San Francisco, viewed 5 September 2016, <>.

Twitter 2016, Twitter, San Francisco, viewed 5 September 2016, <>.


blog 6- Mental Health in Twitter Timeline

By Marcella K. Handoko Kwee

Current Twitter logo (flaticon 2015)

The chosen social media platform is Twitter. I could use other social media, such as Facebook, Instagram, Blogs etc since I have not logged into Twitter for very long time however, I thought the more unfamiliar I am with ‘what is going on in Twitter’, the more interesting Twitter becomes.


Twitter is a social media that allows the users worldwide to write up their thoughts they care about only up to 140 in characters each time. It is being referred as ‘Tweet’. Twitter is commonly used for either personal or business purposes. Twitter also allows the users to follow other users with the same or different interests, follow ‘stakeholders’: actors/actresses, news agencies, companies etc, to retweet other users’ tweets, to mark other’s tweets as favorites, to reply other tweets and so on. Furthermore, Twitter allows the users to link their other social platforms with their Twitter accounts. Thus other Twitter’s users should be able to check our posts on other social media without any interruptions or having to log into those specific social media.

Through Twitter, hard-liners are able to advertise their idealised image of their beliefs as well as to express their disagreements toward their competitors non-verbally, stakeholders: companies, media agencies, governments etc compete one another by sharing their best contents in order to gain popularity measured through the number of retweets, replies and followers, general populations show their supports towards things that they care about.

The Process of Collecting Data


GSpreadsheet 1

In order to document or collect the findings/results associated with mental health stigma within Twitter platform, the first step taken is using Twitter Archiver Google Spreadsheet to search tweets under hashtag ‘stigma’ following by keywords ‘mental health’ and ‘disorder’. Below is provided 2 flow chart graphics, which are consisted of flow chart graphic of number of Twitter users’ followers and follows, and flow chart graphic of count of tweet text. The next step taken is using Twitter ‘Advanced Search’ to search tweets under the same hashtag and same keywords however, the keywords are combined and became ‘mental health disorder’.

Flow Spreadsheet 1

Flow Chart 1

Flow Spreadsheet 2

Flow Chart 2

Comparing between the two alternative ways of collecting data of mental health stigma, I found out that Google Twitter Archiver gave me so much more structured and complete data however, Twitter Advanced Search gave me insights into what the users were actually talking about in their posts, the aim of the conversations, the tone of the conversations clearly in full design screen. In order to dig much more interesting and relevant information on mental health stigma, the best way to do that is by scrolling throughout Twitter timeline.

Advanced Search 1

Advanced Search 2

Here are some of the summaries of the results found:

  • Stigma does not always relate to mental health. Stigma does exist in a number of different areas, including community services, physical health and disabilities. When “stigma” is typed into the hashtag search column, it will show lots of irrelevant topic to mental health. Most of the topic discussed are associated with lung cancer, HIV, physical disabilities and postcode prejudice in the region.
  • Regardless area of issue, stigma has always been controversial and negative.
  • Supportive statements in association with mental health have been found. Few examples are a tweet by @MHCNSW, “Glad to see Aussie men getting on board with #ItsOkayToTalk, breaking down #stigma around #mentalhealth and #suicide” has been retweeted 4 times, a tweet by @AllanSparkes, “Speak up, stop the deathly silence.Thank U Men’s Health for helping break the stigma. @MensHealthAU #LiveStronger” has been retweeted 18 times and a tweet by @DestroyerMariko, “We’re getting there, but we still need to break the #stigma of #mentalillness. This is awful: #mentalhealth #bipolar”. There are senses of relief, proud, courage and grateful throughout the words. The good thing about Twitter is it can also be used as a medium to increase awareness globally.
  • Current news reports have also been found. Few examples are a tweet by @Pawsitivehills, “Our Gold Sponsor Medibank Private Castle Hill are helping us fight the stigma of mental illness in the hills.” has been retweeted 5 times and a tweet by @KBoydell, “Mural art – making a difference increasing awareness decreasing stigma enhancing community relationships @themhsorg” has been retweeted twice. Twitter platform can be used to share information on current situations as well as to show their supports. However, chance is the subjects of the tweet are getting advertised. Thus Twitter platform is an alternative way to gain popularity.
  • One-on-one conversation tweet has also been found. A tweet by @juntei, “@helisalmiakki yeah. they aren’t seeing past stigma. maybe uncomfortable w family being MI and/or disabled? unconscious associated shame”. However, should there are concerned about privacy of personal opinions or views.
  • Retweet functionality is one good point. It helps the users to share good contents in much easier way however, chance is false news can spread quickly over the timeline.
  • Top tweets often come from users who are considered professional and interested in politics and laws, journalism, health, public speaking, human rights, psychologist and similar areas.

Screen Shot 2016-09-02 at 12.41.26 AM

I managed to take another step. I adjusted slightly different keywords and hashtags: ‘stigma’ in this exact phrase column, ‘mental condition struggles uncertainty’ in any of these words column and ‘mental health’ in these hashtags column to leave out in both Twitter Advanced Search and Google Twitter Archiver in order to generate more interesting and relevant results. The current results shown up are much more relevant to the issue, including mental health stigma quotes without showing any other areas of issue but, fewer in number of results.