Scraping twitter for data

Twitter is a social media platform designed around the central feature of sharing 140 character posts with followers, while also following other users in order to receive their posts in a simple easy to digest feed.This functionality has lead to Twitter becoming of the most active discussion based communities on the web, where users cluster around issues and hashtags while also interacting with organisations and news content.

The important difference between Twitter and other social platforms is the circles and connections around issues rather than social circles or friendship groups. This leads to far more vigorous discussion of issues and more interaction with news content and news organisations.This is also seen in how users select who to follow, with many using their twitter feeds as a personally curated news source contributed to by any number of organisations or individuals they are interested in.

This is also one of the most popular tools for interaction between celebrities and other high profile figures and their fans, and the platform has often gathered negative attention due to attacks on high profile figures in the potentially anonymous environment. (Twitter 2016)


In my twitter web scraping, I set myself the goal to find data relating to parental leave for fathers, or paternity leave in Australia. The first search I was able to retrieve data with was based on the search rule:

> paternity OR leave OR feminism OR fathers OR gender OR equality #genderequality

This generated in 2791 tweets over a two week period. The tweets from this search was not specific to Australia, and only a few of them included the words paternity leave. Out of these tweets only 3 had the word ‘paternity’ in it – and none regarding Australia.

Twitter Advanced Search (Ahlstrom 2016)

I attempted a few searches before in purpose to collect data much more specific to Australia. I experimented with words like: parental leave, fathers, #genderequality etc. These area specific searches gave me no results, so I decided to continue with global twitter searches, but more channeled towards parental leave:

> fathers parental OR leave OR gender OR equality

This generated in 891 tweets. Again I tried to specify this search to only Australia but this gave no result. There were tweets within the successful search which were from Australia, so I became unsure if I was conducting the area search correctly. I tried a more concise search with the rule:

> parental #genderequality

This generated in only 18 tweets, where one tweet had been retweeted over the last couple of weeks:

Tweet by Empower Women (2016)

I explored the #IAmParent campaign which is an initiative from UN Women with basis in the Empower Women organisation. The campaign is a bit more specific to the current situation and urge for change in the US where there is zero federal financial support for mothers and fathers – which also was fact in Australia before 2011 (Department of Social Services 2016).

I am Parent Campaign by Empower Women (2016)


Most of the tweets I found that related to my topic had basis in the United States or India, and I realised a large part of them related to fathers day. I did a last attempt to scrape for more relevant data, with a search rule which excluded tweets addressing the debate in India, as well as most tweets relating to fathers day:

> fathers paternity OR parental OR australia -YNoLeave4Papa -India -day

This definitely resulted in a scraping more relating to my issue, and I found a few interesting accounts worth exploring.

(Fathers 4 Equality 2013)

The search led me to an organisation under the name Fathers4Equality (2013). A lot of the content created by this user related to laws regarding custody and divorce, and family in the event of separation between parents. It is definitely an organisation worth investigating to see how they position themselves with paternity leave and in what areas they experience difficulties.

Flow chart of automated task (Ahlstrom 2016)

It would be very interesting to see a visualisation of this data showing the gender split of people who are active around these topics. I was seeing a larger representation of men than I presumed and due to the nature of the medium perhaps views and opinions are expressed truer on Twitter.

Learning Outcome

  • Twitter is a powerful tool for realtime collection of opinions across a number of issues and from a huge variety of perspectives.
  • Successful scraping and collection of this data relies on having a clear understanding of the key terms of your issue and trial and error relating to queries.
  • The data scraped showed a wide variety of opinions from across the globe and a surprisingly high number of male voices, which tends to be different in the main stream media debate.
  • Twitter users tended to gravitate to either extreme in their opinions rather than representing a balanced point of view.
  • Twitter also allows users to present views and positions that may represent a small minority and therefore would not be otherwise seen in the media.
By Camilla Ahlström

Department of Social Services 2016, Paid Parental Leave Scheme, Australian Government, viewed 5 September 2016, <>
Empower Women 2016, I am Parent Campaign, viewed 6 September 2016, <>
Empower Women 2016, ‘These countries have the best parental leave policies in the world’, Twitter post, 28 August, viewed 5 September 2016, <>
Fathers 4 Equality 2013, Fathers 4 Equality Australia, viewed 5 September 2016 <>
Twitter 2016, Twitter Inc, viewed 6 September, <>
%d bloggers like this: