Blog 6 – Data Scraping

Eugene Alberts

Described as ‘The SMS of the internet’, Twitter allows for users to post short messages in either an openly public or select public forum. It’s appeal as a mode of communication is in its simplicity. It also revolves around the premise of being able to ‘follow’ a person of interest without having to personally know them, unlike Facebook.

Noah Glass, one of the company’s founders says of the creation of Twitter’s name and its definition is ‘a short burst of inconsequential information,’ and ‘chirps from birds’. It’s interesting to note that from the very beginning the platform didn’t have lofty intentions and wasn’t meant to communicate anything consequential. This is perhaps built into the system with the 140-character limit that doesn’t allow for anything to be written about in depth. It perhaps forces the user to simplify, be succinct, focus, and narrow what they want to say into its essence. The value of its actual informative value is debatable.

Market research firm Pear Analytics catagorised a cross section of tweets in August 2009 and found the following percentages:

  • Pointless babble– 40%
  • Conversational – 38%
  • Pass-along value – 9%
  • Self-promotion – 6%
  • Spam– 4%
  • News – 4%

flow chart


Collecting Twitter data with the keywords ‘negative gearing’ and ‘election’, during June 30-July 3rd (the few days around the federal election), location – Australia wide, returned only 16 posts.

The posts included investment advice doubling as self-promotion from Money magazine, The Guardian, various property developers and other real estate commentators with a business front. It seemed like promotion was key for these companies, linking to their own websites with information on the topic, using the election period as an opportunity to redirect traffic.

There were also posted quite a few pessimistic political outlooks for the ALP’s election victory given their proposed changes to negative gearing. Interesting not many out spoken protectors of the negative gearing policy. Perhaps hinting at a degree of guilt by the older generations but who are reluctant to speak out for it.

I was surprised there weren’t as many tweets linking to the three keywords as I might have imagined during the lead up the election. This lead me omit the keyword ‘election’ from the search terms brought around 200 more results which is still only a tiny percentage of Australia’s average 7 million tweets per day. The negative gearing issue should have been at it’s hottest during these few days but the scrape suggest otherwise. Perhaps the answer lies in Twitter’s demographic? Or perhaps an indifference to the issue?

neg gearing