Data Scraping



Twitter was the social media platform that was chosen for this data scraping task. Unlike most other platforms which includes many sophisticated features, Twitter is a minimalist to them all. Twitter is essentially a platform that enables users to post statements and content which are limited to 140 characters. Being limited it forces the authors to be more creative through text usage. The platform also allows for reposting, known as re-twitting, others content essentially sharing the content to a wider audience.

The main unique quality of the platform is not actually a quality, but the strict character limit on each post. Unlike other platforms where there is a unlimited character count for each post, twitter only enables the user to utilise 140 characters per post. This ultimately forces the author of each post to be more creative in how they use each character and how they manipulate language to display their content.

The Scrape

Screen Shot 2016-09-06 at 2.47.37 PM.png


The chart above shows the searched rules, which include:
• The tweet must have the hash tags of Sydney and housing
• Can include any of the following words : Investing, Invest, Affordable, Affordability
• Can include the phrases : Housing Bubble

Utilising, Twitter Archiver, I was able to collect a mass amount of data from the social media platform with ease. Diving into the question of housing, I wanted to know what people thought about the housing market, if there was a bubble and the affordability of the houses. Having this in mind I fill in the advanced search form with the following:


Hash tags
• Sydney – I included the hash tag of Sydney because I wanted to only see tweets that were directed towards the sydney housing market, although many would of places sydney in the location box, many twitter users dont specify their locations or use false locations which caused difficulty when searching.
• Housing – The housing tag was in place so the only data that would be retrieved would largely be based off the topic or some what related to the topic of housing.

Include any of these words
• Investing – I included the search of this word, mainly because investing has a significant effect on the housing market and is to be said has an even greater effect on the housing bubble which is an on going issue / scenario.
• Housing – I added this word within the search as a backup incase users do not tag their posts. Doing this would continue to bring up posts related to housing instead of losing out of many posts.
• Affordable
• Market –
• Invest – Searching into the topic of the housing market, investing is a significant aspect of the scope which was the motive in the inclusion.
• Affordability – Looking into the aspect of housing affordability, I placed affordability in the search to bring up posts relating to how affordable housing is.

Including any of these phrases
• Housing Bubble – Diving into the topic of the housing market, there is a continuous on going speculation of the housing bubble and its affects on the market therefore its inclusion on the search.

The Data

Screen Shot 2016-09-06 at 3.31.11 PM.png

The twitter data scrape came up with an astonishing  retrieval of 9100 tweets, with majority of the content coming from overseas, through retweets which ultimately points to the fact that many oversea twitter users follow accounts which discuss the Sydney housing market.

Visual Design

After having collected the data, there was a massive variation of the location in which these posts came from. Looking into this information, many of the posts came from outer Australia, which was interesting as this was relatively a Australia/ Sydney related issue. Thinking onto this, I thought about plotting out where each tweet and re-tweet came from on a map giving me a much more clearer image as to where the most content comes from.