Post 6: Web scrapping: a finesse art. *not yet mastered

Scraping Twitter as a means of understanding social perceptions or misconceptions

Caitlin Kerr

Twitter is a social sharing platform which people use to share small snippets of information. This information can sometimes seem frugal or random. Mainly people like to share their opinions, emotions, comment on an event or interesting facts. A particular feature is ‘following’ in which people tend to follow people or organisations they personally align with or find interesting. Within this function many followers like to retweet tweets as they feel a strong connection or agreeance with their messages and views.
The way twitter is establish in a conversational manner in which one is encourage to voice their opinion and discuss opinions gives great scope and insight into research surrounding views linked to specific issues. In relation to eat twitter or opinion posted comes a range of information such as date, time, location, bio, user name which you can then extrapolate or refine in order to gain further information from data sets made from twitter posts. This makes twitter a very interesting flexible tool for finding out common perceptions.
Geo tagging and user location is an interesting and unique feature that can be link to a posts or the persons account. This tool can be utilised to separate and compare the opinion of people from different countries or regions.
Screen Shot 2016-09-07 at 1.32.27 am.png
This twitter post I stumbled upon in my trawling of data, expresses exactly what I wish to break through in research. This idea of how opinions snow ball into truths through the process of being told them so many time and by people of power and knowledge.
Understanding this idea of how people use twitter to express their ideas and opinions, I wanted to utilise my twitter bot to gauge an understanding of these perception surrounding asylum seekers. In particular regarding the main issues that come up such as ‘turn back the boats’. Through the bot I want to understand weather:
  • peoples views where positive or negative
  • in particular the common misconceptions that people base their ideas and views on
  • with a focus on Australia in particular
In the flow chat below you can see I brainstormed common phrases that could lead me to these misconceptions. These iphrases would become the bases for my searches in discovering the attitude behingd these key ideas below:
  • Take Australian Jobs
  • No refugee tow backs
  • #loveitorleaveit
  • turn back the boats
  • taking our jobs
  • illegal immigrants
  • security risk

14274290_10157524135140093_871710574_o copy.png

Screen Shot 2016-09-07 at 2.08.31 am.png

Screen Shot 2016-09-05 at 5.28.25 pm.png
Much of these initial search using the twitter advanced search tool came up with no results. Particularly using the word Australia or http and hashtags #loveitorleaveit (above)
It was through including a combination of the common words used for asylum seekers as well as the phrases that I started my initial searches. With my initial searches I was faced with over 4000 results. This initial search is where I really started to sort through the problem I would have to overcome in order to get a more concise data set. My first hurdle is the repetition of tweets due to retweeting. In the data below you can see how in this on page alone popular tweet are retweet which clutters the data set (colours represent the same tweet being retweeted).
Screen Shot 2016-09-07 at 12.48.49 am.png
The other problem I incurred was location as a majority of people do not list their location, or there location is a fictitious place “my own little world” or “under your bed” like Quinn Kirby below, didn’t exactly reflect if they where Australian or not. Through refining the search therefore to only Australia would mean I would miss data from users that had no location or made up locations and as most people do not list there location the selection excluded to much of the data.
Screen Shot 2016-09-07 at 12.58.23 am.png
The only success I had with being able to adjust location was with the tag “turn back the boats” in which I filter; no location, Melbourne, Sydney and other. This was a small data set so it was possible to do by hand (below).
Screen Shot 2016-09-06 at 10.57.37 pm.png
This led furthermore to other problems of people from other countries clouding the data. Weather it was people talking from Italy or the UK talking about turning back boats or lots of Americans talking about illegal immigrants in reference to Mexicans and the Trump debate (below) was leading to the data set become littered with irrelevant information and  irrelevant to my initial search for Australian perception of asylum seekers.
Screen Shot 2016-09-06 at 10.37.36 pm 2.png
Screen Shot 2016-09-07 at 1.13.30 am.pngScreen Shot 2016-09-07 at 1.18.21 am.png
My other major problem I encountered was tweets that did not make sense. This in particular was a problem as there was no common theme between then make them hard to exclude from the data of a search refinement level. While with the image below if you read the text by itself gives one perception while couple with the photo tells a different story or humour instead. This is an example of how data itself can become misinforming and skewed.
Screen Shot 2016-09-07 at 1.25.12 am.png
As I was unable to solve the refinement of location and reoccurance of tweets through retweeting this left the data very clouded with other information. As I started to filter my search to try solve such problems I did become cautious of a warning from Kate Sweetapple is that restricting your search the potential of what the data can tell you and the golden nuggets can be lost? Which through more reading of searches and refinement comparisons will hold the answer.
Some of the interesting finds from Sassy Little Hobbit originally from New Zealand showing one of the few examples of these misconception posted to twitter. As well as Sanjay Patel comment above commenting on the Kiwis press secretaries coffers to take more refugees, and clear disagreeance with this offer. this example shows the types of tweets I was seeking out. Overall though this number is few and much less than originally assumed.
Screen Shot 2016-09-07 at 1.36.31 am.png
Screen Shot 2016-09-06 at 10.19.26 pm.png
In this data set search illegal immigrants I used filters on the tweeters number of followers to above 10,000 and retweets above 300 as to start to understand what the main tweets that where gaining circulation where and if this was related to the tweeter follower base.


Screen Shot 2016-09-06 at 10.31.00 pm.png
The twitter search
Over my mastery of twitter bots has not yet been a success… As my main aim was about finding tweets which displayed incorrect statements around these issues I was unable to do so. Yes people would state weather they may agree or disagree with a point such as turning back boats or not allowing illegal immigrants but few took that step further say why they believed this or there fear behind this. Maybe my first mistake was too much assumption. I assumed to find answers along the lines of how boat people will take our jobs or threat to security. Weather my search outline or the fact that they are just not there is yet to be answer.
With this assumption falling short. I turn to the information I do have that could be utilised. Within this idea of language and the misuse of language. As people (within Australia) use language such as illegal immigrants mainly referring to asylum seekers. Could this bot be utilised to reply and correct these misconception which string from wrongly used language. Such as claiming asylum seekers are illegal immigrants is incorrect as asylum seekers fleeing in fear have the right to claim asylums and the right to do so by setting foot on Australian land, making these acts not illegal. It is although through such language posing them as illegal and therefore the action as wrong that such negative stigma is made towards these groups of people.
In response to this idea and the use of service design, utilising twitter bots applications to seek out these language misconception where the word illegal immigrants is used a twitter bot could be used to reply and correct such mistakes; Do you mean asylum seekers (user name)? People fleeing persecution are within there rights to seek asylum within Australia. Or another potential answer; Hey (username) did you know 9 out of 10 boat people are found to be genuine refugees and are therefore not illegal immigrants. The finesse of humour I do feel needs to be added to give more punch to the statement and gain more attention and reflection (but I was unable to channel my inner Robin Williams at the time of writing this).
In terms of visualising this data I think it would be interesting to visualize the flow of popular tweets focusing on specific topics of this issue. It would be interesting to map this flow from its source to retweeters and onto more retweets looking at the branches where the information spreads. By understanding the flow of information around the ‘twitter-sphere’ and the commonalities surrounding the original sources; weather these users always have more than 1000 followers or usually a news organisation, weather these help tweets gain a greater spread. This will help in understanding how and where information is bread. The aim behind this visualisation is trying to source who and how the rumour mill is fed.
%d bloggers like this: