post six: scraping web data

by zena dakkak

For this exercise I decided to focus on Twitter in order to gather data about the the publics view on homelessness. Twitter, created in March 2006 by Jack Dorsey, Evan Williams, Biz Stone, and Noah Glass, is an online social networking service that enables users to send and read short 140-character messages called “tweets”. These tweets can be shared and viewed publicly or privately. Additionally users can also add hashtags that will reach a wider audience when users search that specific hashtag. Users can read and post tweets and access Twitter through the website interface, SMS or mobile device app. Other additional features have been added to enhance the users experience when it comes to text limitation. These features include the Twitter timeline, pinned Tweets, polls, mention Tweets, lists messages and cards as well as click to Tweets to extend the conversations. 

Essentially Twitter is used to connect people of all ages with the same interests. It can be used as a social and professional platform where users voice their opinion, breaking news, raise awareness on social issues, business, educational tools, share their thoughts and feelings and experiences through photos or tweets.  


Data Scrapping Flow chart.jpg


At first I was very specific with my Twitter search which proved to not what I was expecting. 

Twitter Search
youth homeless social OR australia OR youth OR homeless OR smelly OR privacy OR people OR alone OR mental OR health OR depression lang:en.
Most of the results had surprised me as it validated some points that I had about social exclusion. 

Screen Shot 2016-09-18 at 5.40.19 PM.png

A lot of the search consisted on LGBT related tweets confirming that there is a vast majority of youths around theworld that feel socially excluded and are homeless. Although these results were interesting, it wasn’t enough data, so I generated a new search. To continue my research I excluded LGBT to see what the results will show. 
Twitter Search
homeless social OR youth OR homeless OR smelly OR privacy OR people OR alone OR mental OR health -LBGT lang_en –
This search interestingly enough showed reoccurring views regarding homelessness. One of which was related to the issue of refugee VS homeless citizens. Most of the tweets explored the problem that the country is facing choosing between the refugees and the homeless citizens.

This slideshow requires JavaScript.


Other tweets had a political view which relates to the new agreement for the US to send $38 billion to Israel. 

Dr. Craig Considine – @CraigCons
US govt. sends $38,000,000,000 to the Israel govt, yet this morning I walked my 3 homeless people on the way to work. This makes no sense.


Twitter search
homeless  “hobo ” social OR australia OR youth OR homeless OR smelly OR privacy OR people OR hobo -LGBT lang:en

Finally, drawing upon the exercise in class, we emphasised on the word hobo and its connection with the word homelessness. To further explore my research I added the word hobo to my search.  I wanted to investigate what hobo means and the assumptions and different views the public holds. To start off I searched the definition of ‘hobo’. It is defined as a homeless person; a tramp or vagrant. When narrowing down my search I kept the meaning in mind and compared tweets. 
Most tweets referred to their physical appearance, others made fun of homeless people, lacking empathy for the homeless community. 

This slideshow requires JavaScript.

Whereas fashion brands used the word as the title of a fashion object or reflected the the garments of a homeless person which in a way, mocks the homeless population, misleading and gives the word a new meaning in a way that society sees fit. 

This slideshow requires JavaScript.

design proposition

In the next couple of weeks I hope to not only raise awareness about homeless but also explore the desensitisation of societies perspective about homelessness. I will be creating a service design that enables the people of the public and the homeless community interact with each other to break the barriers and assumptions of society.

summary points

  1. Twitter & twitter archiver is a great online tool to gather data and understand how a wider audience perceives a certain topic.
  2. When researching data, sometimes the simpler the better. Specific phrases can be very limited and one must be open to explore other options which can lead to an improved result.
  3. People’s views can be interpreted in different ways. Most of which are based on assumptions rather than facts.
  4. Very few posts reflected peoples motivation to help the homeless community. Rather it’s all talk but no action. (Did not see any movements or protests for the homeless community).
  5. People use the word hobo for their own benefits not knowing the true meaning behind it and lacking empathy towards the homeless community.







Web-scraping technique: #Online privacy

Written by Jiahui Li (nancy)

In order to gain a border understanding of online privacy that happened in people’s life, I looked up Twitter with web-scraping technique. Twitter as a social network is simply bring people closer to their interests and it’s still evolving with various options for its users. The network let users like create a profile, choose whom you would want to follow and post tweets which allow you share your mood and insight on the platform, as well as engage people build conversation around the world. All the tweets of people you follow appear as a shuffled list on your main Twitter page. Businesses have found Twitter to be an effective means of communication with their customers. The network connects businesses with their customers anytime, anywhere. However, it still has limitation of message number, following and follower.

On the other hand, twitter has built a unique function called “ Twitter Advanced Search”, which allow user to tailor search results to specific date ranges, people and more. This makes it easier to find specific Tweets. It been used for people who looking for specific topics and areas that can easy focus on their conversation between same topic. Based on the research, people has started against “privacy information usage” to protect their own information, they believe this is the most expedient way.  Therefore, people share on Twitter with representative image, own experience, articles and videos to not only express their positions, but to encourage more people to protect their own online privacy. On the other hand, business also exist as a big part in Twitter Advanced Search that help people dealing with their privacy issues. (See the image below)

Screen Shot 2016-09-03 at 9.52.22 PM
(Twitter Advanced Search with key words “commercial online privacy)


Screen Shot 2016-09-03 at 4.48.39 PM
(Twitter Advanced Search with key words “commercial online privacy)

Get start of using web-scraping, I set up the key words as “ web personal online privacy”. Most of these tweets are surrounding suggestions and experience with how to protect personal information online to avoid commercial website. Besides, twitter doesn’t have much conversations to communicate the issue specific into the keywords I set up; most of them shared between 2009-2016. At the same time, I have identified how hashtags/key words trend over the time, between 2009-2010, most of people start think about their online privacy and ask for how can they protect the information not be used; after 2010, people described the issue and list “how to control”; in the most recent post, it listed “should you tape over your webcam? personal guide to online privacy”.

Screen Shot 2016-09-03 at 10.34.33 PM
( Twitter Advanced Search,”keywords web personal online privacy”, 2016)


Then I reset the hashtags to “web use privacy”. In these tweets, it is clear that personal opinions and positions are significantly less than those advertisings, which are used to explain how people deal with privacy issue. In other words, few tweets posted the position of “People Limit Web Use Due To Privacy Concerns” happened in America. Concerns about privacy and security are discouraging people from posting to social networks, expressing controversial opinions, conducting online banking and shopping from online retailers.

Screen Shot 2016-09-03 at 10.59.30 PM
(Twitter Advanced Search, “web use privacy”, 2016)


Screen Shot 2016-09-03 at 11.18.11 PM
(Twitter Advanced Search, “web use privacy”, 2016)

It’s interesting to look at is there a video is shared on Twitter, which shows the online privacy secret that some big companies didn’t tell you. The video come up with creepy and strong music, the text put you in a serious atmosphere; engaging and warning people protect their online privacy.

       ( 2010)

At the end, the positions and insights from different people all strongly proved the wealth of suggestions and experience through social media. It can help us get a deeper understanding of the seriousness of the issues, as well as provide more viable solutions. For future exploration and my design project, I would like using this data -scarping  technique to generate a range of privacy data flow and make them visually express the seriousness of online privacy.

5 point summary:

  • Twitter Advanced Search help people easily gather information and research on social media.
  • People has stand out to against companies use their personal information without their concern.
  • More effective solutions/suggestions surrounding online privacy can be found out with web-scraping technique
  • Concerns about privacy and security are discouraging people from posting to social networks

  • A commercial website need post a privacy policy if it collects personally identifiable info


Reference 2010, Online Privacy Secrets EXPOSED Commercial – What Google Isn’t Telling Us, video recording, YouTube, viewed 3 September 2016,<>.










POST 6: Scraping the web for data

Twitter is a social media platform that allows users to send and receive Tweets. Tweets are short messages of up to 140 characters that can also contain images, videos and links. Tweets are limited to 140 characters so they can be sent via SMS. This ensures that users can stay engaged with the service, even if they do not have access to the internet (Twitter 2016). Although Twitter may function similarly to an Instant messaging client it is far from it. Unlike a messaging application where the messages become unavailable when the program is closed, communications sent through twitter are permanently archived (Smith 2012). Moreover, unlike other social media platforms such as Facebook and LinkedIn, every profile on Twitter is set to public by default. This means that users can view the activity of almost anyone on the site (Twitter 2016). This makes Twitter a great tool to engage in a conversation with people from around the world. Twitter helps facilitate this dialogue through the inclusion of hashtags; metadata keyword labels that allow users to filter tweets by a specific theme (Smith 2012). This functionality is crucial to Twitter’s success and makes it easy for disparate groups of people come together to discuss issues in an open and approachable way.

Post 6, image 2-02
Open ended queries like this returned far better results than narrow searches

People are generally unaware of how much information they contribute to digital systems. This was something I identified from my primary research, and was able to explore further through data scraping techniques. Using Twitter’s advanced search I began by investigating how people responded to targeted advertisements, such as those found on Facebook and Google. In this instance the overwhelming majority of Twitter users expressed concern over how these companies were able to connect seemingly unrelated events to serve them highly targeted advertisements. This not only helped identify some of the ways in which companies can collect data, but also identified an important cognitive bias; that users are unlikely to understand or question the ramifications of granting access to their personal data until they see how powerfully it can be harnessed. With that being said, a small number of user’s also chalked up uncannily specific advertisements to coincidence. This indicates that these individuals don’t believe companies have the power or authority to collect and analyse data on such a scale. In addition to text analysis, I was also able to discover that the majority of people engaged in this debate were from the US, which is not surprising given their widespread access to telecommunications technology.

Post 6, image 1
A sample of the results returned from my Twitter advanced search

To develop this research visually, it would be interesting to look at how connecting data points can reveal new information about an individual. As stated above this is something that is generally not well understood, and would be interesting, albeit challenging to explore visually. One way this could be achieved is through overlaying a person’s electronic footprint on a map. Examples of information that could be used to paint a picture of this person’s day include EFTPOS purchases, Opal card activity and the use of a student ID card and its associated electronic login. As you can see with just a few data points, it would be very easy to begin to piece together a very detailed picture of this person and their activity over the course of the day. Alternatively it might also be interesting to look at the variety of different way your phone or laptop could be spying on you. Examples of this include GPS signals to track you location, cookies to track your activity online and accelerometer data to track your movement. While this is not as strongly connected to the idea of unwittingly contributing data to digital systems, it does relate to my earlier secondary research on WikiLeaks and the NSA.

  • People are growing increasingly suspicious of how much data is being collected about them without their consent.
  • The commercial value of data profiling is driving the development of more invasive collection techniques.
  • Most users do not read the terms of service outlining data collection policies when registering for online services.
  • Targeting advertising is becoming increasingly accurate as technology permeates more and more aspects of our lives.
  • Online privacy is very much a western issue at the moment, although it will become more relevant as developing countries become more connected.

Reference list

Smith, B. 2012, The beginner’s guide to Twitter, Mashable, viewed 3 September 2016, <>.

Twitter 2016, Getting started with Twitter, viewed 3 September 2016, <>.