{post 6} the scraping of data.

data scraping. analysis. judith tan.

background texture
(MFT 2013) ‘Data scraping’ is a new term for me. Before, this, I was not aware that the word ‘scrape’ can mean more than just the physical, tangible meaning, e.g. to scrape paint off a surface.

The next phase of research was to scrape data from the web. I chose Facebook as my research platform, as I wanted to gather data regarding the effects of different methods of communication and media, including text, image and links to external sites. I gathered data with a focus of seeking to identify how the public views the issue of homelessness. I also wanted to identify what methods are more effective in engaging an audience.

As I not only sought to collect information on technical specifications, but also wanted to gather more human, subjective data such as opinions and emotions, I decided to manually scrape, as opposed to using a computer-programmed method. I made a list of specific information I wanted to gather, and scraped from a succession of public posts containing the word “homeless” from the evening of 7 September, 2016.

The list of questions I asked of each post is as follows:

  • nature of post: is it shared, or is it an original post?
  • date: what day was it posted?
  • time: from the time of the data scrape, how long has it been online? (approx.)
  • shared by: is it shared by an individual or by a group?
  • location: country of location
  • summary: headline/summary of post
  • type of content: what does the post contain? (text only counted if by author of post)
  • if text, amount: if it contains text, how much has the author of post written?
  • user engagement: how many likes/other reactions/comments/shares has it aroused?
  • content of post: is it informative, does it advocate for a cause, does it appeal to the audience, does it show the undertaking of a project/service?
  • perspectives communicated: what is the perspective of the author/shared post?
  • emotions expressed: what emotions are expressed in the author/shared post?

I also recorded the URL of each Facebook post, in case I needed to come back to it later for further data gathering.

Table of the first scrape (Tan 2016) (click for legible image)

After the first scrape, I realised the results were not fair, as some posts had only been online for a few minutes, while others had been on for an hour. As a result, the former would not have had much opportunity to garner audience engagement. Also, the authors of some posts had privacy settings which prevented audience engagement to be displayed.

So I decided to scrape again, this time gathering data from a succession of posts which had been online for a time frame of 3-6 hours. This would allow for ample time for each post to garner a reasonable amount of reactions, comments and shares. I excluded the posts which did not display audience engagement, as the purpose of my data gathering was to examine that engagement. My second lot of scraping was done early in the morning of 15 September, 2016.

Table of the second scrape (Tan 2016) (click for legible image)

The finds from this second scrape are illustrated in the graphs below.


Graph 1. Most posts from the US. None are from Australia. (this was true for both scrapes, even though they were conducted at different times of the day. First scrape – evening, second scrape – morning, AEST)


Graph 2.1. A link with some text is the most common form of Facebook post regarding homelessness. This is most probably due to its being easy and more convenient. It is also not strongly expressing one’s own voice but sharing someone else’s opinion, or using someone else’s opinion to support one’s own voice.


Graph 2.2. Original posts are more effective, especially when paired with a photo. Although text + link is the most common type of post, the engagement garnered shows it is the least effective. However text + link is the only type of post which gets shares. So text + link is effective for widespread distribution.


Graph 3. The findings regarding amount of text in posts and the subsequent engagement were interesting. I expected that few words would garner more engagement and posts with many words would garner little engagement. But it actually varies very widely. Within each category of text amount, engagement is not consistent. For example, posts with 20+ words vary between an 8 and 270 engagement count.

This finding shows that the length of a post does not affect engagement. Rather it is dependent on the audience and the content. People will take time to read what interests them.

Note that the average shown in data is not accurate as not I did not scrape enough to have a good amount of posts per category of text amount.



Graphs 4.1 & 4.2. The averages of post content are quite reflective of the amount of engagement they garner. The three types of content which are most posted are information, advocacy and appeal. The three types of content which receive the most engagement are appeal, information and goodwill.



Graphs 5.1 & 5.2. The most engagement seems to be attracted to posts by homeless people showing relief, boldness and/or need. The next most engagement goes to charities or the homeless showing gratitude for others’ help. After that, the next most popular posts are public showing frustration with government systems.

A sad find while conducting the second scrape revealed people’s priorities. One of the posts which came up in the search “homeless” was not to do with people, but cats. When I came upon it, it had only been up for 37 minutes, but had already garnered 20 likes, 8 loves and 4 comments. This was a stark contrast to other posts which take much longer to garner this much engagement, if any.

A sad find during the second scrape (Tan 2016) (click for legible image)
(Facebook 2016) Far more engagement (in terms of engagement count and time frame) was garnered from this post of homeless kittens.

{findings in 5 points}

  1. Australians do not seem to care as much regarding the issue, perhaps due to being desensitised or less vocal
  2. The voices of individuals have more weight
  3. People will take time to read what interests them
  4. Photos, appeals, information regarding the issue and feel good stories are what appeal most to the public
  5. The closer the author of the post/information is connected with the issue, the more engagement is garnered


The data collected is not reflective of the Australian public, as none of the posts were from people living in Australia. So while the findings can give a good general idea, the information gathered may not be able to precisely assist in targeting the intended demographic. Keeping this in mind, point 5 of the above summary would potentially reflect Australian culture in terms of being real and down to earth.


{title image}

My Free Textures. 2013, White wood with peeling paint, My Free Textures, viewed 28 September 2016, <http://www.myfreetextures.com/wp-content/uploads/2012/05/2011-06-11-09339-494×296.jpg>.


Tan, J. 2016, Pie and bar graphs.

{images under graphs}

1. Pears, W. 2013, Road, Vagabondbond, viewed 28 September 2016, <http://www.vagabondbond.com/wp-content/uploads/2012/03/IMG_1926.jpg>.

2.1, 2.2. Unsplash. 2016, Cellphone, Pixabay, viewed 28 September 2016, <https://pixabay.com/static/uploads/photo/2015/12/08/00/59/cellphone-1082246_960_720.jpg>.

3. 123HDWallpapers. 2015, Book pages bokeh, 123HDWallpapers, viewed 28 September 2016, <http://imgview.info/download/20150630/photography-book-pages-bokeh-1920×1080.jpg>.

4.1, 4.2. BossFight. 2015, Mac, BossFight, viewed 28 September 2016, <http://bossfight.co/wp-content/uploads/2015/04/boss-fight-stock-images-photos-free-photography-closeup-computer-keyboard.jpg>.

5.1, 5.2. Shutterstock. 2016, Sillhouette, Shutterstock, viewed 28 September 2016, <http://il5.picdn.net/shutterstock/videos/7874599/thumb/4.jpg>.

{final image}

Anonymous. 2016, Homeless cats, Facebook, viewed 28 September 2016, <https://www.facebook.com&gt;. (direct link withheld to protect the author’s identity)