Scraping the web for data [POST 6]

In order to better understand the complexity surrounding big data and online privacy, I performed a research task using Google as a data collection tool. The aims were to find out how many online articles over the last 12 months discussed the subject ‘online privacy’, who published them and if the articles’ publication dates collided with any major event related to big data.

In the Google’s advanced search page, I set up my searching criteria in a way that any online article published over the last year containing the words ‘online privacy’ would show up. I also filtered my search to only display results of Australian websites and in English.

Screen Shot 2016-09-06 at 6.06.15 pm

The list bellow reflects the search results.

Screen Shot 2016-09-06 at 5.53.37 pm

From that data, I have created two graphs that show both the number of articles published in each month and year (graph 1) and the websites that published most frequently (graph 2).

Screen Shot 2016-09-06 at 7.35.08 pm
graph 1
graph 2


  1. Out of the 93 articles showed in my search results, 92 were raising concerns about the collection of data or positioning against it.
  2. A similar number of online articles have been published in the last 4 months of 2015 and in the first 8 months of 2016.
  3. The beginning of data retention laws last October has influenced the number of articles written.
  4. According to Google, The Conversation has been the website that most published about online privacy.
  5. The result of my search, again, concerns me about what information is available to us when using Google. It is hard to believe that not even 100 online articles have been written in a year that so much happened in Australia related to big data and online privacy. As mentioned in previous blog posts, Google is definitely one of the companies that have big interest in the collection of data on the population.
%d bloggers like this: