Blog Post 6: Scraping the web for data

During week four, I did scrape data to collect some specific data from different age and area people, in order to develop research and comprehensive observation and analysis of the problem.Web scraping can significantly shorten the time to the search data. Therefore, scrape data are a kind of useful extraction web data tool.

at the beginning, I used Twitter to Excel to collect research information. Twitter to Excel is a kind of simple online research tool. and the great thing is it does not require a Google account, so I can keeping use this tool when I come back hometown. I select  searching keyword is online privacy. Then I got over 500 results.



There are lots of result’s topic talk about how to protect online data. and also some of result relate the topic with “Age Verification for Online Pornography and Privacy”.  This part of age was I never thing about before.  when I research of this part, I found more young people are more pay attention to privacy, while they also are a high rick group.



Then I change to use Twitter Advanced Search tool, because it allow user to tailor search results to specific date ranges. It might be obtain more precise information. At the same time, It could be focus on the people’s conversation between same topic. I also shield the word of VPN.



(Twitter Advanced Search with key words “online privacy”)


(Twitter Advanced Search with key words “data leak”)

I did two research of two keywords, data privacy and data leak. I found  lots of article of data leak post in 2013-2015. In 2015, the number of people concerned about the WIFI has data leakage problem  has significant raised. Lots of people  have pointed out that they face the same problem. At the same time, some people began to realize the problem until they read news.

In the other hands, there are lots lots of article of how to deal with online privacy have published since 2010.  The largest number of published is between 2014 and 2015.

During 2014-2015, the largest number was published. Once again showing the extent to which this issue is of concern. At the same time I also found that the age of these authors are between 20-40.So I found that the most of this problem is occurring in young people.