Blog 6: Scraping the web for data

In order to have a good understand of how climate change and the human influence each other, we not only need to know some inference from the academic articles and the report from professional association, but also need to find some statement and opinion from normal people.

During the class in week 5, we have been taught how to collect data from social media. One example is Twitter, which is an online social networking service that enables users to send and read short 140-character messages called “tweets”. Registered users can read and post tweets, but those who are unregistered can only read them. Users access Twitter through the website interface, SMS or mobile device app.

Firstly, Twitter is like a micro-blogging, It’s easy to make a quick tweet telling the world what you are doing, how good your morning coffee tastes or how bad your lunch went. Secondly, you can ask any sorts of questions in twitter and the more friends you have, the more detailed answers you will receive. Even the small question such as what people thinks of a particular brand of baby food. Thirdly, you can keeping up with the news and become involved with politics. In twitter, you can show your own opinion in any questions and discuss with the other people.


In the class, we learned how to use Twitter Archiver to collect data. It needs to create a search rule and find the tweets depend on the keyword you set. I used the climate change and water scarcity as the key words and it came out around ten thousand tweets. In the excel table, it shows the date, which author post, authors’ name, tweet text, the app that author used to post this tweet, even the followers and follows of the author. The things I found is more useful for me are the table tells me is this tweet verified or not. It can give me a point that the statement in the tweets, which are not verified just can be a reference but not the evidence.

This slideshow requires JavaScript.

In the excel table, you can also find how many people retweets the tweet you are looking for and how many people put the tweet as they favorite. These information all can give you a direction to help you choose the tweet is more helpful for you.


There is the other tool to collect data from Twitter, which is more easy and clear for me to use. The tool named Datapipeline, and one thing it is better than the Twitter Archiver is the Datapipeline can search several times for different key words, but Twitter Archiver only can create one search rule, if you want to search more, you need to pay for that.


This tool is more detail and you can see the top retweets, top favorites, top hashtags, top mentions, top URLs and top Users as well. Also, if you want to see more, you can download the excel table, there is more information display. One thing I like for this excel table is that it will show one line, which is talk about is the retweet true or false. It can give people a perception of the topic.


  • Some people have lack awareness of saving water, especially in the holiday. People will have more free time during the holiday, so those who have bought expensive diesel SUVs through foreign founds, needs that water to wash their SUVs everyday and some seculars who care so much for animals, need so much water to wash animal during secular festival (Guru 2016).
  • Social media is one of a quick way to find the information you want to search.
  • Sometimes with the help of the social media, we can find more opinions from different people who are using such as Facebook and Twitter, even the student. We can find different angles for the same topic and discuss with others through the Internet.
  • One problem when people do the researching from the social media, you will not know the information is right or wrong, even some of them is just a guess from people.
  • One of the main advantages of Twitter is its reach. It boasts 200m active users (the other 300m are lurkers), and provides a platform for more than 400m tweets every day. I’d be the first to admit that 399m of those are drivel, but the remainder comprises a rich source of information to the hungry researcher. It’s also now a valid academic source, with most major referencing styles including a format for tweets (Catherine 2014).




%d bloggers like this: