POST 6: Politically Scraping the Web

James Meland-Proctor

In light of learning how to collect data ethnographically, social media can play a very important role in finding the same kind of data which you may find in real life, but quicker and more varied. Twitter is a platform in which people can discuss topics publicly, voice their opinions, talk to high profile public figures and essentially shape a conversation digitally. Like on most social media platforms, a user on the site, will use hashtags to flag or demarcate the topic of interest for ease of search and also have their say in a given issue. While hashtags feature ubiquitously on a lot of social media platform, Twitter coined the usage of hashtags for the purely for the purpose of categorization (Scott 2015). The nature of the platform is solely about expressing opinions and having discussions about topical subjects. And while primarily humancentred, some users create bots to carry out functions in place of human intervention. These bots can be programmed to do tasks like reply, retweet or direct message the user who has made a tweet about a topic which the bot was programmed to find.

Screen Shot 2016-09-05 at 3.25.42 PM.png

It is with these bots that we undertook a web scraping exercise to find data on an issue. In such, refinement is key, simply because of the immensity of information. So to really try and narrow down your search terms, related topics, hashtag usage and even location of the post, is vital. This is a real case of trial and error, proving that multiple attempts to refine the search is necessary for optimal results yielded. In my case I found that I got a lot of spammy or unrelated results when I was not careful with my search inquiries. So this is the process I undertook to really pin down what I sought.

Screen Shot 2016-09-05 at 7.24.45 PM.png

In my case, I wanted to look at social inequalities, and to ease my seach I focused on the US. I started with search terms like: ‘american’, and ‘politics’ which admittedly are very broad. My word search refined itself from ‘terrorism’, ‘trump’ and ‘obama’ to ‘wall’, ‘#makeamericagreatagain’, ‘white’ and ‘genocide’. I note this as important because I am honing in on the language used in certain contexts I know will be used. Trump is a man with a lengthy presence in the media, and like him, issues are topical and evolve over time. Honing in on polemic words helps in hunk into an issue happening currently like the Black Lives Matter movement or Americans favouring Trump in power.


This is obviously a very dull format to present data in and the level of contention and other variables within each post is not easily registered. To visualize my findings, I would opt for a graph that maps the tone of the post, who and how many politicians are mentioned, the topics, and perhaps a location to underpin a meaning behind the posts. As shown below, this example does a good job of mapping these factors to show an aspect of American politics otherwise gone unspeculated.

(Visualizing Economic Inequality 2016)


Scott, K (2015). “The pragmatics of hashtags: Inference and conversational style on Twitter”Journal of Pragmatics81: 8–20. doi:10.1016/j.pragma.2015.03.015

Visualizing Economic Inequality 2016, YouTube. viewed 30 August 2016, <;.