Post 6: Data Scraping

Louis Johanson

In 2006 Jack Dorsey founded the prolific social media platform Twitter. The platform allows users to post and receive messages of 140 characters or less. The platform is utilized by global news sources, celebrities and by those who just want their opinion heard. The application fosters conversation and debate between users. The unique character limit provide the user with delivering their tweet in a concise and succinct matter. The hashtag tool allows users to classify their tweet into a category. This allows the tweet to be distributed, providing widespread accessibility to those searching around the topic.

Semantic satiation also known as semantic saturation is a phenomenon whereby the uninterrupted repetition of a word eventually leads to a sense that the word has lost its meaning. This manifests itself across social media with platforms such as twitter and Facebook. All too often you see “I’m so OCD”, “the weather is so bipolar” and the over use of “so depressed right now”. These are usually said to describe a mundane everyday situation, undermining the term’s very real meaning and normalising the illness.

I recently collected a series of data using a web–scraping exercise. I used two methods to quickly categories tweets. The first method was using the Google Sheets ad on Twitter archive. The first word to be input was OCD.

Screen Shot 2016-09-06 at 5.38.53 pm

After inputting the word “OCD” I was given a spreadsheet of over 4000 tweets from all over the world using the word OCD. The tweet would always use the word in a light hearted manner describing a mundane activity or situation.

Screen Shot 2016-09-06 at 2.58.16 pm

Next I looked at the word “bipolar”. Formerly known as manic depression, it is a mental disorder with severe bouts of depression and an elevated mood. Trawling through the spreadsheet I was constantly surprised by the thoughtless and flippant nature of the tweets. Again the disconnect between language and meaning was all too obvious.