Post Six: Data Scraping

Data Scraping (Week 4)
Joy Li

What is twitter for?

Among the many Internet users who haven’t used this service til this particular blog task, it feels like an appropriate time to answer this longstanding question.

I thought that I would skip the part about the ‘information network made up of 140-characters’ common knowledge and try to explain it from my understanding.

It’s the year 2015.

Let’s say that you’re a relatively successful rapper, born and bred in Philadelphia. You recently dropped your second studio album, Dreams Worth More Than Money. Fans dig it. It debuts at number one on the Billboard 200 charts and sells over 200,000 copies in its first week. All is coming up for you.

At the same time you happen to be dating a famous musical artist. And it turns out this musical artist (in addition to being the greatest female rapper of our time, and also the pioneer for the sexualisation of reptiles) is caught up in a media sandstorm over an awards snub with a country turned pop artist – who later realised they misinterpreted your girlfriend’s comments as a direct attack on their own ego.

Out of nowhere, you decide you want a slice of this beef. Or in this case, your own cow.

So where would you go?

Luckily, the question isn’t rhetorical. You decide you want to start your beef with your friend, a musical artist who you are apparently sick of being compared to. And you are entitled to! After all, this ‘friend’ at the time didn’t do enough to promote your last album and you know as a fact that he doesn’t write his own lyrics. You are empowered to present your side of the story and to let everyone know.

In the olden days, you may have never had the opportunity to affront the loss of your status. And quite possibility had no platform via which you could voice your accusations. This is where a well-designed Internet product can be so satisfying…

and dangerous.

It’s public nature and fertile atmosphere for breeding pettiness makes it an acceptable forum for anyone to trade insults. But the extent and the joint fame levels of the people who got involved in your beef didn’t go down the way you hoped. Everyone got pissed (mostly at you).

As for the friend, who you questioned the legitimacy of his livelihood; he channelled his anger towards you into art and not comebacks. He released two diss tracks, with the second being nominated for a Grammy. Obviously to not lose face, you followed up with your own diss track, and everyone dissed it. The Internet did their part by providing us with a lifetime supply of hilarious memes. All from a single tweet.

And that’s what twitter is for.

Screen Shot 2016-09-04 at 10.49.55 PMFive point summary:

  1. Who knows what you really mean
  2. It is known that people can lose respect for you
  3. Your enemy’s diss track of you will be nominated for a Grammy
  4. A lifetime of memes will be spawned from your own actions
  5. Use Twitter at your own disgression

Data Scraping off Twitter

On another (more serious) note, the Internet and its related technologies have undoubtedly reconfigured all aspects of life, and mental health has not been immune to any of these changes. While the digital revolution has quickly improved the lives of individuals by facilitating communication and allowing instant access to information, the same reasons have contributed to the undesirable effects of this interaction with digital technologies. At the same time, a lot of subtleties and niceties of virtual interplay can be gleaned through observing the particular cultural, habits, communities and language that forms around the digital space.

Using the online social networking platform Twitter as a data bank of insights into individual lives, a data scraper was used as a means of filtering the vast amounts of tweets into discernable patterns and relevant details.


The data scraping process was a series of open ended questioning and refinements of different points of interest that had varying successes. The process requires identifying a point of interest within mental health and associating a particular behaviour to that position, revealing the potential constants that can be inputted into a data scraper.

Data Scraping: Process Flowchart

Initially, I was interested in looking at the subtleties hidden within the ways users communicated online. For instance, how the use of different punctuation symbols, capitalisation, emoticons and other acronyms that underscore how we interpret and give tonal qualities to our digital words.

This position does not stem from our ability to carefully string together characters, or that understanding that an exclamation mark carries more weight than a colon in our 140 character existence. Or that words suddenly feel hyperbolical because we’ve taken away all audible sense (totally?????!!!!!!). Whilst being true, it is as if these minute details found in the strangest spaces suddenly reveal more about the person on the other end than the words themselves (at least i think it does).

In theory, it seemed like a viable data scraping option. However in practice, it became apparent that the thousands of possible variables and individual traits were impossible to sort through on a large scale. But rather calls for deciphering individually on a tweet by tweet basis. Moreover, without specifying the relevant context or words that accompany the tweet, the data becomes ultimately meaningless.

This led me to investigate the other more tangible extreme that is a more obvious display of sentiment. The aim was to reveal simple reflections of individual’s shared emotions within a public domain.

Screen Shot 2016-09-04 at 4.22.27 PM
Attempt 1: Using ‘my emotions’ as the exact phrase and ‘today’ as a constant to retrieve all data pertaining to those words. Retweets (RT) and username @’s were excluded as the data warrants an individual to describe their own daily reflections without external intervention.

Results from this were varied. The recurrent theme throughout these tweets were of emotions being on a ‘rollercoaster’ (23 tweets), ‘everywhere’ (66 tweets), ‘all over the place'(78 tweets), ‘getting the best of me’ (14 tweets) or individual’s being ‘unable to control’ (33 tweets) their emotions. Some 43 others tweets were food related. Without much context, the vast majority feel either pessimistic or wildly frantic. Those with context, showed a contrastingly clear cause and effect rationale to their emotional state. Interestingly, by observing the usernames and names alone, females were assumed to be predominant gender within this date set who had expressed their thoughts. This insight hints at the longstanding stigma of unmanliness surrounding the expression of emotions as well as the stereotyped querulousness of women.

Screen Shot 2016-09-04 at 10.48.28 PM

Unaddressed from this data however are the types of emotions exhibited by individuals. To explore in more detail the specific data, a rethinking of the constants were required to point towards the direction of ’emotions’ but without explicitly using the term. Hence ‘feeling’ became the substitute word adopted and implemented into the scraper. The results from this round showed a considerably greater amount of tweets than than the first attempt, but with less relevant results.

Screen Shot 2016-09-04 at 12.17.29 PM
Attempt 2: Using ‘feeling like’ as the exact phrase as well as the constant ‘today’ to reveal daily insights into an individualised moods.

Turns out, ‘feeling like’ is generally associated with nouns rather and adjectives. Hence all the results were geared towards someone’s everyday actions or their personal likeness to human excrement, rather than their inner psychological states.

Screen Shot 2016-09-04 at 10.48.37 PM

After further refinement, ‘I am feeling…today’ happened to be the most successful of the three attempts.

Screen Shot 2016-09-04 at 9.08.30 PM
Attempt 3: Using ‘I am feeling’ as exact phrase personalises the tweet and sets up precedent for use with an adjective or describer.

The results were a vast expression of everyday joys, struggles, successes and anxieties that encapsulates the human experience. Interestingly, most tweets with this phrase were optimistic and showed a greater degree of sincerity, assertiveness and reflective thinking.

Screen Shot 2016-09-04 at 10.48.49 PM

Ironically, over a quarter of the tweets were expressed by a bot that posts the phrase ‘I am feeling ____ today’ with a random adjective as the blank. At times, the bot appears to hold a better grasp of human emotions than that of a real human.

Screen Shot 2016-09-04 at 10.48.59 PM

Yet beyond this lies a recurring motif that runs deeper than the pleasantries between friends. Social media has become an outlet for individuals to express both their frustrations and affirmations. One can imagine how the internet has fostered a community, who experience everyday emotional cycles like you and I, that gather within this pseudo-chatroom style experience that is twitter.

Potential Design Response

Feelings of the World/ A Moody World: Rough sketch of potential design response taking the emotional adjectives and geolocational information of each tweet and aggregating the collective tweets by location to determine the predominant mood of the country and its cities.


%d bloggers like this: