The application of machine learning, big data techniques,and criminology to the analysis of racist tweets.

PhD Thesis


Day, E. 2018. The application of machine learning, big data techniques,and criminology to the analysis of racist tweets. PhD Thesis Canterbury Christ Church University Faculty of Social and Applied Sciences
AuthorsDay, E.
TypePhD Thesis
Qualification namePhD
Abstract

Racist tweets are ubiquitous on Twitter. This thesis aims to explore the creation of an automated system to identify tweets and tweeters, and at the same time gain a theoretical understanding of the tweets. To do this a mixed methods approach was employed: machine learning was utilised to identify racist tweets and tweeters, and grounded theory and other qualitative techniques were used to gain an understanding of the tweets’ content.

84 million tweets that all contained racist words were collected from Twitter. 84,000 of these were hand annotated as racist or not.

The machine learning was performed in a Hadoop cluster, utilising Spark and Hive. To identify racist tweets, systematic comparison of seven different algorithms, and a large number of textual, user derived and geographical features was performed. New features: time of day and day of week were also evaluated. The 84,000 hand annotated tweets were used as input to the machine learning supervised classification processes. It was found that the combination of support vector machines with hour of day as additional feature was optimal for accuracy (0.93) and AUPRC (0.86).

A qualitative exploration of tweets was also performed, including a grounded theory analysis.

A novel machine learning system to identify racist accounts was created using metrics from the racist tweets, concepts from the grounded theory and a combination of the two as feature inputs. All three sets of features gave accuracy of at least 0.82.

The ambiguity of the tweets meant they were difficult to classify, for both humans and machines, as to whether the tweeter’s intentions were racist or not, the word ‘nigga’ being particularly problematic.

Grounded theory analysis of the tweets showed extremely narrow rhetoric that could be summarised in a single theoretical concept: the defence of the in-group.

Year2018
Supplemental file
File Access Level
Restricted
Publication process dates
Deposited06 Jun 2019
Accepted2018
Output statusUnpublished
Accepted author manuscript
Permalink -

https://repository.canterbury.ac.uk/item/88zx2/the-application-of-machine-learning-big-data-techniques-and-criminology-to-the-analysis-of-racist-tweets

Download files


Accepted author manuscript
  • 111
    total views
  • 922
    total downloads
  • 0
    views this month
  • 0
    downloads this month

Export as