The application of machine learning, big data techniques,and criminology to the analysis of racist tweets.

PhD Thesis

Day, E. 2018. The application of machine learning, big data techniques,and criminology to the analysis of racist tweets. PhD Thesis Canterbury Christ Church University Faculty of Social and Applied Sciences

Publication process dates
Authors	Day, E.
Type	PhD Thesis
Qualification name	PhD
Abstract	Racist tweets are ubiquitous on Twitter. This thesis aims to explore the creation of an automated system to identify tweets and tweeters, and at the same time gain a theoretical understanding of the tweets. To do this a mixed methods approach was employed: machine learning was utilised to identify racist tweets and tweeters, and grounded theory and other qualitative techniques were used to gain an understanding of the tweets’ content. 84 million tweets that all contained racist words were collected from Twitter. 84,000 of these were hand annotated as racist or not. The machine learning was performed in a Hadoop cluster, utilising Spark and Hive. To identify racist tweets, systematic comparison of seven different algorithms, and a large number of textual, user derived and geographical features was performed. New features: time of day and day of week were also evaluated. The 84,000 hand annotated tweets were used as input to the machine learning supervised classification processes. It was found that the combination of support vector machines with hour of day as additional feature was optimal for accuracy (0.93) and AUPRC (0.86). A qualitative exploration of tweets was also performed, including a grounded theory analysis. A novel machine learning system to identify racist accounts was created using metrics from the racist tweets, concepts from the grounded theory and a combination of the two as feature inputs. All three sets of features gave accuracy of at least 0.82. The ambiguity of the tweets meant they were difficult to classify, for both humans and machines, as to whether the tweeter’s intentions were racist or not, the word ‘nigga’ being particularly problematic. Grounded theory analysis of the tweets showed extremely narrow rhetoric that could be summarised in a single theoretical concept: the defence of the in-group.
Year	2018
Supplemental file	File Access Level Restricted
Deposited	06 Jun 2019
Accepted	2018
Output status	Unpublished
Accepted author manuscript	Final thesis (002).pdf

Permalink -

https://repository.canterbury.ac.uk/item/88zx2/the-application-of-machine-learning-big-data-techniques-and-criminology-to-the-analysis-of-racist-tweets

Download files

Accepted author manuscript

Final thesis (002).pdf

117
total views
947
total downloads
1
views this month
5
downloads this month

The application of machine learning, big data techniques,and criminology to the analysis of racist tweets.

Download files

Accepted author manuscript

117

947

1

5

Export as