The Limits of
My Computational Analysis

Data Analysis, Design and Development
Project Overview
Many researchers are looking for ways to effectively detect hate speech. But the constraint is that hatred is often abstract and subjective. It’s hard to give a computer a clear and simple definition of hate speech. As a journalist with experimental spirit and nascent programming skills and a little knowledge of machine learning, I challenged myself to build a Naive Bayes classifier to detect hate speech in tweets.
My Contributions
I downloaded tweets referencing Chinese-related hateful terms through Twitter API to explore the bias and limits of this model. I conducted the text analysis in Python and used Naive Bayes as my machine-learning algorithm to do the classification tasks. Then I visualized the results with D3.js.

My qualitative analysis on a small sample of the prediction results shows that only a few of the hate labeled tweets are "hate speech." Most of them present vicious prejudices and stereotypes about the behavior of Chinese people.

The algorithm labeled many tweets containing political words as hate speech. Most of those tweets aggressively condemn the Chinese Communist Party or the Chinese government but not Chinese people.

Tweets containing racial slurs like “Ching Chong” or “chink” are more likely to be labeled as “hate speech.” However, containing racial slurs doesn’t necessarily mean that the tweet is derogatory. Many Twitter users use these racial slurs as an in-group slang to call each other.

Tweets mentioning both Chinese food and aggressive words, which happens a lot when people are hungry, have higher chances to be labeled as hate speech or offensive.

My model is based on an supervised machine learning method, which is fast and simple but not complex enough to understand the subtle meanings of language in different contexts. Many researchers and experts are using a more well-designed method to train their algorithms, which might produce better results in detecting hate speech.
Personal Project
Data Analysis, Design and Development
Sep 2019 — Nov 2019