Can LLMs assist humans in assessing online misogyny? Experiments with GPT-3.5
Contributo in Atti di convegno
Data di Pubblicazione:
2023
Abstract:
Today's social media landscape is flooded with unfiltered content, which can range from hate speech to cyberbullying and cyberstalking. As a result, locating and eliminating such toxic language presents a significant challenge and is an active current research area. In this paper we focus on detecting hate speech against women, i.e. misogyny, exploiting a “prompt-based learning” paradigm with the aim of providing a first assessment of recent developed LLM (OpenAI's GPT-3.5-turbo). We experiment with a benchmark dataset of Reddit posts and evaluate different prompts types w.r.t. response stability, classification accuracy and inter-annotator agreement. Our experiments show that zero-shot detection GPT capabilities - against human annotations - outperform supervised baselines on our evaluation dataset and that ensembling different prompts possibly further improve the accuracy up to 91%. We also found that responses to specific prompts is quite stable, while slightly more variation and less agreement is observed when asking the questions in different ways.
Tipologia CRIS:
4.1 Contributo in Atti di convegno
Keywords:
GPT; online misogyny detection; pre-trained language model; prompt-based learning; text classification
Elenco autori:
Morbidoni, C.; Sarra, A.
Link alla scheda completa:
Titolo del libro:
CEUR Workshop Proceedings
Pubblicato in: