Can LLMs assist humans in assessing online misogyny? Experiments with GPT-3.5

Conference Paper

Publication Date:

2023

abstract:

Today's social media landscape is flooded with unfiltered content, which can range from hate speech to cyberbullying and cyberstalking. As a result, locating and eliminating such toxic language presents a significant challenge and is an active current research area. In this paper we focus on detecting hate speech against women, i.e. misogyny, exploiting a “prompt-based learning” paradigm with the aim of providing a first assessment of recent developed LLM (OpenAI's GPT-3.5-turbo). We experiment with a benchmark dataset of Reddit posts and evaluate different prompts types w.r.t. response stability, classification accuracy and inter-annotator agreement. Our experiments show that zero-shot detection GPT capabilities - against human annotations - outperform supervised baselines on our evaluation dataset and that ensembling different prompts possibly further improve the accuracy up to 91%. We also found that responses to specific prompts is quite stable, while slightly more variation and less agreement is observed when asking the questions in different ways.

Iris type:

4.1 Contributo in Atti di convegno

Keywords:

GPT; online misogyny detection; pre-trained language model; prompt-based learning; text classification

List of contributors:

Morbidoni, C.; Sarra, A.

Authors of the University:

MORBIDONI Christian

SARRA Annalina

Handle:

https://ricerca.unich.it/handle/11564/822522

Book title:

CEUR Workshop Proceedings

Published in:

CEUR WORKSHOP PROCEEDINGS

Journal

CEUR WORKSHOP PROCEEDINGS

Series