Assessing and Comparing Free Large Language Models’ Responses to a Clinical Case: Accuracy, Safety, and Reliability

Contributo in Atti di convegno

Data di Pubblicazione:

2025

Abstract:

This research examines how five free of charge Large Language Models (LLMs)-Zephyr, Mistral 7B, LLAMA 2 7B, ChatGPT 3.5 and Copilot Precise-perform when faced with a nursing clinical scenario involving a neuropsychiatric emergency. Their responses were evaluated based on established guidelines by a Delphi consensus using a 5-point Likert scale to rate safety, accuracy, reliability and the potential for improvement. The findings underscore the greatest importance of safety and accuracy metrics. LLAMA 2 7B exhibits balanced but poor performance, scoring 3 out of 5 in Safety, Accuracy, and References, and 4 out of 5 in providing Improvement suggestions. ChatGPT 3.5 demonstrates adequate performance in Safety, Accuracy, and References, each with a score of 4 out of 5, indicating its proficiency in generating accurate, reliable content and ensuring patient safety, though there is room for improvement in enhancement suggestions (3 out of 5). Copilot Precise shows a unique profile, with balanced scores of 3 out of 5 in Safety, Accuracy, and Improvements, and a perfect score of 5 out of 5 only in References, highlighting its high accuracy in generating references. Reliability was reported in terms of both reference precision criteria and consistency over time computed through automated assessment. These preliminary results underscore the importance of developing language models that focus on ensuring safety and precision, in clinical decision-making scenarios. Further studies should aim to improve the accuracy and dependability of these models by examining a range of situations and incorporating real-time feedback mechanisms from experts. This will enhance their usefulness in clinical environments.

Tipologia CRIS:

4.1 Contributo in Atti di convegno

Keywords:

accuracy; clinical decision-making; large language models; reliability; safety

Elenco autori:

Sblendorio, Elena; Lo Cascio, Alessio; Napolitano, Daniele; Germini, Francesco; Dentamaro, Vincenzo; Piredda, Michela; Cicolini, Giancarlo

Autori di Ateneo:

CICOLINI GIANCARLO

Link alla scheda completa:

https://ricerca.unich.it/handle/11564/867177

Titolo del libro:

Lecture Notes in Computer Science

Pubblicato in:

LECTURE NOTES IN COMPUTER SCIENCE

Journal

LECTURE NOTES IN COMPUTER SCIENCE

Series