The Art of Defending

A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness

The paper presents the Safety and Over-Defensiveness Evaluation (SODE) benchmark, which assesses Large Language Models’ (LLMs) defense strategies against safety concerns. It systematically evaluates and compares various strategies, revealing key findings like the trade-off between safety improvement and over-defensiveness in self-checking techniques, and the effectiveness of safety instructions in reducing... [Read More]
Tags: NLP LLMs Safety AI