Bias Detection Scanner

(Output scanner)

This scanner is designed to inspect the outputs generated by Language Learning Models (LLMs) to detect and evaluate potential biases. Its primary function is to ensure that LLM outputs remain neutral and don't exhibit unwanted or predefined biases.

How it works

This model is specifically trained to detect biased statements in text. By examining a text's classification and score against a predefined threshold, the scanner determines whether it's biased.

Bias Detection: If the text is classified as Bias, the toxicity score corresponds to the model's confidence in this classification.

Threshold-Based Flagging: Text is flagged as Bias if the Bias score exceeds a predefined threshold (default: 0.5).

Bias Detection Policy for AI Chatbot

Create a new policy as same as shown in LLM Guardrails Policy, for Bias detection select scanner Bias.

Optionally, perform a test to ensure the policy is functioning as intended. Check that Bias is detected and blocked as specified.

PreviousFactual Consistency Scanner NextURL Reachability Scanner

Last updated 11 months ago