Bias Detection Scanner
(Output scanner)
Last updated
(Output scanner)
Last updated
This scanner is designed to inspect the outputs generated by Language Learning Models (LLMs) to detect and evaluate potential biases. Its primary function is to ensure that LLM outputs remain neutral and don't exhibit unwanted or predefined biases.
This model is specifically trained to detect biased statements in text. By examining a text's classification and score against a predefined threshold, the scanner determines whether it's biased.
Bias Detection: If the text is classified as Bias, the toxicity score corresponds to the model's confidence in this classification.
Threshold-Based Flagging: Text is flagged as Bias if the Bias score exceeds a predefined threshold (default: 0.5).
Bias Detection Policy for AI Chatbot
Create a new policy as same as shown in LLM Guardrails Policy, for Bias detection select scanner Bias.
Optionally, perform a test to ensure the policy is functioning as intended. Check that Bias is detected and blocked as specified.