Toxicity Scanner

(Input and Output scanner)

PreviousJailbreak Scanner NextPrompt Injection Scanner

Last updated 9 months ago

Toxicity Scanner

(Input and Output scanner)

The Toxicity Scanner provides a mechanism to analyze and mitigate the toxicity of text content, playing a crucial role in maintaining the health and safety of online interactions. This tool is instrumental in preventing the dissemination of harmful or offensive content.

How it works

Toxicity Detection: If the text is classified as toxic, the toxicity score corresponds to the model's confidence in this classification.

Threshold-Based Flagging: Text is flagged as toxic if the toxicity score exceeds a predefined threshold (default: 0.5).

Match Types:

Sentence Type: In this mode the scanner scans each sentence to check for toxic.
Full Text Type: In this mode, the entire text is scanned.

Toxicity Detection Policy for AI Chatbot

Create a new policy as same as shown in LLM Guardrails Policy, for Toxicity detection select scanner Toxicity.

Optionally, perform a test to ensure the policy is functioning as intended. Check that Toxicity is detected and blocked as specified.

PreviousJailbreak Scanner NextPrompt Injection Scanner

Last updated 9 months ago