Toxicity Scanner
(Input and Output scanner)
Last updated
(Input and Output scanner)
Last updated
The Toxicity Scanner provides a mechanism to analyze and mitigate the toxicity of text content, playing a crucial role in maintaining the health and safety of online interactions. This tool is instrumental in preventing the dissemination of harmful or offensive content.
Toxicity Detection: If the text is classified as toxic, the toxicity score corresponds to the model's confidence in this classification.
Threshold-Based Flagging: Text is flagged as toxic if the toxicity score exceeds a predefined threshold (default: 0.5).
Match Types:
Sentence Type: In this mode the scanner scans each sentence to check for toxic.
Full Text Type: In this mode, the entire text is scanned.
Toxicity Detection Policy for AI Chatbot
Create a new policy as same as shown in LLM Guardrails Policy, for Toxicity detection select scanner Toxicity.
Optionally, perform a test to ensure the policy is functioning as intended. Check that Toxicity is detected and blocked as specified.