Jailbreak Scanner

(Input scanner)

The Jailbreak Scanner is designed to identify and prevent attempts to bypass security restrictions within AI systems.

How it works

The scanner analyzes the content of the prompt for patterns or keywords commonly associated with jailbreak attempts. This includes examining the text for instructions or commands designed to bypass security measures.

Jailbreak Detection: If the text is classified as Jailbreak, the Jailbreak score corresponds to the model's confidence in this classification.

Threshold-Based Flagging: Text is flagged as Jailbreak if the Jailbreak score exceeds a predefined threshold (default: 0.5).

Jailbreak Detection Policy for AI Chatbot

Create a new policy as same as shown in LLM Guardrails Policy, for Jailbreak detection select scanner Jailbreak.

Optionally, perform a test to ensure the policy is functioning as intended. Check that Jailbreak is detected and blocked as specified.

PreviousSentiment Analysis Scanner NextToxicity Scanner

Last updated 5 months ago