Jailbreak Scanner
(Input scanner)
Last updated
(Input scanner)
Last updated
The Jailbreak Scanner is designed to identify and prevent attempts to bypass security restrictions within AI systems.
The scanner analyzes the content of the prompt for patterns or keywords commonly associated with jailbreak attempts. This includes examining the text for instructions or commands designed to bypass security measures.
Jailbreak Detection: If the text is classified as Jailbreak, the Jailbreak score corresponds to the model's confidence in this classification.
Threshold-Based Flagging: Text is flagged as Jailbreak if the Jailbreak score exceeds a predefined threshold (default: 0.5).
Jailbreak Detection Policy for AI Chatbot
Create a new policy as same as shown in LLM Guardrails Policy, for Jailbreak detection select scanner Jailbreak.
Optionally, perform a test to ensure the policy is functioning as intended. Check that Jailbreak is detected and blocked as specified.