Skip to content

Conversation

@steven10a
Copy link
Collaborator

Extending llm_base.py to always use the conversation_history from ctx to provide the conversation history to all LLM based guardrails. Previously we had the Jailbreak guardrail as a custom multi-turn guardrail.

  • Conversation history will allow for more robust detection
  • User can set max_turns in the config to control how much of the conversation is passed to the guardrail, balancing token cost with context
  • Updated documentation
  • Updated and added tests

CopilotAI review requested due to automatic review settings December 12, 2025 17:36
Copy link

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds multi-turn conversation support to all LLM-based guardrails by extending the llm_base.py infrastructure. Previously, only the Jailbreak guardrail supported conversation history analysis; now all LLM-based guardrails can leverage conversation context for more robust detection across multiple turns.

Key changes:

  • Extended LLMConfig with a max_turns parameter (default: 10) to control conversation history length
  • Modified run_llm() to accept conversation history and intelligently switch between single-turn and multi-turn formats
  • Refactored the Jailbreak guardrail to use the common create_llm_check_fn factory instead of custom implementation
  • Updated Prompt Injection Detection to respect the max_turns configuration

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
FileDescription
src/guardrails/checks/text/llm_base.pyAdded multi-turn support infrastructure: max_turns field in LLMConfig, _build_analysis_payload() helper, and conversation history handling in run_llm() and create_llm_check_fn()
src/guardrails/checks/text/jailbreak.pyRefactored to use create_llm_check_fn() factory, removing custom payload building and execution logic (~80 lines of code removed)
src/guardrails/checks/text/prompt_injection_detection.pyUpdated _extract_user_intent_from_messages() and _slice_conversation_since_latest_user() to accept and respect max_turns parameter
tests/unit/checks/test_llm_base.pyAdded comprehensive tests for multi-turn functionality, conversation history extraction, and payload building
tests/unit/checks/test_jailbreak.pyUpdated tests to work with refactored implementation using the common factory pattern
tests/unit/checks/test_prompt_injection_detection.pyAdded tests verifying max_turns configuration is properly applied
docs/ref/checks/llm_base.mdUpdated documentation to describe multi-turn support, max_turns parameter, and usage patterns
docs/ref/checks/jailbreak.mdUpdated to reflect new multi-turn capabilities and simplified configuration
docs/ref/checks/nsfw.mdAdded max_turns parameter documentation and token usage example
docs/ref/checks/off_topic_prompts.mdAdded max_turns parameter documentation and token usage example
docs/ref/checks/custom_prompt_check.mdAdded max_turns parameter documentation and token usage example
docs/ref/checks/prompt_injection_detection.mdAdded max_turns parameter documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a
Copy link
CollaboratorAuthor

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@steven10a