Skip to content

openai/openai-guardrails-python

Repository files navigation

OpenAI Guardrails

Overview

OpenAI Guardrails is a Python package for adding robust, configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of guardrails.

Documentation

For full details, advanced usage, and API reference, see here: OpenAI Guardrails Documentation.

Quick Start: Using OpenAI Guardrails (Python)

  1. Generate your guardrail spec JSON

    • Use the Guardrails web UI to create a JSON configuration file describing which guardrails to apply and how to configure them.
    • The wizard outputs a file like guardrail_specs.json.
  2. Install

    pip install openai-guardrails
  3. Wrap your OpenAI client with Guardrails

    fromguardrailsimportGuardrailsOpenAI, GuardrailTripwireTriggeredfrompathlibimportPath# guardrail_config.json is generated by the configuration wizardclient=GuardrailsOpenAI(config=Path("guardrail_config.json")) # Use as you would the OpenAI client, but handle guardrail exceptionstry: response=client.chat.completions.create( model="gpt-5", messages=[{"role": "user", "content": "..."}], ) print(response.llm_response.choices[0].message.content) exceptGuardrailTripwireTriggeredase: # Handle blocked or flagged contentprint(f"Guardrail triggered: {e}") # ---# Example: Using the new OpenAI Responses API with Guardrailstry: resp=client.responses.create( model="gpt-5", input="What are the main features of your premium plan?", # Optionally, add file_search or other tool arguments as needed ) print(resp.llm_response.output_text) exceptGuardrailTripwireTriggeredase: print(f"Guardrail triggered (responses API): {e}")
    • The client will automatically apply all configured guardrails to inputs and outputs.
    • If a guardrail is triggered, a GuardrailTripwireTriggered exception will be raised. You should handle this exception to gracefully manage blocked or flagged content.

Note: The Guardrails web UI is hosted here. You do not need to run the web UI yourself to use the Python package.


What Does the Python Package Provide?

  • GuardrailsOpenAI and GuardrailsAsyncOpenAI: Drop-in replacements for OpenAI's OpenAI and AsyncOpenAI clients, with automatic guardrail enforcement.
  • GuardrailsAzureOpenAI and GuardrailsAsyncAzureOpenAI: Drop-in replacements for Azure OpenAI clients, with the same guardrail support. (See the documentation for details.)
  • Automatic input/output validation: Guardrails are applied to all relevant API calls (e.g., chat.completions.create, responses.create, etc.).
  • Configurable guardrails: Choose which checks to enable, and customize their parameters via the JSON spec.
  • Tripwire support: Optionally block or mask unsafe content, or just log/flag it for review.

Available Guardrails

Below is a list of all built-in guardrails you can configure. Each can be enabled/disabled and customized in your JSON spec.

Guardrail NameDescription
Keyword FilterTriggers when any keyword appears in text.
CompetitorsChecks if the model output mentions any competitors from the provided list.
JailbreakDetects attempts to jailbreak or bypass AI safety measures using techniques such as prompt injection, role-playing requests, system prompt overrides, or social engineering.
ModerationFlags text containing disallowed content categories (e.g., hate, violence, sexual, etc.) using OpenAI's moderation API.
NSFW TextDetects NSFW (Not Safe For Work) content in text, including sexual content, hate speech, violence, profanity, illegal activities, and other inappropriate material.
Contains PIIChecks that the text does not contain personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, etc., based on configured entity types.
Secret KeysChecks that the text does not contain potential API keys, secrets, or other credentials.
Off Topic PromptsChecks that the content stays within the defined business scope.
URL FilterFlags URLs in the text unless they match entries in the allow list.
Custom Prompt CheckRuns a user-defined guardrail based on a custom system prompt. Allows for flexible content moderation based on specific requirements.
Anti-HallucinationDetects potential hallucinations in AI-generated text using OpenAI Responses API with file search. Validates claims against actual documents and flags factually incorrect, unsupported, or potentially fabricated information.

License

For the duration of this early access alpha, guardrails is distributed under the Alpha Evaluation Agreement that your organization signed with OpenAI.

The Python package is intended to be MIT-licensed in the future, subject to change.

Disclaimers

Please note that Guardrails may use Third-Party Services such as the Presidio open-source framework, which are subject to their own terms and conditions and are not developed or verified by OpenAI.

Developers are responsible for implementing appropriate safeguards to prevent storage or misuse of sensitive or prohibited content (including but not limited to personal data, child sexual abuse material, or other illegal content). OpenAI disclaims liability for any logging or retention of such content by developers. Developers must ensure their systems comply with all applicable data protection and content safety laws, and should avoid persisting any blocked content generated or intercepted by Guardrails.

About

OpenAI Guardrails - Python

Resources

License

Security policy

Stars

Watchers

Forks