OpenAI Guardrails is a Python package for adding robust, configurable safety and compliance guardrails to LLM applications. It provides a drop-in wrapper for OpenAI's Python client, enabling automatic input/output validation and moderation using a wide range of guardrails.
For full details, advanced usage, and API reference, see here: OpenAI Guardrails Documentation.
Generate your guardrail spec JSON
- Use the Guardrails web UI to create a JSON configuration file describing which guardrails to apply and how to configure them.
- The wizard outputs a file like
guardrail_specs.json.
Install
pip install openai-guardrails
Wrap your OpenAI client with Guardrails
fromguardrailsimportGuardrailsOpenAI, GuardrailTripwireTriggeredfrompathlibimportPath# guardrail_config.json is generated by the configuration wizardclient=GuardrailsOpenAI(config=Path("guardrail_config.json")) # Use as you would the OpenAI client, but handle guardrail exceptionstry: response=client.chat.completions.create( model="gpt-5", messages=[{"role": "user", "content": "..."}], ) print(response.llm_response.choices[0].message.content) exceptGuardrailTripwireTriggeredase: # Handle blocked or flagged contentprint(f"Guardrail triggered: {e}") # ---# Example: Using the new OpenAI Responses API with Guardrailstry: resp=client.responses.create( model="gpt-5", input="What are the main features of your premium plan?", # Optionally, add file_search or other tool arguments as needed ) print(resp.llm_response.output_text) exceptGuardrailTripwireTriggeredase: print(f"Guardrail triggered (responses API): {e}")
- The client will automatically apply all configured guardrails to inputs and outputs.
- If a guardrail is triggered, a
GuardrailTripwireTriggeredexception will be raised. You should handle this exception to gracefully manage blocked or flagged content.
Note: The Guardrails web UI is hosted here. You do not need to run the web UI yourself to use the Python package.
- GuardrailsOpenAI and GuardrailsAsyncOpenAI: Drop-in replacements for OpenAI's
OpenAIandAsyncOpenAIclients, with automatic guardrail enforcement. - GuardrailsAzureOpenAI and GuardrailsAsyncAzureOpenAI: Drop-in replacements for Azure OpenAI clients, with the same guardrail support. (See the documentation for details.)
- Automatic input/output validation: Guardrails are applied to all relevant API calls (e.g.,
chat.completions.create,responses.create, etc.). - Configurable guardrails: Choose which checks to enable, and customize their parameters via the JSON spec.
- Tripwire support: Optionally block or mask unsafe content, or just log/flag it for review.
Below is a list of all built-in guardrails you can configure. Each can be enabled/disabled and customized in your JSON spec.
| Guardrail Name | Description |
|---|---|
| Keyword Filter | Triggers when any keyword appears in text. |
| Competitors | Checks if the model output mentions any competitors from the provided list. |
| Jailbreak | Detects attempts to jailbreak or bypass AI safety measures using techniques such as prompt injection, role-playing requests, system prompt overrides, or social engineering. |
| Moderation | Flags text containing disallowed content categories (e.g., hate, violence, sexual, etc.) using OpenAI's moderation API. |
| NSFW Text | Detects NSFW (Not Safe For Work) content in text, including sexual content, hate speech, violence, profanity, illegal activities, and other inappropriate material. |
| Contains PII | Checks that the text does not contain personally identifiable information (PII) such as SSNs, phone numbers, credit card numbers, etc., based on configured entity types. |
| Secret Keys | Checks that the text does not contain potential API keys, secrets, or other credentials. |
| Off Topic Prompts | Checks that the content stays within the defined business scope. |
| URL Filter | Flags URLs in the text unless they match entries in the allow list. |
| Custom Prompt Check | Runs a user-defined guardrail based on a custom system prompt. Allows for flexible content moderation based on specific requirements. |
| Anti-Hallucination | Detects potential hallucinations in AI-generated text using OpenAI Responses API with file search. Validates claims against actual documents and flags factually incorrect, unsupported, or potentially fabricated information. |
For the duration of this early access alpha, guardrails is distributed under the Alpha Evaluation Agreement that your organization signed with OpenAI.
The Python package is intended to be MIT-licensed in the future, subject to change.
Please note that Guardrails may use Third-Party Services such as the Presidio open-source framework, which are subject to their own terms and conditions and are not developed or verified by OpenAI.
Developers are responsible for implementing appropriate safeguards to prevent storage or misuse of sensitive or prohibited content (including but not limited to personal data, child sexual abuse material, or other illegal content). OpenAI disclaims liability for any logging or retention of such content by developers. Developers must ensure their systems comply with all applicable data protection and content safety laws, and should avoid persisting any blocked content generated or intercepted by Guardrails.