OpenAI Adds Inline Moderation Scores to API Generation Requests

Q: Which OpenAI API endpoints support inline moderation scores?

OpenAI lists the Responses API and Chat Completions API for generated-content moderation.

Q: What moderation model does OpenAI show for inline moderation?

The documentation examples use `omni-moderation-latest` through `moderation.model`.

Q: Do moderation scores stream with each token?

No. OpenAI says streaming moderation scores arrive after the full generated output is available.

GitHub-generated Open Graph image for the OpenAI Python SDK repository.GitHub / OpenAI Python SDK

AI & AutomationJun 9, 2026

@ZachasAuthorADMIN

OpenAI now lets developers request moderation scores inside Responses API and Chat Completions generation calls, reducing the need for a separate moderation round trip while leaving policy decisions to the application.

OpenAI added inline moderation scores to the Responses API and Chat Completions API on June 4, 2026. Developers can pass a top-level moderation object in a generation request and receive moderation signals for both the input and generated output in the same response. The feature reduces separate moderation calls, but it does not automatically block content; the application still has to decide what to show, log, review, or stop.

Key takeaways

The new API path covers generated-content moderation for both Responses API and Chat Completions requests.
OpenAI's documentation says to set moderation.model, with omni-moderation-latest shown in the examples.
Responses API returns moderation data at response.moderation.input and response.moderation.output.
Chat Completions returns moderation containers at completion.moderation.input and completion.moderation.output.
Streaming apps must account for moderation scores arriving after the full generated output is available.

Practical LinkLoot angle

This is useful for production teams that already wrap LLM calls with pre- or post-generation safety checks. Inline scores can simplify request tracing because the generation result and moderation result live under the same API interaction. That helps when a support team needs to explain why a response was hidden, why a conversation was routed to review, or why a policy event appeared in the logs.

It is not a replacement for product-specific rules. OpenAI's docs frame moderation scores as policy signals, not final decisions. A customer-support bot, an internal coding assistant, and a public content generator may need different thresholds, escalation paths, and retention rules.

Use case	What inline moderation changes	Limitation to keep
Public chatbot	One generation call can return input and output safety signals	You still need thresholds and user-facing handling
Internal assistant	Easier logging for risky prompts and generated responses	Internal policy may allow context that public apps block
Streaming UI	Moderation can still be attached to the generation flow	Scores arrive after the complete output, not token by token
Tool-calling app	Tool-call arguments and tool outputs in conversation content can be covered	Tool names, descriptions, schemas, and response-format schemas are not covered

What to verify before you act

Check whether your current stack calls the standalone Moderation API before generation, after generation, or both. If you switch to inline moderation, test the response shape separately for Responses API and Chat Completions because the access paths differ. For streaming interfaces, verify whether your product can wait for full-output moderation before display, or whether you need a staged UI that marks unverified output until the final moderation result arrives.

Also review logging. Category scores can be useful for audit trails, but they may contain sensitive context about user input. Store only what your policy and retention model justify.

Source check

OpenAI's release notes confirm the June 4, 2026 API release and state that moderation scores were added to Responses API and Chat Completions generation requests. OpenAI's moderation guide explains the top-level moderation object, where to read input and output results, how streaming behaves, and which tool-call surfaces are covered. Production AI Institute independently summarizes the same release and frames the production impact as lower moderation-path complexity rather than automatic safety enforcement.

FAQ

Does OpenAI inline moderation block unsafe output automatically?

No. OpenAI says the model still generates normally; your application must review moderation results before display or downstream action.

Which OpenAI API endpoints support inline moderation scores?

What moderation model does OpenAI show for inline moderation?

Do moderation scores stream with each token?

For more production patterns around connected automations and policy-controlled AI systems, see LinkLoot's guide to AI workflow automation.

Sources & links

References, demos, and supporting links.

OpenAI release notesopenai.comPrimary OpenAI moderation documentationdevelopers.openai.com Production AI Institute analysisproductionai.institute