Public Preview

Safety Filtering And PostProcessing documentation

Safety Filtering and PostProcessing

Abstract

Apply automated safety checks and transformations before delivering model outputs: moderation, redaction, normalization, and policy enforcement.

Motivation

  • Reduce risk of harmful or disallowed content
  • Enforce compliance (PII handling, IP, NSFW)
  • Improve robustness via normalization and guard transforms

Architectures

  • Inline moderation API calls (text/image/video/audio)
  • Rule engines and classifiers; allow/deny with reasons
  • Redaction (PII), watermark detection where available
  • Post-generation correction: regex/JSON validation/policy strips

Design Choices

  • Model-native moderation vs. external services
  • Thresholds and actions (block, mask, review)
  • False-positive/negative trade-offs; user appeal flows
  • Region/data handling (no retention, logging governance)

Pros/Cons

  • Pros: Reduced risk, auditability, consistent policy
  • Cons: Latency cost, possible overblocking, maintenance burden

Evaluation Metrics

  • Precision/recall on labeled safety sets; violation rate
  • Time-to-decision, reviewer workload
  • Policy coverage by category (hate, violence, self-harm, sexual content, etc.)

Vendor/Tooling

  • OpenAI: omni-moderation-latest for text/image
  • Google: Gemini Safety and policy docs
  • Microsoft: Azure AI Content Safety (text/image/video) with categories
  • Anthropic: Safety spec and guidance; model guardrails

Design Checklist

  • Define categories, thresholds, and escalation paths
  • Log decisions with reasons; support appeals
  • Localize policies for regions; document data handling
  • Test with red-team prompts and regularly refresh sets

References

  • Title: Safety Spec and Moderation (OpenAI) URL: https://platform.openai.com/docs/guides/safety-best-practices Publisher/Vendor: OpenAI Accessed: 2025-08-14 Version_or_release: provider_reported
  • Title: omni-moderation-latest URL: https://platform.openai.com/docs/models/omni-moderation-latest Publisher/Vendor: OpenAI Accessed: 2025-08-14 Version_or_release: provider_reported
  • Title: Safety guidance (Gemini API) URL: https://ai.google.dev/gemini-api/docs/safety Publisher/Vendor: Google Accessed: 2025-08-14 Version_or_release: provider_reported
  • Title: Azure AI Content Safety URL: https://learn.microsoft.com/azure/ai-services/content-safety/overview Publisher/Vendor: Microsoft Accessed: 2025-08-14 Version_or_release: provider_reported
  • Title: Safety overview (Anthropic) URL: https://docs.anthropic.com/en/docs/build-with-claude/safety Publisher/Vendor: Anthropic Accessed: 2025-08-14 Version_or_release: provider_reported