TokenRouter Docs

Public Preview

Safety Filtering And PostProcessing documentation

Safety Filtering and PostProcessing

Abstract

Apply automated safety checks and transformations before delivering model outputs: moderation, redaction, normalization, and policy enforcement.

Motivation

Reduce risk of harmful or disallowed content
Enforce compliance (PII handling, IP, NSFW)
Improve robustness via normalization and guard transforms

Architectures

Inline moderation API calls (text/image/video/audio)
Rule engines and classifiers; allow/deny with reasons
Redaction (PII), watermark detection where available
Post-generation correction: regex/JSON validation/policy strips

Design Choices

Model-native moderation vs. external services
Thresholds and actions (block, mask, review)
False-positive/negative trade-offs; user appeal flows
Region/data handling (no retention, logging governance)

Pros/Cons

Pros: Reduced risk, auditability, consistent policy
Cons: Latency cost, possible overblocking, maintenance burden

Evaluation Metrics

Precision/recall on labeled safety sets; violation rate
Time-to-decision, reviewer workload
Policy coverage by category (hate, violence, self-harm, sexual content, etc.)

Vendor/Tooling

OpenAI: omni-moderation-latest for text/image
Google: Gemini Safety and policy docs
Microsoft: Azure AI Content Safety (text/image/video) with categories
Anthropic: Safety spec and guidance; model guardrails

Design Checklist

Define categories, thresholds, and escalation paths
Log decisions with reasons; support appeals
Localize policies for regions; document data handling
Test with red-team prompts and regularly refresh sets

References

Title: Safety Spec and Moderation (OpenAI) URL: https://platform.openai.com/docs/guides/safety-best-practices Publisher/Vendor: OpenAI Accessed: 2025-08-14 Version_or_release: provider_reported
Title: omni-moderation-latest URL: https://platform.openai.com/docs/models/omni-moderation-latest Publisher/Vendor: OpenAI Accessed: 2025-08-14 Version_or_release: provider_reported
Title: Safety guidance (Gemini API) URL: https://ai.google.dev/gemini-api/docs/safety Publisher/Vendor: Google Accessed: 2025-08-14 Version_or_release: provider_reported
Title: Azure AI Content Safety URL: https://learn.microsoft.com/azure/ai-services/content-safety/overview Publisher/Vendor: Microsoft Accessed: 2025-08-14 Version_or_release: provider_reported
Title: Safety overview (Anthropic) URL: https://docs.anthropic.com/en/docs/build-with-claude/safety Publisher/Vendor: Anthropic Accessed: 2025-08-14 Version_or_release: provider_reported