Module 02 · ContentOS Engine
The content factory. Every step measured, every variant judged.
Three variants from independent LLM chains. An LLM-judge tournament picks the winner on 7 axes. AEO/GEO score-driven refine. Knowledge-graph factcheck. Schema markup generated, not retrofitted. Brief to publish-ready in 8-15 minutes per piece on Scale.
Optimized for
- ChatGPT
- Claude
- Perplexity
- Gemini
- Grok
- DeepSeek
- Kimi
- Google AIO
- Copilot
Why a pipeline beats a prompt
Single-shot AI content reads single-shot. Variance kills citation rate.
A one-prompt-one-output workflow can produce something readable; it cannot produce something AI engines will cite. Citation requires structural conformance to the question, third-party authority signals, factual accuracy, and human-pass on AI-detection. Each is a separate axis. A pipeline measures and gates each axis; a prompt hopes.
ContentOS shipped 10,000+ pieces of content in 2025-2026 with measured human-pass rates above 85% and AEO scores averaging 78 across categories — without manual rewrite. That is the measured benchmark behind Scale tier.
The pipeline · ten checkpoints from blank to publish
Each step measured. Each variant scored. Publish-readiness is a verdict, not a vibe.
- Deep research — multi-source SERP + scraper-stack fetch with cited evidence (not LLM hallucination).
- Question generation — what readers actually search for around this topic.
- Keyword cluster — semantic SEO cluster + entity graph anchors.
- Brief — audience, intent, structure, citation targets. KB-grounded when knowledge folder is available.
- Pre-write readiness gate (90/100) — blocks generation if brief is incomplete or off-target. Catches bad assumptions before they consume model budget.
- 3 variants — independent generation from 3 model chains (Claude Sonnet, Kimi K2, Cerebras Llama 3) so prose fingerprints differ.
- LLM-judge tournament — a 4th model scores all 3 variants on 7 axes: factual accuracy, structure, brand-voice match, citation density, AEO readiness, prose, no-slop. Winner ships.
- AEO/GEO score + refine — score against per-engine readiness (target 78+). Below threshold = refine pass against the failing axes.
- Factcheck — three-pass: knowledge graph → NLI batch → deep-verify for P0 numeric/risky claims. Disputed claims block publish.
- Schema + publish-readiness verdict — generates Article + FAQ + Speakable JSON-LD; final go/no-go.
Premium chain
Claude Sonnet → Kimi → Cerebras
For brand-critical content, the complex chain routes through Anthropic Claude Sonnet (or CLI flat-rate), then Kimi K2 fallback, then Cerebras Llama as final fallback. Quality > token economy. Paid API direct disabled to prevent runaway spend.
Simple chain
Free-tier gateway pool
For utility tasks (briefs, classification, retries), the simple chain routes through HWAI gateway (LiteLLM proxy) → Cerebras free → Gemini free → Groq free. $0 marginal cost for the bulk of pipeline operations.
Cost discipline
Per-day per-provider tracker
Persistent token ledger tracks usage per provider per day. Daily-quota awareness prevents one runaway script from exhausting the free-tier pool. Production benchmarks always route the same path as live endpoints — no silent quota burn.
FAQ
ContentOS Engine — common questions.
What does the pipeline produce?
Per request: a deep research bundle (sources + citations), a brief (audience, intent, structure, citation targets), 3 variants from independent LLM chains, an LLM-judge tournament pick, a refine pass against AEO/GEO score, a factcheck against ContentOS knowledge graph + live SERP, schema markup (Article + FAQ + Speakable), and a publish-readiness verdict. Average wall-clock: 8-15 minutes per piece on Scale tier, brief to publish-ready.
How do you avoid AI-detection flags?
Two layers. (1) Generation routes through paid frontier models (Claude Sonnet, Kimi, Cerebras Llama, Anthropic Claude CLI) — not consumer ChatGPT — so the prose has heterogeneous fingerprints. (2) Anti-slop v2 ruleset (20 detection rules: sycophantic openers, "delve into", em-dash overuse, etc) runs as a hard gate before publish. We re-check with Originality.ai / GPTZero / Copyleaks; >85% human-pass rate on shipped content.
EN and RU only? What about other languages?
EN + RU pipelines are production-grade today. Arabic pipeline is next (Dubai team — native coverage). German, Spanish, French: bespoke on Done-for-you. We do not ship machine-translated content in any language; every variant is generated natively in the target language with native-speaker review on Scale+ tiers.
What is the LLM-judge tournament?
3 variants generated independently from different model chains (e.g. one through Anthropic Claude, one through Kimi K2, one through Cerebras Llama 3). A 4th model (judge — typically Claude Opus) scores each on 7 axes: factual accuracy, structural fit, brand voice match, citation density, AEO readiness, prose quality, no-slop. Winner ships; others archived as alternatives. Cuts variance from "vibes" to measured selection.
How is factchecking done?
Three-pass. Pass 1 = factcheck against ContentOS Knowledge Graph (~600 entity facts maintained from Wikidata SPARQL + manual curation). Pass 2 = NLI factcheck batch via factcheck_l3 (sentence-level entailment against retrieved sources). Pass 3 (P0 numeric / risky claims only) = deep-verify against fresh SERP fetch with multi-source agreement check. Verdicts: VERIFIED / DISPUTED / INSUFFICIENT_DATA. Disputed claims block publish.
Pre-write readiness gate — what blocks publish?
A pre-write check runs BEFORE generation to ensure the brief is shippable. Gate threshold: 90/100. Fails: incomplete audience definition, missing citation targets, brief too short, no SERP signal for the topic. This catches the "looks reasonable but no chance of citation" briefs before they consume generation budget. Operator can lower threshold with explicit reason (rare).
Can I use my own brand voice profile?
Yes. Voice profile auto-ingest pulls from your existing site + past articles + style guide. Generation chain receives the profile as system prompt; LLM-judge scores brand-voice match as one of 7 axes. Updated quarterly (or on-demand) as your voice evolves.
See ContentOS on your category. Free 30-min strategy call.
You see the actual output quality before signing. We tell you which tier fits — and honestly which would not make a measurable difference for your category.