Microsoft's Copilot Now Uses Two Models to Fact-Check One
Microsoft’s Wave 3 Copilot update includes a “Critique” system: OpenAI and Anthropic models checking each other’s work before an answer reaches the user. One model generates, a second verifies, and only then does the output appear.
The practical upside is real. Cross-model verification catches a class of confident errors that self-review misses entirely. If you’ve been building RAG pipelines for enterprise clients, you’ve probably already wired in some version of this manually — a second pass, a validator, a structured re-check. Microsoft just productized it.
But the architecture tells a story. When you need two separate models from two separate providers to produce one reliable answer, you’ve implicitly acknowledged that neither model alone clears the bar for enterprise use. That’s not a criticism — it’s an accurate read of where the technology sits. Anyone who’s deployed these systems seriously already knows it.
The questions that matter now are cost and latency. Running every query through two frontier models doesn’t come free. Microsoft can absorb that at scale, but the pattern will push upmarket — toward workflows where accuracy justifies the overhead, and away from casual productivity use cases where speed matters more.
The “just drop in GPT” crowd will keep doing what they’re doing. But enterprises choosing between AI vendors now have another axis to evaluate: not just which model, but what happens after the first answer.