LLMs hallucinate. This is a known property of the technology, not a bug that will be patched in the next release. The question for any system built on top of LLMs is: what do you do about it?
Our answer is the critic agent, a second model whose sole job is to review answers before they reach you.
What the critic checks
After the synthesis model produces an answer, the critic agent receives:
- The original question
- The draft answer
- All source passages that were retrieved
It runs three checks:
Faithfulness
For every factual claim in the answer, the critic verifies that the claim is supported by at least one cited passage. If a claim appears in the answer but cannot be traced to any source, the critic marks it as unsupported.
Unsupported claims trigger a re-retrieval loop: the system fetches additional passages and attempts to either verify the claim or remove it from the answer.
Completeness
The critic checks whether all parts of the original question have been addressed. For decomposed questions (multi-hop queries), it verifies that each sub-question has a corresponding answer.
If a sub-question is unanswered, usually because retrieval found no relevant passages, the critic adds a note to the answer: “No relevant information found in your knowledge base for: [sub-question].”
Contradiction detection
Sometimes two source passages say different things about the same fact. This happens frequently with contracts that have been amended, or policies that have been updated without the old version being deleted.
The critic checks for inter-source contradictions. When it finds one, instead of silently choosing the more recent or higher-confidence passage, it surfaces both in the answer with their respective sources, flagging the contradiction explicitly.
When the critic disagrees
If the critic finds significant issues (more than one unsupported claim, or a completeness score below threshold) it rejects the draft answer and triggers a new retrieval-synthesis cycle with modified queries.
In practice, first-pass rejection happens on about 12% of queries. Of those, the second pass resolves the issues ~90% of the time. About 1% of queries end up returning a “low confidence” answer with explicit caveats rather than a confident synthesised response.
We think this is the right trade-off. A system that tells you it’s uncertain is more useful than one that confidently gives you the wrong answer.
Why a separate model
We could run the critic checks as a second pass using the same synthesis model. We don’t, for two reasons.
First, the synthesis model has a systematic bias toward the answer it just produced. Asking it to critique its own output is less reliable than asking a separate model with no attachment to the draft.
Second, the critic needs to be fast. We run a smaller, fine-tuned verification model specifically for the three checks above. It adds ~200ms to end-to-end latency, a worthwhile trade for the reliability improvement.
Mandatory citations
Every claim in a RenBase answer is linked to a specific passage in a specific document. This is not optional. The synthesis model is instructed to produce no claims without citations, and the critic will reject answers that include uncited content.
This means you can always trace an answer back to its source and verify that the source actually says what the answer claims.