GuideCS· 14 min read

Reply Judge — what to gate, what to ship.

How the multi-dimension gates work — policy grounding, factual grounding, brand voice — what to always route to a human, and how to drift-detect monthly.

Reply Judge scores every drafted support reply on separate dimensions and holds anything that fails one for a human. It's deliberately not a single fidelity dial — a draft has to clear each gate independently, so an on-brand tone can't smuggle through a wrong policy claim.

The gates, per dimension

Policy grounding (≥ 0.7): every policy claim must trace to the merchant's actual policy text. Factual grounding (≥ 0.7): every stated fact must trace to the order and signal data. Brand-voice similarity (≥ 0.6): cosine distance against the anchor of your sent, human-approved replies. Any failing dimension holds the draft — the weakest score decides, and the hold names the dimension and the fix.

What to always gate to a human

Anything that reads as emotional, grief-adjacent, or an escalation — regardless of score.
First contact from a high-LTV customer.
Replies that would commit to a non-standard refund or exception.

Drift-detect monthly

Voice drifts as your brand and team evolve. Once a month, review the held replies: if most are genuinely off-voice, your anchor is healthy; if most are false positives, re-index the anchor on recent approved replies so the voice gate measures against who you sound like now.

The metric to watch isn't auto-send rate — it's whether your held replies are real catches or noise.
— Reply Judge field notes

// no email gate

Want a guide written about your store?

We'll ghost-write the 'how we shipped Phase 2 in 14 days' case for any operator who flips Phase 2 inside their first month. Your data, your prose, our editorial bar.

Book a 20-min demo

All guides

Free to read · No email gate · Real read times