Reply Judge scores every drafted support reply against your brand-voice anchor and holds anything below the floor for a human. The whole game is choosing that floor — too low and off-voice replies ship, too high and humans drown in false positives.
Start at 0.78, move with evidence
0.78 is a sensible default for most stores. Raise it to 0.85 during a launch, a PR-sensitive moment, or for any category where tone is the product (wellness, luxury, anything emotionally loaded).
What to always gate to a human
- Anything that reads as emotional, grief-adjacent, or an escalation — regardless of score.
- First contact from a high-LTV customer.
- Replies that would commit to a non-standard refund or exception.
Drift-detect monthly
Voice drifts as your brand and team evolve. Once a month, review the held replies: if most are genuinely off-voice, your anchor is healthy; if most are false positives, re-index the anchor on recent approved replies and consider loosening the floor.
The metric to watch isn't auto-send rate — it's whether your held replies are real catches or noise.
— Reply Judge field notes
