order_detective
Heuristic + embedding match from inbound message → exact Shopify order. Resolves typos, partial names, hashed identifiers, multilingual subject lines.
// output
order_match
// cs specialist
Eight skills. One inbox. Order Detective grounds every reply in the real order; Policy Grounder cites your real policies; Reply Judge scores draft quality before send. Refunds, replacements, address changes, gift cards — executed inside your trust threshold.
from : klant@protonmail.com
subject : Bestelling 88421 niet aangekomen
lang : nl-NL (in allowed list ✓)
order_match : o_88421 (cosine 0.98)
product : Brushed Linen Throw — Sand
ship_status : in transit, NL (carrier ping 6h ago)
supplier : on-time batch, no flags
policy_cite : cs_policy_chunks#214
"EU shipping ETA — 5–8 business days"
draft : "Hoi! Goed nieuws — je throw is
onderweg en zou tussen vrijdag en
maandag aankomen. Hier is de track-link…"
judge_score : 0.91 → APPROVED → sent
cycle_time : 11s · 1 LLM call · €0.004// eight skills
Heuristic + embedding match from inbound message → exact Shopify order. Resolves typos, partial names, hashed identifiers, multilingual subject lines.
// output
order_match
Pulls the product context — variant, fulfillment status, shipment trail, supplier — so the reply is grounded in what the customer actually bought.
// output
context_payload
Cross-checks supplier ETA, batch quality flags, and lane delay history. If the supplier is the cause, the reply says so honestly, not generically.
// output
supplier_signal
pgvector RAG over cs_policy_chunks — your shipping, refund, returns, warranty pages. Every reply cites the chunk that justifies its decision.
// output
policy_citation
Language-gated by languages_allowed_for_send. Brand-voice embeddings hold tone. Drafts only — until Reply Judge clears them for send.
// output
cs_outbound_drafts
Expected-value math: refund cost vs replacement cost vs escalation risk vs lifetime value. The cheapest right answer, never the easiest.
// output
ev_recommendation
issue_refund · issue_replacement · cancel_order · change_shipping_address · issue_gift_card. Idempotent. Rate-limited. Every action a row.
// output
shopify_mutation
When confidence < threshold or risk class = high, hand off cleanly to a human inbox with the full reasoning trail attached — never a cold ticket.
// output
human_handoff
// reply judge
Does the draft only promise things your published policy actually allows?
e.g. Promised return window matches the cited cs_policy_chunk.
Does every claim trace back to the order data, supplier signal, or catalog?
e.g. Tracking ETA cited from the actual carrier event, not invented.
Does the draft sound like you, scored against your brand-voice embeddings?
e.g. Tone within 0.15 cosine distance of brand_voice_centroid.
Is the request inside the trust window, or does it need a human?
e.g. Refund >$500 → auto-escalate regardless of draft quality.
thread_id : t_4912
draft_id : d_1180
language : nl-NL (allowed ✓)
judge_score : 0.91
├─ policy : 0.94
├─ fact : 0.92
├─ brand_voice : 0.88
└─ risk : low (refund €18.50)
policy_cite : cs_policy_chunks#214
"EU returns — 30d, prepaid label"
fact_cite : order_payload#o_88421
tracking: 'in transit, NL'
verdict : APPROVE_SEND
sent_at : 2026-05-25T11:02:18Z// brand voice embeddings
Drag in 20–50 of your best CS replies — historic emails work fine. We strip identifiers and embed the prose.
20–50 examplesreply_drafter conditions on the brand-voice centroid, never freestyles tone. Even a new policy sounds like you.
drafter conditioningReply Judge scores cosine distance to the centroid. Drifts above 0.15: send is held, you review the variance.
cosine 0.15 threshold// refund_ev
// example scenario
Customer received the wrong color throw. €89 order. LTV €642 (high decile). Supplier swap-batch flag = true (root cause acknowledged).
Magistry picks gift_card 120% — positive EV, retains the relationship, supplier swap-batch covers the cost.
Top-decile customers get more flex; first-order customers get the policy default.
If the batch is the root cause, the EV credits the supplier-recovery line.
Public review risk, repeat-complaint history, dispute history — all priced in.
Magistry never resolves above the per-action ceiling without an escalation row.
// proactive outbound
trigger : fulfillment_stall (>96h carrier silence)
customer : jane.k@***.com (LTV €312)
order : o_88421 — 'Brushed Linen Throw'
draft_subj : "Quick update on your order #88421"
draft_body : Hi Jane — your throw shipped on the 20th
and the carrier hasn't pinged us in 4 days,
which is unusual. We've already opened a
trace with the courier and will refund the
full amount on the 28th if it hasn't moved.
No action needed on your side.
judge_score : 0.93 (policy ✓ fact ✓ voice 0.91)
status : PENDING_OPERATOR_APPROVAL
auto_send : 2026-05-25T18:00Z (if no override)
trust_mode : assisted (proactive sends require approval)// cs specialist
Connect your inbox in OAuth, paste your policy pages, drop in 20 brand-voice examples — first dry-run drafts arrive within the day, ready for you to approve or watch run.
Language-gated · Brand-voice scored · Append-only