Skip to main content

// cs specialist

Handle inbound, end-to-end.

Eight skills. One inbox. Order Detective grounds every reply in the real order; Policy Grounder cites your real policies; Reply Judge scores draft quality before send. Refunds, replacements, address changes, gift cards — executed inside your trust threshold.

IMAP / SMTP Shopify Orders Reply Judge Proactive Outbound
thread · t_4912
from        : klant@protonmail.com
subject     : Bestelling 88421 niet aangekomen
lang        : nl-NL (in allowed list ✓)

order_match : o_88421 (cosine 0.98)
product     : Brushed Linen Throw — Sand
ship_status : in transit, NL (carrier ping 6h ago)
supplier    : on-time batch, no flags

policy_cite : cs_policy_chunks#214
              "EU shipping ETA — 5–8 business days"

draft       : "Hoi! Goed nieuws — je throw is
              onderweg en zou tussen vrijdag en
              maandag aankomen. Hier is de track-link…"

judge_score : 0.91 → APPROVED → sent
cycle_time  : 11s · 1 LLM call · €0.004

// eight skills

Every reply has a reasoning trail.

Eight specialist skills compose every reply, every refund, every escalation. Each skill is independently observable in cs_thread_evaluations — you can read why the agent thought it knew the answer, not just what it said.
01

order_detective

Heuristic + embedding match from inbound message → exact Shopify order. Resolves typos, partial names, hashed identifiers, multilingual subject lines.

// output

order_match

02

catalog_xref

Pulls the product context — variant, fulfillment status, shipment trail, supplier — so the reply is grounded in what the customer actually bought.

// output

context_payload

03

supplier_reality

Cross-checks supplier ETA, batch quality flags, and lane delay history. If the supplier is the cause, the reply says so honestly, not generically.

// output

supplier_signal

04

policy_grounder

pgvector RAG over cs_policy_chunks — your shipping, refund, returns, warranty pages. Every reply cites the chunk that justifies its decision.

// output

policy_citation

05

reply_drafter

Language-gated by languages_allowed_for_send. Brand-voice embeddings hold tone. Drafts only — until Reply Judge clears them for send.

// output

cs_outbound_drafts

06

refund_ev

Expected-value math: refund cost vs replacement cost vs escalation risk vs lifetime value. The cheapest right answer, never the easiest.

// output

ev_recommendation

07

action_executor

issue_refund · issue_replacement · cancel_order · change_shipping_address · issue_gift_card. Idempotent. Rate-limited. Every action a row.

// output

shopify_mutation

08

escalate

When confidence < threshold or risk class = high, hand off cleanly to a human inbox with the full reasoning trail attached — never a cold ticket.

// output

human_handoff

// reply judge

A second pass before any reply ships.

Reply Judge is a second model that scores every draft on policy adherence, factual grounding, and brand-voice alignment. Below the score floor: the draft is bounced back to the drafter; above the kill ceiling: the thread escalates instead of replies.
policy

Does the draft only promise things your published policy actually allows?

e.g. Promised return window matches the cited cs_policy_chunk.

fact

Does every claim trace back to the order data, supplier signal, or catalog?

e.g. Tracking ETA cited from the actual carrier event, not invented.

brand_voice

Does the draft sound like you, scored against your brand-voice embeddings?

e.g. Tone within 0.15 cosine distance of brand_voice_centroid.

risk

Is the request inside the trust window, or does it need a human?

e.g. Refund >$500 → auto-escalate regardless of draft quality.

cs_thread_evaluations
thread_id   : t_4912
draft_id    : d_1180
language    : nl-NL (allowed ✓)

judge_score : 0.91
  ├─ policy      : 0.94
  ├─ fact        : 0.92
  ├─ brand_voice : 0.88
  └─ risk        : low (refund €18.50)

policy_cite : cs_policy_chunks#214
              "EU returns — 30d, prepaid label"
fact_cite   : order_payload#o_88421
              tracking: 'in transit, NL'

verdict     : APPROVE_SEND
sent_at     : 2026-05-25T11:02:18Z

// brand voice embeddings

Tone is a tensor, not a prompt.

Upload 20–50 examples of replies you would write — Magistry generates cs_brand_voice_embeddings, a vector representation of your voice. The drafter steers toward it; the judge measures distance from it; new operators can hire against it.

Seed it once

Drag in 20–50 of your best CS replies — historic emails work fine. We strip identifiers and embed the prose.

20–50 examples

Steer every draft

reply_drafter conditions on the brand-voice centroid, never freestyles tone. Even a new policy sounds like you.

drafter conditioning

Measure every send

Reply Judge scores cosine distance to the centroid. Drifts above 0.15: send is held, you review the variance.

cosine 0.15 threshold

// refund_ev

The math behind every yes.

refund_ev computes the expected value of every possible action — refund, replace, gift card, escalate, deny — given the order, the customer's lifetime value, the supplier reality, and the policy. Picks the cheapest right answer.

// example scenario

Customer received the wrong color throw. €89 order. LTV €642 (high decile). Supplier swap-batch flag = true (root cause acknowledged).

  • issue_refund−€89.00 (loss + lost LTV: −€71)
  • issue_replacement−€34.20 (supplier covers, retains LTV)
  • gift_card 120%+€18.40 (retention model: +0.22 repurchase)
  • escalate−€42.80 (handle time + churn risk)
  • deny−€612.00 (policy breach, public review risk)

Magistry picks gift_card 120% — positive EV, retains the relationship, supplier swap-batch covers the cost.

LTV-aware

Top-decile customers get more flex; first-order customers get the policy default.

Supplier-aware

If the batch is the root cause, the EV credits the supplier-recovery line.

Risk-aware

Public review risk, repeat-complaint history, dispute history — all priced in.

Bounded

Magistry never resolves above the per-action ceiling without an escalation row.

// proactive outbound

Replies before the customer asks.

When fulfillment stalls, a batch ships late, or a payment fails — Magistry drafts the outbound message before the customer writes in. Sits in cs_outbound_drafts until your trust mode says send.
cs_outbound_drafts · row #d_4810
trigger     : fulfillment_stall (>96h carrier silence)
customer    : jane.k@***.com (LTV €312)
order       : o_88421 — 'Brushed Linen Throw'

draft_subj  : "Quick update on your order #88421"
draft_body  : Hi Jane — your throw shipped on the 20th
              and the carrier hasn't pinged us in 4 days,
              which is unusual. We've already opened a
              trace with the courier and will refund the
              full amount on the 28th if it hasn't moved.
              No action needed on your side.

judge_score : 0.93 (policy ✓ fact ✓ voice 0.91)
status      : PENDING_OPERATOR_APPROVAL
auto_send   : 2026-05-25T18:00Z (if no override)

trust_mode  : assisted (proactive sends require approval)
Switch to autonomous trust mode in /workspace/cs/settings

// cs specialist

Reply faster than the customer can finish typing.

Connect your inbox in OAuth, paste your policy pages, drop in 20 brand-voice examples — first dry-run drafts arrive within the day, ready for you to approve or watch run.

Language-gated · Brand-voice scored · Append-only