TL;DR
AI lets teams ship outputs faster than they could agree on what “good” means. This means: lots of activity, little predictable quality. The fix isn’t stricter rules or longer reviews, but short, repeatable calibration processes: define exemplars, run fast comparison exercises, publish local rules, and measure acceptance rates. This edition gives a founder/PM playbook (templates, a 30/60/90 plan, rubrics, and the three metrics that actually prove you raised the bar).
A quick note
Amazon just announced 30,000 layoffs. UPS cut 48,000. Both cited AI as a key driver, and it’s not stopping there.
If you were recently affected and want to pivot into an AI-related role, I want to help. I’m putting together a small effort to connect people with resources, mentors, and companies hiring for AI-skilled roles.
I’ve been in the AI space for 8+ years, worked with ML systems at Meta, founded an AI education non-profit that reached 70,000 people, and now run an AI testing platform where I see firsthand how companies are implementing AI and reshaping their approach to business.
If that sounds useful, you can fill out the form below. I’ll share what I learn as I help people navigate this shift.
The problem in one sentence
You can ship 10x faster with AI, but if your team lacks a shared sense of quality, you’ll ship 10x broken things faster.
Why “quality blindness” exploded with AI
A few short mechanisms explain it:
Volume without vetting. AI multiplies outputs; review capacity didn’t scale.
No exemplar culture. People have never seen great, consistent examples for new AI-enabled work.
Shallow heuristics. Teams use “looks good” or “sounds right” instead of precise success signals.
Divergent thresholds. Senior engineers, PMs, and customer-facing teams have different invisible standards.
Diffusion of ownership. When everyone edits, nobody defines “ship-ready.”
The result is an org that feels busier but is less reliable.
The core insight
Quality is social — not purely technical. You can’t code your way to taste. You have to grow it, explicitly and fast.
The 5-step calibration loop (the shortest path to shared taste)
Do this weekly for any output class you care about (docs, PRs, marketing copy, model responses).
Collect (2 examples) — Gather two recent outputs: one “good-ish” and one “bad-ish.”
Compare (10 mins) — In a 15-minute session, 3 people rate both against current needs. Keep it fast.
Extract (1 rule) — Agree on one concrete rule (“always cite source X for claims about customers”; “no more than 2 API calls per response”).
Document (30 seconds) — Add the rule to a one-line team guide + exemplar snippet.
Enforce (next week) — Require the rule be checked in the next 5 PRs/outputs; measure compliance.
Repeat. Small rhythms compound.
A founder-friendly 30 / 60 / 90 plan
0–30 days — make taste explicit
Run 4 calibration sessions (one per core function: product, support, marketing, data). Use the 5-step loop.
Create an exemplar library (one-paragraph examples of “good” outputs for each function). Store these where people work.
Publish “1-rule” team guides (one sentence + 1 exemplar). No more than 3 per team.
30–60 days — instrument & coach
Measure acceptance rate: % of outputs accepted without rewrite on first review. Baseline it.
Start micro-coaching: 10-minute pairing sessions for those with low acceptance rates.
Run cross-team calibration: pick one output class (e.g., user-facing summaries) and align product + support + marketing.
60–90 days — scale & anchor
Embed exemplars in PR templates and ticket forms (auto-suggest relevant exemplar).
Make calibration part of onboarding: new hires must review 5 exemplars and pass a quick rubric check.
Report weekly: publish acceptance rate + one new rule in the company digest.
Two practical templates (copy-paste)
Calibration session agenda (15 minutes)
0:00 — pick 2 examples (team lead)
0:02 — silent score (rubric below)
0:05 — rapid compare (each person gives 30s rationale)
0:11 — agree on one rule or change (owner assigned)
0:14 — record exemplar + rule in team guide
0:15 — finished
One-line team guide (example)
Support summaries: Always include the customer quote + one concrete next step; don’t exceed 3 bullets. (Exemplar linked)
Simple rubrics that scale
Free-text output rubric (0–3)
0 — Unsafe / unusable
1 — Useful with major edits
2 — Good, small edits only
3 — Ready to ship
Structured output rubric (0–4)
0 — Missing required fields
1 — Partial, many errors
2 — Complete but inconsistent with examples
3 — Matches examples, minor polish needed
4 — Exemplary, publishable
Use the rubric during calibration and require a score in PRs/tickets.
The three metrics that prove you’ve raised the bar
Pick these and publish them weekly.
Acceptance rate (first-pass): % of outputs accepted without edits. If it’s going up, taste is synchronizing.
Rewrite burden: average number of edits per accepted item. Lower is better.
Exemplar usage: % of outputs that reference or match a published exemplar. If usage rises, adoption is happening.
Benchmarks: aim for acceptance rate > 60% in 90 days for targeted workflows.
Quick examples — what this looks like in practice
Product spec alignment
Problem: PMs write specs that engineers interpret very differently.
Fix: calibration session with 3 cross-functional teammates. Outcome: new rule — “Every spec must include a 2-line success metric and one failing example.” Result: fewer clarification threads and faster implementation.
AI-generated customer replies
Problem: AI drafts reply that’s fluent but legally risky.
Fix: add rule — “All refunds and commitments must include an approval code from policy doc X.” Pairing session trains reps on spotting risky phrasing. Result: no compliance incidents in 60 days.
Common traps & how to avoid them
Trap: “We’ll solve this with more QA.” → QA delays feedback. Prefer fast calibration + exemplar publication.
Trap: “Leadership sets taste.” → Top-down rules don’t stick. Let teams co-author exemplars.
Trap: “We’ll automate exemplars later.” → Capture live human judgment now; automation can follow.
Taste is social muscle — it grows from practice, not policy memos.
One-page playbook for leaders (do this Monday)
Schedule four 15-minute calibration sessions this week for different teams.
Collect two examples per team (1 good, 1 bad).
Run the sessions using the agenda above. Publish one-line guides.
Add a rubric score to PRs/tickets for the next 30 days.
Report acceptance rate and one exemplar in next Monday’s update.
Do this once — it will change what people consider acceptable almost immediately.
🔚 Final note
Speed without a shared sense of “good” is chaos disguised as progress. The companies that win in the AI era won’t be the fastest at producing outputs — they’ll be the best at agreeing what matters and teaching their teams to meet that standard, fast. Build exemplars. Run tiny calibration loops. Measure acceptance. Repeat.
👉 If you found this issue useful, share it with a teammate or founder navigating AI adoption.
And subscribe to AI Ready for weekly lessons on how leaders are making AI real at scale.
Until next time,
Haroon
