TL;DR
AI lets teams ship outputs faster than they could agree on what “good” means. This means: lots of activity, little predictable quality. The fix isn’t stricter rules or longer reviews, but short, repeatable calibration processes: define exemplars, run fast comparison exercises, publish local rules, and measure acceptance rates. This edition gives a founder/PM playbook (templates, a 30/60/90 plan, rubrics, and the three metrics that actually prove you raised the bar).

A quick note

Amazon just announced 30,000 layoffs. UPS cut 48,000. Both cited AI as a key driver, and it’s not stopping there.

If you were recently affected and want to pivot into an AI-related role, I want to help. I’m putting together a small effort to connect people with resources, mentors, and companies hiring for AI-skilled roles.

I’ve been in the AI space for 8+ years, worked with ML systems at Meta, founded an AI education non-profit that reached 70,000 people, and now run an AI testing platform where I see firsthand how companies are implementing AI and reshaping their approach to business.

If that sounds useful, you can fill out the form below. I’ll share what I learn as I help people navigate this shift.

The problem in one sentence

You can ship 10x faster with AI, but if your team lacks a shared sense of quality, you’ll ship 10x broken things faster.

Why “quality blindness” exploded with AI

A few short mechanisms explain it:

  • Volume without vetting. AI multiplies outputs; review capacity didn’t scale.

  • No exemplar culture. People have never seen great, consistent examples for new AI-enabled work.

  • Shallow heuristics. Teams use “looks good” or “sounds right” instead of precise success signals.

  • Divergent thresholds. Senior engineers, PMs, and customer-facing teams have different invisible standards.

  • Diffusion of ownership. When everyone edits, nobody defines “ship-ready.”

The result is an org that feels busier but is less reliable.

The core insight

Quality is social — not purely technical. You can’t code your way to taste. You have to grow it, explicitly and fast.

The 5-step calibration loop (the shortest path to shared taste)

Do this weekly for any output class you care about (docs, PRs, marketing copy, model responses).

  1. Collect (2 examples) — Gather two recent outputs: one “good-ish” and one “bad-ish.”

  2. Compare (10 mins) — In a 15-minute session, 3 people rate both against current needs. Keep it fast.

  3. Extract (1 rule) — Agree on one concrete rule (“always cite source X for claims about customers”; “no more than 2 API calls per response”).

  4. Document (30 seconds) — Add the rule to a one-line team guide + exemplar snippet.

  5. Enforce (next week) — Require the rule be checked in the next 5 PRs/outputs; measure compliance.

Repeat. Small rhythms compound.

A founder-friendly 30 / 60 / 90 plan

0–30 days — make taste explicit

  • Run 4 calibration sessions (one per core function: product, support, marketing, data). Use the 5-step loop.

  • Create an exemplar library (one-paragraph examples of “good” outputs for each function). Store these where people work.

  • Publish “1-rule” team guides (one sentence + 1 exemplar). No more than 3 per team.

30–60 days — instrument & coach

  • Measure acceptance rate: % of outputs accepted without rewrite on first review. Baseline it.

  • Start micro-coaching: 10-minute pairing sessions for those with low acceptance rates.

  • Run cross-team calibration: pick one output class (e.g., user-facing summaries) and align product + support + marketing.

60–90 days — scale & anchor

  • Embed exemplars in PR templates and ticket forms (auto-suggest relevant exemplar).

  • Make calibration part of onboarding: new hires must review 5 exemplars and pass a quick rubric check.

  • Report weekly: publish acceptance rate + one new rule in the company digest.

Two practical templates (copy-paste)

Calibration session agenda (15 minutes)

0:00 — pick 2 examples (team lead)
0:02 — silent score (rubric below)
0:05 — rapid compare (each person gives 30s rationale)
0:11 — agree on one rule or change (owner assigned)
0:14 — record exemplar + rule in team guide
0:15 — finished

One-line team guide (example)

Support summaries: Always include the customer quote + one concrete next step; don’t exceed 3 bullets. (Exemplar linked)

Simple rubrics that scale

Free-text output rubric (0–3)

0 — Unsafe / unusable

1 — Useful with major edits

2 — Good, small edits only

3 — Ready to ship

Structured output rubric (0–4)

0 — Missing required fields

1 — Partial, many errors

2 — Complete but inconsistent with examples

3 — Matches examples, minor polish needed

4 — Exemplary, publishable

Use the rubric during calibration and require a score in PRs/tickets.

The three metrics that prove you’ve raised the bar

Pick these and publish them weekly.

  1. Acceptance rate (first-pass): % of outputs accepted without edits. If it’s going up, taste is synchronizing.

  2. Rewrite burden: average number of edits per accepted item. Lower is better.

  3. Exemplar usage: % of outputs that reference or match a published exemplar. If usage rises, adoption is happening.

Benchmarks: aim for acceptance rate > 60% in 90 days for targeted workflows.

Quick examples — what this looks like in practice

Product spec alignment

Problem: PMs write specs that engineers interpret very differently.

Fix: calibration session with 3 cross-functional teammates. Outcome: new rule — “Every spec must include a 2-line success metric and one failing example.” Result: fewer clarification threads and faster implementation.

AI-generated customer replies

Problem: AI drafts reply that’s fluent but legally risky.

Fix: add rule — “All refunds and commitments must include an approval code from policy doc X.” Pairing session trains reps on spotting risky phrasing. Result: no compliance incidents in 60 days.

Common traps & how to avoid them

  • Trap: “We’ll solve this with more QA.” → QA delays feedback. Prefer fast calibration + exemplar publication.

  • Trap: “Leadership sets taste.” → Top-down rules don’t stick. Let teams co-author exemplars.

  • Trap: “We’ll automate exemplars later.” → Capture live human judgment now; automation can follow.

Taste is social muscle — it grows from practice, not policy memos.

One-page playbook for leaders (do this Monday)

  1. Schedule four 15-minute calibration sessions this week for different teams.

  2. Collect two examples per team (1 good, 1 bad).

  3. Run the sessions using the agenda above. Publish one-line guides.

  4. Add a rubric score to PRs/tickets for the next 30 days.

  5. Report acceptance rate and one exemplar in next Monday’s update.

Do this once — it will change what people consider acceptable almost immediately.

🔚 Final note

Speed without a shared sense of “good” is chaos disguised as progress. The companies that win in the AI era won’t be the fastest at producing outputs — they’ll be the best at agreeing what matters and teaching their teams to meet that standard, fast. Build exemplars. Run tiny calibration loops. Measure acceptance. Repeat.

👉 If you found this issue useful, share it with a teammate or founder navigating AI adoption.

And subscribe to AI Ready for weekly lessons on how leaders are making AI real at scale.

Until next time,
Haroon

Reply

or to participate

Keep Reading

No posts found