Tarento · Way of Building · AI-Native Developer Playbook v2

Section 1

Why This Playbook Exists

If you ship code with a copilot and use a chat model to draft tests or docs, you're already AI-assisted. That's the easy part. The harder, more valuable part — the one this playbook is about — is becoming AI-native: making AI a structural part of how you design, build, review, ship, and operate software, in a way that gets better with every loop instead of plateauing. The SAND Framework is the system that makes that happen.

The SAND Framework: Stagewise AI-Native Development is our spec-driven, AI-native way of building software, optimised for Kanban flow and compound engineering. Engineers, designers, and PMs focus on intent, architecture, and review — while agents handle generation, refactoring, and routine tasks. Every stage is governed by a reusable spec; every loop compounds into the next.

How to use this playbook: Read it once cover-to-cover to absorb the shape of the practice. Come back to specific sections when you're starting a new project, onboarding a teammate, or stuck in a review. Print sections 8, 9, and 17 — they are designed to live next to your screen.

You'll know the practice is working when…

Specs live in git

Your team's specifications are versioned alongside code, reviewed in PRs, and referenced in stories — not stashed in someone's notebook.

Stories direct agents

Every user story names its inputs, target artifacts, governing specs, and expected compound outcome — clear enough that an agent can act without a meeting.

Retros produce diffs

Every retro ends with a list of artifacts you wrote back into the system: spec updates, new patterns, regression tests, refined prompts.

The Compound Principle: Every loop produces two things — the feature, and an improvement to the system that produced it. Skip the second one and you're just doing AI-assisted work faster.

Section 2

The Shift, in Plain Terms

The move from AI-assisted to AI-native is mostly about replacing two old habits with two new ones. Hold these in your head as you read the rest of the playbook.

Where most teams are today

Free-form prompts, written fresh per task
AI output is a one-off draft you copy-paste
Tests, docs, and diagrams are downstream chores
What worked last sprint is in someone's head
Reviews catch defects but rarely improve the next loop

Where you're going with SAND

Versioned specs govern every agent invocation
AI output is a reviewable artifact in the pipeline
Tests, docs, and diagrams are first-class deliverables
What worked is written back into specs and patterns
Each loop leaves the system measurably better

The Three Terms You'll See Throughout

Spec

A versioned, reviewable document that constrains what an agent does. Lives in git. Examples: prd_spec, code_spec, qa_docs_spec.

Agent

A narrowly scoped AI worker invoked under a spec, producing reviewable artifacts (PRDs, code, tests, docs, diagrams). Logged, costed, replayable.

Compound

The step where you write learnings back into specs, patterns, agents, and tests so the next loop starts ahead of where this one ended.

Section 3

The Loop You'll Live In

Every unit of work — a story, a bugfix, a refactor, a release — runs the same four-step loop. Steps 1–3 deliver the change. Step 4 is what makes the system get better.

Step 1

Plan

Frame the work. Name the artifact being transformed, the spec that governs it, and what success looks like.

Step 2

Work

Agents produce the primary output under spec control. You contribute what they're bad at.

Step 3

Review

Multi-agent critics and human reviewers assess against spec and constraints. Defects are fixed in place.

Step 4

Compound

Write learnings back: spec updates, new patterns, new tests, refined prompts, documented anti-patterns.

What "Compound" Actually Looks Like

Compound is not a discussion or a retrospective sticky note. It produces concrete artifacts. After every story, you should be able to point at a diff against one of these:

Tangible compound deliverables

A new section, example, or rule added to a spec
A reusable snippet promoted into a shared library
A new regression test added to the cross-team suite
A refined prompt, system message, or agent definition
A documented anti-pattern with explicit rationale

The honest exception

If a story ends with no diff against any of these, ask: was this loop genuinely identical to one we've run before? If yes, fine — but that should be rare. If you're skipping Compound every story, you're not compounding; you're just delivering.

Section 4

Eight Operating Principles

These are the principles you cite in design reviews and PR threads. They're how disagreements get settled without re-litigating philosophy every time.

1. Specs over prompts

Prompts are tactical. Specs are versioned, reviewed, reused. If something matters, it goes in a spec.

2. Every loop compounds

Work isn't done until the Compound step has produced a concrete improvement to the system.

3. Humans decide. Agents produce.

Architectural choices and approvals stay with humans. Generation, refactoring, and repetitive work move to agents.

4. Reviewable diffs only

Agents produce changes small enough to review with clear blast radius. Large rewrites are decomposed.

5. Routing matches risk

Frontier models for ambiguous, high-risk, cross-artifact work. Cheaper models for routine, repetitive work.

6. Default to determinism

Where outputs can follow schemas or templates, prefer that over open-ended generation.

7. Reuse beats regeneration

If a pattern, snippet, or spec exists, the agent must use it. Regenerating from scratch is a smell.

8. Cost is first-class

Cost per story and per stage are tracked alongside lead time and quality. Surprise bills are bugs.

Section 5

Specs are the Source of Truth

If you take one habit from this playbook, take this one: write a spec, not a prompt. A prompt is a snowflake — it works for a moment and disappears. A spec is a contract — it lives in git, it's reviewed, it's versioned, it's reused, and it gets sharper every time someone uses it.

The Specs You'll Touch Most

Spec	What It Encodes	Owned By	Used By
`prd_spec`	Templates, domain language, NFR patterns, acceptance-criteria style	Product	PRD agent, consistency agent
`code_spec`	Tech stack, architecture style, coding standards, security & observability norms	Architecture group + all engineers	Build agent, multi-agent reviewers
`qa_docs_spec`	Test strategies, coverage rules, doc tone and structure	QA + Tech-writing	Verification agent, doc agent
`design_system_spec`	Tokens, components, accessibility rules, microcopy guidelines	Design	Wireframe / microcopy agents
`deployment_spec`	Pipeline templates, rollout strategies, risk classification rules	SRE	Pipeline agent, risk-classification agent
`ops_spec`	Alerts, SLOs, runbooks, remediation policies	SRE	Monitoring & remediation agents
`modernization_spec`	Refactor patterns, migration playbooks, deprecation policies	Architects	Modernization agent

What a Real Spec Snippet Looks Like

This is a redacted slice of a code_spec — the section that governs how cache layers should be built for multi-tenant services. Notice it's not a blob of free-form prose; it's structured enough that an agent can actually use it.

# Section 7.3 — Multi-tenant cache layers
applies_to: [service, library]
when: "component reads/writes cached data scoped to a tenant"

required_patterns:
  - name: tenant-keyed cache keys
    rule: "all keys MUST embed tenant_id as the first segment"
    example: "flag:{tenant_id}:{flag_key}"
  - name: bounded invalidation fan-out
    rule: "invalidation MUST NOT exceed N keys per call (N=1000)"
    rationale: "see incident I-2024-117 — unbounded fan-out caused regional outage"
  - name: audit on write
    rule: "every write emits an audit event with actor, tenant, before/after"

anti_patterns:
  - "global cache keys without tenant prefix (cross-tenant leak risk)"
  - "cache-aside without negative-result caching (thundering herd)"
  - "premature interfaces around the cache client (over-engineering)"

reusable_components:
  - "@platform/tenant-cache (preferred)"
  - "@platform/audit-emitter"

tests_required:
  - "isolation: tenant A cannot read tenant B's keys"
  - "invalidation bound: fan-out > N raises error before execution"
  - "audit completeness: every write produces matching audit event"

How to Write Your First Spec Section

1. Wait for the second instance

Don't speculate. The first time you do something, just do it. The second time, ask: is this a pattern? If yes, write it down.

2. Start with structure, not prose

Rules, examples, anti-patterns, required tests. Agents can act on lists; they can't act on essays.

3. Cite the incident or PR

Every rule has a reason. Link to it. "See incident I-2024-117" beats "this is important."

4. Get it reviewed

Specs go through PR review like code. Two approvers minimum for shared specs. Treat updates as first-class deliverables — tag them in your PR description.

Section 6

The Pipeline, Stage by Stage

The SAND Framework breaks every feature into nine stages. At each stage there's a spec, an artifact going in, and an artifact coming out. The agents change; the loop is the same. We'll trace a single feature — a tenant-scoped audit log for a feature flag service — through every stage so you can see the actual hand-offs.

⬡ SAND Framework · All Stages at a Glance

🔍

S1

Discovery

requirements_spec

📋

S2

PRD

prd_spec

🏛️

S3

Design & Arch

architecture_spec

⚙️

S4

Code

code_spec

✅

S5

Tests

qa_docs_spec

📖

S6

Docs

content_spec

🚀

S7

Deployment

deployment_spec

📡

S8

Operations

ops_spec

♻️

S9

Modernize

modernization_spec

Human

Goals · Risks

PM + Agent

Structured PRD

Architect + Agent

ADRs · C4

Dev + Agent

Code · IaC

QA + Agent

Tests · Props

Writer + Agent

Docs · Diagrams

SRE + Agent

CI/CD · Risk

SRE + Agent

SLOs · Runbooks

Architect

Refactor · Migrate

Every stage: Plan → Work → Review → Compound

Stage 9 is continuous, not sequential

Each stage governed by a versioned spec

Stage 1 · Discovery Spec: requirements_spec

Discovery & Requirements

Capture business goals, user needs, constraints, and risks at a level of clarity sufficient to drive PRD generation.

You do

Articulate goals, target users, success metrics
Identify hard constraints (regulatory, integration, performance)
Surface known risks and dependencies

Agents do

Discovery agent produces a structured brief from notes
Gap-analysis agent flags missing NFRs and edge cases
Risk agent surfaces likely failure modes

Compound

New question categories added to requirements_spec
Domain glossary grows
Missed patterns from human reviewers added to gap-analysis checklist

Audit-log example: The gap-analysis agent flags missing requirements: retention period, query SLAs, who can read other tenants' logs, GDPR-style deletion. PM resolves these before the PRD agent runs.

Stage 2 · PRD Spec: prd_spec

Product Requirements Document

Convert the brief into a structured, reviewable PRD that downstream stages can act on directly.

You do

Author goal narrative, success metrics, prioritization
Approve the structured PRD

Agents do

PRD agent generates the structured PRD
Consistency agent checks against organizational standards
Traceability agent links sections to upstream and downstream artifacts

Compound

Sections reviewers had to add by hand become new templates
Ambiguous phrasings caught downstream get blacklisted

Audit-log example: PRD comes out with capability sections (write, query, retention, deletion), NFRs (write latency < 100ms), Given-When-Then acceptance criteria. Consistency agent flags missing retention policy.

Stage 3 · Design Specs: architecture_spec, design_system_spec

Design & Architecture

Decide the architectural shape, key patterns, and UX direction; produce design and architecture artifacts that constrain implementation.

You do

Architects make decisions and write ADRs
Designers own user journeys and IA
Security architects do threat modeling

Agents do

Architecture agent generates 2–3 candidates with trade-offs
Diagram agent produces C4 views
Design agent produces wireframes against the design system
Threat-model agent drafts STRIDE analysis

Compound

Chosen pattern becomes a reference architecture
Rejected candidates with hot-spot risks become documented anti-patterns

Audit-log example: Three candidate designs — synchronous-write, async via queue, hybrid with local buffer. The hybrid wins; ADR-014 is recorded and the pattern is added to architecture_spec.

Stage 4 · Implementation Spec: code_spec

Implementation

Produce the code, IaC, and initial tests that realize the approved design.

You do

Frame each unit of work; decompose into reviewable diffs
Review agent-generated code critically
Handle complex debugging, novel algorithms, performance work

Agents do

Build agent generates diffs from PRD + design + code_spec + repo context
Multi-agent review runs in parallel: security, performance, over-engineering, style
Test-scaffold agent produces unit and contract tests

Compound

Corrected patterns added to code_spec
Anti-patterns documented with rationale
Reusable code goes into shared platform libraries

Audit-log example: Build agent produces the audit emitter, async pipeline, and storage layer. Security reviewer flags missing tenant scoping on the read path. Over-engineering checker flags an unnecessary abstraction. Both fixed in a second pass before human review.

Stage 5 · Testing Spec: qa_docs_spec

Testing & Quality

Verify the implementation meets the PRD and NFRs; grow the regression net.

You do

Design test strategy and risk-based coverage
Identify edge cases agents miss
Design what cannot be automated

Agents do

Verification agent generates unit, integration, contract tests
Property-based agent proposes invariants
Flakiness detector and coverage agent run continuously

Compound

Edge cases become templates for similar components
Regression suite grows with every loop
Property invariants become checklists for similar work

Audit-log example: Property-based agent proposes "every write produces exactly one audit event with matching fields." QA engineer adds an edge case: partial-failure on the queue must not produce phantom audits. Both go into the regression suite.

Stage 6 · Documentation Specs: qa_docs_spec, content_spec

Documentation & Diagrams

Produce and maintain docs, diagrams, and operational artifacts.

You do

Tech writers and architects review for tone, accuracy, audience fit
SREs own runbooks
Approve customer-facing docs

Agents do

Doc agent generates API reference, READMEs, FAQs, changelogs
Diagram agent regenerates C4 views from code
Runbook agent drafts initial runbooks
Customer-facing doc agent works against a voice-tuned spec

Compound

Frequent FAQs pull into qa_docs_spec
Confusing phrasings caught in support tickets become things-to-avoid in content_spec

Audit-log example: Internal README and API reference are generated. Tech writer edits the customer-facing audit-log guide for tone. Architect adds two human-only steps to the runbook (manual replay sign-off).

Stage 7 · Deployment Spec: deployment_spec

Deployment & Release

Promote changes safely with appropriate gates and rollback paths.

You do

Release managers approve promotion
EMs own release decisions
SREs own the deployment platform

Agents do

Pipeline agent generates and maintains CI/CD, IaC, manifests
Risk-classification agent assigns risk levels and recommends rollout
Release-notes agent composes notes from PRs and ADRs

Compound

Successful canary patterns become "safe templates"
Metrics that should have been rollback triggers but weren't get added

Audit-log example: Risk agent classifies the change as medium risk, recommends 10% canary on a single tenant for 24h. Release manager approves and signs off after each stage.

Stage 8 · Operations Spec: ops_spec

Operations & Incidents

Keep production healthy, detect incidents early, convert every incident into durable improvement.

You do

SREs own SLOs, on-call, postmortems
Engineering teams own service health
Incident commanders run major incidents

Agents do

Monitoring agent correlates signals, surfaces anomalies
Remediation agent proposes diagnoses and fixes
Postmortem agent drafts timelines

Compound

Every incident produces durable updates: alerts, SLOs, runbooks, tests
Repeat-incident-class rate trends to zero

Audit-log example: Monitoring agent detects elevated audit-write latency in one region, correlates with a recent config change. Remediation agent proposes rollback. SRE approves; rollback runs; postmortem produces three durable updates.

Stage 9 · Modernization Spec: modernization_spec

Continuous Modernization

Keep systems healthy and changeable: upgrade dependencies, remove dead code, refactor toward simpler designs.

You do

Architects and tech leads decide scope and risk appetite
EMs integrate modernization into the backlog

Agents do

Modernization agent proposes incremental refactors
Dependency-graph agent maintains the system view
Migration-plan agent generates step-by-step plans with rollback paths

Compound

Each refactor must leave the system simpler or test net stronger
Migration playbooks become near-templated for the next service

Audit-log example: Twelve months in, the audit-log service's queue library is superseded. Modernization agent proposes a migration PR with rollback plan. Architect approves; it ships through the standard pipeline.

Section 7

AI-Ready User Stories

An AI-ready story gives an agent enough to start without a meeting. The familiar "As a... I want... so that..." stays. Four things get added: the input artifact, the target artifacts, the spec sections that govern the work, and the loop position.

A Real Story, in the Format

# Add tenant-scoped audit log to feature flag service

user_narrative: |
  As a release manager, I want every flag change recorded with
  actor, tenant, before/after value, and timestamp, so that I can
  audit changes during incidents.

inputs:
  - PRD §4.3 (audit)
  - ADR-014 (audit-log architecture)
  - code_spec §7 (audit logging)
  - qa_docs_spec §3 (async event tests)

target_artifacts:
  - code change in flag-admin-service
  - contract test (audit-log API)
  - integration test (write path → audit emission)
  - API doc update + runbook update

acceptance_criteria:
  - Given a tenant admin updates a flag, when the update is
    committed, then an audit record is written within 100ms.
  - The record contains all required fields (actor, tenant,
    flag_key, before, after, timestamp).
  - Queries by tenant return only that tenant's records.

loop_position: Work + Review

compound_expectation: |
  If audit-emission helpers are reused, promote into
  @platform/audit-emitter and update code_spec §7.

cost_budget: "$8 / story (est.)"

The Discipline Behind the Format

Inputs are explicit

The agent shouldn't have to guess which spec applies. If you can't list the inputs, the story isn't ready.

Target artifacts are listed up front

Code is rarely the only deliverable. Tests, docs, runbook updates are part of "done."

Acceptance criteria are testable

"Given/When/Then" or similar. If a human can't write a test from it, neither can an agent.

Compound expectation is named

If you can predict what should be promoted into a spec or library, write it down. Otherwise flag the story as exploratory.

Cost budget sets a ceiling

If the agent burns through it, that's a signal to pause and re-plan, not to keep going.

Loop position is named

Tells reviewers what to look for. A "Plan" story is reviewed differently from a "Work" story.

Section 8

Reviewing Agent Output

Reviewing agent-generated work is a different skill from reviewing human-generated work. Agents are confident, prolific, and locally consistent — which means defects are often plausible. Your job isn't to read every line. It's to ask the questions that catch what plausibility hides.

The Six Questions, in Order

Run these in order on every agent-produced PR. The order matters: cheap checks first.

1

Did it follow the spec?

Open the PR alongside the spec sections cited in the story. Walk through them. Is every required pattern actually applied? If a spec section is missing from the PR, ask why before reading further.

2

Where is the blast radius?

What does this PR touch: data, security, public APIs, infra, internal-only? Match review depth to blast radius. A pure docs PR doesn't need a 90-minute review. A change to the auth path does.

3

What's the agent not showing me?

Agents tend to omit the unglamorous: error paths, partial-failure handling, observability hooks, audit logging, edge cases on inputs. If you don't see them, ask explicitly.

4

Is anything too elegant?

Suspiciously clean abstractions, premature interfaces, novel patterns where boring ones would do. Over-engineering is the most common AI-generated defect. Push back hard.

5

What's the compound win?

Before approving: what spec, library, or test should this PR feed? If nothing, why not? The Compound deliverable is part of the PR, not a follow-up.

6

Did the critics earn their keep?

Look at the multi-agent review output. If every critic returned "looks good," be suspicious — they may be too lenient. Tighten their prompts in the Compound step.

Section 9

The Compound Step

The Compound step is where AI-native delivery diverges from "AI-assisted faster." It's also the step under the most pressure to be skipped. The story is shipped, the reviewer is satisfied, the next story is waiting. Twenty minutes spent updating a spec feels like a tax. It isn't. It's the principal.

End-of-Story Compound Check

~10 min · Per story

A pattern emerged that's likely to recur — added to the relevant spec.
A reusable snippet was extracted into a shared library or module.
A new edge case was caught — regression test added to the suite.
An anti-pattern was rejected — documented in the spec's anti-patterns section with rationale.
An agent prompt or system message was refined — change committed and noted.
A spec gap was found — issue filed for the spec owner with a concrete proposal.
An incident or near-miss occurred — postmortem entry made with monitoring/runbook updates.
A cost surprise occurred — routing rule or batching strategy updated.
Nothing applies — story is genuinely a repeat of one we've shipped before. (Be suspicious of this answer.)

Definition by demonstration: If your retro ends without a list of artifacts you wrote back into the system, you didn't compound. You just delivered.

Section 10

Roles & What Changes for You

The shift looks different from each seat. Pick yours below; the others are useful too — knowing what your teammates are leaning into is half of working well together.

DEV

Developer

From individual contributor to compound engineer.

Your craft doesn't disappear — it concentrates. The judgment calls about decomposition, debugging, and design get more of your hours. The boilerplate gets less. The Compound step is where you make your team faster, not just yourself.

✓ Lean into

Decomposing work into reviewable units
Reviewing agent output critically
Debugging hard, novel problems
Shaping code_spec
Mentoring & capturing patterns
Performance and integration work

✕ Step away from

Boilerplate and scaffolding
Repetitive test writing
Mechanical refactoring
Manual changelog maintenance
Hand-rolled doc updates
Acting as a typist for the agent

TL

Tech Lead

From PR-bottleneck to loop conductor.

Your team's speed is now bounded by how well it runs the loop, not by how fast you review. Spend your hours on Plan and Compound. Make the rest of the team great at Work and Review.

✓ Lean into

Architectural intent at story level
Resolving cross-cutting questions
Running Plan and Compound for the team
Ensuring team work feeds shared specs
Calibrating multi-agent review

✕ Step away from

Reviewing every routine PR
Manually maintaining team docs
Re-explaining patterns in chat
Status-collection meetings

ARC

Architect

From diagrammer to spec steward.

The diagrams now generate themselves from code. What doesn't generate itself is the judgment encoded in architecture_spec and code_spec. That's where your hours go. You're the steward of the constraints under which everyone else's agents operate.

✓ Lean into

Architecture decisions and ADRs
Owning architecture_spec & code_spec
Steering Compound across BUs
Reviewing high-impact agent proposals
Cross-team pattern promotion

✕ Step away from

Drawing diagrams by hand
One-off architecture documents that go stale
Reviewing every routine PR
Being the only person who knows the why

QA

QA Engineer

From test-writer to test-strategist.

Agents write the repetitive tests. You design the strategy: what level, what edge cases, what risks justify what coverage. Your most valuable artifact isn't a test suite — it's a richer qa_docs_spec that everyone's verification agent reads from.

✓ Lean into

Test strategy and risk-based coverage
Designing edge cases agents miss
Owning qa_docs_spec
Auditing the regression library
Property-based testing design

✕ Step away from

Writing repetitive unit/integration tests
Maintaining fixtures and mocks by hand
Manual regression sweeps
Status reports the dashboard already shows

SRE

Site Reliability Engineer

From firefighter to feedback-loop designer.

Every incident is now a Compound opportunity. The remediation agent does the rote work; you design what it does and what it doesn't, and you write incidents back into ops_spec so the same class doesn't recur.

✓ Lean into

SLO design and incident command
Postmortem authorship
Owning ops_spec & deployment_spec
Designing remediation policies
Toil-reduction agent calibration

✕ Step away from

Manual signal correlation
Hand-writing every runbook
Repetitive remediations
Pipeline plumbing

PM

Product Manager

From PRD-writer to spec-author.

Your PRD is no longer a document — it's an input to a pipeline. Quality goes up when the PRD is structured enough that the PRD agent and the consistency agent can do most of the drafting. Your hours move toward goal-setting, prioritization, and growing prd_spec.

✓ Lean into

Goal articulation and success metrics
Customer empathy and prioritization
Owning prd_spec
Structured requirements briefs
Cross-team domain glossary

✕ Step away from

Drafting boilerplate PRD sections
Manual traceability matrices
Re-typing the same NFR templates
Status-collection meetings

DSN

Designer

From visual producer to system curator.

Agents will produce wireframes and microcopy. The design system, the tokens, and design_system_spec are what make those outputs good. Your most leveraged work is in the system, not the screen.

✓ Lean into

Owning the design system & tokens
Encoding accessibility into specs
Crafting content_spec for voice
Reviewing and refining agent UI
User-research synthesis & journeys

✕ Step away from

Producing every wireframe by hand
Re-writing similar microcopy from scratch
Maintaining the design library manually
One-off prototype builds

Section 11

SDLC & Kanban Alignment

The SAND Framework can work with any SDLC approach — including Waterfall and various Agile frameworks. But it is principally aligned with Kanban. The small, reviewable, spec-governed increments that SAND produces are a natural fit with Kanban's core philosophy of continuous flow, limited WIP, and relentless cycle-time optimisation.

Why SAND and Kanban Are a Natural Fit

The alignment in one sentence: SAND breaks every unit of work into the smallest reviewable increment that a spec can govern and an agent can produce — which is exactly what Kanban's WIP limits and cycle-time pressure demand.

Kanban Principle 1

Limit Work in Progress

Each SAND stage is a discrete, bounded column. Stories can't proceed until their stage artifact is reviewed and accepted. Agents producing reviewable diffs — not massive rewrites — keep each card genuinely small and completable within WIP limits.

Kanban Principle 2

Faster Cycle Time

Agents compress the Work phase. Specs eliminate the planning ramp-up on repeat patterns. The Compound step means the second similar story starts faster than the first. Cycle time doesn't just stay flat — it actively trends down.

Kanban Principle 3

Optimise for Flow

Blockers in traditional Kanban often come from waiting for humans to draft things. SAND moves that wait time to the agent, which is non-blocking. Human review is focused and fast because reviewable diffs have clear blast radius.

Kanban Principle 4

Continuous Improvement

Kanban requires you to make the process visible and improve it. The Compound step is the structural mechanism: every loop writes its improvement back into specs, libraries, and tests — exactly what a Kanban retrospective should produce.

Framework Compatibility Overview

🎯 Kanban

SAND's primary alignment. Small increments, WIP limits, flow optimisation, and the Compound step map directly to Kanban principles. The stagewise pipeline is a natural Kanban board layout.

Compatibility: Excellent (primary framework)

⚡ Scrum / Agile Sprints

Works well. Map stages to sprint ceremonies. The sprint cadence replaces Kanban's continuous flow — compound deliverables happen at sprint retro. WIP limits require explicit enforcement.

Compatibility: Good

🏗️ Waterfall / Stage-Gate

Compatible at the stage level — each SAND stage aligns with a waterfall phase. Compound is harder to enforce at pace. Large batch sizes reduce the benefit of agent-generated reviewable diffs.

Compatibility: Partial

Selective Model Usage — Routing AI to the Right Work

Not all work is equal, and not all AI models are equal in price or capability. Cost optimisation is a first-class SAND principle. Here's the routing logic.

Tier 1 · Frontier Models

e.g. GPT-4o, Claude Opus, Gemini Ultra

Complex & Novel Work

Ambiguous requirements needing deep reasoning
First-iteration architecture on greenfield problems
Cross-artifact traceability (PRD ↔ code ↔ tests)
High-blast-radius security or performance reviews
Novel algorithm or domain-specific logic generation
Postmortem root-cause analysis

Tier 2 · Capable Models

e.g. Claude Sonnet, GPT-4o-mini (large context)

Standard Development Work

PRD generation from a structured brief
2nd and 3rd iteration on established patterns
Diagram generation from code
Test generation for known component types
Routine PR review against existing spec rules
API documentation from OpenAPI schema

Tier 3 · Lighter / Hosted Models

e.g. Claude Haiku, smaller fine-tuned models

Routine & Repetitive Tasks

Changelog generation from commit messages
Boilerplate code from templates
Formatting and linting correction
Simple unit test scaffolding (4th+ iteration)
FAQ generation from support tickets
Translation/localisation of known strings

When to Let AI Iterate — and When to Hand Off to a Human

AI iteration is powerful but not infinite. The quality of AI output typically follows a curve: significant gains on early iterations, diminishing returns by iteration 3–4, and potential quality erosion after that. Know when to stop the loop.

Iteration	Owner	Typical Focus	Signal to Proceed / Escalate
1st iteration	AI (Tier 1–2)	Initial draft — scaffold, structure, happy path. High variance is expected.	Proceed if spec coverage is ≥ 70%. Escalate if the agent is hallucinating APIs or misreading context.
2nd iteration	AI (Tier 2)	Address review feedback, fill in error paths, add observability hooks and edge cases.	Proceed if spec coverage is ≥ 90% and no security findings remain. Escalate if same defects recur.
3rd iteration	AI or Human (assess)	Fine-tuning: performance, subtle logic bugs, complex multi-system interactions.	If 3 iterations haven't resolved the core issue, a human should diagnose root cause before a 4th AI pass.
4th+ iteration	Human (preferred)	Persistent defects often signal a misunderstanding of context, system constraints, or spec gaps.	Human fixes the issue, then updates the spec so the same pattern doesn't repeat on the next story.
Always human	Human only	Architecture decisions, ADRs, postmortem conclusions, regulatory sign-off, incident command.	N/A — these are non-delegable by design.

Iteration discipline: The temptation after a failed 3rd AI iteration is to try a 4th with a better prompt. Resist it. Repeated AI iteration on the same problem is a symptom — either the spec is wrong, the story is too large, or the problem requires human judgment. Diagnose before retrying.

The SAND / Kanban compact: Each SAND stage is a Kanban column. Each story is a card. WIP limits on columns enforce the "small reviewable increment" discipline. The Compound step is your Kanban improvement cadence — not once a quarter, but every card.

Section 12

Greenfield vs Brownfield

The loop is the same, but the emphasis shifts hard depending on whether you're building on empty ground or modifying a system that already has users. Treat them as different sports.

🌱 Greenfield · Generate

Heavy use of scaffolding agents — full service skeletons from PRD + specs
Architecture agent proposes candidates; you choose and ADR
Reuse platform scaffolds aggressively
Front-load design and architecture; constraints stick for the system's life
Speed of iteration matters more than reversibility
Compound win: each project contributes back to scaffolds and reference architectures

🏗️ Brownfield · Comprehend, then change

Codebase-comprehension agent first — build a knowledge model before any change
Characterization tests required before refactor
Small, reversible diffs only. No big-bang rewrites
Feature-flag-controlled cutovers; explicit rollback paths
Domain-expert review where agent confidence is low
Compound win: modernization_spec grows; the second migration is faster than the first

The Decision Rule

Question	Greenfield treatment	Brownfield treatment
Existing codebase to integrate with?	No, or only at well-defined boundaries	Yes, with deep coupling
Current system understood?	N/A	Imperfectly; comprehension is part of the work
Cost of breaking existing behaviour?	Low	High; users and SLAs depend on it
Test coverage of affected area?	Build from scratch with verification agent	Often thin; characterization tests required first
Primary AI emphasis	Generation and scaffolding	Comprehension, characterization, incremental change
Spec emphasis	`prd_spec`, `architecture_spec`, `code_spec`	`modernization_spec`, `ops_spec`, `code_spec` evolution
Risk posture	Speed of iteration	Reversibility and small blast radius

Heuristic: If any answer in the right column applies, treat the workstream as brownfield — even if part of it is technically new. Mixed workstreams (a greenfield service that integrates deeply with a legacy core) require explicit decisions about which patterns apply at which boundary.

Section 13

Anti-Patterns to Spot Early

These are failure modes we've seen across the industry and inside our own teams. Each is a path of least resistance — easy to fall into, expensive to walk back.

Anti-Pattern	What it looks like	The fix
🚩 The prompt sprawl	Every developer writes their own prompts in their own style. Outputs drift. Nothing compounds because there's nothing to compound into.	Promote good prompts into shared spec sections. Treat one-off prompts as drafts on the way to specs.
🚩 The skipped Compound	"We're behind, let's just ship and circle back." Two months later: still shipping, still behind, system hasn't improved.	Compound is part of definition-of-done, not after it. If you can't compound today, decide explicitly that this story is a repeat — don't drift.
🚩 The over-eager agent	The agent ships a 1500-line PR that "refactors while we're at it." Reviewable diffs become unreviewable diffs.	Bound the scope in the story. Reject scope creep at review. Decompose large rewrites; don't accept them in one PR.
🚩 The plausible mistake	The agent invents a function, an API, or a library that doesn't exist — but it looks right. You merge. CI catches it. Or worse: it doesn't.	Always run generated code against the actual project. Trust nothing that hasn't compiled or linted in your environment.
🚩 The frontier-model default	Routine tasks hit the most expensive model. Costs balloon. Latency degrades. Routine work blocks behind frontier-model queue.	Tier the routing. Smaller models for repetitive work. Frontier only for genuinely complex, ambiguous, or cross-artifact work.
🚩 The tribal spec	One person owns the spec, edits it from gut feel, doesn't review changes. The spec becomes their preferences in document form.	Specs go through PR review. Changes cite incidents, PRs, or evidence. Two approvers for shared specs. Quarterly pruning.
🚩 The skill atrophy	Engineers stop debugging hard problems because the agent always tries first. Real expertise erodes; the team can't operate without agents.	Rotate engineers through "no-agent" debugging weeks. Require human authorship of high-impact ADRs and postmortems.
🚩 The infinite iteration trap	The same defect is retried 5+ times with slight prompt tweaks. No human diagnoses the root cause. Time is lost; spec gaps compound silently.	Cap AI iterations at 3. On the 4th pass, a human diagnoses first. The insight goes back into the spec before the next story.
🚩 The WIP overflow	Agents generate so fast that review queues overwhelm the team. Ten stories are "in review" simultaneously; none are truly done.	Apply Kanban WIP limits to the review column, not just to development. Agent speed without review discipline creates the illusion of progress.

Section 14

Metrics That Matter

Two families of metrics: delivery (familiar) and compounding (new). Track both. The compounding metrics are how you'll know the practice is working — delivery metrics alone can be misleading in early phases.

What Good Looks Like at 12 Months

−40 to −60%

Lead time on participating teams

Flat or better

Change failure rate

≥30%

Reuse rate

−30%+

Time to ship 2nd similar feature

The Three Metric Families

Family	Metric	Phase 1 target	Phase 3 target
Delivery	Lead time per feature (vs baseline)	−20 to −30%	−70% or more
Delivery	Change failure rate	Flat	Flat or better
Delivery	MTTR	Flat	−50%
Delivery	Deployment frequency	+50%	+200%
AI-Native	% AI-generated code/tests/docs	40–60%	70–85%
AI-Native	Human review time per PR	Flat	−30%
AI-Native	Agent rework rate	<25%	<10%
AI-Native	AI cost per story	Tracked	Below phase 1 baseline
AI-Native	Avg. AI iterations per story	Tracked	≤2.5 (signal of spec quality)
Compounding	Reuse rate	Tracked	>60%
Compounding	Spec updates per sprint (impact-weighted)	≥3 per team	≥5 per team
Compounding	Time to ship Nth similar feature vs first	Tracked	−50% by N=3
Compounding	Repeat-incident-class rate	Tracked	Trending to zero
Kanban	Cycle time per stage	Tracked by stage	WIP-limit violations trending to zero
Kanban	Review queue depth	<5 cards simultaneously	<3 cards simultaneously

Section 15

Self-Assessment

Eight questions about your team. Answer honestly — there's no scoring police. The result will tell you whether you should focus on foundations, scaling, or refinement.

Section 16

The 8-Week Path to Your First Compound

If your team is starting from AI-assisted today, here's a concrete path. Don't try to do everything at once. The point is to ship a real loop and feel the compound, not to roll out a framework.

Weeks 1–2 · Adopt the language

Read this playbook end-to-end as a team
Pick one feature for the pilot loop
Identify the spec sections you'll need
Set up cost reporting per story
Set up your Kanban board with SAND stage columns

Weeks 3–4 · Run the first loop

Convert the pilot story to AI-ready format
Run Plan → Work → Review with explicit ownership
Force the Compound step at the end
Tag every artifact in the PR description
Set a WIP limit on the Review column

Weeks 5–6 · Scale the loop

Run 3–5 stories under the model
Add multi-agent review on non-trivial PRs
Measure: lead time, rework rate, cost, AI iterations
Update specs with what worked
Tier your model routing for the first time

Weeks 7–8 · Demonstrate compound

Ship a second instance of a similar story
Measure how much faster it was
Demo the spec diffs at sprint review
Onboard one neighbouring team

Pilot success criteria: By the end of week 8, you should have shipped 6–8 stories under the loop, made at least 5 spec/library/test contributions back to the system, and seen the second-instance feature ship in measurably less time than the first. If those three are true, you're ready to onboard the next team.

Section 17

Definition of Done

The team's definition of done evolves to match what an AI-native loop is expected to produce. Print this. Stick it next to your sprint board. Argue from it.

A story is done when…

v2 · Print & pin

Code is merged and meets code_spec.
Tests cover the acceptance criteria and any new edge cases; risk-weighted coverage is adequate.
Documentation is regenerated and reviewed; customer-facing docs are reviewed by tech-writing where applicable.
Diagrams reflecting the change are current.
Multi-agent review has run on non-trivial PRs with no unresolved findings; human review is recorded.
Compound deliverables are explicit: spec updates, new patterns, new tests — or a recorded note that nothing applies (and why).
Cost is recorded and within budget for the story.
The PR description names the inputs (specs, ADRs) and the agent runs that produced the change.
Model routing is recorded: which tier was used, and why (captures cost and complexity signals).
AI iteration count is recorded; if ≥ 4 iterations were required, a human diagnosis note is included.
WIP limits respected throughout: the story did not sit blocked in any stage column beyond the agreed SLA.

Section 18

Governance & Continuous Learning

This playbook is a living system. It should improve every quarter based on what we ship, what breaks, and what we learn. The compound principle applies to the playbook itself.

For Individual Contributors

› Follow the PLAN → WORK → REVIEW → COMPOUND loop on every story
› Treat spec updates as first-class deliverables, not optional follow-ups
› Cite inputs (specs, ADRs) and agent runs in every PR description
› Surface anti-patterns and spec gaps the moment you spot them
› Spend an hour a week reading other teams' Compound diffs
› Record model tier used per story; flag routing anomalies

For Tech Leads & Architects

› Run Plan and Compound for the team — these are not delegable
› Calibrate multi-agent reviewers quarterly; tighten when too lenient
› Audit one randomly selected agent run per week
› Promote patterns across teams; resist forking
› Coach on judgment in reviews, not just activity
› Review WIP limits and stage cycle times monthly; adjust to team capacity

Feedback & Learning Loop

Observe

Log every agent run with inputs, outputs, model tier, cost, iteration count, and spec version. Without observation there is nothing to learn from.

Surface

In retros, name what worked, what didn't, and what's missing from the specs. Make the gap visible before debating the fix.

Encode

Write the learning back into the right artifact: spec, prompt, test, library, runbook, or routing rule. Vague action items don't compound.

Propagate

Share the diff with neighbouring teams. A pattern that compounds across two teams compounds twice as fast.

Prune

Quarterly, retire what no longer serves: stale specs, unused patterns, brittle prompts. Compound systems also accumulate dead weight.

Quarterly Playbook Review Agenda: (1) Compounding metrics by team — who's actually accumulating capability, (2) spec catalogue health — what's growing, what's stale, (3) routing and cost — where models are over-spent, (4) anti-pattern review — what new failure modes have we seen, (5) Kanban flow health — WIP violations, stage cycle time anomalies, (6) AI iteration rates — stages with high average iterations signal spec gaps, (7) playbook updates — sections that need rewriting, (8) cross-BU pattern promotion candidates.

SAND

The treadmill builds endurance. The compound builds capability.

Choose deliberately, every loop. The SAND Framework is designed to evolve as our practice matures and as the technology shifts beneath us. Treat it the same way you'd treat a good code_spec — argue with it, refine it, version it.

Tarento · Way of Building · Version 2 · Innovation Labs · SAND Framework

The AI-NativeDeveloper Playbook

Why This Playbook Exists

You'll know the practice is working when…

Specs live in git

Stories direct agents

Retros produce diffs

The Shift, in Plain Terms

Where most teams are today

Where you're going with SAND

The Three Terms You'll See Throughout

Spec

Agent

Compound

The Loop You'll Live In

Plan

Work

Review

Compound

What "Compound" Actually Looks Like

Tangible compound deliverables

The honest exception

Eight Operating Principles

1. Specs over prompts

2. Every loop compounds

3. Humans decide. Agents produce.

4. Reviewable diffs only

5. Routing matches risk

6. Default to determinism

7. Reuse beats regeneration

8. Cost is first-class

Specs are the Source of Truth

The Specs You'll Touch Most

What a Real Spec Snippet Looks Like

How to Write Your First Spec Section

1. Wait for the second instance

2. Start with structure, not prose

3. Cite the incident or PR

4. Get it reviewed

The Pipeline, Stage by Stage

Discovery & Requirements

You do

Agents do

Compound

Product Requirements Document

You do

Agents do

Compound

Design & Architecture

You do

Agents do

Compound

Implementation

You do

Agents do

Compound

Testing & Quality

You do

Agents do

Compound

Documentation & Diagrams

You do

Agents do

Compound

Deployment & Release

You do

Agents do

Compound

Operations & Incidents

You do

Agents do

Compound

Continuous Modernization

You do

Agents do

Compound

AI-Ready User Stories

A Real Story, in the Format

The Discipline Behind the Format

Inputs are explicit

Target artifacts are listed up front

The AI-Native
Developer Playbook