What you are really building
You are not just building an app.
You are building a repeatable engineering system where:
- requirements are clarified before code
- architecture is documented before implementation
- agents work from written contracts instead of fuzzy memory
- implementation is done in slices
- every slice is tested and reviewed
- deployment is a controlled release, not a gamble
This tutorial gives you that system.
Part 1 — The right mental model
Why most AI-assisted app builds fail
Most teams fail because they do one of these:
- ask one agent to “build everything”
- skip requirements and jump to code
- let the agent modify too many services at once
- don’t define what “done” means
- don’t force tests, validation, linting, and review
- deploy without a staging gate
- assume “code generated” means “production ready”
That creates fake speed.
Real speed comes from this sequence:
- define
- constrain
- implement
- verify
- deploy safely
- harden
Part 2 — Divide responsibilities between Claude and Codex
Use Claude for these jobs
Use Claude when you need:
- idea clarification
- product requirement discovery
- architecture planning
- user flow and admin flow design
- repo audit and codebase understanding
- gap analysis
- “what is missing?”
- writing
CLAUDE.md - writing
SPEC.md - reviewing whether a feature is actually complete
- finding what is broken, risky, missing, or inconsistent
Claude Code is explicitly built around an agentic loop of gathering context, taking action, and verifying results, and it reads CLAUDE.md at the start of sessions as persistent project guidance. (Claude)
Use Codex for these jobs
Use Codex when you need:
- implementing a clearly-scoped slice
- editing multiple files safely
- generating tests
- running lint/typecheck/test commands
- validating acceptance criteria
- reviewing diffs
- repeating a stable workflow with consistency
Official Codex guidance emphasizes durable repo instructions in AGENTS.md, giving verification steps, running lint and pre-commit checks, and breaking complex work into smaller focused tasks. (OpenAI Developers)
The simplest split
Use this rule:
- Claude decides what should happen
- Codex makes it happen
- Claude checks whether it happened correctly
That is the cleanest operating model.
Part 3 — The file system of control
For a serious project, do not rely only on chat history.
Create these files in the repository root:
Myhospital/
CLAUDE.md
AGENTS.md
SPEC.md
RELEASE_READINESS.md
svc1/
CLAUDE.md
svc2/
CLAUDE.md
svc3/
CLAUDE.md
svc4/
CLAUDE.md
What each file does
SPEC.md
This is the contract.
It defines:
- what is being built
- why it matters
- who uses it
- what flows must work
- acceptance criteria
- test plan
- rollout plan
- rollback plan
CLAUDE.md
This is Claude’s persistent repo memory that you write.
Official Claude Code docs say CLAUDE.md is loaded at the start of every session and should contain things Claude cannot infer on its own, such as bash commands, code style, workflow rules, architecture decisions, and review expectations. They also recommend keeping it concise and using more specific files deeper in the directory tree when needed. (Claude)
AGENTS.md
This is Codex’s durable repo guide.
OpenAI’s Codex docs recommend using AGENTS.md as the place to encode repo layout, run instructions, build/test/lint commands, engineering conventions, constraints, and what “done” means. (OpenAI Developers)
RELEASE_READINESS.md
This is the truth document before deployment.
It states:
- what works
- what does not work
- what is partial
- what was tested
- remaining risks
- release blockers
This file is not mandated by the vendors. It is a high-value engineering practice.
Part 4 — From idea initiation to a real requirement set
This is where your original notes are already strong.
You wrote:
- Authentication
- Authorisation
- User Flow
- Admin Flow
- UX Spec
- Search / Filter / Pagination
- Validation
- Feature 1
- Feature 2
- Feature 3
That is exactly the right backbone.
Step 1: Turn the idea into a requirement brief
Before any code, answer these:
Product
- What problem are we solving?
- Who is the primary user?
- What outcome should improve?
Business
- What does success look like?
- What is MVP?
- What is explicitly out of scope?
Security
- How do users authenticate?
- How are permissions enforced?
- What data is sensitive?
UX
- What is the primary user journey?
- What is the admin journey?
- What are the loading, empty, validation, and error states?
Data
- What entities exist?
- What relationships exist?
- What changes the database?
Operations
- Where will this run?
- What environments exist?
- What does rollback look like?
Step 2: Ask Claude to turn the rough idea into a structured brief
Prompt:
I want to build [APP NAME].
Turn this into a complete product brief.
Include:
1. problem statement
2. user personas
3. user flow
4. admin flow
5. authentication model
6. authorization model
7. UX requirements
8. search/filter/pagination needs
9. validation rules
10. feature list grouped into MVP / v2 / later
11. non-functional requirements
12. security concerns
13. deployment assumptions
14. open questions
Do not write code yet.
Be strict and identify what is missing.
This is discovery, not implementation.
Part 5 — If the project already exists, audit the repository first
Your note said:
CLAUDE – READ my all of code without missing each file
That is exactly the right instinct.
But make it systematic.
The correct repo-audit method
Do not simply say “read the codebase.”
Instead:
- generate a tracked file list
- exclude generated/build/vendor files
- ask Claude to reconcile its audit against the file list
- force a summary of:
- working features
- broken features
- unknown features
- missing tests
- missing validation
- missing lint/type rules
- service ownership confusion
Claude Code is explicitly designed to search files, understand codebases, and follow project instructions from CLAUDE.md; Claude docs also recommend using concise persistent instructions and path-specific guidance where necessary. (Claude)
Strong audit prompt for Claude
Audit this repository completely.
Rules:
- Start from the tracked source-file inventory.
- Ignore generated/build/vendor files unless referenced by source code.
- Account for every important source file.
- Produce:
1. top-level architecture summary
2. directory layout
3. service-by-service responsibility map
4. feature inventory
5. which features appear working
6. which features appear broken
7. which features are unknown or unverified
8. missing test coverage
9. missing validation
10. missing lint/type enforcement
11. security risks
12. API or schema mismatches
13. priority-ranked recommendations
At the end include:
- total reviewed files
- excluded files
- not reviewed files
- confidence level by service
Part 6 — Write CLAUDE.md the right way
Claude’s official docs say CLAUDE.md should contain instructions that matter every session, stay concise, and be used for rules Claude cannot infer from the code alone. They also support placing CLAUDE.md files at multiple levels, where more specific local guidance takes precedence. (Claude)
What belongs in CLAUDE.md
Put in:
- repo purpose
- service boundaries
- coding conventions
- commands to run
- validation expectations
- test expectations
- review checklist
- “do not” rules
- definition of done
Do not put in:
- giant architecture essays
- temporary ticket notes
- long one-off procedures
- duplicated information from README unless it changes behavior
Root CLAUDE.md template
# Project: MyHospital
## Purpose
Multi-service hospital platform.
## Repo layout
- svc1: auth and identity
- svc2: patient and appointment flows
- svc3: billing and payments
- svc4: admin, analytics, reporting
## Global rules
- Do not change public contracts without updating SPEC.md first.
- Prefer small, reviewable changes.
- Preserve backward compatibility unless SPEC.md explicitly allows breaking change.
- Every code change must include validation, tests, and lint/type correctness.
- Do not bypass security checks for convenience.
## Commands
- install: [command]
- dev: [command]
- test: [command]
- lint: [command]
- typecheck: [command]
- integration-test: [command]
## Security rules
- All protected endpoints require server-side authorization checks.
- Never trust client-provided roles.
- Validate all input at the request boundary.
- Sensitive data must not be logged.
## Output format for implementation tasks
For each task, report:
1. files changed
2. what changed
3. tests added or updated
4. risk introduced
5. follow-up work
## Definition of done
A task is done only if:
- implementation is complete
- validation is enforced
- tests pass
- lint/typecheck pass
- affected user/admin flows still work
- SPEC.md remains accurate
Service-level CLAUDE.md
For svc1/CLAUDE.md:
# svc1: auth and identity
## Responsibility
Authentication, authorization, session handling, roles, permissions.
## Must not break
- login
- logout
- session renewal
- password reset
- role enforcement
- unauthorized access handling
## Required tests
- auth success
- auth failure
- invalid credentials
- expired token
- insufficient permissions
- session timeout behavior
## Required validation
- email
- password policy
- token format
- permission checks
For svc2/CLAUDE.md:
# svc2: patient and appointment workflows
## Responsibility
Patient search, appointment booking, rescheduling, cancellation, provider assignment.
## Must not break
- patient search
- appointment creation
- reschedule flow
- cancellation flow
- pagination and filtering
- validation feedback
## Required validations
- required fields
- date/time validity
- provider availability
- duplicate booking prevention
Part 7 — Write AGENTS.md for Codex
OpenAI’s docs position AGENTS.md as the durable place for Codex repo instructions, including repo layout, commands, constraints, and what “done” means. They also note that /init can scaffold a starter AGENTS.md, but that you should refine it to match real team workflows. (OpenAI Developers)
AGENTS.md template
# AGENTS.md
## Repo overview
MyHospital is a multi-service application with service boundaries that must be preserved.
## Services
- svc1: auth and authorization
- svc2: appointments and patient flows
- svc3: billing and invoicing
- svc4: admin dashboards and reporting
## Commands
- install: [command]
- dev: [command]
- unit-tests: [command]
- integration-tests: [command]
- lint: [command]
- typecheck: [command]
## Engineering conventions
- Prefer minimal diffs.
- Reuse existing patterns before introducing new abstractions.
- Do not rename files or functions unless necessary.
- Keep API changes backward compatible unless SPEC.md says otherwise.
## Constraints
- Do not change infra or deployment manifests unless task explicitly asks.
- Do not touch unrelated services.
- Do not skip tests.
- Do not leave TODOs in completed work unless listed in follow-up.
## Validation rules
- Implement request validation and business-rule validation.
- Add negative test cases for invalid input and permission failures.
## Done means
- code compiles
- tests pass
- lint/typecheck pass
- changed behavior is verified
- risk is stated clearly
Part 8 — Create SPEC.md before you write code
This is the center of the whole workflow.
If SPEC.md is weak, the whole build will drift.
What SPEC.md must contain
# SPEC: [Feature Name]
## Objective
What are we building and why?
## Scope
Included:
Excluded:
## Users
- user
- admin
- support/internal ops
## Functional requirements
1. Authentication
2. Authorization
3. User flow
4. Admin flow
5. Search/filter/pagination
6. Validation
7. Feature 1
8. Feature 2
9. Feature 3
## UX requirements
- loading states
- empty states
- error states
- unauthorized states
- form validation states
## Data model impact
- tables
- fields
- migrations
- compatibility concerns
## API/contracts
- endpoints
- request schema
- response schema
- failure modes
## Security
- role rules
- audit logging
- sensitive data handling
## Test plan
- unit
- integration
- end-to-end
- negative cases
- regression cases
## Rollout plan
- dev
- staging
- production
## Rollback plan
- revert method
- migration reversal or mitigation
- feature flag fallback
## Definition of done
- implementation complete
- tests passing
- lint/typecheck passing
- staging validated
- docs updated
Part 9 — The approval gate: “Claude yes/no”
This part from your notes is gold.
After SPEC.md is written, stop.
Do not code yet.
Ask Claude for a hard gate.
Prompt
Review SPEC.md and repository context.
Answer only in this format:
READY: yes/no
If no, list only blockers.
Check for:
- missing requirements
- missing edge cases
- missing validation rules
- missing security rules
- missing error states
- missing test cases
- unclear service ownership
- missing deployment or rollback concerns
If Claude says READY: no, fix the blockers first.
That one step prevents a huge amount of bad implementation work.
Part 10 — Break implementation into slices
This is where Codex becomes powerful.
OpenAI’s prompting guidance says Codex performs better when tasks are small, focused, and verifiable, and when prompts include reproduction steps, validation steps, and lint/pre-commit checks. (OpenAI Developers)
Never do this
Build the whole MyHospital app end to end.
Always do this instead
Break the work into slices like:
- auth foundation
- permission model
- appointment data model
- appointment search/filter/pagination
- booking UX
- admin approval UX
- billing integration
- tests
- observability and release prep
Then implement one slice at a time.
Example slice prompt for Codex
Implement SPEC section: Appointment Search in svc2.
Scope:
- touch only svc2 and shared contracts if needed
- do not modify svc1 auth logic
- do not modify billing
- preserve existing API style and patterns
Requirements:
- search by patient name, doctor, status, date range
- pagination
- sorting
- input validation
- unauthorized access checks
- empty state handling
- error handling
Required verification:
- add or update unit tests
- add integration tests for search and pagination
- run lint
- run typecheck
- run affected test suites
Final output must include:
1. files changed
2. commands run
3. test results
4. known risks
5. follow-up items
That is the level of prompt quality that makes Codex reliable.
Part 11 — The right implementation loop
Here is the exact loop you should run.
Loop A — Claude prepares the work
Claude:
- understands the repo
- maps the target service
- drafts or refines the spec
- identifies risks and missing items
Loop B — Codex implements the slice
Codex:
- edits the right files
- adds tests
- runs commands
- reports results
Loop C — Claude reviews the slice
Claude:
- compares implementation against
SPEC.md - checks whether validation is real
- checks whether user and admin flows still work
- checks for missing edge cases
- checks for architecture drift
Repeat until the release is complete.
Part 12 — What “missing” really means
You wrote:
What is missing?
test cases
validation
lining
which feature is working
which feature is not
I assume “lining” means linting.
This should become a standing checklist.
The Production Readiness Checklist
For every feature, always ask:
1. Is the feature functionally implemented?
Not “did code change?”
But “can the user complete the flow?”
2. Is validation complete?
At least two levels:
- request or schema validation
- business-rule validation
3. Are authorization checks enforced server-side?
Not only hidden in the UI.
4. Are negative cases tested?
Examples:
- invalid input
- expired token
- missing permissions
- duplicate operations
- empty search results
- invalid pagination params
5. Are lint and type checks green?
This is the minimum quality floor.
6. Is the status of the feature known?
Each feature must be labeled:
- working
- partial
- broken
- unverified
That is how adults ship software.
Part 13 — Create RELEASE_READINESS.md
Use this before staging and before production.
# RELEASE_READINESS
## Release name
[release name]
## Working – feature A – feature B – feature C ## Partial – feature D: works except [case] ## Broken – feature E: blocked by [reason] ## Unverified – feature F ## Validation status – request validation: complete / partial / missing – business validation: complete / partial / missing ## Test status – unit: [status] – integration: [status] – e2e: [status] ## Quality checks – lint: pass/fail – typecheck: pass/fail – security review: done/pending ## Known risks – high – medium – low ## Deployment blockers – [list]
Part 14 — Testing strategy that actually works
Do not wait until the end for testing.
Testing should exist at every layer.
Unit tests
Use for:
- pure logic
- validators
- transformers
- helpers
- permission rules
Integration tests
Use for:
- API routes
- DB interactions
- service contracts
- auth middleware
- pagination and filtering behavior
End-to-end tests
Use for:
- login flow
- admin approval flow
- key user journey
- one or two critical revenue or compliance flows
Negative tests
These are mandatory:
- malformed requests
- unauthorized access
- forbidden actions
- invalid filters
- missing required fields
- duplicate submissions
- bad state transitions
Official Codex guidance specifically recommends asking it to write or update tests, run relevant checks, and review the work rather than stopping at code generation. (OpenAI Developers)
Part 15 — The review phase: Claude as auditor
Once Codex finishes a slice, ask Claude to review it.
Prompt
Review the implementation against SPEC.md.
Classify findings as:
- blocker
- major
- minor
- follow-up
Check:
- feature completeness
- missing edge cases
- missing validation
- missing auth/authz checks
- UX-state coverage
- pagination/search/filter correctness
- schema/API mismatches
- regression risk
- docs drift
This is where Claude shines.
Claude is especially strong at asking:
- “Did you really cover the whole flow?”
- “Does this contradict the spec?”
- “Is this secure?”
- “What did we forget?”
Part 16 — Staging before production
Do not let AI-generated confidence replace release discipline.
Before production:
- deploy to staging
- run smoke tests
- verify login/authz
- verify primary user flow
- verify admin flow
- verify search/filter/pagination with realistic data
- verify migration compatibility
- verify logs and alerts
- verify rollback
Staging checklist
- Does login still work?
- Does role enforcement still work?
- Are old records still readable?
- Does pagination behave at scale?
- Do validation messages make sense?
- Are empty states and error states usable?
- Do audit logs contain what you need?
- Is performance acceptable?
Part 17 — Production deployment
Production should be boring.
If it feels exciting, the process is wrong.
Production release sequence
- freeze the release branch
- confirm CI is green
- confirm migrations
- confirm environment variables
- deploy lowest-risk components first if possible
- run health checks
- smoke test critical journeys
- monitor logs, errors, metrics
- keep rollback ready
The production questions you must answer in writing
- What exactly changed?
- What can fail?
- How will we detect failure?
- How fast can we rollback?
- Does rollback require DB mitigation?
- Which flows must be tested immediately after deploy?
That belongs either in SPEC.md or your deployment runbook.
Part 18 — Post-production hardening
A release is not done when it reaches production.
It is done when the first production learning cycle is complete.
After deploy, ask Claude:
We deployed [release name].
Analyze:
- likely failure modes
- weak validation areas
- missing monitors
- missing alerts
- rollback risks
- assumptions that were not actually verified
Produce a hardening backlog ranked by risk.
This turns production into learning, not panic.
Part 19 — Your 10-hour operating model
You wrote:
1 PM --> 11 PM
Effective - 2 Hours
30 mins Step 1 Requirement
That can become a superb delivery cadence.
A serious 10-hour build day
1:00 PM – 1:30 PM
Requirements and scope lock
Output:
- feature brief
- user flow
- admin flow
- auth/authz rules
- search/filter/pagination rules
- validation rules
- MVP scope
1:30 PM – 2:15 PM
Claude repo audit or architecture design
Output:
- repo/service map
- risk map
- identified gaps
- what already exists
- what must not break
2:15 PM – 3:00 PM
Write control docs
Output:
CLAUDE.mdAGENTS.mdSPEC.md
3:00 PM – 3:15 PM
Claude readiness gate
Output:
READY: yes/no- blocker list if no
3:15 PM – 6:00 PM
Codex implements slices 1–3
Examples:
- auth improvements
- data model
- first API or first UI flow
6:00 PM – 7:00 PM
Claude review and gap fixing
Output:
- missing tests
- missing validation
- broken user/admin flows
- major risks
7:00 PM – 9:00 PM
Codex implements slices 4–6
Examples:
- pagination/filtering
- admin flow
- edge-case handling
- tests and lint/type cleanup
9:00 PM – 10:00 PM
Staging prep and release-readiness report
Output:
RELEASE_READINESS.md- known risks
- blockers
- staging checklist
10:00 PM – 11:00 PM
Staging validation and production decision
Output:
- go / no-go
- rollback plan
- post-deploy watchlist
Your “effective 2 hours” are the decisions:
- scope
- tradeoffs
- final review
- go/no-go
That is exactly where humans should spend time.
Part 20 — The single biggest mistake to avoid
Do not run Claude and Codex live on the same files at the same time without isolation.
Codex guidance explicitly warns against running live threads on the same files without using git worktrees. (OpenAI Developers)
So use one of these models:
Safe model A
Claude plans, Codex edits, Claude reviews after
Safe model B
Claude works in one branch/worktree, Codex in another
Safe model C
Claude only reads/reviews, Codex only writes
This one rule will save you a lot of pain.
Part 21 — The best prompt library
Prompt 1 — Claude requirement discovery
You are the product architect for this application.
Turn my rough idea into a production-grade feature brief.
Be strict.
Do not write code.
Include:
- problem
- users
- user flow
- admin flow
- auth/authz
- UX states
- validation
- search/filter/pagination
- API assumptions
- data model assumptions
- edge cases
- security concerns
- rollout and rollback concerns
- what is missing
Prompt 2 — Claude repo audit
Audit this repository completely.
I want:
- architecture summary
- directory and service map
- feature inventory
- working vs broken vs unverified features
- missing test cases
- missing validation
- lint/type issues
- risky modules
- service ownership confusion
- recommended next actions
Prompt 3 — Claude writes SPEC.md
Write SPEC.md for this feature.
Use these sections:
- objective
- scope
- users
- functional requirements
- UX requirements
- validation
- security
- API/contracts
- data model impact
- test plan
- rollout plan
- rollback plan
- definition of done
Prompt 4 — Claude readiness gate
Review SPEC.md and answer only:
READY: yes/no
If no, list blockers only.
Prompt 5 — Codex implementation prompt
Implement the approved SPEC section for [feature].
Scope:
- allowed directories: [...]
- forbidden directories: [...]
Requirements:
- [...]
- [...]
- [...]
Verification:
- add/update tests
- run lint
- run typecheck
- run affected tests
Final report:
1. files changed
2. commands run
3. test results
4. known risks
5. follow-up items
Prompt 6 — Claude review prompt
Review this implementation against SPEC.md.
Classify issues as:
- blocker
- major
- minor
- follow-up
Check:
- completeness
- validation
- auth/authz
- search/filter/pagination
- user flow
- admin flow
- regression risk
- docs drift
Part 22 — How this applies specifically to Myhospital
Your directory layout example was:
Myhospital
svc1
svc2
svc2
svc3
I assume one svc2 is a typo. A better normalized version:
Myhospital/
svc1-auth/
svc2-patients/
svc3-appointments/
svc4-billing/
svc5-admin/
If renaming is not possible, keep the existing names but define ownership clearly in docs.
A clean ownership model
svc1
- authentication
- authorization
- sessions
- roles
svc2
- patient records
- patient search
svc3
- appointments
- schedules
- rescheduling
- cancellation
svc4
- billing
- invoices
- payments
svc5
- admin dashboards
- reports
- audit tools
Now every feature can be mapped clearly.
Part 23 — The golden rules for production-grade AI development
Here are the rules I would insist on for any serious team:
Rule 1
No implementation before SPEC.md.
Rule 2
No “done” without tests, validation, lint, and type checks.
Rule 3
No production deploy without a rollback plan.
Rule 4
No large prompts for large features.
Split into slices.
Rule 5
Claude reviews Codex output before acceptance.
Rule 6
Every feature must be labeled:
- working
- partial
- broken
- unverified
Rule 7
Never trust UI-only validation or UI-only permissions.
Rule 8
Use persistent repo instructions.
Claude docs recommend CLAUDE.md; Codex docs recommend AGENTS.md. (Claude)
Rule 9
Keep those instruction files concise and practical.
Both Claude and Codex docs emphasize that durable guidance should stay short, specific, and based on real repeated friction rather than vague rule dumps. (Claude)
Rule 10
Don’t automate a workflow until it works manually.
Codex best practices explicitly recommend turning repeated stable work into skills first and automating only once the process is predictable. (OpenAI Developers)
Final takeaway
The best way to build applications with Claude and Codex is not:
“Ask both to build the app.”
It is:
- Claude turns ambiguity into a clear plan
- Claude audits the codebase and identifies gaps
- You write durable repo guidance
- You lock the feature contract in
SPEC.md - Claude gives a strict readiness gate
- Codex implements one approved slice at a time
- Codex runs tests, lint, and validation
- Claude audits the implementation
- You produce a release-readiness report
- You deploy through staging to production with rollback ready
- Claude creates the hardening backlog after release