How to Build an Application Using Claude and Codex

Uncategorized

What you are really building

You are not just building an app.

You are building a repeatable engineering system where:

  • requirements are clarified before code
  • architecture is documented before implementation
  • agents work from written contracts instead of fuzzy memory
  • implementation is done in slices
  • every slice is tested and reviewed
  • deployment is a controlled release, not a gamble

This tutorial gives you that system.


Part 1 — The right mental model

Why most AI-assisted app builds fail

Most teams fail because they do one of these:

  • ask one agent to “build everything”
  • skip requirements and jump to code
  • let the agent modify too many services at once
  • don’t define what “done” means
  • don’t force tests, validation, linting, and review
  • deploy without a staging gate
  • assume “code generated” means “production ready”

That creates fake speed.

Real speed comes from this sequence:

  1. define
  2. constrain
  3. implement
  4. verify
  5. deploy safely
  6. harden

Part 2 — Divide responsibilities between Claude and Codex

Use Claude for these jobs

Use Claude when you need:

  • idea clarification
  • product requirement discovery
  • architecture planning
  • user flow and admin flow design
  • repo audit and codebase understanding
  • gap analysis
  • “what is missing?”
  • writing CLAUDE.md
  • writing SPEC.md
  • reviewing whether a feature is actually complete
  • finding what is broken, risky, missing, or inconsistent

Claude Code is explicitly built around an agentic loop of gathering context, taking action, and verifying results, and it reads CLAUDE.md at the start of sessions as persistent project guidance. (Claude)

Use Codex for these jobs

Use Codex when you need:

  • implementing a clearly-scoped slice
  • editing multiple files safely
  • generating tests
  • running lint/typecheck/test commands
  • validating acceptance criteria
  • reviewing diffs
  • repeating a stable workflow with consistency

Official Codex guidance emphasizes durable repo instructions in AGENTS.md, giving verification steps, running lint and pre-commit checks, and breaking complex work into smaller focused tasks. (OpenAI Developers)

The simplest split

Use this rule:

  • Claude decides what should happen
  • Codex makes it happen
  • Claude checks whether it happened correctly

That is the cleanest operating model.


Part 3 — The file system of control

For a serious project, do not rely only on chat history.

Create these files in the repository root:

Myhospital/
  CLAUDE.md
  AGENTS.md
  SPEC.md
  RELEASE_READINESS.md
  svc1/
    CLAUDE.md
  svc2/
    CLAUDE.md
  svc3/
    CLAUDE.md
  svc4/
    CLAUDE.md

What each file does

SPEC.md

This is the contract.

It defines:

  • what is being built
  • why it matters
  • who uses it
  • what flows must work
  • acceptance criteria
  • test plan
  • rollout plan
  • rollback plan

CLAUDE.md

This is Claude’s persistent repo memory that you write.

Official Claude Code docs say CLAUDE.md is loaded at the start of every session and should contain things Claude cannot infer on its own, such as bash commands, code style, workflow rules, architecture decisions, and review expectations. They also recommend keeping it concise and using more specific files deeper in the directory tree when needed. (Claude)

AGENTS.md

This is Codex’s durable repo guide.

OpenAI’s Codex docs recommend using AGENTS.md as the place to encode repo layout, run instructions, build/test/lint commands, engineering conventions, constraints, and what “done” means. (OpenAI Developers)

RELEASE_READINESS.md

This is the truth document before deployment.

It states:

  • what works
  • what does not work
  • what is partial
  • what was tested
  • remaining risks
  • release blockers

This file is not mandated by the vendors. It is a high-value engineering practice.


Part 4 — From idea initiation to a real requirement set

This is where your original notes are already strong.

You wrote:

  • Authentication
  • Authorisation
  • User Flow
  • Admin Flow
  • UX Spec
  • Search / Filter / Pagination
  • Validation
  • Feature 1
  • Feature 2
  • Feature 3

That is exactly the right backbone.

Step 1: Turn the idea into a requirement brief

Before any code, answer these:

Product

  • What problem are we solving?
  • Who is the primary user?
  • What outcome should improve?

Business

  • What does success look like?
  • What is MVP?
  • What is explicitly out of scope?

Security

  • How do users authenticate?
  • How are permissions enforced?
  • What data is sensitive?

UX

  • What is the primary user journey?
  • What is the admin journey?
  • What are the loading, empty, validation, and error states?

Data

  • What entities exist?
  • What relationships exist?
  • What changes the database?

Operations

  • Where will this run?
  • What environments exist?
  • What does rollback look like?

Step 2: Ask Claude to turn the rough idea into a structured brief

Prompt:

I want to build [APP NAME].

Turn this into a complete product brief.

Include:
1. problem statement
2. user personas
3. user flow
4. admin flow
5. authentication model
6. authorization model
7. UX requirements
8. search/filter/pagination needs
9. validation rules
10. feature list grouped into MVP / v2 / later
11. non-functional requirements
12. security concerns
13. deployment assumptions
14. open questions

Do not write code yet.
Be strict and identify what is missing.

This is discovery, not implementation.


Part 5 — If the project already exists, audit the repository first

Your note said:

CLAUDE – READ my all of code without missing each file

That is exactly the right instinct.

But make it systematic.

The correct repo-audit method

Do not simply say “read the codebase.”

Instead:

  1. generate a tracked file list
  2. exclude generated/build/vendor files
  3. ask Claude to reconcile its audit against the file list
  4. force a summary of:
    • working features
    • broken features
    • unknown features
    • missing tests
    • missing validation
    • missing lint/type rules
    • service ownership confusion

Claude Code is explicitly designed to search files, understand codebases, and follow project instructions from CLAUDE.md; Claude docs also recommend using concise persistent instructions and path-specific guidance where necessary. (Claude)

Strong audit prompt for Claude

Audit this repository completely.

Rules:
- Start from the tracked source-file inventory.
- Ignore generated/build/vendor files unless referenced by source code.
- Account for every important source file.
- Produce:

1. top-level architecture summary
2. directory layout
3. service-by-service responsibility map
4. feature inventory
5. which features appear working
6. which features appear broken
7. which features are unknown or unverified
8. missing test coverage
9. missing validation
10. missing lint/type enforcement
11. security risks
12. API or schema mismatches
13. priority-ranked recommendations

At the end include:
- total reviewed files
- excluded files
- not reviewed files
- confidence level by service

Part 6 — Write CLAUDE.md the right way

Claude’s official docs say CLAUDE.md should contain instructions that matter every session, stay concise, and be used for rules Claude cannot infer from the code alone. They also support placing CLAUDE.md files at multiple levels, where more specific local guidance takes precedence. (Claude)

What belongs in CLAUDE.md

Put in:

  • repo purpose
  • service boundaries
  • coding conventions
  • commands to run
  • validation expectations
  • test expectations
  • review checklist
  • “do not” rules
  • definition of done

Do not put in:

  • giant architecture essays
  • temporary ticket notes
  • long one-off procedures
  • duplicated information from README unless it changes behavior

Root CLAUDE.md template

# Project: MyHospital

## Purpose
Multi-service hospital platform.

## Repo layout
- svc1: auth and identity
- svc2: patient and appointment flows
- svc3: billing and payments
- svc4: admin, analytics, reporting

## Global rules
- Do not change public contracts without updating SPEC.md first.
- Prefer small, reviewable changes.
- Preserve backward compatibility unless SPEC.md explicitly allows breaking change.
- Every code change must include validation, tests, and lint/type correctness.
- Do not bypass security checks for convenience.

## Commands
- install: [command]
- dev: [command]
- test: [command]
- lint: [command]
- typecheck: [command]
- integration-test: [command]

## Security rules
- All protected endpoints require server-side authorization checks.
- Never trust client-provided roles.
- Validate all input at the request boundary.
- Sensitive data must not be logged.

## Output format for implementation tasks
For each task, report:
1. files changed
2. what changed
3. tests added or updated
4. risk introduced
5. follow-up work

## Definition of done
A task is done only if:
- implementation is complete
- validation is enforced
- tests pass
- lint/typecheck pass
- affected user/admin flows still work
- SPEC.md remains accurate

Service-level CLAUDE.md

For svc1/CLAUDE.md:

# svc1: auth and identity

## Responsibility
Authentication, authorization, session handling, roles, permissions.

## Must not break
- login
- logout
- session renewal
- password reset
- role enforcement
- unauthorized access handling

## Required tests
- auth success
- auth failure
- invalid credentials
- expired token
- insufficient permissions
- session timeout behavior

## Required validation
- email
- password policy
- token format
- permission checks

For svc2/CLAUDE.md:

# svc2: patient and appointment workflows

## Responsibility
Patient search, appointment booking, rescheduling, cancellation, provider assignment.

## Must not break
- patient search
- appointment creation
- reschedule flow
- cancellation flow
- pagination and filtering
- validation feedback

## Required validations
- required fields
- date/time validity
- provider availability
- duplicate booking prevention

Part 7 — Write AGENTS.md for Codex

OpenAI’s docs position AGENTS.md as the durable place for Codex repo instructions, including repo layout, commands, constraints, and what “done” means. They also note that /init can scaffold a starter AGENTS.md, but that you should refine it to match real team workflows. (OpenAI Developers)

AGENTS.md template

# AGENTS.md

## Repo overview
MyHospital is a multi-service application with service boundaries that must be preserved.

## Services
- svc1: auth and authorization
- svc2: appointments and patient flows
- svc3: billing and invoicing
- svc4: admin dashboards and reporting

## Commands
- install: [command]
- dev: [command]
- unit-tests: [command]
- integration-tests: [command]
- lint: [command]
- typecheck: [command]

## Engineering conventions
- Prefer minimal diffs.
- Reuse existing patterns before introducing new abstractions.
- Do not rename files or functions unless necessary.
- Keep API changes backward compatible unless SPEC.md says otherwise.

## Constraints
- Do not change infra or deployment manifests unless task explicitly asks.
- Do not touch unrelated services.
- Do not skip tests.
- Do not leave TODOs in completed work unless listed in follow-up.

## Validation rules
- Implement request validation and business-rule validation.
- Add negative test cases for invalid input and permission failures.

## Done means
- code compiles
- tests pass
- lint/typecheck pass
- changed behavior is verified
- risk is stated clearly

Part 8 — Create SPEC.md before you write code

This is the center of the whole workflow.

If SPEC.md is weak, the whole build will drift.

What SPEC.md must contain

# SPEC: [Feature Name]

## Objective
What are we building and why?

## Scope
Included:
Excluded:

## Users
- user
- admin
- support/internal ops

## Functional requirements
1. Authentication
2. Authorization
3. User flow
4. Admin flow
5. Search/filter/pagination
6. Validation
7. Feature 1
8. Feature 2
9. Feature 3

## UX requirements
- loading states
- empty states
- error states
- unauthorized states
- form validation states

## Data model impact
- tables
- fields
- migrations
- compatibility concerns

## API/contracts
- endpoints
- request schema
- response schema
- failure modes

## Security
- role rules
- audit logging
- sensitive data handling

## Test plan
- unit
- integration
- end-to-end
- negative cases
- regression cases

## Rollout plan
- dev
- staging
- production

## Rollback plan
- revert method
- migration reversal or mitigation
- feature flag fallback

## Definition of done
- implementation complete
- tests passing
- lint/typecheck passing
- staging validated
- docs updated

Part 9 — The approval gate: “Claude yes/no”

This part from your notes is gold.

After SPEC.md is written, stop.

Do not code yet.

Ask Claude for a hard gate.

Prompt

Review SPEC.md and repository context.

Answer only in this format:

READY: yes/no

If no, list only blockers.

Check for:
- missing requirements
- missing edge cases
- missing validation rules
- missing security rules
- missing error states
- missing test cases
- unclear service ownership
- missing deployment or rollback concerns

If Claude says READY: no, fix the blockers first.

That one step prevents a huge amount of bad implementation work.


Part 10 — Break implementation into slices

This is where Codex becomes powerful.

OpenAI’s prompting guidance says Codex performs better when tasks are small, focused, and verifiable, and when prompts include reproduction steps, validation steps, and lint/pre-commit checks. (OpenAI Developers)

Never do this

Build the whole MyHospital app end to end.

Always do this instead

Break the work into slices like:

  1. auth foundation
  2. permission model
  3. appointment data model
  4. appointment search/filter/pagination
  5. booking UX
  6. admin approval UX
  7. billing integration
  8. tests
  9. observability and release prep

Then implement one slice at a time.

Example slice prompt for Codex

Implement SPEC section: Appointment Search in svc2.

Scope:
- touch only svc2 and shared contracts if needed
- do not modify svc1 auth logic
- do not modify billing
- preserve existing API style and patterns

Requirements:
- search by patient name, doctor, status, date range
- pagination
- sorting
- input validation
- unauthorized access checks
- empty state handling
- error handling

Required verification:
- add or update unit tests
- add integration tests for search and pagination
- run lint
- run typecheck
- run affected test suites

Final output must include:
1. files changed
2. commands run
3. test results
4. known risks
5. follow-up items

That is the level of prompt quality that makes Codex reliable.


Part 11 — The right implementation loop

Here is the exact loop you should run.

Loop A — Claude prepares the work

Claude:

  • understands the repo
  • maps the target service
  • drafts or refines the spec
  • identifies risks and missing items

Loop B — Codex implements the slice

Codex:

  • edits the right files
  • adds tests
  • runs commands
  • reports results

Loop C — Claude reviews the slice

Claude:

  • compares implementation against SPEC.md
  • checks whether validation is real
  • checks whether user and admin flows still work
  • checks for missing edge cases
  • checks for architecture drift

Repeat until the release is complete.


Part 12 — What “missing” really means

You wrote:

What is missing?
test cases
validation
lining
which feature is working
which feature is not

I assume “lining” means linting.

This should become a standing checklist.

The Production Readiness Checklist

For every feature, always ask:

1. Is the feature functionally implemented?

Not “did code change?”
But “can the user complete the flow?”

2. Is validation complete?

At least two levels:

  • request or schema validation
  • business-rule validation

3. Are authorization checks enforced server-side?

Not only hidden in the UI.

4. Are negative cases tested?

Examples:

  • invalid input
  • expired token
  • missing permissions
  • duplicate operations
  • empty search results
  • invalid pagination params

5. Are lint and type checks green?

This is the minimum quality floor.

6. Is the status of the feature known?

Each feature must be labeled:

  • working
  • partial
  • broken
  • unverified

That is how adults ship software.


Part 13 — Create RELEASE_READINESS.md

Use this before staging and before production.

# RELEASE_READINESS

## Release name

[release name]

## Working – feature A – feature B – feature C ## Partial – feature D: works except [case] ## Broken – feature E: blocked by [reason] ## Unverified – feature F ## Validation status – request validation: complete / partial / missing – business validation: complete / partial / missing ## Test status – unit: [status] – integration: [status] – e2e: [status] ## Quality checks – lint: pass/fail – typecheck: pass/fail – security review: done/pending ## Known risks – high – medium – low ## Deployment blockers – [list]


Part 14 — Testing strategy that actually works

Do not wait until the end for testing.

Testing should exist at every layer.

Unit tests

Use for:

  • pure logic
  • validators
  • transformers
  • helpers
  • permission rules

Integration tests

Use for:

  • API routes
  • DB interactions
  • service contracts
  • auth middleware
  • pagination and filtering behavior

End-to-end tests

Use for:

  • login flow
  • admin approval flow
  • key user journey
  • one or two critical revenue or compliance flows

Negative tests

These are mandatory:

  • malformed requests
  • unauthorized access
  • forbidden actions
  • invalid filters
  • missing required fields
  • duplicate submissions
  • bad state transitions

Official Codex guidance specifically recommends asking it to write or update tests, run relevant checks, and review the work rather than stopping at code generation. (OpenAI Developers)


Part 15 — The review phase: Claude as auditor

Once Codex finishes a slice, ask Claude to review it.

Prompt

Review the implementation against SPEC.md.

Classify findings as:
- blocker
- major
- minor
- follow-up

Check:
- feature completeness
- missing edge cases
- missing validation
- missing auth/authz checks
- UX-state coverage
- pagination/search/filter correctness
- schema/API mismatches
- regression risk
- docs drift

This is where Claude shines.

Claude is especially strong at asking:

  • “Did you really cover the whole flow?”
  • “Does this contradict the spec?”
  • “Is this secure?”
  • “What did we forget?”

Part 16 — Staging before production

Do not let AI-generated confidence replace release discipline.

Before production:

  1. deploy to staging
  2. run smoke tests
  3. verify login/authz
  4. verify primary user flow
  5. verify admin flow
  6. verify search/filter/pagination with realistic data
  7. verify migration compatibility
  8. verify logs and alerts
  9. verify rollback

Staging checklist

  • Does login still work?
  • Does role enforcement still work?
  • Are old records still readable?
  • Does pagination behave at scale?
  • Do validation messages make sense?
  • Are empty states and error states usable?
  • Do audit logs contain what you need?
  • Is performance acceptable?

Part 17 — Production deployment

Production should be boring.

If it feels exciting, the process is wrong.

Production release sequence

  1. freeze the release branch
  2. confirm CI is green
  3. confirm migrations
  4. confirm environment variables
  5. deploy lowest-risk components first if possible
  6. run health checks
  7. smoke test critical journeys
  8. monitor logs, errors, metrics
  9. keep rollback ready

The production questions you must answer in writing

  • What exactly changed?
  • What can fail?
  • How will we detect failure?
  • How fast can we rollback?
  • Does rollback require DB mitigation?
  • Which flows must be tested immediately after deploy?

That belongs either in SPEC.md or your deployment runbook.


Part 18 — Post-production hardening

A release is not done when it reaches production.

It is done when the first production learning cycle is complete.

After deploy, ask Claude:

We deployed [release name].

Analyze:
- likely failure modes
- weak validation areas
- missing monitors
- missing alerts
- rollback risks
- assumptions that were not actually verified

Produce a hardening backlog ranked by risk.

This turns production into learning, not panic.


Part 19 — Your 10-hour operating model

You wrote:

1 PM --> 11 PM
Effective - 2 Hours
30 mins Step 1 Requirement

That can become a superb delivery cadence.

A serious 10-hour build day

1:00 PM – 1:30 PM

Requirements and scope lock

Output:

  • feature brief
  • user flow
  • admin flow
  • auth/authz rules
  • search/filter/pagination rules
  • validation rules
  • MVP scope

1:30 PM – 2:15 PM

Claude repo audit or architecture design

Output:

  • repo/service map
  • risk map
  • identified gaps
  • what already exists
  • what must not break

2:15 PM – 3:00 PM

Write control docs

Output:

  • CLAUDE.md
  • AGENTS.md
  • SPEC.md

3:00 PM – 3:15 PM

Claude readiness gate

Output:

  • READY: yes/no
  • blocker list if no

3:15 PM – 6:00 PM

Codex implements slices 1–3

Examples:

  • auth improvements
  • data model
  • first API or first UI flow

6:00 PM – 7:00 PM

Claude review and gap fixing

Output:

  • missing tests
  • missing validation
  • broken user/admin flows
  • major risks

7:00 PM – 9:00 PM

Codex implements slices 4–6

Examples:

  • pagination/filtering
  • admin flow
  • edge-case handling
  • tests and lint/type cleanup

9:00 PM – 10:00 PM

Staging prep and release-readiness report

Output:

  • RELEASE_READINESS.md
  • known risks
  • blockers
  • staging checklist

10:00 PM – 11:00 PM

Staging validation and production decision

Output:

  • go / no-go
  • rollback plan
  • post-deploy watchlist

Your “effective 2 hours” are the decisions:

  • scope
  • tradeoffs
  • final review
  • go/no-go

That is exactly where humans should spend time.


Part 20 — The single biggest mistake to avoid

Do not run Claude and Codex live on the same files at the same time without isolation.

Codex guidance explicitly warns against running live threads on the same files without using git worktrees. (OpenAI Developers)

So use one of these models:

Safe model A

Claude plans, Codex edits, Claude reviews after

Safe model B

Claude works in one branch/worktree, Codex in another

Safe model C

Claude only reads/reviews, Codex only writes

This one rule will save you a lot of pain.


Part 21 — The best prompt library

Prompt 1 — Claude requirement discovery

You are the product architect for this application.

Turn my rough idea into a production-grade feature brief.
Be strict.
Do not write code.

Include:
- problem
- users
- user flow
- admin flow
- auth/authz
- UX states
- validation
- search/filter/pagination
- API assumptions
- data model assumptions
- edge cases
- security concerns
- rollout and rollback concerns
- what is missing

Prompt 2 — Claude repo audit

Audit this repository completely.

I want:
- architecture summary
- directory and service map
- feature inventory
- working vs broken vs unverified features
- missing test cases
- missing validation
- lint/type issues
- risky modules
- service ownership confusion
- recommended next actions

Prompt 3 — Claude writes SPEC.md

Write SPEC.md for this feature.

Use these sections:
- objective
- scope
- users
- functional requirements
- UX requirements
- validation
- security
- API/contracts
- data model impact
- test plan
- rollout plan
- rollback plan
- definition of done

Prompt 4 — Claude readiness gate

Review SPEC.md and answer only:

READY: yes/no

If no, list blockers only.

Prompt 5 — Codex implementation prompt

Implement the approved SPEC section for [feature].

Scope:
- allowed directories: [...]
- forbidden directories: [...]

Requirements:
- [...]
- [...]
- [...]

Verification:
- add/update tests
- run lint
- run typecheck
- run affected tests

Final report:
1. files changed
2. commands run
3. test results
4. known risks
5. follow-up items

Prompt 6 — Claude review prompt

Review this implementation against SPEC.md.

Classify issues as:
- blocker
- major
- minor
- follow-up

Check:
- completeness
- validation
- auth/authz
- search/filter/pagination
- user flow
- admin flow
- regression risk
- docs drift

Part 22 — How this applies specifically to Myhospital

Your directory layout example was:

Myhospital
  svc1
  svc2
  svc2
  svc3

I assume one svc2 is a typo. A better normalized version:

Myhospital/
  svc1-auth/
  svc2-patients/
  svc3-appointments/
  svc4-billing/
  svc5-admin/

If renaming is not possible, keep the existing names but define ownership clearly in docs.

A clean ownership model

svc1

  • authentication
  • authorization
  • sessions
  • roles

svc2

  • patient records
  • patient search

svc3

  • appointments
  • schedules
  • rescheduling
  • cancellation

svc4

  • billing
  • invoices
  • payments

svc5

  • admin dashboards
  • reports
  • audit tools

Now every feature can be mapped clearly.


Part 23 — The golden rules for production-grade AI development

Here are the rules I would insist on for any serious team:

Rule 1

No implementation before SPEC.md.

Rule 2

No “done” without tests, validation, lint, and type checks.

Rule 3

No production deploy without a rollback plan.

Rule 4

No large prompts for large features.
Split into slices.

Rule 5

Claude reviews Codex output before acceptance.

Rule 6

Every feature must be labeled:

  • working
  • partial
  • broken
  • unverified

Rule 7

Never trust UI-only validation or UI-only permissions.

Rule 8

Use persistent repo instructions.
Claude docs recommend CLAUDE.md; Codex docs recommend AGENTS.md. (Claude)

Rule 9

Keep those instruction files concise and practical.
Both Claude and Codex docs emphasize that durable guidance should stay short, specific, and based on real repeated friction rather than vague rule dumps. (Claude)

Rule 10

Don’t automate a workflow until it works manually.
Codex best practices explicitly recommend turning repeated stable work into skills first and automating only once the process is predictable. (OpenAI Developers)


Final takeaway

The best way to build applications with Claude and Codex is not:

“Ask both to build the app.”

It is:

  1. Claude turns ambiguity into a clear plan
  2. Claude audits the codebase and identifies gaps
  3. You write durable repo guidance
  4. You lock the feature contract in SPEC.md
  5. Claude gives a strict readiness gate
  6. Codex implements one approved slice at a time
  7. Codex runs tests, lint, and validation
  8. Claude audits the implementation
  9. You produce a release-readiness report
  10. You deploy through staging to production with rollback ready
  11. Claude creates the hardening backlog after release

Leave a Reply