The Socratic Method With Claude Code

I tracked my debugging sessions over six weeks on a NestJS monorepo. The ones where I let Claude just fix the problem averaged 4 minutes. The ones where I used the Socratic approach — Claude questioning me, or me questioning Claude — averaged 7 minutes. Three minutes slower per session. But by week four, the repeat-bug rate had dropped by roughly 60%. I was hitting the same class of issue (cookie mismatches, missing validation, partial-write consistency) far less often, because I’d actually understood the underlying pattern the first time around.

Most developers use Claude Code the same way: paste a problem, get a solution, move on. It works. But it also turns you into a consumer of answers instead of a builder of understanding. Six months later, when a similar bug appears in a different shape, you’re starting from scratch.

There’s an alternative. Instead of telling Claude Code what to do, you structure your interaction so that Claude asks you questions that guide you toward the answer. You become the learner; Claude becomes the questioner. Or you flip it: you ask Claude probing questions instead of requesting solutions directly.

Both directions work. The key is replacing “do this for me” with a dialogue that builds understanding.

Why This Matters

The difference between a developer who asks AI to fix their bugs and one who uses AI to understand their bugs shows up over time:

You learn the codebase instead of outsourcing comprehension
You catch flawed assumptions before they become flawed code
You retain the knowledge for next time
You make better architectural decisions because you reasoned through them

The fix might take 2 minutes either way. But the understanding compounds.

The Five Methods

Method 1: Instruct Claude to Question You

Add Socratic instructions to your CLAUDE.md or give them inline.

Persistent (in CLAUDE.md):

## Interaction Style

When I ask for help with a bug or feature, do not jump to a solution. Instead:
1. Ask me what I think the root cause is
2. Ask me what I've already tried
3. Ask clarifying questions about my assumptions
4. Guide me toward the answer with progressively more specific questions
5. Only provide the direct answer if I explicitly ask you to stop questioning

When I say "just tell me", switch to direct mode.

Per-conversation:

I have a bug where the auth token isn't being set. Don't fix it for me.
Instead, ask me questions that help me find the root cause myself.

Method 2: You Drive the Questions

Instead of asking Claude to do things, ask it to explain things. Then act on what you learn.

Instead of this	Try this
”Fix the failing test"	"What is this test actually asserting? Walk me through line by line."
"Add a new endpoint"	"What existing endpoint is most similar to what I need? What would I need to change?"
"Why is this slow?"	"What SQL query does this endpoint execute? What indexes exist on that table?"
"Refactor this"	"What are the code smells here? Which one causes the most concrete harm?”

The pattern: ask Claude to explain or analyze, then use that understanding to act yourself.

Method 3: The Rubber Duck Escalation

A structured approach that starts with self-reliance and escalates only as needed:

State the problem out loud to Claude: “I think X is happening because Y.”
Ask Claude to poke holes: “What am I missing? What assumptions am I making?”
Ask for a hint, not a solution: “Point me to the right file or function, but don’t tell me the fix.”
Ask for the smallest useful nudge: “Is the issue in the query or the mapper?”
Only then, if stuck: “OK, show me the fix and explain the reasoning.”

Each step gives you a chance to solve it yourself with progressively less effort.

Method 4: Hypothesis-Driven Debugging

Frame every debugging session as a hypothesis to test:

My hypothesis: the 401 error happens because the cookie isn't being sent
on cross-origin requests.

What evidence would confirm or refute this? What should I check first?

Claude responds with diagnostic questions and checks. You run them. You update your hypothesis. Repeat.

This mirrors how experienced developers actually debug — you’re just making the internal monologue explicit.

Method 5: Architecture by Inquiry

Before building a feature, have Claude question your design:

I'm planning to add a caching layer in front of the job-ad-formats endpoint.
Play devil's advocate. What questions should I be able to answer before I
start writing code?

Claude might ask:

What’s the cache invalidation strategy?
How stale can the data be before it causes problems?
Is the bottleneck actually the database query, or is it somewhere else?
Have you measured the current response time?

If you can’t answer one of these, that’s the thing to investigate first.

Setting a Difficulty Level

Tell Claude how much help you want:

Hard mode: “Only ask questions. Never show code.”
Medium mode: “Ask questions first. Show code snippets only as hints.”
Easy mode: “Ask one clarifying question, then show the solution with explanation.”

When to Use It (And When Not To)

The Socratic method is a learning tool, not a productivity tool. Use it when:

You’re in an unfamiliar part of the codebase
You’re making an architectural decision you’ll live with for months
You keep hitting the same class of bug

Use direct mode when:

You understand the problem and just need implementation speed
The task is mechanical (rename, format, migrate)
You’re under time pressure and the learning can wait

Examples in Practice

These five examples show the Socratic method applied to real development scenarios. Each one demonstrates how questioning leads to deeper understanding than a direct answer would.

Debugging an Auth Bug

Users are getting logged out randomly after deploying a new feature. Instead of asking Claude to fix it, you use the Socratic method to find the root cause yourself.

Turn 1 — State the problem

You

Users are getting logged out randomly after we deployed the org switcher. Don't fix it -- help me find it.

Claude

Let's narrow it down. Two questions: (1) Does this happen to all users, or only users with specific characteristics? (2) What does "randomly" mean -- is it after a specific action, or truly at unpredictable times?

Turn 2 — Narrow the scope

You

It's users with multiple organizations. It happens when they switch orgs.

Claude

OK, so it's tied to the switch action. When a user switches orgs, what does the backend do with their auth state? Walk me through the endpoint.

Turn 3 — Trace the code

You

POST /auth/switch-organization calls clearCookie('auth_token') then setCookie('auth_token', newToken).

Claude

When you clear a cookie and set it again, the browser matches cookies by name and path and domain. If those attributes differ between the clear and the set, the old cookie survives. Are the cookie options identical in both calls?

Turn 4 — Discovery

You

...no. clearCookie uses path: '/' but setCookie uses path: '/api'. So the root-path cookie never gets cleared, and the browser sends the stale one on non-API routes.

Claude

Exactly. Before you fix it, one more question: where are the cookie options defined? Is there a shared config, or are they hardcoded in each call?

Turn 5 — Fix with full context

You

They're hardcoded in two different places. I'll extract them into AUTH_COOKIE_OPTIONS in the config and use it everywhere.

Conversation Flow

flowchart TD
    A["You: Users are getting logged out randomly after the last deploy"] --> B{"Claude asks:\nWhat changed in the last deploy?\nDoes it affect all users or a subset?"}
    B --> C["You: We added the organization switcher.\nIt seems to affect users with multiple orgs."]
    C --> D{"Claude asks:\nWhat happens to the auth token\nwhen a user switches orgs?\nDoes the cookie get rewritten?"}
    D --> E["You: Let me check...\nThe switchOrganization endpoint\ncalls res.clearCookie() then res.setCookie()"]
    E --> F{"Claude asks:\nAre the cookie options identical in both calls?\nWhat if clearCookie uses different options?"}
    F --> G["You: Found it! clearCookie uses '/'\nbut setCookie uses '/api'.\nThe old cookie persists and conflicts."]
    G --> H["You fix it yourself with full understanding"]
    style A fill:#1a1a2e,stroke:#e94560,color:#eee
    style B fill:#16213e,stroke:#0f3460,color:#eee
    style C fill:#1a1a2e,stroke:#e94560,color:#eee
    style D fill:#16213e,stroke:#0f3460,color:#eee
    style E fill:#1a1a2e,stroke:#e94560,color:#eee
    style F fill:#16213e,stroke:#0f3460,color:#eee
    style G fill:#1a1a2e,stroke:#53d769,color:#eee
    style H fill:#0d7377,stroke:#14ffec,color:#eee

What You Learned

mindmap
  root((Auth Bug))
    Cookie mechanics
      Browser matches by name + path + domain
      Mismatched options = zombie cookies
    Codebase insight
      Cookie options were hardcoded in 2 places
      No shared config for auth cookie settings
    Prevention
      Extract shared cookie config
      One source of truth for cookie options
    Debugging pattern
      Trace the exact HTTP headers
      Check attribute consistency

Contrast — direct mode:

If you'd said "fix the logout bug," Claude would have found and fixed the path mismatch. You'd have working code but no understanding of why cookie path matching matters. Next time a cookie bug appears in a different form, you'd start from scratch.

Designing a Caching Layer

You want to add Redis caching to a slow endpoint. Instead of asking Claude to implement it, you ask Claude to challenge your design — and discover you don't need Redis at all.

Turn 1 — Propose a solution

You

I want to add Redis caching in front of the job-ad-formats endpoint. It's returning in 800ms. Help me think through the design -- don't just build it.

Claude

Before we talk about Redis, two questions: (1) What's your target response time? (2) Have you identified where the 800ms is spent? Is it the database query, the mapping logic, or network latency?

Turn 2 — Examine assumptions

You

Target is under 100ms. I haven't profiled it, but the query joins 4 tables.

Claude

Worth profiling first -- if the query itself is 50ms and the rest is mapping, caching the raw query won't help much. But let's assume the query is the bottleneck. How often does the underlying data change?

Turn 3 — Question the tool choice

You

Admins update pricing in the legacy CMS. Maybe once a month.

Claude

Data that changes monthly is a strong candidate for caching. But does it need to be Redis? Redis adds infrastructure, connection management, serialization overhead, and a new failure mode. What are the tradeoffs of in-memory caching here?

Turn 4 — Think through tradeoffs

You

Simpler, but what about multiple API instances? They'd have different caches.

Claude

How many instances do you run? And for this specific data -- pricing tiers that change monthly -- can two instances serve slightly different cached values for a few minutes without causing problems?

Turn 5 — Arrive at a decision

You

Two instances. And yeah, if one instance shows the old price for 5 minutes after an update, nobody would notice. OK, in-memory with a TTL it is.

Decision Tree You Explored

flowchart TD
    A["Endpoint is slow (800ms)"] --> B{Where is the time spent?}
    B -->|Query| C{How often does data change?}
    B -->|Mapping| M["Optimize mapper, not cache"]
    B -->|Network| N["Check connection pooling"]
    C -->|Frequently| D{Consistency requirements?}
    C -->|Rarely| E{Multiple instances?}
    D -->|Strict| F["Redis with invalidation"]
    D -->|Eventual OK| G{Infrastructure budget?}
    E -->|Single| H["In-memory cache + TTL"]
    E -->|Multiple| I{Can instances diverge briefly?}
    I -->|Yes| H
    I -->|No| F
    G -->|Have Redis| F
    G -->|Keep it simple| H
    style A fill:#1a1a2e,stroke:#e94560,color:#eee
    style H fill:#0d7377,stroke:#14ffec,color:#eee
    style F fill:#16213e,stroke:#0f3460,color:#eee
    style M fill:#2d132c,stroke:#ee4540,color:#eee
    style N fill:#2d132c,stroke:#ee4540,color:#eee

What You Learned

mindmap
  root((Caching Decision))
    Profile before caching
      Know where the time goes
      Cache the right layer
    Match tool to problem
      Monthly changes != Redis
      In-memory is infrastructure-free
    Multi-instance awareness
      Staleness tolerance drives the choice
      5-min TTL acceptable for pricing
    Avoided
      Unnecessary Redis dependency
      Over-engineering for the problem size

Contrast — direct mode:

If you'd said "add Redis caching to job-ad-formats," Claude would have implemented a full Redis cache-aside pattern. You'd ship a working but over-engineered solution, adding operational complexity for data that changes once a month.

Understanding Unfamiliar Code

You're new to the project and need to understand how the payment flow works. Instead of asking Claude to explain it all, you explore it through guided questions — and discover a real bug along the way.

Turn 1 — Start from the user's perspective

You

I need to understand the payment flow. Don't explain it to me -- guide me through it with questions. I'll read the code as we go.

Claude

Let's start from the user's side. What action does a user take that initiates a payment? Look at the frontend -- what component contains the payment UI?

Turn 2 — Trace the frontend

You

PaymentStep.tsx has a "Complete Purchase" button. It calls useCompletePayment().

Claude

Good. What does that hook do? Specifically: what HTTP request does it make, and what data does it send?

Turn 3 — Cross the boundary

You

It POSTs to /api/payment/complete with { jobAdId, formatId, paymentMethod }.

Claude

Now follow that request to the backend. Which controller handles it, and what service method does it delegate to? Don't read the whole service yet -- just find the entry point.

Turn 4 — Predict then verify

You

PaymentController.complete() calls PaymentService.completePayment(). The service method is about 60 lines.

Claude

Before reading all 60 lines: based on the method name and what you know about payments, what must this method do? List the essential steps.

Turn 5 — Discover a real issue

You

It does all three (create transaction, link services, update ad status). But... there's no transaction wrapping them. If linking services fails, the transaction record exists but the ad never activates. The user paid but got nothing.

Claude

Now you understand the flow and you've found a real consistency risk. One more question: how would you verify whether this has actually happened in production? What would the data look like?

Mental Model You Built

flowchart LR
    subgraph Frontend
        A["PaymentStep.tsx"] --> B["useCompletePayment()"]
    end
    subgraph API
        B -->|"POST /api/payment/complete"| C["PaymentController"]
        C --> D["PaymentService"]
    end
    subgraph Database Operations
        D --> E["1. Insert transaction"]
        E --> F["2. Link services to ad"]
        F --> G["3. Update ad status"]
    end
    E -.->|"No DB transaction wrapping"| H["Risk: partial completion"]
    style H fill:#e94560,stroke:#e94560,color:#fff
    style A fill:#1a1a2e,stroke:#0f3460,color:#eee
    style B fill:#1a1a2e,stroke:#0f3460,color:#eee
    style C fill:#16213e,stroke:#0f3460,color:#eee
    style D fill:#16213e,stroke:#0f3460,color:#eee
    style E fill:#0d7377,stroke:#14ffec,color:#eee
    style F fill:#0d7377,stroke:#14ffec,color:#eee
    style G fill:#0d7377,stroke:#14ffec,color:#eee

The Exploration Path

mindmap
  root((Payment Flow))
    Entry point
      PaymentStep.tsx button
      useCompletePayment hook
      POST /api/payment/complete
    Backend path
      PaymentController
      PaymentService.completePayment
      ~60 lines of orchestration
    Three critical steps
      Create transaction record
      Link services to job ad
      Update ad status
    Discovery
      No DB transaction wrapper
      Partial failure = user pays but gets nothing
      Need to check production for orphaned transactions

Contrast — direct mode:

If you'd said "explain the payment flow," Claude would have dumped a summary. You'd understand the structure but not the reasoning. The bug discovery came directly from the Socratic process — predicting what the code should do, then comparing it to what it actually does.

Code Review by Questioning

You've written a new API endpoint and want Claude to review it — through questions that make you evaluate your own code, rather than a list of "fix this" comments.

Security Layer

Claude

Let's start with access control. Who is allowed to call this endpoint? What mechanism enforces that?

You

Any authenticated user can call it. I used @UseGuards(JwtAuthGuard).

Claude

Should any authenticated user be able to create a transaction for any organization? Or should they only create transactions for orgs they belong to?

You

...only their own orgs. I'm not checking organizationId against the user's memberships. That's an IDOR vulnerability.

Input Validation

Claude

Look at your DTO. For each numeric field, ask yourself: what's the smallest value that makes business sense? What's the largest? Does your validation enforce those bounds?

You

quantity has @IsNumber() but no min/max. Someone could POST quantity: -5 and create a negative transaction.

Error Handling

Claude

Imagine the database is down. A user clicks "Purchase." What exactly appears on their screen?

You

They'd get a raw 500 with Drizzle's error object. That leaks internal details and gives no useful message.

Idempotency

Claude

Last question. The user has a flaky connection. They click "Purchase" and their browser retries the request. What happens?

You

Two identical transactions get created. I need either an idempotency key or a unique constraint on [userId, jobAdId, formatId].

Issues Found Through Questions

flowchart TB
    subgraph "Your Code"
        EP["POST /api/transactions"]
    end
    subgraph "Issues You Discovered"
        direction TB
        S["Missing org membership check\n(IDOR vulnerability)"]
        V["No min/max on numeric fields\n(negative transactions possible)"]
        E["Raw DB errors exposed to client\n(information leak)"]
        I["No idempotency protection\n(duplicate transactions)"]
    end
    EP --> S
    EP --> V
    EP --> E
    EP --> I
    subgraph "Fixes You Designed"
        S --> S2["Add @OrgMembership() guard"]
        V --> V2["Add @Min(1) @Max(100)"]
        E --> E2["Wrap in try/catch,\nreturn generic 500"]
        I --> I2["Add unique constraint +\nidempotency key header"]
    end
    style S fill:#e94560,stroke:#e94560,color:#fff
    style V fill:#e94560,stroke:#e94560,color:#fff
    style E fill:#e94560,stroke:#e94560,color:#fff
    style I fill:#e94560,stroke:#e94560,color:#fff
    style S2 fill:#0d7377,stroke:#14ffec,color:#eee
    style V2 fill:#0d7377,stroke:#14ffec,color:#eee
    style E2 fill:#0d7377,stroke:#14ffec,color:#eee
    style I2 fill:#0d7377,stroke:#14ffec,color:#eee
    style EP fill:#1a1a2e,stroke:#0f3460,color:#eee

Review Question Categories

mindmap
  root((Code Review\nQuestions))
    Security
      Who can call this?
      Can they access other users' data?
      What's the authorization model?
    Validation
      What inputs are valid?
      What are the boundary values?
      What does the DTO enforce?
    Error handling
      What if the DB is down?
      What does the user see on failure?
      Do error responses leak internals?
    Consistency
      Is it idempotent?
      What if called twice?
      What if called concurrently?
    Performance
      Will this N+1 query?
      What if there are 10K records?
      Is there an index on the WHERE clause?

Contrast — direct mode:

A direct review would say "add @Min(1), add an org check, catch DB errors, add idempotency key." You'd apply all four fixes mechanically. The Socratic review made you think through each category. Next time you write an endpoint, you'll run through security, validation, error handling, and consistency yourself.

Making Refactoring Decisions

You have a 400-line service file that feels messy. You want to refactor it but aren't sure how to split it. Instead of asking Claude to refactor it, you use questions to find the natural seams.

Turn 1 — Name the pain

You

JobAdService is 400 lines and growing. I want to refactor it but I'm not sure how to split it. Ask me questions.

Claude

Before splitting, let's understand it. Without looking at the code, list every thing this service does. Not methods -- responsibilities.

Turn 2 — List responsibilities

You

It handles: (1) creating job ads, (2) updating job ads, (3) publishing/unpublishing, (4) duplicating ads, (5) fetching ads with filters, (6) managing ad-related file uploads.

Claude

Six things. Which of these change for the same reason? Group them by "if X changes, which of these also change?"

Turn 3 — Find natural groups

You

Create + Update + Duplicate all share validation logic
Publish/Unpublish is about status transitions and has its own rules
Fetching is read-only with completely different dependencies
File uploads are their own world (S3, file types, size limits)

Claude

So you have 3-4 natural groups. Now look at the actual methods. Draw me the call graph -- which methods call which other methods within this service?

Turn 4 — Evaluate options

Claude

You have three clusters with minimal cross-talk. Two options:

Split by domain: JobAdCommandService, JobAdLifecycleService, JobAdQueryService
Split by layer: Extract JobAdValidator, JobAdFileService, keep the rest

Which split would make the next feature you add easier to implement?

Turn 5 — Decide with clarity

You

Next feature is adding scheduled publishing. That's clearly lifecycle. If lifecycle is its own service, I add schedulePublish() there without touching CRUD. Option 1.

Claude

Last question: does option 1 require any shared state between the new services, or can they be fully independent?

You

They all need the repository, but that's injected. No shared mutable state. Clean split.

Method Call Graph

graph LR
    subgraph "CRUD Group"
        create --> validate
        create --> uploadFiles
        update --> validate
        update --> uploadFiles
        duplicate --> create
    end
    subgraph "Lifecycle Group"
        publish --> validateStatus["validate (status)"]
        publish --> notifyAdmin
        unpublish --> validateStatus
    end
    subgraph "Query Group"
        findAll
        findById
    end
    style create fill:#0d7377,stroke:#14ffec,color:#eee
    style update fill:#0d7377,stroke:#14ffec,color:#eee
    style duplicate fill:#0d7377,stroke:#14ffec,color:#eee
    style validate fill:#0d7377,stroke:#14ffec,color:#eee
    style uploadFiles fill:#0d7377,stroke:#14ffec,color:#eee
    style publish fill:#e94560,stroke:#e94560,color:#fff
    style unpublish fill:#e94560,stroke:#e94560,color:#fff
    style validateStatus fill:#e94560,stroke:#e94560,color:#fff
    style notifyAdmin fill:#e94560,stroke:#e94560,color:#fff
    style findAll fill:#16213e,stroke:#0f3460,color:#eee
    style findById fill:#16213e,stroke:#0f3460,color:#eee

The Refactoring Plan

flowchart TB
    subgraph Before["Before: JobAdService (400 lines)"]
        ALL["Everything in one file"]
    end
    subgraph After["After: 3 focused services"]
        CMD["JobAdCommandService\n~150 lines\ncreate, update, duplicate,\nvalidate, uploadFiles"]
        LCM["JobAdLifecycleService\n~100 lines\npublish, unpublish,\nschedulePublish (new),\nvalidateStatus, notifyAdmin"]
        QRY["JobAdQueryService\n~80 lines\nfindAll, findById,\nfilters, pagination"]
    end
    ALL --> CMD
    ALL --> LCM
    ALL --> QRY
    CMD ---|"shares"| REPO["JobAdRepository"]
    LCM ---|"shares"| REPO
    QRY ---|"shares"| REPO
    style ALL fill:#e94560,stroke:#e94560,color:#fff
    style CMD fill:#0d7377,stroke:#14ffec,color:#eee
    style LCM fill:#0d7377,stroke:#14ffec,color:#eee
    style QRY fill:#0d7377,stroke:#14ffec,color:#eee
    style REPO fill:#16213e,stroke:#0f3460,color:#eee

Decision Rationale

mindmap
  root((Refactoring\nDecision))
    Why split at all
      400 lines and growing
      6 responsibilities, 3 change-reasons
      Next feature would make it worse
    Why domain split over layer split
      Next feature is scheduled publishing
      Lifecycle service is the natural home
      Layer split would scatter the feature
    Why 3 services not 4
      File uploads only used by CRUD
      Not worth extracting until it grows
      Split the seam that hurts most first
    Shared dependency
      All 3 share JobAdRepository via DI
      No shared mutable state
      Clean, independent modules

Contrast — direct mode:

Asking "refactor JobAdService" would produce a split — but based on Claude's heuristics, not your upcoming roadmap. The Socratic process surfaced the scheduled publishing requirement, which made the right split obvious.

The Underlying Principle

Every example above follows the same pattern: understanding before action. The Socratic method doesn’t make you slower — it makes the first solution you try more likely to be the right one. And it leaves you better equipped for the next problem that looks like this one.

The fix might take 2 minutes either way. But next time a similar issue appears, you won’t need to ask.