What It Actually Costs to Ship a Feature Through OpenClaw

Every feature I ship through OpenClaw, I know exactly what it cost in tokens. Here's how the planning/execution model split keeps development costs at pennies per task.

Every feature I ship through my agentic coding setup, I know exactly what it cost me in tokens. Not roughly. Exactly.

Yesterday's deploy: $0.12.

That's not a typo. Here's how the model economics work, and why the planning/execution split is the most important architectural decision in my stack.

The Setup

This post assumes OpenClaw is already running. If you're not familiar: OpenClaw is a Telegram-based interface that routes tasks to OpenCode, which handles the actual coding work autonomously on a remote server. It's an always-on coding assistant.

The only marginal cost of running tasks through it is what the models consume. And that comes down to one decision I made early: don't use the same model for everything.

Two Models, Two Jobs

Most people running an agentic coding setup pick one model and run everything through it. That works, but it's wasteful. Frontier models are expensive at output tokens. You don't need frontier reasoning to write boilerplate.

Here's what I use instead:

Planning: Kimi K2.5 (Moonshot AI, via OpenRouter)

Kimi handles the hard part. Given a task description, it reads the codebase context, reasons through the architecture, identifies which files need to change, and writes precise instructions for the execution step.

This requires genuine reasoning quality. Multi-step instructions, understanding of dependencies, awareness of side effects. Kimi K2.5 is built for this — it sits at #1 on the OpenClaw community leaderboard, where the community votes based on real agent workload performance.

Pricing via OpenRouter: $0.45/M input, $2.20/M output

Execution: GLM 4.7 Flash (Z.ai, via OpenRouter)

GLM handles the implementation. It takes the structured instructions from the planning step and writes the actual code, makes the file edits, and commits. This is faster, more repetitive work — the kind that doesn't require a $15/M output model.

GLM 4.7 Flash is #2 on the OpenClaw leaderboard. Fast and reliable for execution tasks at a fraction of the cost.

Pricing via OpenRouter: $0.06/M input, $0.40/M output

The analogy holds: a senior architect designs, a junior dev builds. You pay for senior judgment only where it matters.

The Math

A typical feature task through OpenClaw looks like this:

Planning phase (Kimi K2.5)

Input: ~1,000 tokens (task description + relevant file context)
Output: ~2,000 tokens (structured implementation plan)
Cost: ~$0.005

Execution phase (GLM 4.7 Flash)

Input: ~5,000 tokens (plan + file context)
Output: ~10,000 tokens (actual code edits)
Cost: ~$0.007

Total per task: ~$0.012

Less than a cent and a half per feature.

Sensitivity Analysis

What does a full month of development look like?

Monthly tasks	API spend
10 tasks	~$0.12
50 tasks	~$0.60
100 tasks	~$1.20
500 tasks	~$6.00
1,000 tasks	~$12.00

OpenCode is open source. No subscription, no per-task fee. The only marginal cost is what the models actually consume. Even at 1,000 tasks a month, you're at $12 in API spend.

Why This Model Pairing

The OpenClaw community leaderboard at pricepertoken.com/leaderboards/openclaw aggregates community votes on which models perform best for agent workloads. Kimi K2.5 leads with 291 votes, GLM 4.7 Flash is second at 223. This is practitioners reporting what actually works in production.

The reasons are consistent:

Kimi K2.5 follows complex, multi-step instructions reliably. It doesn't lose track of context across a long task. It handles architectural reasoning well and produces structured, actionable output.
GLM 4.7 Flash is fast and cheap. For structured execution tasks where the plan is already written, it doesn't need to reason from scratch — it needs to execute correctly. And it does.

Worth noting: GLM 4.7 Flash is free directly through Z.ai. Via OpenRouter, you pay a small margin for the routing infrastructure. Either way, the execution cost is near-zero.

The Bigger Principle

The planning/execution split is not specific to agentic coding. It's how good teams work. You don't pay a senior engineer to do data entry. You pay them to make the decisions that a junior engineer then implements.

The mistake most people make with LLMs is treating them as a monolith: one API call, one model, one price for everything. The economics improve significantly when you route by task type.

This is the same principle behind Clawdacious: AI that's sized to the job. Not the most expensive option running everything indiscriminately.

Geoff Bevans is the founder of Clawdacious, a done-for-you AI assistant service for GTA professionals. Starts at $299.