Playbooks · Strategy

What AI Product Development Actually Costs

Pricing pages give you a number. This guide gives you the cost drivers, the hidden costs, and the questions that decide whether you are paying for a feature or paying for a problem.

13 min read Updated 2026-05-04By Clarvia Team
TL;DR

How much does AI cost is the wrong question. The right question is what drives cost in your specific situation, and the answer is usually scope, data complexity, integration depth, and the governance bar required. This guide covers the cost drivers, the hidden costs that surface in months three through six, and the questions to ask any AI partner before signing.

Why 'how much does AI cost' is the wrong question

The question hides three different cost questions inside one. There is the cost to build a first AI feature, the cost to operate that feature in production, and the cost to evolve the feature as models, data, and requirements change. They are different orders of magnitude and they are paid by different budgets in most companies.

Pricing pages from agencies usually answer the first question and downplay the other two, which is why finance teams find their costs higher than the proposal six months in. The honest answer to a leadership team is the total cost over a meaningful time horizon (12-24 months), not the headline number on a statement of work.

We do not publish pricing because every project is bespoke. What we can publish is what drives cost, so you can compare proposals on substance rather than on a single number that hides where the money actually goes.

What actually drives cost

Five drivers explain most of the variance in AI project cost. The first is scope. A feature that handles five intents costs roughly a fifth of one that handles fifty, all else equal. Most cost overruns trace to scope expansion during build, not to the original scope being underpriced.

The second is data complexity. Clean structured data with good labels is cheap. Unstructured documents in mixed formats with inconsistent labelling are expensive. Sensitive data (PII, regulated content) is expensive in a different way: it adds compliance, security, and architectural overhead.

The third is integration depth. A standalone AI feature that runs against a clean API is cheap to integrate. A feature that has to write into a legacy ERP, respect existing permissions models, and coexist with a deterministic rules engine is expensive, often more expensive than the AI itself.

The fourth is the governance bar. A feature for an internal team that can tolerate the occasional miss is cheap to govern. A feature for external customers in a regulated industry needs evaluation, monitoring, audit trails, human-review queues, and incident response, all of which cost real money to build and operate.

The fifth is iteration velocity. A team that ships once and stops is cheap. A team that intends to iterate continuously needs the platform components (eval, monitoring, deployment) that make iteration safe, and those components are mostly fixed cost regardless of how many features run on them.

Cost archetypes: discovery, build, ongoing

Discovery is usually the cheapest phase and the highest-leverage. A focused two-week discovery surfaces the scope, data, integration, and governance constraints that decide whether the build is sensible. Skipping discovery to save discovery cost almost always increases build cost by a multiple.

Build cost varies enormously based on the five drivers. A scoped first feature in the four-to-eight week range is the most common shape we see. Features that promise to be larger usually benefit from being broken into multiple sequential builds.

Ongoing cost is the one most teams underestimate. It includes model API costs, infrastructure (vector databases, monitoring, eval), and the engineering time to operate and evolve the feature. Operating cost is usually 30 to 60 percent of build cost on an annualised basis once a feature is in production. Plan for it.

The hidden costs that surface in months three to six

Model API spend at production volume. The cost per request is small; the volume at scale is not. Production teams that did not model API spend before launch often face an unwelcome conversation with finance in month three.

Eval and monitoring infrastructure. The cost of vector databases, observability tools, and eval platforms is rarely budgeted upfront. It is real and it scales with usage. Build it into the operating model from day one.

Drift remediation. Models change, data changes, prompts decay. Every quarter or two, a feature that was working starts working worse, and someone has to investigate and fix it. Budget engineering capacity for this; it is not optional.

Compliance overhead. The first AI feature carries the cost of building the compliance pattern. The second and third features amortise that cost. The first cost is real, large, and easy to forget when scoping.

Human-in-the-loop operations. Most production AI features have a human review queue for low-confidence cases. The people staffing that queue cost money. They are also part of the value proposition, so do not skip them, but do budget for them.

Build versus buy economics

Buying an off-the-shelf AI tool that solves your problem is almost always cheaper than building, when an off-the-shelf tool exists that genuinely solves your problem. The mistake is assuming a tool solves the problem when it solves a similar but different problem.

We tell clients to bias toward buy. The cases where building wins are: when the problem is core to your competitive position, when off-the-shelf tools genuinely do not exist, or when the cost of integration with existing systems makes a generic tool more expensive than a custom build. Those cases are rarer than people assume.

A useful diagnostic: write down what an off-the-shelf tool would do, and what you would need to add or change. If the additions are minimal, buy. If the additions are substantial, you are effectively building anyway and you might as well build cleanly rather than fight a tool that does not fit.

Common cost traps

The 'we will figure out evaluation later' trap. Teams that defer evaluation save build cost and pay for it in operating cost, where the lack of evaluation makes every change risky and every regression invisible.

The 'we just need a pilot' trap. Pilots that are not built to production standards rarely transition to production. Either build to production from week 1 or build a throwaway prototype with the explicit understanding it is throwaway. Building a 'pilot' that is neither produces a feature you cannot ship and cannot scrap.

The 'we will use the cheapest model' trap. Using a weaker model often saves API costs and increases other costs (more retrieval, more prompting, more human review) by more than the savings. Optimise the system, not any one component.

The 'fixed price for an unclear scope' trap. Fixed-price engagements work when the scope is genuinely fixed. They fail when scope is loose and the partner is incentivised to minimise work to protect margin. We use fixed-price for discovery (where scope is small and clear) and milestone-based pricing for builds (where scope evolves).

How to scope a sensible first project

The right size for a first AI project is whatever fits in eight weeks of execution time, including the platform components you will need for subsequent projects. Smaller is risky because it might not justify the platform overhead; larger is risky because the scope drifts before the build lands.

A useful test: can you describe the feature, the data it uses, and what good looks like, in under five minutes. If yes, the scope is probably right. If you find yourself adding qualifiers and exceptions, the scope is too broad and needs to be narrowed before any partner builds anything.

The most successful first projects we ship have one named owner, one well-defined workflow, one source of data, and one measurable outcome. The simplicity is the feature, not a limitation.

What to ask any AI partner about cost

What is the build cost, and what is included. Surface the assumptions about scope, data, integration, and governance. Disagreements about what is in scope are how cost overruns happen.

What is the expected operating cost, broken down by API, infrastructure, and engineering time. Push past 'it depends' answers; partners who have shipped production AI know roughly what each component costs.

How are change orders priced. Scope will change; the question is whether change is priced fairly or weaponised.

What is the exit cost. If we want to take the build in-house in 12 months, what does that look like and what does it cost. Partners who duck this question are the ones who plan to make exit expensive.

What does year two look like, by component. Most projects have meaningful year-two costs that the original proposal did not mention. The right partner will tell you what those are without being asked.

AI Project Cost Worksheet

A working worksheet that walks through the five cost drivers and the year-one and year-two cost components for a specific AI project. Built to compare proposals from different partners on a like-for-like basis.

Related playbooks

Common questions

Why do you not publish your prices?

Because every project is bespoke and a published price would either be misleadingly low (excluding work most projects need) or misleadingly high (including work many projects do not need). What we can publish is the scope, deliverables, and timeline of each engagement, which is what actually decides whether the engagement is worth the price.

How do we compare proposals from different partners on cost?

Compare on the line items, not the total. Get each partner to break down: discovery, build, platform components, ongoing operating cost, change order pricing, and exit cost. Partners who can do this cleanly have shipped production AI before. Partners who cannot have not.

What is the cheapest first AI project that is still worth doing?

A scoped automation of a single high-volume workflow with measurable cost savings. Examples: invoice extraction for a single supplier set, support deflection for a fixed list of intents, classification of a high-volume document type. The cheapest projects that succeed share the property of being small enough to evaluate cleanly and useful enough to justify the operating overhead.

Should we expect costs to come down as models get cheaper?

Model API costs have come down substantially and continue to do so. Build cost has not, because build cost is mostly engineering time. Operating cost on the API component drops; operating cost on engineering, monitoring, and human review does not. Plan accordingly.

Is in-house cheaper than an agency?

It depends on what you are doing. In-house is cheaper for ongoing iteration of a feature you have already built, because the marginal engineering hour is cheap. An agency is cheaper for the first build, because the agency has shipped the pattern before and brings platform components you would otherwise build from scratch. The hybrid that works best is: agency builds the first feature and the platform; in-house operates and iterates.

Get a real cost picture for your AI project.

Book a free 15-minute call. We will walk through the cost drivers for your specific situation and tell you what range to expect.

Cookie Preferences

We use cookies to enhance your experience. By continuing, you agree to our use of cookies.