Who still gets to keep 75%+ gross margins?
The margin software took for granted — and why AI has to earn it back, one outcome at a time.
For twenty years, "software margin" meant one number: 75% and up. It was so reliable that investors stopped treating it as an achievement and started treating it as a definition — if your gross margin had a 7 or an 8 in front of it, you were software; if it didn't, you were something lesser, valued on a lesser multiple. AI has quietly made that number a question again. This essay is about who gets to keep it, and what they have to do that the last generation never did.
The 75% was a gift
Start with where the number came from, because almost everyone misremembers it. The classic SaaS gross margin wasn’t a reward for writing good code. It was the arithmetic of a cost base that barely moved when you added a customer.
Lay out a mature SaaS cost of revenue and you get something like this (ranges, because it varies with scale and segment): cloud hosting 6–12% of revenue, customer support and success 8–10%, payment processing 1–3%, professional services and onboarding run at or near break-even, call it 0–5%. Stack those and your cost of revenue is roughly 15–25% — a 75–85% gross margin, with the public-company median sitting around 77%.
But the percentages undersell the actual magic, which was operating leverage on the cost of revenue itself. Every one of those lines fell as a share of revenue as you scaled: you negotiated better cloud rates, you deflected more support tickets into self-serve, you spread onboarding tooling across more accounts. And the marginal cost of the next customer — the thousandth, the millionth — rounded to nothing. One more login. One more row in a database. The gift wasn’t 77%; it was that the gift got bigger as you grew, and the next unit was free.
Now drop inference into that P&L. Every other line is still there, maybe somewhat more efficient. But there’s a new one, and it behaves like nothing software has carried before: the model call. It is real money, it is incurred every time the product does work, and — this is the part that breaks the old model — it does not round to zero on the next unit. The heaviest user, the one a SaaS business served almost for free, is now the one quietly costing you the most. Software has, for the first time, acquired a genuine variable cost of delivery. That’s the whole of the COGS essay in one sentence, and it’s why the 75% stopped being automatic.
Two honest qualifiers, because the doom version is wrong. First, inference unit cost is deflating fast — the same task gets cheaper every few months as models and hardware improve. So this is not a permanent tax; it’s a variable cost on a downward slope. Second, that’s exactly what makes it dangerous to price around naively: a cost that’s both variable and falling tempts you into the worst possible pricing reflex.
Cost-plus is the dead end
The reflex is to price off the cost. Inference costs you a few cents, so you mark it up and charge a few cents more. It feels prudent. It is a trap, and it caps your margin at exactly the moment you’re trying to defend it — because if your price is a markup on a falling cost, your price falls too, and you’ve volunteered for a commodity business with a commodity’s margin.
Here’s the thing the cost-plus reflex forgets: SaaS never got to 75% by cost-plus either. No one priced a CRM seat at “hosting plus support plus 20%.” They priced it at what a salesperson’s productivity was worth, and the cost base was an afterthought that happened to be small. The 75% was the residue of value-based pricing meeting a tiny cost — not the goal of a markup. The cost being small was luck. The pricing being value-based was the craft.
So the escape from the inference squeeze is not a cleverer markup. It is the same craft, applied harder: untether your price from your cost and tie it to the value you create. That is the only door to a software margin in the AI era. The difficulty is that the door is in a Chinese wall built over twenty years, called “SaaS pricing.”
You have to come out of SaaS’s shadow
The buyer’s mind is trained. A decade of SaaS has taught every procurement team what software costs and how it’s priced: per seat, per month, a few hundred to a couple thousand dollars per user per year, billed annually at a discount. That training is now the single biggest obstacle to AI margins, because it anchors the buyer to the wrong number.
Consider what SaaS anchored its value on: visibility, clarity, workflow orchestration, and a productivity lift. Good things — but soft things, and the industry priced them at a reasonable ratio. The rule of thumb was capture roughly 5–10% of the value you create — a 1:10 to 1:20 price-to-value peg. Over time, the categories matured, the ROI conversation faded, and the price became conventional: nobody asks a CRM to prove its return anymore; they just pay the going per-seat rate. Comfortable and capped.
If AI prices inside that shadow — as “premium software,” a higher per-seat add-on — it inherits the ceiling. A 20–50% price bump on a seat is the most you can get away with before the buyer’s SaaS-trained instinct says, “That’s expensive software.” To earn 5–10× the ACV that justifies AI’s cost base and its promise, the value has to be pegged to something substantially larger than a productivity lift. And there’s only one pool big enough.
The value pool is labor, not software
Here are the two numbers that matter. Global software spend in 2026 is about $1.4 trillion. Global labor compensation is in the tens of trillions — labor is half of world output, and inside a typical company, payroll is 50–60% of everything it spends, while software is low single digits. The pool of money tied to work being done is on the order of 25–40× the pool tied to software.
That gap is the entire opportunity. Software that helps a person work competes for the software budget and inherits its ceiling. Software that does the work competes for the labor budget — and even a thin slice of a pool 30× larger is a fat slice of the old one.
But three things have to stay honest about that slice:
- You can’t take it all. If your product does $50,000 of a person’s annual work, you cannot charge $50,000 — the buyer only moves if they keep most of the gain. The historical capture rate holds: you’ll get 10–20% of the value created, maybe pushing higher where the result is indisputable. Ten to twenty percent of a labor line is still 5–10× a software seat. The math works; the discipline is not to mistake the value created for the price you can charge.
- Augmentation isn’t displacement. The $50,000 is a ceiling that assumes you actually remove the labor. Most AI today augments — it makes one person do the work of two, and the buyer redeploys the freed-up hours rather than cutting the role. The layoffs in the headlines are real but narrow: overwhelmingly tech companies unwinding their own pandemic over-hiring, not a broad cross-industry handover to machines. Out in the wider economy AI has barely begun to spread, and where it lands the default is still redeploy, not let go. So the realized, attributable value is a fraction of the displacement headline — and your price has to survive that gap. Price to the labor you genuinely take off the table, not the labor you could in theory.
- The arithmetic is “how many people’s work, times what share.” That’s the real pricing model under the hood: estimate the fraction of a role you actually perform, multiply by the loaded cost of that role, and take your 10–20%. Do that, and you arrive — every time — at a number per unit of work that looks small in absolute terms, even cheap next to a human, but aggregates to many multiples of a software contract.
You can’t sell the layoff
Which leaves one problem, and it’s the reason outcome pricing exists at all. You cannot walk into the room and price this way out loud. “We do the work of three of your support reps, here’s our invoice for a slice of their salaries” is both politically radioactive and strategically dumb — it makes you the villain in the buyer’s reorg and invites them to argue about exactly which jobs and how many.
So you need a metric that prices labor value without ever naming the labor. A unit that the buyer can accept, meter, and expense — that quietly indexes to work performed while you never say the word “headcount.” That unit is the delivered outcome. “$0.99 per resolved conversation” is, underneath, “a small share of what a support interaction costs you in human time” — but it’s stated as a clean, dignified, per-result price. Outcome pricing isn’t just the purest alignment of price and value (it is that, and that’s its own essay). It’s also the only socially acceptable way to bill the labor pool. That’s why the high-margin frontier and the outcome frontier are the same place.
Two gates, not one
So outcome pricing is the door to the labor pool, and the labor pool is the door to a software margin. But the door is gated twice, and almost no one clears both.
Gate one is measurement and attribution. To bill a result, you must define it cleanly and prove you caused it — “a resolved ticket” qualifies, “a better-designed slide” does not, and most AI output is too entangled with the user’s own judgment to bill as a discrete, attributable outcome. This is the gate the outcomes essay is about, and it’s why only a handful of products price this way.
Gate two is the one almost everyone misses, and it’s where the 75% actually lives: you must be able to deliver the outcome at software’s marginal cost. It is entirely possible to clear gate one and still have a terrible business. If producing each billed outcome requires a human in the loop — a reviewer, an escalation agent, a “human-in-the-loop for quality” — or compute that scales linearly and refuses to deflate, then your cost of revenue behaves like a services firm’s, and services firms run 30–50% gross margins, not 75%. And don’t assume AI rescues that number: how far automation can actually push a services margin is the open question, and even a heavily AI-leveraged services business is still a services business — it carries the hiring, training, attrition, and management drag that never fully goes away. You will have built a labor-arbitrage business at a software multiple, and the market will eventually re-rate you to what you are.
This is the trap hiding inside the most exciting category in AI — “agents that do the whole job.” Doing the whole job is precisely what lets you charge for the outcome (gate one) and precisely what tempts you to throw humans behind the curtain to guarantee it (failing gate two). The two gates pull against each other. The rare, valuable businesses are the ones that clear gate one with an outcome legible enough to bill, and clear gate two by delivering it with software — no human behind the curtain, compute that falls with the cost curve. That intersection is small. It is also the only place a real software margin survives.
And even there, it’s fragile
Clearing both gates buys you the margin; it doesn’t let you stop. Value-based pricing holds only while two things are true: the buyer can’t easily see your cost, and can’t easily get the outcome elsewhere. AI erodes both. Inference cost becomes public and falls; a dozen competitors can deliver the same resolved ticket; and the buyer — trained, remember, to anchor on cost — starts doing the cents-per-call math and asking why a result that costs you three cents is priced at ninety-nine.
The defense isn’t to hide the cost forever; it’s to keep moving the value faster than the price commoditizes. When inference gets 5× cheaper, the winners don’t pocket a 5× margin — they reinvest the deflation dividend into doing more of the job, a harder part of the job, a more attributable outcome, so the price stays tied to value that’s still rising. The 75% in AI is not a resting state you reach. It’s a treadmill you stay on. That’s the part the “software margins are back” crowd skips.
Twilio already ran this experiment
We don’t have to imagine what happens to software with a real, usage-based COGS. One company has lived it in public for a decade: Twilio.
Twilio’s product is a developer’s dream — a few lines of code to text or call anyone on Earth. But under that API sits a cost software never used to carry: it pays the telecom carriers for every message and minute. Those carrier fees are a genuine, variable, usage-scaling COGS paid to an upstream oligopoly — the exact shape of an inference bill. The result is a gross margin that has sat near 50% for years, not 75% — most recently around 49.6%, and still sliding as U.S. carriers hike messaging fees (an extra ~$190–235M of pass-through in 2026 alone, shaving roughly two points off margin). Three lessons fall straight out, and each is already in this essay.
The pass-through is a trap, exactly as predicted. Carrier fees get passed to customers “at cost” — and passing a cost through dilutes your margin percentage rather than defending it. Twilio is the cost-plus dead end made real: you cannot mark your way to a software margin on top of a visible, commoditized input.
The market tolerates a low margin while you grow — then it doesn’t. Through the 2020–21 surge, investors handed Twilio a full software multiple despite the 50% margin. When growth normalized, the re-rating was savage — the stock fell on the order of 80%+ from its peak — because at lower growth, a 50% gross margin doesn’t leave enough behind to fund the sales-and-R&D engine the way 80% does. That’s the re-rating this essay keeps warning about, observed in the wild.
The only real defense was moving up-stack. Twilio’s answer wasn’t a cleverer markup on texts; it was to build and buy higher-value software on top — Segment, Flex, Engage — and to split the company into a low-margin “Communications” base and a higher-margin “Software” layer, betting the mix drifts upward over time. That is this essay’s “reinvest into value the buyer can’t replicate,” in corporate-strategy form. It works slowly; the blended margin is still around 50% because the pass-through base is so large.
Which answers the wrapper question directly. Twilio is, in part, a wrapper on the carriers — and yet it commands a real, durable premium over raw carrier rates, because one global API, guaranteed deliverability, compliance plumbing, and reliability at scale are worth paying for and hard to rebuild. A “wrapper” can absolutely be a great business; Twilio is a multi-billion-dollar-revenue one. What it can’t be is a 75% business, and the market has priced it as exactly what it is. The lesson for every “thin layer on a model provider” isn’t wrappers die. It’s: you earn the premium your integration and distribution genuinely add — and the margin, and the multiple, of the cost structure you actually have. Be honest about which that is.
One difference cuts in AI’s favor, and it matters. Twilio’s key input rises — carriers keep hiking — while inference falls. AI sellers have the one lever Twilio never did: the deflation dividend. Whether that hardens into durable margin or just gets competed away is the whole game — but at least the wind is at their back, where Twilio has always had it in their face.
So who keeps 75%?
A narrow band, and it’s worth naming the spec exactly: a product whose value is measurable and attributable (gate one), deliverable at software marginal cost with no human behind the curtain (gate two), priced to a slice of the labor pool via an outcome metric rather than to a markup on inference, and re-investing deflation into new value fast enough to stay ahead of commoditization. Hit all four, and you’ve earned a margin that looks like software’s old gift but is nothing like a gift — it’s a thing you defend every quarter.
Everyone else faces an honest fork, and the only real sin is pretending you’re not at it. You are either a software business with a thinner margin than the last generation — still good, still 50–70%, just no longer the automatic 80s — or you are a services business with software in the loop, running 30–50% and worth what services are worth. Both are real businesses. Neither is shameful. What’s fatal is selling investors the software multiple while quietly running the services P&L, because the gap closes eventually and it closes on you.
Which margin is yours is not something we can tell you from the outside, and we won’t pretend to — it depends on your costs, your resolution rate, your model mix, and how much human you’ve hidden in the loop. That’s a model you have to run on your own numbers. (That’s exactly what the pricing calculator is for — your costs, your model, your margin; we’ll never put a margin on someone else’s name.)
A what-if: what would it take for Fin to keep 75%?
To make the two gates concrete, take the cleanest public example of an outcome price and reason about it — strictly as a hypothetical. We don’t know Intercom’s costs and we won’t invent them; the only real number here is the public one: Fin charges $0.99 per resolved conversation. Everything below is our own back-of-the-envelope estimate of what would have to be true for that price to yield a 75% gross margin. Treat it as a worked exercise in arithmetic, not a claim about anyone’s books.
A 75% gross margin on $0.99 means all-in cost of revenue per resolved conversation must come in under ~$0.25. Into that quarter must fit: the inference for the whole conversation, retrieval and guardrail calls, infrastructure, support, payment processing — and, critically, the cost of every conversation that didn’t resolve.
That last item is the lever everyone underestimates. Fin only earns the $0.99 when it resolves. Every unresolved attempt still burns compute and returns zero revenue. So the cost that matters isn’t compute-per-conversation, it’s compute-per-billed-resolution = compute-per-attempt ÷ resolution rate. If Fin resolves 70% of what it attempts, every dollar of revenue carries ~1.4 conversations’ worth of compute. If it resolves 35%, it carries ~2.9× — the same model cost, double the effective COGS. Before the inference price even enters, the resolution rate sets the ceiling on the margin.
Now the compute itself. Assume a resolved conversation runs a few turns — retrieval, reasoning, a drafted answer, maybe 30,000–80,000 tokens all in. At frontier-model prices (say ~$3–10 per million blended tokens in 2026), that’s roughly $0.10–0.50 of raw inference per attempt — already straddling the $0.25 line on the heavier end, before you’ve divided by the resolution rate. So the economics only close if several things are simultaneously true:
- Cheap-enough models for most of the work — route the easy 80% to small/cheap models, reserve frontier models for the hard residue, and cache aggressively. Pull blended compute down to ~$0.05–0.12 per attempt.
- A high resolution rate — call it 60–70%+, so the divide-by-resolution penalty stays near 1.4×, not 3×.
- Zero human in the loop on billed resolutions — this is gate two, in arithmetic form. Put even a cheap human reviewer on 10% of conversations at a few dollars each, and your average COGS blows straight through $0.25. The 75% requires the machine to finish the job on its own.
- Deflation working for you — at unchanged $0.99, every drop in model price is pure margin expansion. Time is on the seller’s side here, which is also why the price will eventually face pressure.
Put it together, and the honest read is: 75% on $0.99 is achievable, but it sits on a knife’s edge — it needs large volumes, a high resolution rate, disciplined model routing, and no human backstop, and it gets easier every quarter inference gets cheaper and harder every quarter a competitor will resolve the same ticket for less. Which is the whole thesis in one product: the margin is real, it lives at the intersection of both gates, and it has to be defended, not banked.
The margin AI can’t take for granted
Software’s 75% was a gift of a cost base that barely moved. AI took the gift back and replaced it with a job: tie your price to the labor you actually displace, find the outcome metric that bills for it without naming it, deliver that outcome with software and not with people, and keep moving the value faster than the price commoditizes. Do all of that and you keep the number. Do some of it and you’re a good business with a humbler margin. Pretend you’ve done it when you haven’t, and the re-rating is just a matter of time.
None of this makes the lower-margin road the wrong one to start on. Some of the best bets in AI are deliberate bridges: run humans in the loop now to learn the job and train the system that retires them later; accept a sub-software margin now to land the category and earn the right to raise it; let venture dollars fund a multi-year climb from services economics toward software ones. Those can be exactly the right calls. But they are bridges with a clock on them — and the graveyard is full of companies that swore they’d automate the humans away “next year” until the humans became the company, or that planned to raise margins “once we have scale” until the runway ran out first. Make the bridge explicit, put a date on it, and be honest about whether you’re building toward the far bank or just standing comfortably in the river.
The old margin was a fact about software. The new one is a verdict on your pricing.
Related questions
- Can AI companies sustain 75%+ gross margins?
- Some can, but it is no longer automatic the way it was for SaaS. AI carries a real, usage-based cost of delivery — inference — that classic software didn't, so a software-grade margin now has to be earned by clearing two gates: the outcome you sell must be measurable and attributable (so you can price to value, not cost), and you must be able to deliver that outcome at software's marginal cost, with no human in the loop. Products that clear both, price to a slice of the labor they replace, and keep reinvesting falling compute costs into more value can hold 75%+. Most will land lower — a thinner-margin software business or, if humans stay in the loop, a services business at 30–50%.
- Why is outcome-based pricing linked to gross margin?
- Because it untethers price from cost. If you price as a markup on inference, your price falls as compute falls and your margin is capped at a commodity's. Pricing for a delivered outcome lets you charge for the value created — a slice of the labor the result replaces — which is a pool an order of magnitude larger than the software budget. That is how SaaS reached 75% in the first place (value-based pricing meeting a tiny cost), and it is the only door to a software margin once AI has a real cost of goods sold.
- Is building an AI 'wrapper' a bad business?
- Not necessarily — but be honest about the margin and the multiple it deserves. Twilio is the classic precedent: a layer on top of telecom carriers that still earns a real, durable premium for global connectivity, reliability, and integration, yet runs a ~50% gross margin because the carrier cost is a visible, pass-through input. A wrapper on a model provider can likewise be a strong business if its integration and distribution add value buyers can't cheaply rebuild. What it can't do is mark up a commoditized input to a 75% margin, or claim a software multiple while running a pass-through cost structure.