Free Isn't Free
In AI, every free user costs you money. Price for that.
In SaaS, an extra free user costs you nothing. In AI, every free user costs you money.
Every time they hit Enter, your GPUs fire and your cash burns. Growth explodes. Then your bills arrive.
So the old freemium playbook breaks. “Give away the basics, gate the best features” assumes free is cheap. In AI, your free tier is your single biggest compute bill, and your best feature might be the most expensive thing you own.
A huge thank-you to Vikas Kansal and the Google AI team, whose paywall framework this is built on. 🙏
The trap: one premium tier
Google AI hit this wall in public. Their first move was the classic play: a single $20 Gemini Advanced tier, pay for the smartest model.
Two things broke. The free tier was already, in users’ own words, “smarter than I am,” so most saw no reason to upgrade. And the power users who did upgrade burned so much compute the unit economics were terrifying.
In other words: one tier can’t price a cost that scales with every prompt. You have to gate on what users value and what the company pays for, at the same time.
So what do you gate? Three things.
Gate the volume
Price the work pumped through the system, not the smarts. Google split one tier into three: Plus, Pro and Ultra, each a level of usage intensity, up to a 1M-token context window. Midjourney does the same with Fast Mode (instant GPU, metered hours) versus Relax Mode (free, but you queue). Light users stay free. Heavy users pay because they cost more, not just because they value more.
Gate the outcome
Stop selling answers. Start selling hours.
The free tier gives the right answer, then leaves you to copy, paste, reformat and re-prompt. Put the paywall in front of the features that finish the job: automation, agents, integrations, export. Intercom’s Fin agent is the cleanest version. It’s free to let the AI try, and you pay $0.99 only when the problem is actually resolved.
Gate the heaviest compute
Some features melt the servers. When Google built Genie 3, its real-time world model, the internal joke was that “the TPUs were melting on every prompt.” Serving that to every free user wasn’t a bad business move. It was physically impossible.
So it went to the top tier only. Make text and basic images universally free to pull people in. Set a hard gate the moment someone wants cinematic video, a real-time simulation, or a 3D world.
Tiers capture the budget. The ecosystem keeps it.
Right now, users are pouring experimental dollars into AI. Those budgets won’t last. Well-priced tiers capture that money. An ecosystem around them is what keeps it.
Convert at the moment of intent. The upsell is timing, not packaging. Google watches three signals: a user who refines the same output five times in one session (that’s real work), a user on both desktop and mobile inside 48 hours (the tool is now part of their day), and the “continue this chat” soft paywall, where a shared conversation needs a Pro model and proves its value before anyone pays.
Bundle for month two. AI churn is brutal because the habit isn’t formed yet. So tie the subscription to something stickier. Google bundled AI with Google One cloud storage, and nobody cancels the thing holding their photos, so the AI habit survives almost by accident. Cursor did it by indexing your codebase: churning means tearing down your own setup.
Route prompts so cheap stays cheap. You can’t serve your biggest model to every free prompt. “What’s the capital of France?” should hit a tiny, fast model. A logic puzzle routes to a heavy reasoner with token metering. The user still gets instant magic. Your margin stays invisible.
The traps underneath
Three margin traps will catch you even when the gates are right.
Don’t break trust. AI usage spikes around projects and exams, then stops. Bury the cancel button and a temporary pause becomes a permanent exit. Offer a one-click pause that keeps their saved prompts, and they come back for the next heavy week.
Don’t lock your tiers in stone. Today’s premium model is next quarter’s commodity. Lock the boundaries and you bleed margin (giving away $5,000 of compute on a $200 plan) or bleed users (a rival gives your “premium” away free). Audit unit costs constantly. Keep room at the top for the next breakthrough.
Don’t ship peak-agnostic pricing. A flat 24/7 rate while your GPUs redline on weekday afternoons and sit idle on Sundays trains people to run heavy jobs at your most expensive hour. Add usage multipliers or compute “happy hours” to smooth the load.
Gate the volume. Gate the outcome. Gate the heaviest compute.
Open your pricing page next to your compute bill. If your most expensive feature sits in your cheapest tier, you’ve found the leak. Tell me which feature it is. I read every reply. 🙏
Source Notes
AI freemium cost structure. Vikas Kansal’s Google AI essay via Lenny’s Newsletter supplies the central claim: SaaS freemium assumes low marginal cost, while AI freemium has real inference cost on every active user.
Volume, outcome, and compute gates. The Google AI examples, Gemini tiering, and Genie 3 compute constraint come from Kansal’s pricing framework. Source:
Aha moment without unlimited compute. Elena Verna’s freemium analysis supports the point that the free tier still has to deliver enough value for activation, even when cost must be controlled. Source:
Outcome pricing. Intercom’s Fin pricing is the clean example of charging when the AI resolves the customer problem, not merely when it generates an answer. Source: https://www.intercom.com/help/en/articles/8205718-fin-ai-agent-outcomes




