<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Product Theatre]]></title><description><![CDATA[Honest writing on AI, product leadership, and the gap between hype and reality — one pattern, every Tuesday.]]></description><link>https://www.producttheatre.com</link><image><url>https://substackcdn.com/image/fetch/$s_!kIaA!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b557c19-2676-4227-9c7c-37092ec14f3f_512x512.png</url><title>Product Theatre</title><link>https://www.producttheatre.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 05 Jun 2026 17:14:09 GMT</lastBuildDate><atom:link href="https://www.producttheatre.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Cam]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[producttheatre@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[producttheatre@substack.com]]></itunes:email><itunes:name><![CDATA[Cam]]></itunes:name></itunes:owner><itunes:author><![CDATA[Cam]]></itunes:author><googleplay:owner><![CDATA[producttheatre@substack.com]]></googleplay:owner><googleplay:email><![CDATA[producttheatre@substack.com]]></googleplay:email><googleplay:author><![CDATA[Cam]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Taste Tax]]></title><description><![CDATA[AI made execution cheap. Now judgment is the bottleneck.]]></description><link>https://www.producttheatre.com/p/the-taste-tax</link><guid isPermaLink="false">https://www.producttheatre.com/p/the-taste-tax</guid><dc:creator><![CDATA[Cam]]></dc:creator><pubDate>Thu, 04 Jun 2026 11:14:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zbAK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zbAK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zbAK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zbAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:951408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://producttheatre.substack.com/i/200433973?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zbAK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!zbAK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7a8dd0b-8b9b-40c7-95b9-9a33b43f716c_1376x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>AI did not remove the need for product judgment.</p><p>It removed the excuse for not having any.</p><p>When agents can produce five flows, three prototypes, two launch plans, and a passable landing page by Friday, the hard question changes. It is no longer &#8220;Can we build it?&#8221; It is &#8220;Which version is worth shipping?&#8221;</p><p>That scarce work is taste.</p><p>By taste, I do not mean aesthetics. I mean product discrimination: the ability to tell useful from noisy, sharp from generic, finished from merely working, and necessary from extra.</p><p>The taste tax is the bill that comes due when execution gets cheap.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.producttheatre.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.producttheatre.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>When building gets cheap, choosing gets exposed</h2><p>At Notion, the AI shift did not start with designers making prettier static mocks.</p><p>It moved the work into a shared prototype playground.</p><p>Brian Lovin, a designer on Notion AI, built an environment where designers could turn ideas into working prototypes with Claude Code. The point was not that every designer should become an engineer. The point was that AI products are hard to judge from static screens.</p><p>You have to feel the system behave.</p><p>A prototype that answers can still be wrong. It can use the wrong tone. It can hide the moment where trust breaks. It can complete the task and still feel generic in the user&#8217;s hands.</p><p>That is the right scene for the taste tax.</p><p>AI made the first version cheaper. It did not decide whether the interaction was good.</p><p>The new bottleneck is not making the artifact.</p><p>It is judging it while there is still time to change it.</p><h2>Taste is not decoration</h2><p>Taste is often treated as a soft word. That makes it easy to ignore.</p><p>Do not ignore it.</p><p>Taste is the operating standard underneath product work:</p><ul><li><p>What should we cut?</p></li><li><p>What should we polish?</p></li><li><p>What should we call this?</p></li><li><p>Which version helps the user think less?</p></li><li><p>Where does the feature cross from useful into trying too hard?</p></li><li><p>What would make this feel like us if the logo disappeared?</p></li></ul><p>Those are not decoration questions. They are product questions.</p><p>Before AI, weak taste was partially hidden by execution cost. Building was expensive, so fewer bad ideas made it all the way to users.</p><p>Now the gate is lower.</p><blockquote><p>Bad taste ships faster too.</p></blockquote><h2>The failure mode is generic abundance</h2><p>The first-order effect of AI is more output.</p><p>More drafts. More screens. More tests. More strategy docs. More prototypes. More &#8220;pretty good&#8221; versions of almost everything.</p><p>That sounds like leverage. Sometimes it is.</p><p>But abundance has a failure mode: teams stop discriminating. The roadmap only adds. The interface gets busier. The product starts to look like every other product built with the same models, the same templates, and the same unwillingness to say no.</p><p>This is the taste tax in the wild:</p><ul><li><p>The team can name twenty things to build and nothing to remove.</p></li><li><p>&#8220;The model wrote it&#8221; becomes a defense.</p></li><li><p>Code review catches bugs but not mediocrity.</p></li><li><p>The definition of done ends at &#8220;it works.&#8221;</p></li><li><p>Nobody can name a product they admire and explain the standard behind it.</p></li></ul><p>The product is not broken.</p><p>That is the problem. Broken work gets fixed. Generic work gets shipped.</p><h2>The signal is what the team refuses</h2><p>The easiest way to see taste is to ask what the team refused.</p><p>A team with taste can show you the versions they rejected. They can explain why a feature was removed, why a label changed, why one interaction got another hour and another got killed.</p><p>A team without taste can only show you the output.</p><p>This is why &#8220;move fast&#8221; became more dangerous in the agent era. Speed without discrimination converges to the mean. If everyone has access to similar generation tools, the difference is not who can produce more.</p><p>The difference is who can choose better.</p><p>James Bessen&#8217;s work on automation points to the broader pattern: technology often shifts work rather than simply removing it. When the task gets cheap, the judgment around the task becomes more valuable.</p><p>The artifact got cheaper.</p><p>The standard did not.</p><h2>Install a quality bar, not a taste lecture</h2><p>Do not tell teams to &#8220;have better taste.&#8221; That is not a process.</p><p>Make the standard visible at the moment work is about to ship.</p><p>Use one review with three questions:</p><ul><li><p>What did we refuse? Tests whether the team can cut. A good answer names a rejected version, feature, message, workflow, or interaction &#8212; with a clear reason.</p></li><li><p>What did we elevate? Tests whether the team improved beyond &#8220;it works.&#8221; A good answer names one change that made the output clearer, simpler, safer, more useful, or more distinctive.</p></li><li><p>What standard did we apply? Tests whether taste is shared or personal. A good answer is a reusable principle the next team can apply without the original reviewer in the room.</p></li></ul><p>This keeps the advice industry-agnostic. The artifact might be a product flow, pricing change, policy answer, data workflow, onboarding moment, service script, or internal tool. The standard is the same: the team can explain why this version deserves to exist.</p><p>Over time, save the best answers. That becomes the team&#8217;s taste library: not a mood board, but a record of good decisions.</p><h2>What changes Monday</h2><p>Pick one AI-assisted feature your team is about to ship.</p><p>Do one thing before launch: run a refusal review.</p><p>The launch decision should not be: &#8220;Does it work?&#8221;</p><p>It should be:</p><p><strong>Can the team explain what it refused, what it elevated, and what standard it used?</strong></p><p>If the answer is mostly silence, do not call the work finished.</p><p>Call it generated.</p><p>The market will not punish you because your team is slow.</p><p>It will punish you because everyone is fast now, and your product looks like theirs.</p><h2>Sources</h2><ul><li><p>Lenny&#8217;s Newsletter &#8212; I haven&#8217;t written a single line of code in six months: https://www.lennysnewsletter.com/p/i-havent-written-a-single-line-of</p></li><li><p>Lenny&#8217;s Newsletter &#8212; Why cultivating agency matters more than cultivating skills: https://www.lennysnewsletter.com/p/why-cultivating-agency-matters-more</p></li><li><p>James Bessen &#8212; Toil and Technology: https://www.imf.org/external/pubs/ft/fandd/2015/03/bessen.htm</p></li><li><p>Paul Graham &#8212; Taste for Makers: https://www.paulgraham.com/taste.html</p><p></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.producttheatre.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Product Theatre! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Straw Prompting]]></title><description><![CDATA[A demo is not an eval. That is the AI rollout mistake hiding in plain sight.]]></description><link>https://www.producttheatre.com/p/straw-prompting</link><guid isPermaLink="false">https://www.producttheatre.com/p/straw-prompting</guid><dc:creator><![CDATA[Cam]]></dc:creator><pubDate>Thu, 04 Jun 2026 11:03:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jUTo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jUTo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jUTo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jUTo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1060924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://producttheatre.substack.com/i/200433382?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jUTo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 424w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 848w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 1272w, https://substackcdn.com/image/fetch/$s_!jUTo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f6f3e97-d4d5-4f77-9a97-24995db7a785_1376x768.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> </p><p>An AI demo is not an eval.</p><p>That is the mistake behind a lot of brittle enterprise AI rollouts. The demo proves the model can answer one polished question in one controlled room. An eval &#8212; a test set of real cases, scored against a written standard &#8212; tells you whether the system can survive actual users.</p><p>Most teams still ship the demo.</p><p>That is what I am calling <strong>straw prompting</strong>: building an AI feature out of a hand-tuned prompt, testing it on the cases that already work, and launching before the messy cases arrive.</p><p>It looks like a house.</p><p>It is not ready for weather.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.producttheatre.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.producttheatre.com/subscribe?"><span>Subscribe now</span></a></p><p></p><h2>The happy path is not the product</h2><p>Air Canada learned this in public.</p><p>In 2022, a customer used Air Canada&#8217;s website chatbot while booking travel after a death in the family. The chatbot told him he could buy a full-price ticket, travel, and then apply later for the airline&#8217;s bereavement fare.</p><p>That was wrong.</p><p>The airline&#8217;s actual bereavement policy said the discount could not be requested after travel had already happened. The chatbot even linked to the page with the correct policy, but the answer it gave in the conversation was still misleading.</p><p>The customer relied on the chatbot, travelled, and later asked for the fare difference. Air Canada refused. In 2024, British Columbia&#8217;s Civil Resolution Tribunal found Air Canada liable for negligent misrepresentation and ordered it to pay damages, interest, and fees.</p><p>The product lesson is not &#8220;chatbots are bad.&#8221;</p><p>It is sharper than that.</p><p>Air Canada had the correct information on its own website. The failure was that the interactive answer surface did not reliably behave like the policy it was supposed to represent.</p><p>That is the straw-prompting pattern in the wild.</p><p>The company did not just need a chatbot that could answer bereavement questions. It needed a system that could notice policy conflict, ground answers in the current source of truth, and escalate when the answer affected money, rights, or obligations.</p><p>A working answer was not enough.</p><p>The answer had to be governed like part of the product.</p><h2>The wolf is variance</h2><p>The wolf is not a smarter model from a competitor.</p><p>The wolf is variance: the messy spread of real-world inputs that your demo never saw.</p><p>Users misspell things. They paste half a policy. They use internal acronyms. They ask two questions at once. They omit context. They ask in anger. They ask in formats your prompt writer did not imagine.</p><p>OpenAI&#8217;s eval guidance makes the point plainly: AI systems need task-specific tests that reflect real-world conditions, edge cases, and continuous change. &#8220;Looks good to me&#8221; is an anti-pattern, not a launch gate.</p><blockquote><p>That is the trap. A demo that works ten times in a row does not tell you how the system behaves on the eleventh thousand.</p></blockquote><h2>The launch review should expose the failure mode</h2><p>You do not need a 40-point governance checklist to spot straw prompting.</p><p>You need a launch review that separates symptoms from causes. Run it against five patterns:</p><ul><li><p>Polished examples, not error rates &#8212; the team tested the demo, not the system. Require a real-input eval set before launch.</p></li><li><p>No one can name the top failure modes &#8212; the risk model is missing. Write the failure list before tuning the prompt.</p></li><li><p>Output quality judged by vibes &#8212; the team has no shared standard. Create a rubric before reviewing answers.</p></li><li><p>&#8220;Guardrails later&#8221; in the plan &#8212; controls are being treated as cleanup. Define refusal, escalation, and source-of-truth rules now.</p></li><li><p>The prompt writer also wrote the tests &#8212; the eval is likely overfit to the happy path. Add messy cases from real usage, edge conditions, policy boundaries, and expert review.</p></li></ul><p>The point is not to punish the team for moving quickly. It is to find the root cause while the cost of fixing it is still low.</p><h2>Build the test before the prompt</h2><p>The fix is not a giant governance program. It is a different order of operations.</p><p>Start with the test.</p><p>For any AI feature, collect 100 to 300 real or realistic inputs from the hardest part of the workflow. Include short requests, long requests, ambiguous requests, boundary cases, policy conflicts, and cases where the right answer is &#8220;I do not know.&#8221;</p><p>Then write the rubric.</p><p>What counts as correct? What counts as unsafe? When should the model refuse? When should it escalate to a human? What evidence should it cite? What must it never invent?</p><p>Only then tune the prompt.</p><p>The prompt has to earn its way through the mess. Otherwise you are not improving the product. You are decorating the demo.</p><h2>The brick house has gates, owners, and reruns</h2><p>Once the root cause is clear, prevention is boring on purpose.</p><p>Brick-house AI rollouts put four controls in place before launch:</p><ol><li><p><strong>A golden set.</strong> The small, trusted collection of examples that defines what &#8220;good&#8221; looks like. It includes normal cases, edge cases, and failures you never want to see twice.</p></li><li><p><strong>A named owner.</strong> Someone owns the eval set, the rubric, and the launch threshold. Without an owner, quality becomes a meeting mood.</p></li><li><p><strong>Escalation rules.</strong> The system knows when to refuse, when to cite the source of truth, and when to send the user to a person.</p></li><li><p><strong>Scheduled reruns.</strong> Models change. Prompts change. Retrieval changes. Product policy changes. The system that passed Tuesday may not pass next Tuesday.</p></li></ol><h2>What changes Monday</h2><p>Pick the next AI feature waiting for launch approval.</p><p>Do one thing: replace the demo review with an eval review.</p><p>The launch decision should not be: &#8220;Did the examples look good?&#8221;</p><p>It should be:</p><p><strong>Did the system pass the real cases, against the written rubric, with clear escalation rules and an owner for reruns?</strong></p><p>If not, do not approve the launch.</p><p>Approve the demo, if you want.</p><p>Just do not mistake it for a brick house.</p><h2>Sources</h2><ul><li><p>OpenAI &#8212; Evals: https://platform.openai.com/docs/guides/evals</p></li><li><p>OpenAI Cookbook &#8212; Evals design guide: https://cookbook.openai.com/examples/evaluation/evals_design_guide</p></li><li><p>NIST &#8212; AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-framework</p></li><li><p>Deeth Williams Wall &#8212; BC Tribunal Finds Air Canada Liable: https://www.dww.com/articles/bc-tribunal-finds-air-canada-liable-for-inaccurate-advice-given-by-website-chatbot</p></li><li><p>American Bar Association &#8212; BC Tribunal Confirms Companies Remain Liable: https://www.americanbar.org/groups/business_law/resources/business-law-today/2024-february/bc-tribunal-confirms-companies-remain-liable-information-provided-ai-chatbot/</p><p></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.producttheatre.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Product Theatre! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>