The thesis behind the Retail Decision Stack and the Cost Per Member-Year piece is the same at the core: feed Performance Max a value-based conversion signal that reflects what the business actually earns, and Google's AI will bid toward higher-value cohorts. The argument is structural and the math works on a whiteboard. The question this experiment exists to answer is narrower and harder: how fast does the audience mix actually move once you change the signal?

The reason this matters: every account I have ever audited has had a version of the same conversation. "We will switch PMax to value-based bidding next quarter, once we have the data plumbing right." The team is right that the plumbing matters. They are usually wrong about the timeline they assume Google's AI needs to respond. If the AI moves in two weeks, the cost of waiting another quarter is real. If it needs ninety days, the prep work is the right priority. I want a calibrated answer, not a guess.

Audience mix shifts measurably within 14 days. Reporting lags.

The hypothesis has two parts. First, that a single value-based conversion signal — passed to PMax as a custom conversion value weighted by predicted retention — produces a measurable shift in audience mix toward higher-AOV cohorts within fourteen days, holding creative, budget, targeting, and asset groups constant. Second, that the shift will appear in the audience-signal-strength reporting before it appears in the cleaner cohort dashboards a CMO would actually look at. The thesis is that the AI moves faster than the reporting layer that lets you see the AI move.

Concretely: I am predicting that the share of conversions coming from cohorts in the top quartile of predicted member-year value (or, in a retail variant, predicted retention probability) rises by at least 8 percentage points within 14 days, with no other change to the campaign. If it rises less than 4 points, the hypothesis fails. The 4–8 point band is the calibration zone — informative either way, but not enough to make the case that the signal is doing the heavy lifting on its own.

A holdout structure inside one PMax campaign.

The cleanest test would be a holdout — two identical campaigns, one with the signal, one without — but PMax does not give you that without burning real budget on a placebo. Instead, the structure is a temporal holdout against the campaign's own pre-test baseline, with the audience composition reported through three different lenses to check whether the shift is real or a reporting artifact.

Test surface
One Performance Max campaign on a retail account, established 60+ days, mature signal data, no recent structural changes.
Variable changed
Conversion value model. From flat purchase value to value × predicted retention coefficient at the cohort level.
Held constant
Budget, asset groups, creative, audience signals, geographic targeting, bidding strategy type.
Measurement window
14 days post-change, with a 14-day pre-change baseline. Cohort tagging on the conversion side runs in parallel both periods.

The reporting layer is the trickier part. PMax's own audience reporting is averaged and lagged. To see the mix shift early, the conversion data has to be enriched downstream — every conversion tagged with the cohort it came from at the moment it fired, then rolled up daily outside the platform. That instrumentation is the actual cost of running the experiment honestly. Without it, the test becomes "did total revenue go up," which is not what is being tested.

"The hardest part of this experiment is not the AI. It is the discipline of measuring the right thing, on a schedule fast enough to catch the shift before it gets averaged away."

In progress. Mid-run notes below.

The experiment is currently in its second week. I will publish the full result here, with the daily mix-shift chart, when the 14-day window closes. What I can say at the mid-point: the audience-signal-strength reporting shifted on day five — earlier than I expected, and earlier than the cohort dashboard caught it. That confirms half of the hypothesis (the AI moves faster than the cleaner reporting) regardless of where the final number lands. I do not yet know whether the magnitude will clear the 8-point bar. If it does not, the result is still useful: it means the signal works but the timeline assumption I have been arguing for needs to lengthen.

I want to flag something I did not predict. The first three days after the signal change saw conversion volume drop by ~6 percent before recovering. That is consistent with the AI re-exploring its bid space rather than a real demand drop, but it is a real phenomenon CMOs should expect, and one a quarterly review cycle would never catch. It is an argument for running this kind of switch when the team has bandwidth to hold the line for a week, not the week before AEP or the week of a launch.

What changed in the thinking

The signal-change shock window is real and short. The next experiment should isolate it.

Going in, I treated the post-change shock as noise. Mid-run, it looks more like signal — a short, predictable re-exploration window that is probably worth its own experiment to characterize. Experiment #002 will test whether the shock window can be shortened by warming the signal in a held-out asset group first, before flipping the campaign. The original hypothesis is still in flight. The unexpected finding is more interesting than the original question.

Status: in progress. Final result and mix-shift chart will be added to this entry when the 14-day window closes. Subscribers to Stay Sharp get the update before it's indexed.