Google / Anthropic cross study: stop guessing after each Google update

Is your e-commerce site ready for AI buyer agents?

With each core update, the teams rewrite their story after the fact. The Google leak provides a mathematical grid that explains increases and decreases. Enough to move from conjecture to decision.

After each update Googlethe same ritual: everyone explains the day before what we didn’t know the day before. It’s expensive and unreliable. The Google leak opens another avenue: a formula that predicts why a site rises or falls, tested on a real case. For a manager, the issue is not technical — it is to decide on measurable bases.

The scenario returns with each core update.

Traffic is moving. The team meets. Someone offers an explanation. Another disputes it. We build a story. We present it to the management committee. And three months later, with the next update, we rewrite this story in.

For years, SEO management has been based on collective intuition. We observed an effect, we invented a cause for it. No one could prove the connection. Google’s engine was a black box, and each team told it their own way.

In May 2024, this black box opened ajar. Google’s ContentWarehouse leak exposed 14,014 internal attributes. At the same time, Anthropic published the claude-code repository as open source, with 18 submodules documented in a PARITY.md file. Two rare windows into the actual mechanics of a ranking system.

We crossed these two sources with four real cases. And we derived a measurable grid from it. Not an opinion. A formula.

For a manager, the issue is not technical. It is decision-making. We can finally move from conjecture to a decision grid.

The hidden cost of guesswork

Let’s pose the real problem. It’s not that SEO is moving. It’s that we don’t know why he’s moving.

When a team does not have a reading grid, it produces stories. And a story constructed after the fact has a very real cost, which can be read on three lines of the income statement.

Leadership time. Each update triggers its cycle. Crisis meetings, analysis notes, back and forth between marketing and management. We mobilize the most expensive people in the organization to produce an unverifiable explanation. This time never comes back. And it comes back with each update.

Budget allocation. This is the heaviest cost, because it is invisible. If we attribute a decline to the wrong cause, we correct the wrong thing. Let’s take a hypothetical example: a team loses a large part of its traffic, concludes that there is an internal networking problem, and launches a redesign project lasting several months. If the real cause was elsewhere, this budget is lost twice. Once on site useless. Once on the real problem, which continues to bleed while we look elsewhere.

The credibility of the team. By dint of explaining each movement with a new theory, we end up explaining nothing. The management committee feels it. When the SEO manager says “this time it’s because of X”, after having said “it was because of Y” in the previous quarter, trust erodes. And a team that no longer inspires confidence loses its budget, then its influence.

Three lines, therefore: time, money, trust. Conjecture drains all three of them, silently, with each update.

Cyclicity makes everything worse. An a posteriori account is never verified, therefore it is never corrected. The team reproduces the same update loop after update. We don’t learn. We tell.

A measurable grid acts exactly on these three lines. It reduces the time spent guessing. It directs the budget towards the good cause. And she gives the team a verifiable word.

What a formula changes for driving

This is the heart of our work.

By crossing the 14,014 attributes of the leak with the 18 sub-modules of Anthropic, we isolated 38 dimensions which structure the quality signal. Not 38 fuzzy factors. 38 identifiable dimensions, which can be named and tracked.

Among them, the quality embeddings system, called NSR in the leak, is based on 17 distinct modules. This is the first time that we can put down a map rather than an intuition.

But the most useful output for a leader is not the list. It’s the mechanics.

A page’s quality score does not add up. It multiplies.

We formalized it like this:

Q = baseline × ∏(1 − δ_k)

Let’s just read it. Each page starts from a basic score. Then each identified weakness, each δ_k, removes a fraction of this score. And these fractions do not add up: they multiply with each other.

This is quite the contrast to the usual mental model.

The additive model reassures: “I have ten minor defects, so a small loss”. Fake. In a multiplicative model, ten minor defects stack up and can collapse a score. A page that is correct on 37 dimensions but weak on just one, the good one, can drop suddenly.

For piloting, it is a change of doctrine. Three concrete consequences for whoever decides.

We prioritize instead of doing everything. If not all weaknesses weigh the same, then dealing first with the one that multiplies the loss the most yields more than dealing with ten small flaws. The grid tells where to put the effort. The budget goes to the real lever, not the most visible one.

We decide before the update, not after. A page is graded continuously. We no longer wait for Google’s verdict to know where we are weak. We know it before. Management becomes anticipatory instead of reactive.

We speak a common language. A Q score is a number that a marketing director, a manager and a developer read the same way. No more debates of opinion in meetings. We look at the grid.

It’s no longer about “checking boxes”. This involves identifying the multiplicative penalties that weigh the most, and treating them as a priority. A decision grid, not a checklist.

The proof: the HouseFresh case

A formula is worthless without testing. We applied it to a known and documented case: HouseFresh.

HouseFresh is an independent editorial site that lost most of its visibility during an update. The usual narrative would have been: “Google favored big brands.” Explanation a posteriori, unverifiable.

We did the opposite. We applied the formula, dimension by dimension.

The result: a Q score of 0.09.

On a scale where 1 represents full quality, 0.09 describes an almost completely neutralized page. And this result was not invented to fit the case. It arises from the product of the penalties identified via the leak.

That’s what the shift is. The fall of HouseFresh ceases to be a story. It becomes a reproducible calculation. The formula, nourished by the attributes of the leak, finds the real extent of the collapse.

A leader can do the same thing on their own pages. Identify the active δ_k. Calculate the Q. And decide on that basis, not on a meeting intuition. The story says “we were downgraded”. The grid says “here are the penalties that lowered the score, and here is which one to deal with first”. The difference is entirely in the action that follows.

Data governance as an advantage

Here is the strategic consequence that few teams see.

If the score multiplies, then the own data on its own pages becomes an asset. Not a technical detail. A competitive advantage.

Most organizations manage their SEO without an internal grid. They react. They wait for the update, then tell. The one that equips itself with a measurable grid does the opposite: it evaluates its pages continuously, before Google decides.

This shifts the advantage. He no longer wants to “produce more content” but to “know the real state of each page on the dimensions that matter”. The company that knows where it is losing fractions of its score corrects it before the fall. The others discover the fall after the fact.

Data governance therefore becomes a management decision, and not a subject for experts. She asks questions that naturally go back to the committee.

Who measures the 38 dimensions, and at what rate? An annual measurement is useless; a continuous measurement becomes a dashboard.

Who decides on correction priorities when the grid designates several? It is a budgetary arbitration, therefore a management decision.

Who owns the data? An internal grid, maintained over time, becomes a heritage. It accumulates the history of scores, update after update. She learns where conjecture forgot.

These questions relate to budget allocation and traffic predictability. They are no longer technical. They are strategic.

A measurable grid transforms SEO from an uncertain expense item into a manageable system. This is exactly what a leader is looking for: predictability, and an asset that strengthens over time.

What this doesn’t explain

Honesty is part of the method. A credible grid names its own limits. And it is precisely this honesty that makes it usable in committee.

The leak doesn’t reveal everything.

We measured its coverage rate. The Google leak explains approximately 57% of the observed signal. A majority, therefore, but not all. Part of the classification remains out of reach, in systems not exposed by the leak.

The merger with Anthropic confirms this caution. By comparing the submodules of claude-code to the dimensions of the leak, we measured a parity on module OP_11 of 0.63. A real convergence between two independent systems, but partial. The architectures are similar without overlapping.

What to conclude for piloting?

That the grid is not a crystal ball. It doesn’t predict every move. It does not replace judgment.

This 57% saturation is not a weakness to hide. It is a force of governance. It imposes healthy decision-making humility.

A manager who knows that his grid covers 57% of the pilot signal better than a manager who thinks he has everything under control. He knows where to rely on measurement, and where to keep a margin of caution. He doesn’t overreact to the 43% he doesn’t see. He focuses his decisions where the data is solid.

This is the exact opposite of the conjecture. Conjecture promises total and unverifiable certainty. The grid offers a partial and verifiable measurement. A part known, and a part assumed to be unknown.

This known part is already enough to change decisions. When 57% of the signal becomes readable and reproducible, we stop blindly arbitrating on the whole thing. We act on what we measure. We monitor the rest with lucidity. It is an adult mode of driving, where conjecture remained a permanent bet.

And a 57% measure, reproducible and tested, is infinitely better than a 100% story invented after the fact.

This is the real shift for a leader: we no longer try to explain everything. We seek to measure what is measurable, to decide on this basis, and to remain lucid about the rest.

Conjecture promises false certainty. The grid offers an honest measurement. Which one would you prefer to put before your management committee?

Leave a Reply

Your email address will not be published. Required fields are marked *