Everyone across the marketing universe is talking about the Google leaks, be it our LinkedIn feeds flooded with reposts from marketing influencers, our very own Rival Amp community, or the dozens of podcasts that are screaming about ‘gaming the system’. Some believe that Google did this intentionally, to take away the limelight from Bard’s unsurprisingly inaccurate answers, while others feel that none of this is valid anymore, and that’s why it was leaked. In this article, I’m going to discuss my POV on the recent #GoogleLeaks and what it might mean for you as a marketer.
TLDR of the Google Leak:
- A massive leak of over 2,500 internal Google documents revealed sensitive information about how Google's search engine ranks websites.
- The leaked documents contradict previous statements from Google that factors like click data, site authority, and online brand mentions are not used for ranking websites in search results.
- Google appears to have whitelists in place for certain authoritative sites and domains like travel sites, local COVID-19 authorities, election sites etc. This goes against Google's claims of ranking solely based on relevance.
- The leaked documents suggest Google uses various click-related metrics like click lengths, impressions, and "good vs bad" clicks to influence search rankings. This contradicts previous statements from Google that clicks are not used directly for ranking.
- The leak mentions factors like "smallPersonalSite" which implies Google may categorize and potentially disadvantage small personal websites in rankings compared to larger, more established publishers.
- While Google has confirmed the leak of internal documents, it remains unclear if the leaked data is current or specifically used for live search rankings.
Let’s Get Right Into It
Generally, the majority of these leaks are unsurprising.
It underlines the point that has always been true that Google’s ranking algorithm is not a monolithic entity, but really a series of multiple functions with many inputs and signals. This makes sense - Google, way back when actually became the dominant search engine because of its breakthrough programming methodology called MapReduce. MapReduce allows for parallelized, simultaneous operations of indexing and prioritization of data across multiple servers. That’s how they became so good at serving up information so well - solving a programming problem that balances multiple highly complex inputs at once.
This approach and competency has continued in their core product for a long time. This is not surprising and is one of the reasons why we should never take SEO or other practitioners seriously when they refer to a monolithic 'algorithm' and simple hacks to game systems that exist within a highly complex ecosystem of functions.
Implications for SEO and Transparency
This is also a learning that we can apply to other platforms even if we are not ourselves programmers: Algorithms that recommend content in any context on the internet are not one dimensional and there is no single metric, tactic, or approach that will game “the algorithm” for long.
Secondly, it underscores the point that Google is not a transparent entity and has never been. We work in marketing, and we understand the power of a brand - while Google is a long way removed from the bubble font logo and “Don’t Be Evil” vibe, a large part of their brand for a long time has been a friendlier kind of company. This ranges from employee benefits to pioneering the groundbreaking mechanism of stable, generalized second-price auctions that charge per result, seemingly aligning their incentives with advertiser outcomes. While this is true at least in part, that does not mean that it is true in its entirety.
The majority of the ranking factors and scoring characteristics disclosed in this leak seem to be at least in part aimed at surfacing genuinely high-quality content for good user experience, but Google has been explicitly and intentionally misdirecting on these factors for a long time. Whether or not that is better for users or is simply to protect their trade secrets is certainly debatable and I won’t necessarily ascribe any kind of malicious intent here, but it leads to the conclusion that platform recommendations cannot and should not ever form the entirety of our position on how to use that platform. A corollary to this some of you may have heard me say: Best practice recommendations will unlock 70% of the value of a platform, and our testing, strategy, and curiosity must do the rest. This is going to be particularly the case for another Google product that relies heavily on a complex system of signals and machine learning - Performance Max ads.
Much like the SEO leaks here, we need to focus on our own testing to develop our POV, as many folks have reproducibly tested their conjectures on many of the now-disclosed factors in this leak for a long time.
A Few Other Interesting Tidbits
Beyond the conclusions we can draw about the nature of the beast, there’s a few specific things we can take from the technical details included in this leak.
- Small brands and sites are likely inherently disadvantaged: This is true in the abstract unrelated to this leak (there is extensive research that proves that small brands that have little mental availability/are not as established advertise and people associate their advertising with the dominant brands in their category), but also specific factors in this leak like the identified object of ‘SmallPersonalSite’ or the 'Helpful Content Update' backlash from small businesses lend some evidentiary credence to this.
- Distinctiveness and high-quality content actually matter: Several attributes impacting rankings recognise user experience factors as well as authorship of documents, and content which has been automatically annotated. There are already the beginnings of flags to identify human-authored content vs automatically annotated content, which will have big impacts on brands using AI to generate a high volume of content: again, it’s already known that ChatGPT and other LLMs, (while indistinguishable in many cases in small sample sizes to human content or to individual human judges), overuse certain words and sentence structure - for example, ChatGPT is 300x more likely to use the word “delve” than a human being is. While it is early days, the ability of content recommendation algorithms to recognise human annotated content (in this leak, referred to as the object GoldStandardDocument) will likely improve and impact rankings. I’m reaching a bit here as some of the documentation for what this object constitutes is light and missing context, but I’m willing to make that guess.
- Recency matters: Fresh updates to highly trafficked pages yield better results. Google does in fact prioritize more recent content. Great news, we all still have jobs and constantly need to feed the machine. But this makes sense - closeness of content subject matter to the search query and how long people spend on the page matters as well, and as searcher behaviors change subtly over time, so will the relevance of content. Stay on top of what your customers and visitors are interested in and make sure your content stays fresh.
In Summary
The Google leaks underscore what we already knew - Google's business is intricate and opaque, but has some impressive capabilities serving up content that users want (AI Search Overview results aside which is a subject for a different column)...
While some insights validate our existing knowledge, the most important takeaway is that we cannot simply follow Google's recommendations or rely on simplistic tactics - and anyone who sells their close relationship with Google or any other media owner without a critical investigation into the nature of that platform is doing you a massive disservice, either for profit or out of ignorance. Rigorous testing, strategic thinking, and curiosity have always been, and will always remain imperative to unlocking any platform’s true value.