Are fragment identifiers that change content cloaking?

A slightly technical and complex question

The Magic SEO Ball has recently become aware of a large ecommerce website with extensive faceted navigation (currently managed with robots.txt disallows, rel=canonical and Search Console’s query parameter management feature) that is considering replacing its query parameters with URL fragments. To be more specific, consider these pretend URLs:

http://domain.tld/category/subcategory/attribute/ is a category + attribute filter; typically, adding a single attribute results in a page that is crawlable and canonical.
http://domain.tld/category/subcategory/attribute/?second-attribute is the same category + attribute except with a second attribute added; all 2+ attribute combinations get a query parameter, and all the query parameters are currently disallowed, with rel=canonical pointing to http://domain.tld/category/subcategory/attribute/.

Replacing query parameters with URL fragments might look like this:

http://domain.tld/category/subcategory/attribute/ remains crawlable and canonical.
http://domain.tld/category/subcategory/attribute/#second-attribute gets generated with two or more attributes, and http://domain.tld/category/subcategory/attribute/?second-attribute redirects to it.

Why consider this? Only because it addresses:

Crawl budget: #second-attribute will not be requested, and therefore it will not be crawled.
Duplication and canonicalization: there is no need to signal that #second-attribute is canonical to anything else, because #second-attribute will never be recognized as a duplicate in the first place.
Internal link equity: all filters pointing at #second-attribute will just be perceived as internal links pointing to the parent crawlable page.
External link equity: any backlink pointing to #second-attribute will just pass equity to the parent crawlable page.
(Bonus) Thinness: filtered combinations that are currently crawlable, but use noindex due to very low inventory counts, can be replaced with fragments as above.

In the words of one developer, upon learning of this plan and its intricate elegance: “It seems like cheating!” Doesn’t it, though? Others on the website’s engineering team have resisted this change on the grounds that doing it could be considered cloaking by Google. Wait, did they say cloaking? Yes, cloaking (seriously).

Are they right?

Magic SEO Ball says: my reply is no.

First, some gratitude

One thing the Magic SEO Ball would like to make very clear is that he or it appreciates software developers’ concern about cloaking and its SEO risks, which can be real and serious, because usually this sort of conversation goes precisely the opposite way: devs want to do something fancy-like, SEO says that this is concerning, and then gets steamrolled by opposition. So this kind of partnership is rare and welcome.

What is cloaking?

What Google says about cloaking is, “Cloaking refers to the practice of presenting different content or URLs to human users and search engines,” but what they mean is something like this: “Cloaking refers to the practice of presenting different content or URLs to human users and search engines in order to trick search engines into making a page rank better than it otherwise would.”

(The last part was left out, no doubt, because they assumed it would be totally obvious.)

In its classic presentation, consider http://domain.tld/blue-widgets/, a product listing page that displays dozens of blue widgets but that fails to rank for the query [blue widgets]. An SEO strategy might be to display that product listing page to users on its URL, but instead to display to Googlebot, using user-agent detection or IP range detection, an 800 word essay on the subject of blue widgets on the same URL. This is cloaking because it is fundamentally an attempt to deceive Google into letting a URL rank in search results for a query when it otherwise would not.

The essence of cloaking is in the final five words of this sentence (emphasis added):

Cloaking is considered a violation of Google’s Webmaster Guidelines because it provides our users with different results than they expected.

Again, if Google lets http://domain.tld/blue-widgets/ rank for [blue widgets] because they have indexed it with an essay about blue widgets that only they can see, and searchers reach it but it only has products and no essay, then that is cloaking.

What is not cloaking?

On the other hand, if the company were to redirect http://domain.tld/blue-widgets/ to http://domain.tld/#blue-widgets/, Google would never consider that cloaking, because the company would in essence be trying not to rank for [blue widgets].

In this case, with http://domain.tld/blue-widgets/ replaced by http://domain.tld/#blue-widgets/, then http://domain.tld/#blue-widgets/ would simply never be indexed and therefore never get search traffic – it’s a strategy to keep something out of Google’s index, rather than a strategy to change the appearance of something in Google’s index – so is not and could never be considered cloaking.

Here’s some Matt Cutts doing some of his famous subtlety:

Now let’s have Googlebot come and ask for a page as well. And you give Googlebot a page…. Cloaking is when you show different content to users and to Googlebot….

Did you catch that? In order for it to be cloaking, Googlebot has to ask for (ie, request) a page – but Googlebot can not and will not request a URL fragment, and no fragment will be served to them: they will only ever request either http://domain.tld/ or http://domain.tld/blue-widgets/, but never http://domain.tld/#blue-widgets/.

Cutts continues with a superficially uninteresting, but ultimately revealing, example about pornography:

It’s a hugely bad experience [for searchers to land on pornography when that is not what they intended]; people complain about it. It’s an awful experience for users.

If http://domain.tld/blue-widgets/ changed to http://domain.tld/#blue-widgets/, then Google would never send search traffic to http://domain.tld/#blue-widgets/. Because no searchers would reach it, there could never be a negative experience for searchers like in the Cutts porn example.

Back to our example site

In context, the question about cloaking is even more bizarre because the site in question currently uses robots.txt disallows to prevent search engines from seeing URLs with query parameters. As a result of those URLs not being crawled, they are not indexed and do not get search traffic. They still exist, though; users can navigate to them and in the most literal possible way, they show different content to those users than they show to search engines: the users see filtered product listing pages and the search engines see only inaccessible URLs.

Using robots.txt disallows and/or rel=nofollows to prevent search engines from seeing pages is not any less cloaking than removing those pages and replacing them with fragments.

Common guidance on faceted navigation

Faceted navigation is a very complex subject, but like most SEO guidance around the web, guidance on this subject is effective primarily for beginner to intermediate SEOs, working on small to medium sized sites. For a site with a dozen categories and maybe a half dozen facets across each, using Search Console’s parameter management tool and rel=canonical or meta noindex is likely to be satisfactory.

It’s only when sites get rather more complex – thousands of categories, hundreds of thousands of desired category + attribute combinations, millions of product SKUs and backlinks – that the basic advice starts to break down and strategic technical SEO leadership must be consulted to devise a plan that will work for search engines as well as for humans. Otherwise, expect bots to get lost in a spider trap of trillions of URLs, or to throw away a substantial amount of link equity pointing to disallowed URLs.

Why aren’t other large, complex sites using URL fragments for faceted navigation?

Because they don’t care about SEO, of course, or because they don’t care enough about SEO to invest in it.

Or they don’t have a substantial amount of backlinks pointing to disallowed URLs, and are able to address the internal link equity issue satisfactorily by using javascript to hide links (good for them!).

Or because they don’t have the engineering talent to pull it off.

Or because they thought seriously about it and looked around the web, but couldn’t find anyone else doing it, and their normal practice is to bury any unusual proposal with requests for “comps,” so it didn’t get done.

Or because they don’t care about users sharing URLs, so they went with the idea of not updating URLs when filters are applied (I see you, TripAdvisor!).

Or some combination of these reasons.

Or they will, when they finally are able to hire a new SEO lead who tells them to do it.

Another objection

This isn’t how fragment identifiers were meant to be used. They were supposed to be URL anchors only.

Let the Magic SEO Ball’s operator tell you something about how things were meant to be. He remembers what javascript was like when it was first invented, because he used it then on his personal website in high school to annoy visitors with interminable popups (for real – the only option was to click “ok,” but clicking “ok” just meant more and more popups until the browser had to be force-quit). Javascript, in short, was meant to add interactivity to World Wide Web pages, which were static HTML documents.

A decade later, he was pretty shocked to see entire user interfaces – “web applications” – built out of javascript with URLs like this popular one: https://mail.google.com/mail/u/0/#inbox/.

A decade after that, javascript had actually managed to migrate from browsers to web servers, where it was generating entire sites, backend and frontend. And in that form, it’s used now for a variety of applications that aren’t even directly related to the web at all (narrowly defining “the web”).

Should technologies be used in the ways that they were originally intended? Maybe they should – it depends.

Should technologies also be used for new and different things? Maybe they should – it depends.

But objecting to an idea on the conservative grounds that it uses a technology in a way that’s different from its intended use is not likely to be a recipe for long term success in a competitive industry like ecommerce.