A slightly technical and complex question *
The Magic SEO Ball has recently become aware of a large ecommerce website with extensive faceted navigation (currently managed with robots.txt disallows, rel=canonical and Search Console’s query parameter management feature) that is considering replacing its query parameters with URL fragments. To be very specific, consider these pretend URLs:
http://domain.tld/category/subcategory/attribute/is a category + attribute filter; typically, adding a single attribute results in a page that is crawlable and canonical.
http://domain.tld/category/subcategory/attribute/?second-attributeis the same category + attribute except with a second attribute added; all 2+ attribute combinations get a query parameter, and all the query parameters are currently disallowed, with rel=canonical pointing to
Replacing query parameters with URL fragments would look like this:
http://domain.tld/category/subcategory/attribute/remains crawlable and canonical.
http://domain.tld/category/subcategory/attribute/#second-attributegets generated with two or more attributes, and
http://domain.tld/category/subcategory/attribute/?second-attributeredirects to it.
Why consider this? Only because it addresses:
- Crawl budget:
#second-attributewill not be requested, and therefore it will not be crawled.
- Duplication and canonicalization: there is no need to signal that
#second-attributeis canonical to anything else, because
#second-attributewill never be recognized as a duplicate in the first place.
- Internal link equity: all filters pointing at
#second-attributewill just be perceived as internal links pointing to the parent crawlable page.
- External link equity: any backlink pointing to
#second-attributewill just pass equity to the parent crawlable page.
- (Bonus) Thinness: filtered combinations that are currently crawlable, but use noindex due to very low inventory counts, can be replaced with fragments as above.
In the words of one developer, upon learning of this plan and its intricate elegance: “It seems like cheating!” Doesn’t it, though? Others on the website’s engineering team have resisted this change on the grounds that doing it could be considered cloaking by Google. Wait, did they say cloaking? Yes, cloaking (seriously).
Are they right?
Magic SEO Ball says: my reply is no.
First, some gratitude *
One thing the Magic SEO Ball would like to make very clear is that he or it appreciates software developers’ concern about cloaking and its SEO risks, which can be real and serious, because usually this sort of conversation goes precisely the opposite way: devs want to do something fancy-like, SEO says that this is concerning, and then gets steamrolled by opposition. So this kind of partnership is rare and welcome.
What is cloaking? *
What Google says about cloaking is, “Cloaking refers to the practice of presenting different content or URLs to human users and search engines,” but what they mean is something like this: “Cloaking refers to the practice of presenting different content or URLs to human users and search engines in order to trick search engines into making a page rank better than it otherwise would.”
(The last part was left out, no doubt, because they assumed it would be totally obvious.)
In its classic presentation, consider
http://domain.tld/blue-widgets/, a product listing page that displays dozens of blue widgets but that fails to rank for the query [blue widgets]. An SEO strategy might be to display that product listing page to users on its URL, but instead to display to Googlebot, using user-agent detection or IP range detection, an 800 word essay on the subject of blue widgets on the same URL. This is cloaking because it is fundamentally an attempt to deceive Google into letting a URL rank in search results for a query when it otherwise would not.
The essence of cloaking is in the final five words of this sentence (emphasis added):
Cloaking is considered a violation of Google’s Webmaster Guidelines because it provides our users with different results than they expected.
Again, if Google lets
http://domain.tld/blue-widgets/ rank for [blue widgets] because they have indexed it with an essay about blue widgets that only they can see, and searchers reach it but it only has products and no essay, then that is cloaking.
What is not cloaking? *
On the other hand, if the company were to redirect
http://domain.tld/#blue-widgets/, Google would never consider that cloaking, because the company would in essence be trying not to rank for [blue widgets].
In this case, with
http://domain.tld/blue-widgets/ replaced by
http://domain.tld/#blue-widgets/ would simply never be indexed and therefore never get search traffic – it’s a strategy to keep something out of Google’s index, rather than a strategy to change the appearance of something in Google’s index – so is not and could never be considered cloaking.
Here’s some Matt Cutts doing some of his famous subtlety:
Now let’s have Googlebot come and ask for a page as well. And you give Googlebot a page…. Cloaking is when you show different content to users and to Googlebot….
Did you catch that? In order for it to be cloaking, Googlebot has to ask for (ie, request) a page – but Googlebot can not and will not request a URL fragment, and no fragment will be served to them: they will only ever request either
http://domain.tld/blue-widgets/, but never
Cutts continues with a superficially uninteresting, but ultimately revealing, example about pornography:
It’s a hugely bad experience [for searchers to land on pornography when that is not what they intended]; people complain about it. It’s an awful experience for users.
http://domain.tld/blue-widgets/ changed to
http://domain.tld/#blue-widgets/, then Google would never send search traffic to
http://domain.tld/#blue-widgets/. Because no searchers would reach it, there could never be a negative experience for searchers like in the Cutts porn example.
Back to our site *
In context, the question about cloaking is even more bizarre because the site in question currently uses robots.txt disallows to prevent search engines from seeing URLs with query parameters. As a result of those URLs not being crawled, they are not indexed and do not get search traffic. They still exist, though; users can navigate to them and in the most literal possible way, they show different content to those users than they show to search engines: the users see filtered product listing pages and the search engines see only inaccessible URLs.
Using robots.txt disallows and/or rel=nofollows to prevent search engines from seeing pages is not any less cloaking than removing those pages and replacing them with fragments.
Common guidance on faceted navigation *
Faceted navigation is a very complex subject, but like most SEO guidance around the web, guidance on this subject is effective primarily for beginner to intermediate SEOs, working on small to medium sized sites. For a site with a dozen categories and maybe a half dozen facets across each, using Search Console’s parameter management tool and rel=canonical or meta noindex is likely to be satisfactory.
It’s only when sites get rather more complex – thousands of categories, hundreds of thousands of desired category + attribute combinations, millions of product SKUs and backlinks – that the basic advice starts to break down and strategic technical SEO leadership must be consulted to devise a plan that will work for search engines. Otherwise, expect bots to get lost in a spider trap of trillions of URLs, or to throw away a substantial amount of link equity pointing to disallowed URLs.
Why aren’t other large, complex sites using URL fragments for faceted navigation? *
Another objection *
This isn’t how fragment identifiers were meant to be used. They were supposed to be URL anchors only.
Should technologies be used in the ways that they were originally intended? Maybe they should – it depends. Should they also be used for new and different things? Maybe they should – it depends. But objecting to an idea because it uses a technology in a way that’s different from its intended use is pretty uninspired.