Natan Gesher

Are fragment identifiers that change content cloaking?

March 1, 2019 By Natan Gesher

A slightly technical and complex question

The Magic SEO Ball has recently become aware of a large ecommerce website with extensive faceted navigation (currently managed with robots.txt disallows, rel=canonical and Search Console’s query parameter management feature) that is considering replacing its query parameters with URL fragments. To be more specific, consider these pretend URLs:

http://domain.tld/category/subcategory/attribute/ is a category + attribute filter; typically, adding a single attribute results in a page that is crawlable and canonical.
http://domain.tld/category/subcategory/attribute/?second-attribute is the same category + attribute except with a second attribute added; all 2+ attribute combinations get a query parameter, and all the query parameters are currently disallowed, with rel=canonical pointing to http://domain.tld/category/subcategory/attribute/.

Replacing query parameters with URL fragments might look like this:

http://domain.tld/category/subcategory/attribute/ remains crawlable and canonical.
http://domain.tld/category/subcategory/attribute/#second-attribute gets generated with two or more attributes, and http://domain.tld/category/subcategory/attribute/?second-attribute redirects to it.

Why consider this? Only because it addresses:

Crawl budget: #second-attribute will not be requested, and therefore it will not be crawled.
Duplication and canonicalization: there is no need to signal that #second-attribute is canonical to anything else, because #second-attribute will never be recognized as a duplicate in the first place.
Internal link equity: all filters pointing at #second-attribute will just be perceived as internal links pointing to the parent crawlable page.
External link equity: any backlink pointing to #second-attribute will just pass equity to the parent crawlable page.
(Bonus) Thinness: filtered combinations that are currently crawlable, but use noindex due to very low inventory counts, can be replaced with fragments as above.

In the words of one developer, upon learning of this plan and its intricate elegance: “It seems like cheating!” Doesn’t it, though? Others on the website’s engineering team have resisted this change on the grounds that doing it could be considered cloaking by Google. Wait, did they say cloaking? Yes, cloaking (seriously).

Are they right?

Magic SEO Ball says: my reply is no.

First, some gratitude

One thing the Magic SEO Ball would like to make very clear is that he or it appreciates software developers’ concern about cloaking and its SEO risks, which can be real and serious, because usually this sort of conversation goes precisely the opposite way: devs want to do something fancy-like, SEO says that this is concerning, and then gets steamrolled by opposition. So this kind of partnership is rare and welcome.

What is cloaking?

What Google says about cloaking is, “Cloaking refers to the practice of presenting different content or URLs to human users and search engines,” but what they mean is something like this: “Cloaking refers to the practice of presenting different content or URLs to human users and search engines in order to trick search engines into making a page rank better than it otherwise would.”

(The last part was left out, no doubt, because they assumed it would be totally obvious.)

In its classic presentation, consider http://domain.tld/blue-widgets/, a product listing page that displays dozens of blue widgets but that fails to rank for the query [blue widgets]. An SEO strategy might be to display that product listing page to users on its URL, but instead to display to Googlebot, using user-agent detection or IP range detection, an 800 word essay on the subject of blue widgets on the same URL. This is cloaking because it is fundamentally an attempt to deceive Google into letting a URL rank in search results for a query when it otherwise would not.

The essence of cloaking is in the final five words of this sentence (emphasis added):

Cloaking is considered a violation of Google’s Webmaster Guidelines because it provides our users with different results than they expected.

Again, if Google lets http://domain.tld/blue-widgets/ rank for [blue widgets] because they have indexed it with an essay about blue widgets that only they can see, and searchers reach it but it only has products and no essay, then that is cloaking.

What is not cloaking?

On the other hand, if the company were to redirect http://domain.tld/blue-widgets/ to http://domain.tld/#blue-widgets/, Google would never consider that cloaking, because the company would in essence be trying not to rank for [blue widgets].

In this case, with http://domain.tld/blue-widgets/ replaced by http://domain.tld/#blue-widgets/, then http://domain.tld/#blue-widgets/ would simply never be indexed and therefore never get search traffic – it’s a strategy to keep something out of Google’s index, rather than a strategy to change the appearance of something in Google’s index – so is not and could never be considered cloaking.

Here’s some Matt Cutts doing some of his famous subtlety:

Now let’s have Googlebot come and ask for a page as well. And you give Googlebot a page…. Cloaking is when you show different content to users and to Googlebot….

Did you catch that? In order for it to be cloaking, Googlebot has to ask for (ie, request) a page – but Googlebot can not and will not request a URL fragment, and no fragment will be served to them: they will only ever request either http://domain.tld/ or http://domain.tld/blue-widgets/, but never http://domain.tld/#blue-widgets/.

Cutts continues with a superficially uninteresting, but ultimately revealing, example about pornography:

It’s a hugely bad experience [for searchers to land on pornography when that is not what they intended]; people complain about it. It’s an awful experience for users.

If http://domain.tld/blue-widgets/ changed to http://domain.tld/#blue-widgets/, then Google would never send search traffic to http://domain.tld/#blue-widgets/. Because no searchers would reach it, there could never be a negative experience for searchers like in the Cutts porn example.

Back to our example site

In context, the question about cloaking is even more bizarre because the site in question currently uses robots.txt disallows to prevent search engines from seeing URLs with query parameters. As a result of those URLs not being crawled, they are not indexed and do not get search traffic. They still exist, though; users can navigate to them and in the most literal possible way, they show different content to those users than they show to search engines: the users see filtered product listing pages and the search engines see only inaccessible URLs.

Using robots.txt disallows and/or rel=nofollows to prevent search engines from seeing pages is not any less cloaking than removing those pages and replacing them with fragments.

Common guidance on faceted navigation

Faceted navigation is a very complex subject, but like most SEO guidance around the web, guidance on this subject is effective primarily for beginner to intermediate SEOs, working on small to medium sized sites. For a site with a dozen categories and maybe a half dozen facets across each, using Search Console’s parameter management tool and rel=canonical or meta noindex is likely to be satisfactory.

It’s only when sites get rather more complex – thousands of categories, hundreds of thousands of desired category + attribute combinations, millions of product SKUs and backlinks – that the basic advice starts to break down and strategic technical SEO leadership must be consulted to devise a plan that will work for search engines as well as for humans. Otherwise, expect bots to get lost in a spider trap of trillions of URLs, or to throw away a substantial amount of link equity pointing to disallowed URLs.

Why aren’t other large, complex sites using URL fragments for faceted navigation?

Because they don’t care about SEO, of course, or because they don’t care enough about SEO to invest in it.

Or they don’t have a substantial amount of backlinks pointing to disallowed URLs, and are able to address the internal link equity issue satisfactorily by using javascript to hide links (good for them!).

Or because they don’t have the engineering talent to pull it off.

Or because they thought seriously about it and looked around the web, but couldn’t find anyone else doing it, and their normal practice is to bury any unusual proposal with requests for “comps,” so it didn’t get done.

Or because they don’t care about users sharing URLs, so they went with the idea of not updating URLs when filters are applied (I see you, TripAdvisor!).

Or some combination of these reasons.

Or they will, when they finally are able to hire a new SEO lead who tells them to do it.

Another objection

This isn’t how fragment identifiers were meant to be used. They were supposed to be URL anchors only.

Let the Magic SEO Ball’s operator tell you something about how things were meant to be. He remembers what javascript was like when it was first invented, because he used it then on his personal website in high school to annoy visitors with interminable popups (for real – the only option was to click “ok,” but clicking “ok” just meant more and more popups until the browser had to be force-quit). Javascript, in short, was meant to add interactivity to World Wide Web pages, which were static HTML documents.

A decade later, he was pretty shocked to see entire user interfaces – “web applications” – built out of javascript with URLs like this popular one: https://mail.google.com/mail/u/0/#inbox/.

A decade after that, javascript had actually managed to migrate from browsers to web servers, where it was generating entire sites, backend and frontend. And in that form, it’s used now for a variety of applications that aren’t even directly related to the web at all (narrowly defining “the web”).

Should technologies be used in the ways that they were originally intended? Maybe they should – it depends.

Should technologies also be used for new and different things? Maybe they should – it depends.

But objecting to an idea on the conservative grounds that it uses a technology in a way that’s different from its intended use is not likely to be a recipe for long term success in a competitive industry like ecommerce.

Does UTM tracking on inbound links affect SEO?

August 27, 2017 By Natan Gesher

The Magic SEO Ball has been asked to comment on the advice of a soi-disant SEO professional, to wit, that UTM parameters on incoming links do not affect SEO:

Not if you use the rel=”canonical” tag to specify the URL that you’d like Google to rank.

To support his case, he quotes Gary “Real Time Panda” Illyes:

Yeah, although if they are not canonical, they’ll funnel the PageRank as well as other signals to the canonical URL

Is he right?

Magic SEO Ball says: concentrate and ask again.

As the Magic SEO Ball often says, SEO is easy to do, but difficult to do well.

This question is a perfect example of one that separates the people who learned a little bit about SEO by reading some blogs from the people who understand SEO very well and generate immense amounts of money for their clients and employers.

Let’s consider this a few different ways:

Ranking

Suppose you have a page on your site like https://domain.tld/page/`, with a self-referencing rel=canonical. Suppose you receive an incoming link that points tohttps://domain.tld/page/?utm_source=google`, which correctly also has a rel=canonical pointing to `https://domain.tld/page/`.

What will the effect of that link be? Will it cause the URL with the tracking parameter to outrank the clean URL? Almost certainly not. 99 times out of 100, search engines will understand and respect webmasters’ URL canonicalization.

On the rare occasions that they don’t, it’s usually because something else is wrong: perhaps rel=canonical has been implemented incorrectly sitewide, and therefore is being ignored. Or perhaps a non-canonical URL is getting the overwhelming majority of links, both internal and external.

So in this narrow case, the advice above is right, and UTM parameters will not affect your SEO.

Crawl resources

Suppose your site domain.tld has on the order of 1,000 pages, of which 900 have been crawled and 800 are currently in Google’s index.

Now suppose you’ve built out a product catalogue with 1,000,000 SKUs, each of which is to get its own page. You need to get all of those URLs crawled, right? But you haven’t even gotten the full 1,000 existing URLs crawled.

So you submit XML sitemaps in Search Console and, in order to improve your site’s crawl budget, suppose you are doing the obvious: investing in improvements to page load time and acquiring backlinks. Now suppose that all of those backlinks point to non-canonical UTM-parameter versions of your product URLs.

Do the links help you? Of course they do, but because these links don’t point to canonical URLs, they do not help you as much as they could. At a time when you need to focus on getting the bulk of your product catalogue crawled so it can be indexed and start ranking for things, your site’s crawl budget is being spent on Googlebot crawling URLs that are not meant to rank for anything, rather than URLs that are meant to rank for things.

Making the best use of a site’s crawl budget, and improving its crawl budget, are not issues that matter for tiny sites in uncompetitive niches. For big sites, though, it matters a lot.

Diffusion of link equity

Building links to your site will help your pages rank for things, and building links to the specific pages that you want to rank (with keyword-rich anchor text) will help more.

But what happens if your page https://domain.tld/blue-widgets/` doesn't get a link, buthttps://domain.tld/blue-widgets/?utm_source=google` (with correct rel=canonical) does? Search engines will pass the link equity from the non-canonical URL to the canonical URL, just as their representatives say they will. But there is no commitment, and no reason to believe, that all of the link equity will be passed. Estimates vary, but I have a difficult time believing that more than 90% of the PageRank from https://domain.tld/blue-widgets/?utm_source=google` would be passed tohttps://domain.tld/blue-widgets/`.

As with crawl budget management, diffusion of link equity is highly unlikely to be a problem for a small site, or a site in an uncompetitive niche. But when you graduate to making million-dollar SEO decisions on a daily basis, on sites with millions of pages and millions of visits, this sort of thing can be ruinous.

The advice therefore wasn’t quite entirely wrong: in many cases, it’s correct to say that links with UTM parameters will not affect your SEO. But as with a lot of SEO advice that you can read from people’s blogs who learned SEO entirely from reading other people’s blogs, it’s not quite entirely right, either: in some cases, including the cases where the most is at stake, links with UTM parameters absolutely can affect your SEO.

Listening to Google

Let’s return to what was actually said.

Gary Illyes was asked:

do utms neutralize backlink value?

(I don’t know what “neutralize” is supposed to mean in this case, but it’s hardly germane.)

He answered:

Yeah, although if they are not canonical, they’ll funnel the PageRank as well as other signals to the canonical URL

“They’ll funnel the PageRank.” Did he say that they’d funnel all the PageRank? No, he did not.

“as well as other signals.” Did he say that they’d funnel all the other signals? No, he did not.

Twitter is loads of fun, but it is not possible to answer complex SEO questions in 280 characters or fewer. My advice: don’t assume that a twitter answer is complete, especially when it appears to confirm what you think you know from very superficial experience.

Did Thumbtack really break Google’s rules?

June 9, 2015 By Natan Gesher

Recently a service called Thumbtack was penalized by Google for unnatural links, presumably as a result of this blog post.

In brief: Thumbtack does some fairly aggressive and overt link-building; Google is an investor in Thumbtack. This exposes Google to the criticism that they are allowing a business of which they are a partial owner to violate their rules, while punishing competing sites for the same types of practices.

Thumbtack’s president said: “To be clear, we do not now, nor have we ever, paid for links … We have always strived to work within Google’s guidelines.”

Was Thumbtack really breaking Google’s stated rules about building links?

Magic SEO Ball says: Yes, definitely.

It’s a common myth among inexperienced SEOs and people who don’t know very much about SEO that Google is primarily concerned with preventing people from buying and selling links. While it is true that this practice is extremely bothersome to Google, they will take action against any manipulation of the link graph that they find.

What is the link graph?

The “link graph” is a concept that search engines use to conceptualize and to visualize the connections of all hyperlinks around the web. It’s important for a lot of aspects of how search engines operate: discovery of new domains and pages, crawling them, indexing them and ranking them (as an authority metric and as a relevance metric).

PageRank, famously, was invented by Larry Page and Sergey Brin and was the backbone of what later became Google. It’s a numerical value that can be understood as representing how popular, important, valuable and trustworthy a certain web page is – based on how many other pages link to it, and which pages link to it.

What is manipulation of the link graph?

When search engines’ representatives use terms like “manipulation of the link graph,” they mean the practice of adding a lot of helpful links to a site in order to get it to rank higher, or the practice of adding a lot of harmful links to a site in order to get it to rank lower.

Because of the extreme importance to search engines of the link graph, and to Google in particular of PageRank, anyone who could successfully manipulate the link graph, causing what Google sees as the “wrong” ranking for a query, is pretty directly harming Google.

Hence, their rules:

Any links intended to manipulate PageRank or a site’s ranking in Google search results may be considered part of a link scheme and a violation of Google’s Webmaster Guidelines. This includes any behavior that manipulates links to your site or outgoing links from your site.

One example they cite is, “Buying or selling links that pass PageRank.” But several other examples include no mention of money whatsoever, such as, “Keyword-rich, hidden or low-quality links embedded in widgets that are distributed across various sites.”

While Thumbtack seems to have known better than to try the automated approach of using an embeddable – and, to Google, easily detectable – widget, their program of asking members for links from their own sites, using exact match commercial anchor text, was the same in effect.

For that reason, this half-assed defense on twitter is pretty weak:

@garthobrien not buying links, points have nothing to do w/ $$

— Sander Daniels (@sanderdaniels)

June 5, 2015

Is HTTPS a tie-breaker?

May 14, 2015 By Natan Gesher

So… last summer, Google announced that they would begin using HTTPS as a ranking signal:

…over the past few months we’ve been running tests taking into account whether sites use secure, encrypted connections as a signal in our search ranking algorithms. We’ve seen positive results, so we’re starting to use HTTPS as a ranking signal. For now it’s only a very lightweight signal — affecting fewer than 1% of global queries, and carrying less weight than other signals such as high-quality content — while we give webmasters time to switch to HTTPS. But over time, we may decide to strengthen it, because we’d like to encourage all website owners to switch from HTTP to HTTPS to keep everyone safe on the web.

Now Googler Gary Illyes has stated that HTTPS isn’t a ranking factor, but actually is just a tie-breaker:

HTTPS is a tie breaker (attribute, not a ranking factor) @methode #smxsydney #smx

— Woj Kwasi (@WojKwasi)

May 14, 2015

More analysis here:

Google uses HTTPS as a tie breaker when it comes to the search results. If two sites are virtually identical when it comes to where Google would rank them in the search results, HTTPS acts as a tie breaker. Google would use HTTPS to decide which site would appear first, if one site is using HTTPS while the other is not.

Does this make any sense?

Magic SEO Ball says: very doubtful.

There’s been some dispute among SEOs about just how important, if at all, HTTPS actually is as a ranking factor. Here’s an SEO arguing that there are seven specific things on which a site should focus before worrying about HTTPS. Her list:

Consistent URLs Everywhere
Short, Non-Parameter Heavy URLs
Real, Relevant Content, Not “SEO Copy”
Live Text High on the Page, Not Stuffed at the Bottom
Links
Strong CTA-Friendly Title Tags
Speed Up Your Load Time

For what it’s worth, I think she’s basically right, though this varies a lot based on the niche and the size of the site. For much bigger sites it may make sense to tackle other technical issues first; for certain types of sites it may make sense to improve the mobile experience first. And of course, for e-commerce or any site that takes people’s money, HTTPS does matter.

So what exactly is a tie-breaker?

Tie-breaking is a special system to determine the outcome of a contest when all the other inputs are exactly equal. A tie-breaker only gets used if it will be necessary to break a tie; if a certain input is calculated all the time, then it is not actually a tie-breaker.

One good example of a tie-breaker is the vote of the Vice President in the United States Senate. He doesn’t typically sit as a member of the Senate, but is officially its president, and he doesn’t vote at all unless his vote would change the outcome (ie, unless the Senate’s vote is tied).

Another example of a tie-breaker is how professional football teams are ranked at the end of a season. The NFL has a complex set of rules for determining which of two teams with identical win-loss records will advance to the playoffs and which will not (and it gets even more complex when there are three teams that are tied, and so on):

Head-to-head (best won-lost-tied percentage in games between the clubs).
Best won-lost-tied percentage in games played within the division.
Best won-lost-tied percentage in common games.
Best won-lost-tied percentage in games played within the conference.
Strength of victory.
Strength of schedule.
Best combined ranking among conference teams in points scored and points allowed.
Best combined ranking among all teams in points scored and points allowed.
Best net points in common games.
Best net points in all games.
Best net touchdowns in all games.
Coin toss

Critically, these rules only get invoked if two teams finish with the same record of wins and losses.

So, is HTTPS – which Google has previously used its official blog to describe as a “ranking signal” – not actually a ranking signal, but actually just a tie-breaker that gets used in the rare scenario that two pages rank exactly equally?

Almost certainly not. In order for HTTPS to be a tie-breaker, Google would have to compute all the ranking signals for a certain query, and then only in cases of a perfect 50-50 tie would they then tip the scale in favor of the site with HTTPS. But what if both sites use HTTPS, or what if neither site uses HTTPS? Then invoking the tie-breaker wouldn’t even break the tie.

HTTPS is, as John Mueller has said and as SEOs have confirmed, a relatively weak and – in most cases – somewhat insignificant ranking signal. For the sake of comparison, another relatively weak and somewhat insignificant ranking signal is keyword use in H2s.

Being a small ranking signal does not mean that it’s a tie-breaker.

Incidentally, this is not the first time that Mr. Illyes has made public statements about how Google works that have beggared belief among SEOs. Just a couple of months ago, he said that Panda is updated in real time, which made absolutely no sense at the time; we now know this to be false.

Matt Cutts often said things that might be true only in some extremely literal or extremely general sense, and which needed to be parsed carefully so their meanings could be teased out, and John Mueller often seems to be – I want to put this charitably – filling in the gaps in his knowledge with educated guesses. But Gary Illyes’s record of disputable and provably false statements suggest that he should be prevented from continuing to offer them.

Is there a duplicate content penalty?

April 10, 2015 By Natan Gesher

Some sites seem to do really well in search engines just by ripping off other sites’ content and republishing it, possibly with an additional image or a changed headline, either with visible attribution or without it, using a correct rel=canonical or not.

Other sites that do this even minimally seem to suffer for it, whether by an algorithmic factor like Panda or a manual action or just by not being able to rank well.

Does Google penalize sites that have a lot of duplicate content?

Magic SEO Ball says: Reply hazy, try again.

One of the differences between professional SEOs and Google search quality engineers is that SEOs tend to think specifically and speak broadly, while Googlers tend to think broadly and speak specifically.

When an SEO – especially one who isn’t very good and doesn’t know enough to use precise language – says that his site was “penalized” or “punished” by Google, there are a few things he might mean:

I did something to harm my site (eg, server errors, page speed, blocking crawlers, indexation issues, canonicalization issues), which caused me to lose rankings, which caused me to lose traffic.
My competitors improved their sites, which caused them to gain rankings, which caused my site to decline by the same amount, which caused me to lose traffic.
Google changed its organic ranking algorithm to favor something that I’m not doing, or not doing well, which caused me to lose rankings, which caused me to lose traffic.
Google did something else completely different with search results pages, like knowledge graph or answer cards or seven-result SERPs or rich snippets or … which didn’t cause me to lose rankings at all, but which did cause me to lose traffic.
Google released an algorithmic ranking factor (eg, Panda, Penguin) and this apparently suppressed my site in search results, even though nobody at Google will ever be able to confirm this for me, causing me to lose rankings, which caused me to lose traffic.
I got caught doing something that violated Google’s webmaster guidelines, or I got caught not preventing someone from using my site to do something that violated Google’s webmaster guidelines, and Google put a manual action on my site, which caused me to lose rankings, which caused me to lose traffic.

When Google says “penalty,” however, they are talking about only one possible thing: a manual action (#6).

For instance: There is No Duplicate Content Penalty in Google. Here we have a representative of Google answering a question in the narrowest possible way to be able to say that there is not a penalty for duplicate content, which is technically true as long as you define “penalty” very narrowly, but which doesn’t even come close to answering the question.

Calling duplicate content a “filter” instead of a penalty is helpful for the five percent of SEOs who understand the difference – and there is a big and meaningful difference – between the two, and likely to be received as completely obfuscatory nonsense from the ninety-five percent of SEOs who just want an answer to this question: Is duplicate content bad?

So we will answer their question. Duplicate content is bad, for several reasons.

There actually is a duplicate content penalty

A later article in the same site linked above states very clearly that, in some specific cases, there can be a domain-wide duplicate content penalty, in the event that a certain site overwhelmingly uses other sites’ content without offering much unique material of any value. This is an actual penalty – a manual action, in Google’s words – that requires first fixing the problem and then submitting a reconsideration request to resolve.

The aforementioned duplicate content filter

As we all have seen, sometimes duplicate content is relegated to some index-below-the-index that isn’t even visible to searchers unless they click a link to view all the results.

Duplicate content opens the door to unnecessary canonicalization issues

There are a lot of ways to handle duplicate content on a site or among sites. One popular and recommended way is using rel=canonical to send the signal that a certain version is the preferred one, and that it should get the link equity of the others. The canonical tag does basically work most of the time, but it is only a good solution to a problem that’s fundamentally avoidable.

There are also a great many cases where the canonical doesn’t work as intended. For example, if the second domain has a vastly higher domain authority than the first, or if Google crawled the second version earlier and saw it there before it saw the original, or if the second gets far more links and shares than the first, the rel=canonical pointing from second to first may be ignored.

rel=canonical also does not send as much link equity as a 301 redirect, which means losing pagerank whenever it is used. It also needs to be engineered and tested and maintained, which can be a challenge for huge sites because it isn’t visible to users. Faulty implementations of rel=canonical, while now rare, are scary enough (imagine being the webmaster of the site that lost 98% of its traffic because every page suddenly had a canonical pointing to the home page) that one needs to act with caution.

Duplicate content should be avoided

Don’t avoid it at all costs, because there are some scenarios where it’s perfectly useful. But try finding a way to engineer elegant solutions that allow your site not to use duplicate content, wherever possible.

Does Google use Gmail for URL discovery?

March 26, 2015 By Natan Gesher

I’ve heard an SEO person say that Google can scan my Gmail and use it to discover new URLs to crawl. Does this really happen?

Magic SEO Ball says: My sources say no.

This is actually something that we have believed for some time, and something that we have told to a bunch of people, including on interviews for serious SEO positions at really excellent companies (Oops!).

So it’s a bit surprising to learn that we were almost certainly incorrect.

The IMEC Lab group ran a pretty good test, Does Google Sniff Your Gmail to Discover URLs?:

We posted 4 total pages … and then asked different groups of users to email links to those pages… We asked 20 to 22 people to send gmails sharing the links for each article to the various pages. One group was asked to share article 1, a different group was asked to share article 2, and so forth. The goal was to see if Google would spot these links in the gmails, and then crawl and index those URLs… there was very little to see. The results were wholly unremarkable, and that’s the most remarkable thing about them!

The test only lasted for less than two weeks, so it’s possible that those URLs would eventually have gotten crawled.

It’s also possible that Google will use Gmail to discover new domains to crawl, but not specific individual URLs.

So it would definitely be worthwhile to try repeating the test using different parameters, but until we see evidence demonstrating otherwise, it seems fair to say that Google does not crawl the URLs they see in Gmail.

Does Google “cheat” by favoring its own products in search results?

March 19, 2015 By Natan Gesher

For many years, we have heard claims from some very smart people and from some less-than-smart people that Google manipulates its own search results, or perhaps its own organic search ranking algorithm, to favor its own sites.

Is this true?

Magic SEO Ball says: Signs point to yes.

We have defended Google for many years – and not just because we make our living as a professional SEO, in an entire ecosystem that exists because Google, and other search engines like it, are both good enough to attract billions of people to use it and because Google, and other search engines like it, are notoriously complex and difficult to game.

In particular, we have been known to cite articles like this one: 5 Times Google Penalized Itself For Breaking Its Own SEO Rules.

To be sure, we often see Google acting like a bully and we often take to Twitter to tell Google to honor its own motto: Don’t be evil…

@dannysullivan “Don’t be evil.”

— Natan Gesher (@gesher)

January 1, 2015

Don’t be evil: http://t.co/EAygTVZ0tO

Also, open always wins.

— Natan Gesher (@gesher)

November 8, 2014

Don’t be evil. http://t.co/Xfpf6OJDEd

— Natan Gesher (@gesher)

June 30, 2014

Google is quietly shutting down its Wallet API for digital goods (don’t be evil): http://t.co/Ir69PaOMwv

— Natan Gesher (@gesher)

November 15, 2014

Google is now letting people create Gmail accounts without being forced to use Google plus: http://t.co/9E0VaBsj0v

Don’t be evil.

— Natan Gesher (@gesher)

September 21, 2014

The Feature For Which Google Killed The + Command, Direct Connect, Is Now Dead: http://t.co/qfnD8diIBc

Don’t be evil.

— Natan Gesher (@gesher)

August 23, 2014

Still, this article in the Wall Street Journal is just too much to ignore: How Google Skewed Search Results ^[1].

The article describes an FTC report that nails Google for doing all sorts of unsavory and classless things, like intentionally promoting its own services in organic search results, and intentionally demoting its competitors’ services.

It’s unclear whether Google did this by manipulating the ranking algorithm (that is, by starting with what they wanted the results to be, and then working backwards until they had an algorithm that could reliably produce those results) or by letting the ranking algorithm operate independently and then just manipulating the search results pages when the results themselves weren’t satisfactory. Either way, Googlers like Matt Cutts and John Mueller should be ashamed of themselves.

Google is just a company. They act in the interests of their shareholders and their employees and their users, and when the interests of the third group are in conflict with the interests of the first group, it’s not a surprise that the second group is tasked with ensuring that the first group’s interests take precedence.

But Google’s mission is quite unlike other companies’ missions, at least insofar as Google is uniquely in a position to turn its dreams into reality. And we should all be scared and concerned at what Google is capable of doing.

References[+]

References
↑1	If you need help accessing the article, try doing a Google search for its headline, and then accessing it through Google News. That’s right: in some cases, Google strong-arms publishers into giving away their articles for free to readers coming from Google searches.

Will the 21st April 2015 mobile algorithm update really be bigger than Panda and Penguin?

March 17, 2015 By Natan Gesher

Last month, Google announced a change to the way they’ll rank mobile search results, to begin on 21 April.

Now Googler Zineb Ait Bahajji is stating that this change will be bigger than Panda and Penguin.

Zineb from Google at #smx Munich about the mobile ranking update: is going to have a bigger effect than penguin and panda!

— Aleyda Solis (@aleyda)

March 17, 2015

Is she right?

(Or is this like when Googler Gary Illyes claimed incorrectly that Panda was already being updated in real time?)

Magic SEO Ball says: Don’t count on it.

The strange thing to consider about the upcoming change to mobile search rankings is the way Google announced it: they may never have given as much information, as far in advance, about a genuinely meaningful update to their organic ranking algorithm. It was truly an unprecedented event.

Unless, that is, the change is actually going to be relatively minor, and the announcement and all the Twitter hype and hoopla are really just a way to get webmasters and SEOs to do what Google wants them to do, which is to make mobile-friendly websites, preferably of the responsive or adaptive varieties.

We try not to be too skeptical and we definitely don’t believe that Google is lying about the mobile rankings change, but we have to wonder whether Google’s search quality team is really going to shoot themselves in the foot by providing worse search results in some cases, just because the pages happen to be mobile optimized. Tl;dr: they aren’t.

Panda and Penguin have been, at best, mixed successes for Google. Completely aside from the pissed off webmaster and SEO communities, we are aware of many SERPs that are lower quality as a result of Google’s attempts to use machine learning to create new algorithmic ranking factors.

After 21 April, expect to see changes, but don’t expect the world to end for great sites whose pages aren’t mobile friendly, and don’t expect garbage sites with responsive pages to start crushing their authoritative, very relevant, high-quality competitors.

Is Panda updated in real time?

March 13, 2015 By Natan Gesher

At SMX West, a Googler named Gary Illyes claimed that the Panda algorithm is now running constantly, and that sites affected by it only need to fix their problems and be recrawled, after which they should regain their traffic.

.@methode says panda happens “pretty much instantly” cc @rustybrick

— Rae Hoffman (@sugarrae)

March 5, 2015

@portentint @sugarrae … @methode did qualify that once the “pages were re-processed” … However long that takes cc @rustybrick

— Eric Wu ( ･ㅂ･)و ̑̑ (@eywu)

March 6, 2015

Is he correct / telling the truth?

Magic SEO Ball says: very doubtful.

First, this claim is completely in conflict with the evidence at hand. We have firsthand knowledge of at least one large site that got hit by Panda around 24-25 October 2014, eliminated the problem almost immediately by noindexing a quarter of its URLs ^[1], watched those URLs drop from Google’s index, and still has not recovered from Panda.

Second… well, there isn’t much more to say about this. While the Panda algorithm itself might have been updated at some point since late October, there is zero reason to believe that its data has been refreshed. And there’s also no reason to think that Google would run an updated Panda algorithm with stale data. So, almost certainly there’s been neither an algorithm update nor a data refresh.

SEOs who do high quality work are generally in agreement about this.

So does this mean that Mr. Illyes was misleading us or lying to us, or does it mean that he was mistaken or confused?

We think the latter explanation is far more likely. His excuse that he “caused some confusion by saying too much and not saying enough the same time” sounds like a nice try to save face, which is understandable. It’s probably an internal goal at Google to get to a point where Panda can be run in real time, but this requires two things:

The quality of the algorithm has to be high enough. This means that false negatives need to be reduced, and false positives need to be eliminated.
The logistics of running the algorithm have to be workable. This means that the computing complexity has to be manageable enough that Google’s engineers and infrastructure can handle it on a constant basis, rather than just on a periodic basis.

While the second issue is the kind of problem that Google is pretty good about solving – more engineers, better engineers, more hardware, more powerful hardware, whatever – the first issue is something that may not be possible in the near future.

References[+]

References
↑1	Without going into details, we have 95+% certainty that those URLs caused the Panda problem in the first place.

Should I create articles on my website for the purpose of syndicating them?

December 22, 2014 By Natan Gesher

This question comes to us via Twitter:

@gesher What’s your take on creating articles on your site for the purposes of syndication?

— Jesse Semchuck (@jessesem)

December 20, 2014

Magic SEO Ball says: my reply is no.

Content syndication as an audience development strategy

Creating articles specifically with the intent of having them syndicated on other sites can be a fine way to expose those sites’ different audiences to your product, service, ideas or own website. When doing so, you should take care of the following concerns.

Audience

Every website has a different audience. Some are huge and some are tiny; some are general and some are specific; some are overlapping and some are distinct. Take care to ensure that your articles are appearing in the right places online by taking the time to understand the audience profiles of the sites where they will be syndicated. Failing to do so may cause your content to be ignored at best, or resented and marked as spam at worst.

Volume

How much is too much? If your syndicated content overwhelms the unique content on your own site, you are syndicating too much. If your syndicated content overwhelms the original content of the sites on which it appears, you are syndicating too much.

Repetition

Many people have a certain set of content websites that they visit on a regular basis, daily or weekly; or an RSS feed reader that they check on a regular schedule; or the expectation that they’ll be able to use Twitter and Facebook to find out what’s happening. Some people use all three methods. If they follow your site and another site that syndicates your site’s content, or multiple sites that syndicate your site’s content, they’re going to start seeing your articles repeatedly. While that may strike you as desirable, it may also backfire by bothering this extended audience, preventing people from ever becoming your customers or followers.

In summary, what many syndication issues – with audience, volume and repetition – have in common is that they are caused by a casual “If you build it, they will come” approach that discounts the users’ interests, wishes, and experience. This may result from a surfeit of technical ability to effect syndication (viz., by RSS) and a deficit of concern for other web denizens.

Consider, instead of a push method, a pull method by which you publish your own material on your own site, and allow it to be republished by request by other webmasters on an article by article basis.

Content syndication as an SEO strategy

In general, the main reason to be interested in content syndication as an SEO strategy is for link building: the idea being that you can create feeds with your articles, including followed links back to your own site, and allow other sites to use the articles with proper canonical tags.

While it would be a stretch to say that Google’s policies about link building have historically been clear, one trend that has emerged and that can be stated with clarity is that Google does not want to allow any link building strategy that can scale. In effect, this means that asking for links and giving them is fine, but that manipulation of the overall link graph is not fine.

Does content syndication for SEO purposes (i.e., for the purposes of increasing your articles’ link authority and your site’s domain authority) work? Yes, but you’d better assume that links added without any human effort by other sites’ webmasters can be devalued without any human effort by Google’s search quality engineers.

And that doesn’t even touch on the risks involved, which I outlined briefly in this Quora question: Can my search engine ranking be hurt if I post my blog articles on my own site and on several other blogging sites?

… if you publish the same article on your own site and on other sites, you’re running the risk that it will rank on the other sites but not on your own… employing this practice at scale may expose your site to Panda… Instead, consider creating original content for different sites that is useful to each site’s audience.

So if you’re thinking about audience development and want to do content syndication, I think it is ok but also that you should consult an SEO and seriously weigh the SEO concerns, along with the possibility that syndicating content in the wrong ways may do more harm than good. And if you’re thinking specifically about a content syndication strategy for SEO, there are much better ideas out there.