What are canonical and hreflang tags and how to use them?

If you have ever visited a website from the Benelux, Switzerland, or anywhere in Europe for that matter, you have probably been exposed to the use (or lack of use) of hreflang or canonical tags. Both serve to send signals to Google crawlers, but they do not have the same effect. Where one indicates which page is more relevant for your users depending on language and/or location, the other simplifies Google’s job by telling them which page to actually look at.

Hreflang tags

What are they, why do we use them, and what do they look like?

They’re small strings of code found in the <head> section of a page.

It’s used by Google crawlers as a signal, for two reasons: language and location.

Let’s imagine the website of an umbrella seller only present in Belgium, France & the Netherlands. Their homepage exists in 4 versions.

<link rel=”alternate” href=” https://www.belgianumbrellas.com/be/fr” hreflang=”fr-be” />
<link rel=”alternate” href=” https://www.belgianumbrellas.com/be/nl” hreflang=”nl-be” />
<link rel=”alternate” href=” https://www.belgianumbrellas.com/fr” hreflang=”fr-fr” />
<link rel=”alternate” href=” https://www.belgianumbrellas.com/nl” hreflang=”nl-nl” />

This basically tells Google: This page exists in 4 versions to be served to different users so don’t panic, this is not duplicate content. This example is only for the homepage, but the same logic applies to the rest of the pages. The fr-be version of the product page should have hreflang tags pointing to itself and the nl-be, fr-fr, and nl-nl product page, and so on. If a page in French doesn’t have an exact duplicate in Dutch, then point to the most relevant Dutch page.

Our example is made out of 2 dimensions.

Language: The first bit (2 letter code from the ISO 639-1) indicates to Google crawlers the language of the page. That way, Google knows the page should be served for searches made from browsers set in French, or made from IP addresses in French-speaking countries. This works if you are only proposing content to be read by users. In our case, the delivery costs, contact phone numbers, and legal addresses change in the 3 markets. That’s why there’s a second bit to the hreflang, the one after the hyphen. Yes hyphen, not underscore. Never an underscore.

Location: The second bit (2 letter code from the ISO 3166-1 Alpha 2) is used by Google so it knows which page to serve to someone in a specified country. In our example, if the hreflang tags only contained languages, which page would show to a user based in Belgium, whose browser is configured in French. Google would have to choose between the Belgian-French and French-French versions of the page itself, based on other signals. But as the hreflang contains location as well, Google knows to serve the Belgian French version to the user. Hreflang tags can also work cross-domain. So if you use several sites for your language/location versions, you can still implement hreflangs :

<link rel=”alternate” href=” https://www.belgianumbrellas.fr” hreflang=”fr-fr” />
<link rel=”alternate” href=” https://www.belgianumbrellas.be/fr” hreflang=”fr-be” />

So, if you’re only displaying a few short stories you wrote on your website, you don’t need hreflang tags. If you’ve translated your stories, and/or propose to print and deliver them worldwide, you’ll want to use hreflang tags containing language and/or location.

Canonicals tags

Like the hreflang tag, the canonical is a small piece of code talking to Google crawlers.

The difference is the message. Hreflang tags are used to inform Google that a page is devised for users of a certain language and location. Canonical tags, on the other side, are used only to tell Google: this page is almost identical to another, but please I’d like the other one to rank higher. The canonicalized page may still show in the search results in some cases.

This is what a canonical tag looks like :

<link rel=”canonical” href=”https://www.thepageyouwanttorank.com” />

When talking about ‘almost identical’ pages, we mean duplicate content, which comes in many forms

1) Url variations of the exact same page. The simpler way is to redirect all of these to the one of your choosing, but if for some obscure reason you can’t do that, you can canonicalize these

https://www.belgianumbrellas.com/
https://www.belgianumbrellas.com
http://www.belgianumbrellas.com/
https://belgianumbrellas.com/

…you’ve got the point

2) The mobile or desktop versions of the same page

3) Having parameters in your URL which indicate a small change in content. For instance, you are checking a page with umbrellas, but filtered on the red ones only):

https://www.belgianumbrellas.com/products
https://www.belgianumbrellas.com/products?color=red

4) You’ve got B2B and B2C sections on your website, but some pages are identical

5) Etc etc

So the question now is: ok I’ve got “duplicate” content, what’s wrong with that exactly? Why would I want to redirect Google crawlers only to one version? The answer is twofold.

Taking the umbrella example again, let’s say you’ve got a main page selling pocket umbrellas, and 3 different sub-pages, based on the color of the product.

https://www.belgianumbrellas.com/pocket-models
https://www.belgianumbrellas.com/pocket-models?color=red
https://www.belgianumbrellas.com/pocket-models?color=green
https://www.belgianumbrellas.com/pocket-models?color=blue

First, to make things clearer and easier for Google. Each subpage is ranking nicely on its own color search term, but you want to consolidate that ranking on the pocket umbrella main page. Using canonicals, you’ll point Google to the main page to have it rank higher, and benefit from the ranking of all 3 subpages.

Second, the same principle applies to link equity (the principle with which your page authority benefits from internal or external links). If you’ve got different external links landing on your subpages, it might be a good idea to bring that link equity to the main page. People will still land on your subpages, but your main page will gain authority.

Side note – syndicated pages: canonical tags are not limited to your own domain. Let’s say you wrote an article, which you then shared to be republished on another website. If that site has a globally better rank than yours, their page with your syndicated article will probably show up higher in search results. What you want to do then, is to add a cross-domain canonical tag to the syndicated content, and a self-referencing canonical tag to the original content. As a basis, self-referencing tags are recommended, as it shows Google which page you want to be indexed, and how the url should look when indexed. For example, articles on our blog have self-referencing canonicals:

Hreflang & canonicals used together

All in all, adding hreflang and canonical tags to your website isn’t that complicated. However, you might want to be careful if you need both types of tags.<

Let’s say Google tries to analyze the following page :

https://www.belgianumbrellas.com/be/fr/pocket-models?color=red

And you’ve added to that page both hreflang tags for all language/location versions, as well as a canonical directing Google to the main page

As a reminder :

With the hreflang you signal Google that the page should be indexed to be served for people in Belgium, with a browser set in French,

With the canonical, you’re telling Google to go look at the following instead

https://www.belgianumbrellas.com/be/fr/pocket-models

You can understand that this doesn’t really work. The tags are telling Google at the same time to index that page and other versions, but not to look at it, and go index another page.

The solution: don’t use hreflang tags on pages that are canonicalized. That way, you make the crawler go in steps. First, the crawler is directed to the page you actually want it to look at with the canonical, then it discovers the language and location of the page, as well as its translations.