How to Audit Hreflang Tags with Screaming Frog

By Geoff Griffiths @mmatraining1980

To crawl and include Hreflang Reports –

Menu Bar > Configuration > Spider > Crawl – Check the box to store Hreflang and Crawl (if you want to crawl and check they all 200)

To Filter on the Hreflang Tab

Click on the icon of the sliders on the top right Search window

You then get “Excel style” filtering options:

  • Find URLs without all of the relevant Hreflang Attributes:
  • Hreflang Window – “All” in Drop down menu – Filter out any parameter URLs
  • Export to Excel
  • Filter so only shows “Indexable” URLs
  • Find any URLs without the relevant number of hreflangs – e.g. if you have 8 sites in different languages/regions – you’ll probably want most of your Indexable URLs to have 8 “occurrences” of hreflang
  • Check Non-200 Hreflang dropdown for any errors
  • Unlinked Hreflang URLs – Perform a crawl Analysis to Check this
  • Missing Return Links

Use the search function near the top right (click the slider icon) – Filter to show only “indexable” URLs to find URLs that should have return links, that are missing them.

Notes from Screaming Frog’s website

1) Select ‘Crawl’ and ‘Store’ Hreflang under ‘Config > Spider > Crawl’

2) To Crawl Hreflang In XML Sitemaps, Select ‘Crawl Linked XML Sitemaps’ Under ‘Config > Spider > Crawl’

3) Crawl The Website

4) View The Hreflang Tab

5) View the different Hreflang reports using the drop down menu

6) Perform a “crawl analysis” to see the 6th report

Reports

  • COntains hreflang – URLS that have the rel=”alternate” markup
  • Non-200 Hreflang URLs – URLs within the rel=”alternate” markup that don’t result in a 200 status code
  • Unlinked Hreflang URLS – Page sthat conain one or more hrelgang tag / URL that’s only linked to by a hreflang tag and not in the actual webpages
  • Missing Return Links – Hreflang should be recipricol.
  • Inconsistent Language & Region Return Links – This filter includes URLs with inconsistent language and regional return links to them. This is where a return link has a different language or regional value than the URL is referencing itself
  • Non Canonical Return Links – URLs with non canonical hreflang return links. Hreflang should only include canonical versions of URLs.
  • Noindex Return Links – Return links which have a ‘noindex’ meta tag. All pages within a set should be indexable, 
  • Incorrect Language & Region Codes – This simply verifies the language (in ISO 639-1 format) and optional regional (in ISO 3166-1 Alpha 2 format) code values are valid
  • Missing Self Reference – URLs missing their own self referencing rel=”alternate” hreflang annotation. It was previously a requirement to have a self-referencing hreflang, but Google has updated their guidelines to say this is optional. It is however good practice and often easier to include a self referencing attribute.
  • Not Using Canonical – URLs not using the canonical URL on the page, in it’s own hreflang annotation. Hreflang should only include canonical versions of URLs.
  • Missing – URLs missing an hreflang attribute completely. These might be valid of course, if there aren’t multiple versions of a page.
  • Outside <head> – Pages with an hreflang link element that is outside of the head element in the HTML. The hreflang link element should be within the head element, or search engines will ignore it.

To bulk export details of source pages, that contain errors or issues for hreflang, use the ‘Reports > Hreflang’ options.

For example, the ‘Reports > Hreflang > Non-200 Hreflang URLs’ export,

Notes from Search Engine Land

Irrelevant Hreflang Values

Sometimes, the language and region values in a hreflang tag are not properly aligned with the page’s relevant languages or countries. This error can be trickier to handle as tools won’t be able to identify it, so a manual review will be needed to detect if the hreflang values are really showing the correct language and/or country for the page in question.

Remember, hreflang attributes require a language to be specified, but region is optional and should only be used when necessary (for example, if you want to serve different pages to Spanish speakers in Mexico and Spanish speakers in Spain).

It’s critical to verify, before implementing anything, whether the site is language or country targeted (or if there’s a mix of approaches that you need to be aware of). The hreflang values will need to be generated according to this targeting.

Another scenario I’ve found is that, in some cases, the language (or country) code hasn’t been correctly implemented and always specifies the same language (or country) for each alternate URL. In this example from Audible, the home pages for France and Germany have been tagged as English language pages, even though they’re really in French and in German, respectively:

 Irrelevant URLs

Similar to the previous example, sometimes the hreflang attributes are showing the right language and/or country values, but the URLs have not been correctly specified.

For example, in the case of Skype, you can see that the English language version URL is always specified instead of the relevant language URL for each case. (Similarly, the canonical tag is always showing the English URL instead of the relevant one, as in the case of the Spanish language page below).

Full URLs including Full Prefix e.g. has www. instead of https://www. in hreflang

There are also situations where URLs that are meant to have absolute paths are not including the “https://” or “https://” at the start, making them relative URLs which don’t point to the correct page, as can be seen in this example:

Notes from LinkedIn article

In some cases, the same page may contain information for people speaking different languages, so using hreflang tags alone may not be sufficient. Using schema.org markup can help search engines more accurately recognize parts of web pages. For example, inLanguage defines the language of the content or performance or used in an action in schemes such as Event, CreativeWork, BroadcastService and others.

There are multiple free online tools available for testing. My favorite is https://technicalseo.com/tools/hreflang/ Google Search Console depreciated their country-targeting feature September of 2022, however, third party crawl tools such as ScreamingFrog and Ryte.com can uncover site-wide language and regional targeting issues fairly well.

If you use a tool and get the message:

“Missing region-independant link for that language (en)”

It can mean, for example with the Technical SEO tool; that we need a generic URL for English speaking visitors, regardless of what region/country they come from.

In practice, it’s often recommended to have a ‘fallback’ or a default hreflang tag for each language. For English, this would be a tag with the language code “en” without a country code. This tag acts as a catch-all for English speakers in regions not specifically targeted by other tags (like en-GB or en-US).

For example, if your website has English pages specifically for the US and the UK, your hreflang tags might look something like this:

  • <link rel="alternate" href="http://example.com/en-gb" hreflang="en-gb" /> for English speakers in the UK
  • <link rel="alternate" href="http://example.com/en-us" hreflang="en-us" /> for English speakers in the US

To resolve the error, you would add a tag for a generic English version:

  • <link rel="alternate" href="http://example.com/en" hreflang="en" /> for English speakers in general, regardless of region

This setup ensures that search engines know which page to show to English-speaking users based on their location, and also have a default page to show to English speakers in locations not covered by your region-specific tags.