Sitemap.xml can be an asset in your SEO strategy and help search engines, like Google, to discover your pages. Once Google is aware of your pages, it may crawl, render, index and rank them in search results, if applicable. Ensure you follow the sitemap best practices to maximise your efforts.
What is a sitemap.xml?
Sitemap.xml is a file that contains information about the pages, videos, images and other resources on your website and the relationship between them. The file is written in eXtensible Markup Language (XML) – a markup language, just like HTML, which was designed to store and transport data. Remember, an xml sitemap is design to help Google, and you can create an HTML sitemap to help your users navigate around your website.
Why is a sitemap.xml important for SEO?
Thanks to a sitemap.xml search engines can read your URLs and crawl them more efficiently, resulting in Google and other search engines being aware of core pages on your site, and allowing indexation (if sitemap.xml can be fetched and processed, and your sitemap.xml file contains exclusively indexable URLs).
Who needs a sitemap.xml?
Sitemap.xml is not a requirement and some websites may not need one. Your organic performance will probably benefit from a sitemap:
- If your site is large and not all your pages are internally linked, it may be difficult for search engine crawlers to discover them.
- If your site is new and does not have external incoming links Google cannot find your pages by following links.
- If your site utilises video and/or images, a sitemap.xml can support Google in obtaining more information about your content.
- If your site is shown in Google News, a sitemap.xml file can provide additional information for Google.
A sitemap.xml file may not be necessary, if:
- your website is relatively small (according to Google Search Central, it has fewer than 500 indexable pages)
- your internal linking is comprehensive and there is no orphan pages on your site, allowing Google to discover new pages by crawling links between them
- your site does not have many video and/or image files
- your site does not have news pages which you’d like to appear in search results
Best Practices for SEO
Create a sitemap.xml
You can generate a sitemap automatically, if you’re using a CMS which provides this feature, e.g. WordPress, Wix, or Blogger, or if you are using a sitemap generator software or tool, e.g. XML sitemap or Screaming Frog. Alternatively, you can create a sitemap manually by using a text editor and the correct syntax. If you’re creating a sitemap manually, ensure you’re creating it in accordance with Google’s recommendations and the sitemap protocol.
Create multiple sitemaps
If you have more than 50,000 indexable URLs, split your sitemap file into multiple sitemaps. You can also create a sitemap index file that contains locations of all your individual sitemaps. Additionally, you may want to create separate sitemap files to help with tracking search performance of a specific sitemap in Google Search Console, e.g. seasonal products sitemap.
Create Dynamic XML sitemaps
If your site is large consider creating dynamic sitemaps to streamline to ensure your sitemap.xml files are up to date.
Only include pages you want Google to know about
When creating your sitemap file, ensure you include only ‘valuable’ URLs in the file to prevent Google from crawling URLs which do not provide any value to the users. If your site is large, you may encounter crawl budget issues, if non-indexable URLs are included in your sitemap.
Only include absolute URLs in your sitemap
Google will crawl the URLs within your sitemap exactly as listed and by including a relative URL, e.g. /blog-post-example, Google won’t be able to crawl your page.
Do not include ‘weak’ pages in sitemap
Weak pages or pages without any real value to the end user should be removed from your sitemap to prevent Google from crawling them unnecessarily.
Do not include ‘noindex’ pages in sitemap
Pages with a ‘noindex’ tag won’t be indexed and there is no need to include them in the sitemap.xml file. Please make sure that the ‘noindex’ tag has been implemented correctly and valuable pages are not being excluded from the index, as it will prevent your pages from appearing in SERP and performing organically.
Do not include broken pages in sitemap
If a page does not return a 200 status code, remove it from your sitemap as the page cannot be indexed and encouraging crawling of it is a waste of resources.
Do not include redirected pages in sitemap
Including redirected URLs in your sitemap.xml file results in Google crawling multiple URLs before reaching the final URL. It can eat into the crawl budget. If implementing permanent redirects on your website is a regular occurrence, consider automating the process.
Do not include canonicalised pages in sitemap
Pages which do not include a self-referencing canonical link advise Google that other version of the page should be indexed, resulting in the canonicalised page being crawled and not indexed. By removing canonicalised pages from your sitemap you can stop Google from crawling them unnecessarily.
Submit your sitemap.xml to Google
Once your sitemap has been created, submit your sitemap to Google via your Google Search Console. By submitting it to Google you’re making Google aware of any new and/or updated content on your website.
Include sitemap.xml location in your robots.txt file
It is also best practice to include your sitemap.xml location in the robots.txt file to directly point search engines to your sitemap. Make sure that your sitemap URL in your robots file is a full (absolute) URL. Most CMS will include the sitemap location in your robots.txt file automatically.
Monitor status of submitted sitemap(s)
When submitting your sitemap.xml file to GSC monitor the status of submitted sitemaps to troubleshoot potential issues. In order for Google to be able to crawl pages within a sitemap, the status should be marked as ‘success’. If other statuses are returned, make sure to resolve the issues to allow Google to crawl your pages.
Resolve sitemaps report issues in Google Search Console
If your submitted sitemap.xml file(s) returns either:
- couldn’t fetch or
- sitemap had X errors
resolve the issue(s) to allow search engines to fetch and process your sitemap. You can obtain more information about the root causes of issues with your sitemap(s) in GSC > Indexing > Sitemaps > Submitted sitemaps (click the affected sitemap)
Troubleshoot indexation issues for pages submitted via sitemap.xml
If your website has multiple sitemaps, e.g. category sitemap, product sitemap, blog post sitemap etc, troubleshoot any indexation issues within a given sitemap.
By isolating individual sitemaps you’ll be able to identify root causes of indexation issues as well as prioritise accordingly to resolve issues with the most impact. Read how to resolve indexation issues or reach out to discuss your technical SEO issues.
Pingback: How to resolve sitemap.xml issues in Google Search Console? – SEOwithEWE