site logoTune The Web
I've written a book! - click here to view or buy "HTTP/2 in Action" from Manning. Use code 39pollard to get 39% off!

Sitemaps

This page was originally created on and last edited on .

Introduction

A sitemap is a list of all the web pages on your site, which can be used for both users and search engines to find your content. There are two types of sitemaps:

  1. A XML sitemap is a list of all your pages in a particular format that is easily read by computers. The format is given at the sitemaps.org website, and you can see the Sitemap for Tune The Web here. These sitemaps are critical to have for SEO purposes and should be created and added to the search engine webmaster tools to allow the search engines to find your pages (and in particular any new pages) quickly.
  2. A HTML sitemap is a standard web page with a list of links. Think of it like a table of contents to a book. These were fairly common in the past, but have fallen out of favour recently with improved navigation on most sites and the addition of site search. This site does not currently have an HTML sitemap, for example, as I think it's unnecessary with our search functionality. In the absence of an XML sitemap, Google and other search engines, can us an HTML sitemap as it will see this as a normal HTML page and just follow the links. So you are benefiting from having a page with the links, rather than from having a sitemap page specifically, in this case.

How to set it up

This depends on how you publish your website. If you used a CMS then most of them will have sitemap generators built into to them that will automatically create your sitemap, and can be set to update this either periodically or every time you publish your website. This is the best option as the CMS will know exactly what pages make up your website and so can include them. Most will also offer an option to exclude pages that should not be searchable from the sitemap.

If that's not an option to you there are many online sitemap generators (just do a Google for "xml sitemap generator"). The downside with these is that they crawl your website, in much the same way as Google would anyway, so you lose a lot of the benefits of an auto-generated site map. You can of course review the sitemap and add or remove missing URLs as you see fit but that is obviously manual. Additionally you will need to rerun the process to keep your sitemap up to date.

The third option is to do it by hand. This is obviously the most manual method and is not very scalable but does give you complete control over what goes in your sitemap, and may be useful if you have a number of pages which are not linked from other pages, but that you want in the sitemap. However we would strongly recommend scripting this process up so you can keep the sitemap up to date, or use some of the many scripts already available on the internet. If you have access to all the files on your server it's a fairly easy job for a programmer to write a script to loop through them and generate a sitemap.

Support

Google and Bing both allow (and recommend) sitemaps to be uploaded to their website. Once you have created your sitemap you should log in to the search engine webmaster tools and verify and then upload the website. It will take a week or two for Google or Bing to give feedback on your sitemap, but after that it should automatically check the sitemap periodically and pick up new pages as they are added to the sitemap.

The Downsides

The main downside is in setting this up, if your web release process doesn't currently support sitemaps then they can be a pain to generate. While the online tools are helpful, they don't really add that much that the search engines wouldn't find anyway, though the much higher speed at which search engines index pages from sitemaps mean that a manually generated one may still be worthwhile if an automatic sitemap generator is not possible.

Other concerns are that pages that should not be found can be added to a sitemap and suddenly be indexed on a search engine. Of course no web page should be put on a public website if you do not want it available publicly and there are other, better, ways to flag a page as not indexable (using meta tags), but if a web page is added to the sitemap then it will be picked up quicker by search engines (as is the point!). It is not unknown for web pages to become available earlier than was intended (e.g. a special offer page that is still a work in progress but is available on the product website).

When sitemaps originally came on the scene, there were concerns that using them masked hidden pages that were not linked from other pages. This would be more obvious if a page was not added to a search engine index, but a sitemap explicitly overrides that. While I think navigation is important on a website, and does require a lot of thought, I don't think intentionally not publishing a sitemap for this reason is the right answer as there are better ways to find such unlinked pages.

Summary

Sitemaps are a great way to tell search engines about new pages an should be implemented by any site where possible. Indexing has been shown to be significantly faster when using a sitemap, which will allow web pages to be found quicker.

If you are using Google Custom Search for your on-site search (like this website does), then you do want your pages to be searchable as quickly as possible even if you think they are unlikely to rank on the main google page.

This page was originally created on and last edited on .

How useful was this page?
Loading interactions…