[How-to] Boost your traffic with Sitemaps and Robots.txt
April 18, 2007 12:47 am
In November 2006, Yahoo, Google and Microsoft joined force to support Sitemaps 0.90 for webmasters to easily submit their site content to the three major search engines. In an article published by Yahoo, Priyank Garg, Yahoo Search Product Manager, stated that
By offering an open standard for web sites, webmasters can use a single format to create a catalog of their site URLs and to notify changes to the major search engines. This should make is easier for web sites to provide search engines with content and metadata. And in turn, search engines can spend less time crawling unchanged pages and can update indexes faster as new content is discovered.
There are a number of Wordpress Plugins (we use this one) that would allow you to easily generate a sitemap for your site. Since then, the three companies had been discussing ways to make the site submission process to all the search engines as quickly as possible (instead of going through individual search engine submission tool). Last week, Yahoo announced that the companies have agreed on using robots.txt as the mechanism for webmasters to easily share their sitemaps to all participating search engines. Google's coverage of the story is here.
Robots.txt is a regular text file that defines rules and instruct search engine robots and crawlers what files or directories within your site they should or should not index. You might have say, an image folder that contains a bunch of images, to save bandwidth, you might not want the Google bots to index and make them searchable online.
To create a robots.txt, all you would have to do is create a regular text file, name it robots.txt and it must be uploaded to the root directory of your site (i.e the path should be http://www.yoursite.com/robots.txt). A basic robots.txt might look like this:
User-agent: *
Disallow: /
Say you want to disallow all bots to crawl your cgi-bin folder, you would then modify the file to look like:
User-agent: *
Disallow: /cgi-bin/
To learn more about robots.txt, visit this FAQ
With the new announcement, what that means is all you would have to do for Yahoo, Google, Microsoft and other participating search engines (including Ask.com) to crawl and index your site quickly, is to add this line to your robots.txt
Sitemap: http://www.yoursite.com/sitemap.xml
You can see the robots.txt for this site here. Since we implemented the new robots.txt, our referral clicks from Google has sky-rocketed, and our posts are constantly ranked top 3 in the search results (try searching "extend dojo dialog" on Google").

