17 November 2009 2 Comments

Robots.txt SEO Techniques

http://markbeljaars.com/wp-content/plugins/sociofluid/images/digg_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/reddit_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/stumbleupon_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/delicious_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/furl_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/technorati_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/facebook_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/yahoobuzz_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/mixx_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/twitter_48.png

This post is a long but important one. I recommend you grab a cup of hot chocolate before your start :)

If you have not heard of the robots.txt file, it is simply a small file located in your website root directory that instructs search engines on what they can and can’t do. Although not strictly enforced, search engine bots will generally respect the rules set forward in the robots.txt file. With a properly configured robots.txt file you can, for example, attempt to fend off spam bots, tell google not to index your images or instruct bots to skip pages that may contain duplicate content.

Bots are pieces of software used by search engine companies, spammers and content accumulators to crawl the internet to find new or modified content. A bot’s job is to follow links on a website crawling from page to page and site to site. It’s kind of like a Six Degrees of Kevin Bacon thing. Follow enough links and you should eventually find all the content on the net. This is why backlinks are so important. The more backlinks you have, the easier it is for search engines to find your content. There are literally millions of bot instances trawling the net at any one time. The official term for a bot is a user-agent of which there are thousands. Lets take Google for example. Google has many different user-agents used to index your site, extract images and videos, find news feeds, find mobile phone content, check your site for Adsense quality and so on. This site details a complete list of known user-agents.

The robots.txt file has been around for ages. It was actually introduced by AltaVista in 1994, but now remains a staple food for web spiders. For a complete description of the file and its standard notation, visit here. In short, a robots.txt file can restrict specific bots from crawling your entire site or part thereof. To do this, all bots have a special signature. For example,Google’s index bot is called Googlebot, Bing’s bot is called MSNbot, and Yahoo’s bot is called Yahoo! Slurp.

An entry in the Robots.txt file may look like this:

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html

Here we are telling the Slurp user agent that it can access all pages located in any directory starting with “public”, and have no access to pages with “_print” in the URI.

Below is a complete robots.txt file for one of my experimental WordPress sites (I’ll post an article explaining what I mean by experimental site another day). Astute readers may note that I am disallowing all user agents from specific directories, and only allowing some specific user agents access to the remaining areas of my site. A recent update to the standard also allows me to list the location of my site map to help search engines find all of my pages.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /search/*/feed
Disallow: /search/*/*

User-agent: Mediapartners-Google
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Googlebot-Mobile
Allow: /

User-agent: Mediapartners-Google
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Sitemap: http://beginnerchess.org/sitemap.xml

Disallowing bots from accessing content not intended for consumption will ensure that your site will remain keyword optimized on all pages, thus helping promote your site within the search engine rankings. Say for example you have worked hard at optimizing all pages for the keyword “weight gain” and the various long tails. Your work may be filtered down in the eyes of the search engine if it was able to crawl your login page, privacy page and contact form.

Some SEO experts also argue that Google punishes young websites in favor of older more established sites. Google apparently uses the Internet Archive (found here) to determine the age of a site. If it cannot find the site in the archive, it apparently assumes the site is a certain age. For this reason, many people actively stop the Internet Archive user-agent from indexing their site. This can be done by including the following lines:

User-agent: ia_archiver-web.archive.org
Disallow: /

You may want to also stop image bots from accessing your pictures if they have borrowed non-stock images from other sites. This can be done like so:

User-agent: Googlebot-Image
Allow: /

Finally, robots.txt can be used to exclude bots from specific pages that may be used to display content that may be available on other sites or pages. It is often argued that Google will punish your ratings for displaying duplicate content. I personally do not see this as a big issue and believe that duplicate content can actually help your site’s rating in some instances (more about this another day). Anyway, to stop a bot from accessing a specific page, add the following lines:

User-agent: *
Disallow: */my-duplicate-page.html

Note that this is not a fool-proof method. If your disallowed page has links to it from another site, it will still be crawled by the bots.

I could keep going, but I’m sure you are all bored by now. Feel free to comment below or contact me directly if you wish to know more.

Happy roboting.

13 November 2009 8 Comments

10 Minute Search Engine Optimized Website

http://markbeljaars.com/wp-content/plugins/sociofluid/images/digg_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/reddit_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/stumbleupon_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/delicious_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/furl_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/technorati_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/facebook_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/yahoobuzz_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/mixx_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/twitter_48.png

The following guide is a comprehensive step-by-step procedure for configuring a SEO (Search Engine Optimization) friendly site starting from a default WordPress installation. I’ve assumed you have already installed WordPress and have logged in to the administrator dashboard. I’ve also assumed that you know how to install and configure new plugins. If not, see here.

All instructions were written for WordPress 2.8.6, but should work equally well for most versions. Before we start, you must have already researched your target key phrase, written a 300+ word keyword optimized article and chosen a domain name that includes your key phrase. Here’s an article on keyword research to get you started.

  1. Expand the Appearance menu and select Add New Themes. Browse all available themes and chose one that best represents your key phrase. The theme you chose should have a minimum set of features that will deem it SEO optimized. They include:

    • A navigation menu that appears on all pages. This ensures that if the search bot stumbles on any of your pages, there is a path to all other pages for it to follow.
    • A text title and title description that appears on the top of all pages. The title should be enclosed within H1 tags. Search engines can not use optical character recognition to retrieve your blog’s name, so ensure it is written in text.
    • A left sidebar. Studies show that people click ads that are displayed on the left more often than adds on the right.
  2. Edit the default WordPress settings:

    • Expand Settings and select General. Set the blog title to your key phrase. Set the Tagline to a keyword long tail or a phrase that includes one or more keywords. Ensure the E-mail address is filled in.
    • Expand Settings and select Permalinks. Set Common Settings to Custom Structure and in the entry field, type “/%postname%/” (without the quotes). This will give your post page the same name as your post title. As your post title should be keyword optimized, so too will your posr URL.
  3. Install and activate the Akismet plugin. Follow the instructions given in the Akismet Configuration page to acquire a WordPress.com API Key. Akismet will automatically remove spam comments. The last thing you want are spam comments reducing your keyword density.
  4. Install and activate the WP-Sticky plugin. WP-Sticky will allow you to stick your keyword optimized post to the top of your home page.
  5. Install and activate the Broken Link Checker plugin. It has been debated that Google will lower your authority if your site contains numerous broken links. It makes sense as sites with broken links are generally outdated or are of low quality. Use this plugin to periodically check for broken links within any page or post.
  6. Install and activate the Contact Form plugin. Follow the instructions to create a Contact page. People will not be comfortable purchasing goods from your site if there is no means to contact you should something go wrong.
  7. Install and activate the Easy Privacy Policy plugin. Follow the instructions to create a privacy policy page. Note that if you are intending to display Google ads, your site MUST include an accessible privacy policy to meet Goggles requirements.
  8. Install and activate the SEO No Duplicate plugin. Google awards the links for duplicate content to the site with the highest page ranking. Many debate that Google also punish websites that have many pages with the same content. This plugin will simply point all duplicate content on your site back to a single permalink.
  9. Install and activate the Social Bookmarks plugin. If your posts are worth sharing, you should encourage your readers to socially bookmark it. Doing so will assist in driving traffic to your site.
  10. Install and activate the Table of Contents Creator plugin. Follow the instructions to create a site map page. This plugin will help the search engine bots by exposing all pages within a single list.
  11. Install and activate the Ultimate Google Analytics plugin. Follow the instructions to acquire a Google Analytics account ID. This is beneficial as you will be able to track the number of users that visit yout site.
  12. Install and activate the Google XML Sitemaps plugin. Follow the instructions to obtain a Yahoo Application ID. Search engines use site maps to determine how often your pages change.
  13. Install and activate the All in One SEO Pack plugin. Take your time when filling in the plugin options. Ensure that:

    • The Home Title is your keyword phrase.
    • The Home Description is keyword optimized and designed to grab the attention of any would-be visitor. This description is displayed under your site’s listing in many search engines.
    • The Home Keywords should include your key phrase, the individual key words and any key word long tails.
    • Apply for webmaster accounts at google, yahoo and bing. Add your site to each account and copy all three authorization meta tags into the Additional Post Headers, Additional Page Headers and Additional Home Headers found in the SEO Pack options page.
  14. Create a new post and paste in your keyword optimized article. Ensure the name of your post is the key phrase. Create post tags and categories also matching your key phrase and key words. Fill in the SEO Pack options at the bottom of the page, ensuring that the Title is your key phrase, the Description is keyword optimized and the Keywords include your key phrase and all keywords. Set the Post Sticky Status to Sticky.
  15. Use one of the many Search Engine submission tools (such as this one) to submit your site to all the major search engines.

That’s it. You now have a 4 page website. The first site probably took way longer than 10 minutes, but consecutive sites will be quicker as you will already have the Akismet API Key, Yahoo Application ID, and webmaster accounts. Note that you may have to wait anywhere up to 1 month before your site is first indexed. To speed up the process, social bookmark your site to clipmarks, stumble upon, reddit, digg and so on. Using this technique, your site should be indexed in as little as 2 weeks. Don’t get carried away though. If you create 100′s of bookmarks before your site is indexed, Google will get suspicious and I have seen people have to wait 3 months or more before their site appears.

Also hold off on adding google adsense until 2 weeks after your site is indexed. Some argue that adding adsense too early will degrade your site’s rating.

There are also some other plugins you should consider adding. They are:

  1. Pretty Link. This plugin will let you track link hits and tidy long URLs.
  2. WP-phpMyAdmin. For the advanced user. This plugin allows direct access to the WordPress database.
  3. WP Super Cache. If your site has a large amount of traffic, WP Super Cache will speed up the user’s experience by caching the HTML result of a page rather than re-running the server side PHP script.
  4. WP-DBManager. Performs automated routine maintenance of the WordPress database and can also be configured to email a database backup at regular intervals.
  5. WordPress Backup. Emails a backup of the WordPress plugin, uploads and theme directories at regular intervals.

I hope this article has been of some use.