16 April 2010 0 Comments

The Basics Of SEO Explained

http://markbeljaars.com/wp-content/plugins/sociofluid/images/digg_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/reddit_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/stumbleupon_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/delicious_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/furl_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/technorati_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/facebook_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/yahoobuzz_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/mixx_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/twitter_48.png

Search engine optimization (or just SEO) is the method of getting ranked in Google, Yahoo and Bing on the basis of the quality of your website. SEO is the preferable method of internet marketing over sponsored links because, following the initial investment, you continue to get visitors to your website for very little outlay, and quite possibly even for free. To succeed with SEO, the first step is always to establish what exact search terms, also known as keywords or keyphrases, your potential customers are using when searching for whatever it is that you are offering. It also allows you to identify and target specific niches, for example; nearly 3 times as many people type ‘kitchen door handles’ in to a search engine each month than they do ‘internal door handles’, even though they are basically the exact same item. The next step is the on-site optimization. This process involves altering page names, headings and tags to include your chosen keywords. This is necessary so that the search engine robots (known as spiders) can determine what your website is about. The content of a website is also rewritten to include relevant keywords throughout the site. The best approach is to have each page of a website target one specific keyword or phrase. Still, the rewritten content must also be well written for any human visitor. Not only will it convince potential customers to make their purchase, but informative, exciting content is often shared among the web via blog and article sites, generating back links to your website from other peoples websites. Each back link from another website is regarded by the search engines as being an indication that your website is a quality one, and the more links you get, the higher you will rank for your chosen keywords.

17 November 2009 2 Comments

Robots.txt SEO Techniques

http://markbeljaars.com/wp-content/plugins/sociofluid/images/digg_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/reddit_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/stumbleupon_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/delicious_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/furl_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/technorati_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/facebook_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/yahoobuzz_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/mixx_48.png http://markbeljaars.com/wp-content/plugins/sociofluid/images/twitter_48.png

This post is a long but important one. I recommend you grab a cup of hot chocolate before your start :)

If you have not heard of the robots.txt file, it is simply a small file located in your website root directory that instructs search engines on what they can and can’t do. Although not strictly enforced, search engine bots will generally respect the rules set forward in the robots.txt file. With a properly configured robots.txt file you can, for example, attempt to fend off spam bots, tell google not to index your images or instruct bots to skip pages that may contain duplicate content.

Bots are pieces of software used by search engine companies, spammers and content accumulators to crawl the internet to find new or modified content. A bot’s job is to follow links on a website crawling from page to page and site to site. It’s kind of like a Six Degrees of Kevin Bacon thing. Follow enough links and you should eventually find all the content on the net. This is why backlinks are so important. The more backlinks you have, the easier it is for search engines to find your content. There are literally millions of bot instances trawling the net at any one time. The official term for a bot is a user-agent of which there are thousands. Lets take Google for example. Google has many different user-agents used to index your site, extract images and videos, find news feeds, find mobile phone content, check your site for Adsense quality and so on. This site details a complete list of known user-agents.

The robots.txt file has been around for ages. It was actually introduced by AltaVista in 1994, but now remains a staple food for web spiders. For a complete description of the file and its standard notation, visit here. In short, a robots.txt file can restrict specific bots from crawling your entire site or part thereof. To do this, all bots have a special signature. For example,Google’s index bot is called Googlebot, Bing’s bot is called MSNbot, and Yahoo’s bot is called Yahoo! Slurp.

An entry in the Robots.txt file may look like this:

User-Agent: Yahoo! Slurp
Allow: /public*/
Disallow: /*_print*.html

Here we are telling the Slurp user agent that it can access all pages located in any directory starting with “public”, and have no access to pages with “_print” in the URI.

Below is a complete robots.txt file for one of my experimental WordPress sites (I’ll post an article explaining what I mean by experimental site another day). Astute readers may note that I am disallowing all user agents from specific directories, and only allowing some specific user agents access to the remaining areas of my site. A recent update to the standard also allows me to list the location of my site map to help search engines find all of my pages.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /search/*/feed
Disallow: /search/*/*

User-agent: Mediapartners-Google
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Googlebot-Mobile
Allow: /

User-agent: Mediapartners-Google
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Sitemap: http://beginnerchess.org/sitemap.xml

Disallowing bots from accessing content not intended for consumption will ensure that your site will remain keyword optimized on all pages, thus helping promote your site within the search engine rankings. Say for example you have worked hard at optimizing all pages for the keyword “weight gain” and the various long tails. Your work may be filtered down in the eyes of the search engine if it was able to crawl your login page, privacy page and contact form.

Some SEO experts also argue that Google punishes young websites in favor of older more established sites. Google apparently uses the Internet Archive (found here) to determine the age of a site. If it cannot find the site in the archive, it apparently assumes the site is a certain age. For this reason, many people actively stop the Internet Archive user-agent from indexing their site. This can be done by including the following lines:

User-agent: ia_archiver-web.archive.org
Disallow: /

You may want to also stop image bots from accessing your pictures if they have borrowed non-stock images from other sites. This can be done like so:

User-agent: Googlebot-Image
Allow: /

Finally, robots.txt can be used to exclude bots from specific pages that may be used to display content that may be available on other sites or pages. It is often argued that Google will punish your ratings for displaying duplicate content. I personally do not see this as a big issue and believe that duplicate content can actually help your site’s rating in some instances (more about this another day). Anyway, to stop a bot from accessing a specific page, add the following lines:

User-agent: *
Disallow: */my-duplicate-page.html

Note that this is not a fool-proof method. If your disallowed page has links to it from another site, it will still be crawled by the bots.

I could keep going, but I’m sure you are all bored by now. Feel free to comment below or contact me directly if you wish to know more.

Happy roboting.