iRobots.txt SEO
Download Latest version: 1.1.1 (updated on 18 February 2010)
Features | Screenshot | Download | Configuration | FAQ | Comments | Demo
Version 1.1.1 now released! This version has several improvements over the last release. Most notable is the ability to directly edit the robots.txt file. For a complete list of other improvements, please refer to the changelog section below.
Please Help! Assistance in interpretation or just suggesting new features would be greatly appreciated. All assistance will be acknowledged on the settings page with a link provided to your site. Please register your interest via the Contact Me page or as a comment below. Thank you.
If you find the plugin useful, please vote for it here.
Features
iRobots.txt SEO (IRSEO) is a fully customizable robots.txt virtual file generator. IRSEO creates a highly optimized and secure robots.txt file straight out of the box. Users may choose to enable or disable specific user agents, directories or files using intuitive options all of which include detailed instructions.
The robots.txt file is a text file located in the root directory of a website. It’s purpose is to direct user-agents (AKA bots) away from or towards specific files or directories. Inhibiting a bot from indexing specific pages will ensure your website remains keyword optimized and all indexed pages are relevant to your potential customers.
IRSEO also inhibits several Wordpress system directories and files by default. Doing this ensures that the search bots do not include security sensitive pages within search results. For example, searching for inurl:wp-content name size description in Google will produce a list of sites with indexed and open content directories.
Note that IRSEO creates a virtual robots.txt file. This robots.txt file is displayed whenever access to the robots.txt file is requested.
Download
Latest version: 1.1.1
Requires at least: Wordpress 2.7
Tested up to 2.9.2
The plugin can alternatively be downloaded from the WordPress plugin repository.
You are free to use the plugin under terms specified by GPL.
Example robots.txt File
| 1 | ####################################################### |
| 2 | # iRobots.txt SEO |
| 3 | |
| 4 | # Google Image |
| 5 | User-agent: Googlebot-Image |
| 6 | Allow: / |
| 7 | Disallow: |
| 8 | |
| 9 | # Google Adsense |
| 10 | User-agent: Mediapartners-Google* |
| 11 | Allow: / |
| 12 | Disallow: |
| 13 | |
| 14 | # Internet Archiver Wayback Machine |
| 15 | User-agent: ia_archiver* |
| 16 | Allow: / |
| 17 | Disallow: |
| 18 | |
| 19 | # Digg Mirror |
| 20 | User-agent: duggmirror |
| 21 | Disallow: / |
| 22 | |
| 23 | # All Bots |
| 24 | User-agent: * |
| 25 | Disallow: /cgi-bin |
| 26 | Disallow: /wp-admin/ |
| 27 | Disallow: /wp-includes/ |
| 28 | Disallow: /wp-content/ |
| 29 | Disallow: /search/*/feed |
| 30 | Disallow: /search/*/* |
| 31 | Disallow: /*?* |
| 32 | Disallow: /*? |
| 33 | Disallow: /readme.html |
| 34 | Disallow: /license.txt |
| 35 | Disallow: /*.php$ |
| 36 | Disallow: /*.js$ |
| 37 | Disallow: /*.inc$ |
| 38 | Disallow: /*.css$ |
| 39 | Disallow: /*.gz$ |
| 40 | Disallow: /*.wmv$ |
| 41 | Disallow: /*.cgi$ |
| 42 | Disallow: /*.xhtml$ |
| 43 | Allow: /wp-content/uploads/ |
| 44 | Allow: /*?page=* |
| 45 | |
| 46 | # Sitemap |
| 47 | Sitemap: http://markbeljaars.com/sitemap.xml.gz |
| 48 | |
| 49 | ####################################################### |
| 50 | # |
| 51 | # Robots.txt file generated by iRobots.txt SEO v1.0 |
| 52 | # by Mark Beljaars |
| 53 | # |
| 54 | # _ _ _ _ | |_ _ |. _ _ _ _ _ _ _ _ |
| 55 | # | | |(_|| |< |_)(/_||(_|(_|| _\.(_(_)| | | |
| 56 | # _| |
| 57 | # http://markbeljaars.com/plugins/irobotstxt-seo |
| 58 | # |
| 59 | ####################################################### |
| 60 | # |
| 61 | # Note: |
| 62 | # The Allow directive and wildcards (*) in filenames are |
| 63 | # not standard robots.txt syntax, however they are |
| 64 | # supported by most new search engines. |
Installation
- Download the plugin from http://markbeljaars.com/download/current/plugins/irobotstxt-seo.zip.
- Extract and upload the plugin to your
/wp-content/plugins/directory and activate it - Edit the plugin settings using the admin page located under
Settings.
Screenshots
Configuration Page

Configuration
General Options- Use strict robots.txt standard definition: The official robots.txt definition specifically identifies which directories or files a search engine can not index and does not include any directives for detailing which files a search engine can index. Google has expanded the definition to include an
allowdirective and also allows wildcards in file names. Although not officially supported, the ammended standard is understood by most search engines. - Automatically add the website sitemap to the robots.txt file: Sitemaps inform search engines of your site structure and also allow you to estimate how often your pages will change. Obviously search engines find this sort of information beneficial. The sitemap protocol is defined here. Sitemaps can be automatically produced by Wordpress plugins such as Google XML Sitemaps Generator.
- Inhibit indexing of Wordpress system folders: Wordpress system folders such as the plugin and content directories are not keyword optimized and therefore should not be indexed by a search engine. Further, indexing system folders may present a security risk.
- Do not allow duplicate content: Wordpress has many ways of displaying the same post, including by tag, by category or by author. This appears to Google as multiple pages with the same content. It is debated that Google does not like sites with lots of duplicate content, but on the other hand it is also debated that Google likes sites with many pages. Use this option to inhibit or allow some duplicate content.
- Allow Google Adsense to access entire site: Google Adsense automatically determines which ads are relevant for your audience by crawling the contents of your site. Giving Adsense full access to your site may result in more targeted advertisements. Ignore this option if you have not implemented Adsense.
- Inhibit indexing by the Internet Archive: The Internet Archive is a not-for-profit organization with aims to archive all information on the Internet at regular intervals. It is speculated that Google uses the Internet Archive to determine the age of a website to assist in defining a site\’s authority. Some SEO experts recommend that the Internet Archive be disabled from indexing young website. The Internet Archive also raises issues of document control (old versions of your posts may be archived), intellectual property rights and privacy.
- Inhibit image indexing: You may wish to inhibit search engines from indexing your images if your images are copyright, have been dubiously obtained (they infringe copyright), are not related to your site or are not likely to generate traffic. Affiliate marketers may also find that images may generate untargeted traffic thus affecting a site\’s conversion ratio.
- Inhibit indexing by the Dugg Mirror: Duggmirror provides a mirror for the most popular stories on Digg.com. Sites are often overloaded by the amount of traffic Digg sends their way, causing the webpage to become unavailable. To alleviate the so-called “digg effect” Duggmirror hosts a mirror of the most popular stories making them available to Digg users. The problem is that Google may index the DuggMirror page before the source and inturn drive traffic from your site to the mirror.
Advanced Configuration
Custom records can be added or deleted from the robots.txt file using this form. A complete list of user agents can be found at http://www.user-agents.org/. Examples of robot.txt directive strings (the text that goes after the allow or disallow directives) can be found at http://www.robotstxt.org/robotstxt.html. Google\’s non-official extensions are described in detail in this blog post. Note that all allow directive records and directive strings including wildcard globbing will be ignored if Use strict robots.txt standard definition is selected.
View Robots.txt
View or edit the complete virtual robots.txt file.
- Enable free form editing: Enables manual editing of the robots.txt file. Caution is required as a badly formed robots.txt file may seriously effect search engine rankings. Note that once free form editing is enabled, modification of the general and advanced configuration settings is inhibited. Further, when free form is disabled, any manual changes to the robots.txt file will be lost.
FAQs
Does iRobots.txt SEO create or modify any files?
No. The robots.txt file serverd by IRSEO is virtual only. Your site will remain unmodified once the plugin is removed.
Where can I learn more about robots.txt?
The official robots.txt information site is http://www.robotstxt.org/. The Google robots.txt extensions are documented here.
Can I free edit the robots.txt file?
Yes. In the “View Robots.txt” admin setting panel, select the “Enable free form editing” option. You will now be able to directly modify the robots.txt file from within this pane.
Revision History
1.0- Initial public release.
- Modified admin setting section headers to expand section if clicked anywhere within the header.
- Fixed defines, function names and i10n strings conflicting with the TOCC plugin.
- Removed PHP5 function stream_get_contents and replaced with backwards compatible fgets as suggested by Jay.
- Now detects presense of XML Sitemap Generator and if exists post a warning explaining that this plugin also generates a virtual robots.txt file. XML Sitempas has an option for disabling robots.txt file generation. Again, thanks Jay for this feedback.
- Fixed bug that stopped the admin page loading on some systems.
- Added “Settings” link to plugin menu using code provided by Jay.
- It is now possible to free edit the robots.txt file from within the plugin admin panel.
- The robots.txt file is now served if the URL does or does not contain the ‘www’ prefix. URL comparison is now also case insensitive.
- Added ’sitemap.xml.gz’ to robots allow all section.
- XML Sitemap plugin warning is now hidden if the virtual robots.txt file is served correctly.
- The php code has now been fully commented.
- Added nonce and admin check to all administration panel settings changes (for security purposes).
- Moved all options into a single associative array resulting in smaller and easier to follow code with less calls to the option table.
- Fixed file close bug in irseo_file_exists function that has caused an error on some blogs.
- Added option to allow or filter duplicate content.
Comments
Please let me know what you think. Leave a comment if you have a requested feature, found a bug or need some help. All are welcome.
23 Responses to “iRobots.txt SEO”
Trackbacks/Pingbacks.
-
-
05. Feb, 2010
[...] iRobots.txt SEO [...]
-
-
29. Jan, 2010
Social comments and analytics for this post…
This post was mentioned on Twitter by buildweb: iRobots.txt SEO – MarkBeljaars.com: Dec 9, 2009 … iRobots.txt SEO is a fully customizable robots.txt vir.. http://bit.ly/8NVwso #seo…
-
-
23. Dec, 2009
[...] iRobots.txt SEO (IRSEO) [...]
-
-
13. Dec, 2009
[...] iRobots.txt SEO (IRSEO) [...]
-
-
13. Dec, 2009
[...] recomiento probarlo. Descargar iRobots.txt SEO. Compatir con tus [...]
-
-
13. Dec, 2009
[...] using intuitive options, all of which include detailed instructions. Download and more info at: iRobots.txt SEO – MarkBeljaars.com __________________ Mail [...]
-
-
12. Dec, 2009
[...] iRobots.txt SEO (IRSEO) [...]
-
-
12. Dec, 2009
[...] Wenn Du neu hier bist, möchtest Du vielleicht unseren RSS Feed abonnieren.Powered by WP Greet BoxiRobots.txt SEO von Mark Beljaars ist ein virtual robots.txt Datei Creator. Zu den Features des WordPress Plugins [...]
-
-
11. Dec, 2009
[...] SEO 11. Dezember 2009 | Autor: KHK iRobots.txt SEO vo Mark Beljaars ist ein virtial robots.txt Datei Creator. Zu den Features gehören unter [...]
-
-
09. Dec, 2009
[...] more: iRobots.txt SEO – MarkBeljaars.com Comments0 Leave a Reply Click here to cancel [...]
-
-
09. Dec, 2009
[...] Full details of the plugin can be found at http://markbeljaars.com/plugins/irobotstxt-seo/. [...]








Hi mark,
I have installed your plug in but can not see it anywhere in the source.
If I don’t see it does that mean it is not working?
Hi Natalie,
Can you explain this a little more? Do you have iRobots.txt options in the admin settings pane? You can check whether the plugin is working by typing the following into a browser:
http://yoururl/robots.txt
For example, my site’s robots.txt file is located at: http://markbeljaars.com/robots.txt.
If the robots.txt file does not exist, there could be three possible reasons:
1. The plugin failed to install for some reason. Remove the previous install then try downloading from my site. Then select plugins/add new from the admin panel and click the upload link. Upload the zip file and activate.
2. Your theme is overriding virtual file requests.
3. Your .htaccess file is blocking the robots.txt file.
Mark.
Mark,
I just installed your plugin and I get the following error.
Warning: fclose(): supplied argument is not a valid stream resource in /Library/WebServer/Documents/xxxxxxx/wp-content/plugins/irobotstxt-seo/irobotstxt-seo.php on line 322
Running WP 2.9.1. Any ideas?
Andy
Hi Andy,
I haven’t seen this error before. What version of PHP are you running? If you are happy editing PHP files, you can simply delete this line and the plugin should start working (in PHP, files are closed automatically if the programmer forgets to close them). In any case, I will fix the code tonight (Australian time) and upload a new revision.
Mark.
Hi Andy,
I just released an update that fixes this bug and adds a new option to disallow duplicate content. Hopefully it works OK for you now.
Mark.
Hi, I have one question, when I use your plugin and create the robots.txt file, does your plugin upload it to the websites directory, or do I have to copy the file and upload it through http://FTP..
Thanks, the best robots.txt plugin so far.
Hi Jadah,
The robots.txt file is actually a virtual file. The plugin detects when somebody (or somebot) is trying to display the http://www.website/robots.txt file and intercepts this request. The plugin then instead outputs a text stream that makes it look like the file is being sent. In this way, there is notihing you need to upload. Also, any changes to the robots.txt file are immidiately reflected. There is one catch with this method though. If a physical robots.txt file exists (ie one that you have FTP uploaded), then this will always be displayed instead of the virtual file. This means that you will need to delete (or rename) any physical robots.txt files to use this plugin.
Hope this helps,
Mark.
I have one question. I noticed in my Google Webmaster Tools account that the Google cannot access the “sitemap.xml.gz” file as it’s restricted by the robots.txt file genrated by your plugin. (my sitemaps are generated by the “Google XML Sitemaps” plugin and “Add sitemap to virtual robots.txt” is unchecked as you instruct).
Checking the text of the generated robots.txt file it lists the location of the sitemap as “just-thinkin.net/sitemap.xml.gz” (I have both “sitemap.xml” and “sitemap.xml.gz” in my root directory) yet it blocks access to this very sitemap by the entry “Disallow: /*.gz$”.
Although Google can access my “sitemap.xml” file okay I’m still confused as to why the sitemap location listed in the generated robots.txt is purposely blocked to all bots at the same time. Is this actually how things are supposed to work? Google was always able to access both sitemaps when using my old manual robots.txt (the old one had a problem with validity that I could never track down).
Thanks for a fine plugin BTW.
Hi Kirk. Great detective work! You are of course completely correct. The disallow does stop any .gz files from being crawled. The reason .gz files are normally disallowed is to ensure that backup files (gz is an extension used for compressed files) are not indexed. I will fix this conflict in the next release. This should be due in a less than a week and will enable free-form editing of the resultant robots.txt file.
In the interim, you can fix this by opening the iRobots SEO settings page and selecting “Advanced Configuration”. In the User Agent text box, type “*”. Select “allow:” in the pull down and type “/sitemap.xml.gz” in the text box next to the pulldown. Finally press the “Add Custom Record” button.
When you view the resulting robots.txt file, you should see a couple of lines like this:
# Custom RecordsUser-agent: *
Allow: /sitemap.xml.gz
Hope this helps.
Good deal and thanks for working on a new release. I’ve added the custom record. Works fine.
i support you, Best if it can allow free editing..
Good idea Sevi. Will add free editing to the next release.
Mark.