#%% Telling robots to bypass certain URLs # # This sample file is written to the "robots exclusion protocol" # or "robots exclusion standard". Well behaved robots (that's # all the important ones!) use this file to check where they are # unwelcome ... and they should then only crawl / use your other # pages. # # robots.txt file for www.wellho.net and www.wellho.co.uk # See # http://en.wikipedia.org/wiki/Robots.txt # http://www.robotstxt.org/robotstxt.html # and checker at # http://tool.motoricerca.info/robots-checker.phtml # # Why do you want to exclude certain URLs when the whole point # of having a web site is to give the public access to the # informaion it contains? You'll see in my example that I've # put a note beside each of the URLs listed. # # * I do NOT want search results within our site indexed, as they # would just hide the real pages # # * There is no point in the search engines trying to index all # possibly accessibility combinations # # * CGI program outputs differ every time - no point in indexing them # # * The "happens" directory is our staff short cuts - not really a place # for new visitors to land! # # * The unique.html file is automatically generated from all the other # pages on our site and contains a list of possible spelling mistakes on # other pages - NOT what we want to index under! User-agent: * Disallow: /cgi-bin/ # Disallow cgi programs Disallow: /net/unique.html # Unique words Disallow: /happens/ # Our Staff Short Cuts Disallow: /resources/mywellho.html # Accessibility Options Disallow: /net/search.php4 # Searches Disallow: /demo/poc01.php?item # Also searches Disallow: /illust/ # Short cuts to images Disallow: /net/recents.html # Pointless to index Disallow: /net// # Supress recursive pages in /net Disallow: /resources// # Supress recursive pages in /resources # Disallow: /net/maps.html # Disallow: /net//maps.html # Disallow: /net///maps.html # Disallow: /net////maps.html # Disallow: /net/////maps.html # Disallow: /net//////maps.html # Disallow: /resources/maps.html # Disallow: /resources//maps.html # Disallow: /resources///maps.html User-agent: TurnitinBot Disallow: / # Note that blank lines are NOT allowed within the block!