Spider and Bots don't always follow RFC's or standards but even if you can block some malicious spiders with little effort, it might be worth setting up.
You can’t block all of them but you CAN keep your server load down and your access streamlined to your target audience. For instance putting this in an .htaccess file will block a good amount:
SetEnvIfNoCase User-Agent "^abot" bad_bot
SetEnvIfNoCase User-Agent "^aipbot" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^EI" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^LWP" bad_bot
SetEnvIfNoCase User-Agent "^lwp" bad_bot
SetEnvIfNoCase User-Agent "^MSIECrawler" bad_bot
SetEnvIfNoCase User-Agent "^nameprotect" bad_bot
SetEnvIfNoCase User-Agent "^PlantyNet_WebRobot" bad_bot
.....
order allow,deny
allow from all
deny from env=bad_bot
Get the full list at Brontobytes (the second link below lists even more)
Bonus: The top 10 spam bot user agents you MUST block. NOW.
Bonus: A very long list of user agents to block from 0x000000.com
Wednesday
Blocking Bots with .htaccess Ban List
Subscribe to:
Post Comments (Atom)
Security4all Blog
Twitter
Slideshare
Facebook
Digg
Flickr



1 comments:
Thanks for linking to my blog!
Post a Comment