Wednesday

Blocking Bots with .htaccess Ban List

Spider and Bots don't always follow RFC's or standards but even if you can block some malicious spiders with little effort, it might be worth setting up.

You can’t block all of them but you CAN keep your server load down and your access streamlined to your target audience. For instance putting this in an .htaccess file will block a good amount:

SetEnvIfNoCase User-Agent "^abot" bad_bot
SetEnvIfNoCase User-Agent "^aipbot" bad_bot
SetEnvIfNoCase User-Agent "^asterias" bad_bot
SetEnvIfNoCase User-Agent "^EI" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "^LWP" bad_bot
SetEnvIfNoCase User-Agent "^lwp" bad_bot
SetEnvIfNoCase User-Agent "^MSIECrawler" bad_bot
SetEnvIfNoCase User-Agent "^nameprotect" bad_bot
SetEnvIfNoCase User-Agent "^PlantyNet_WebRobot" bad_bot
.....

order allow,deny
allow from all
deny from env=bad_bot

Get the full list at Brontobytes (the second link below lists even more)

Bonus: The top 10 spam bot user agents you MUST block. NOW.
Bonus: A very long list of user agents to block from 0x000000.com

1 comments:

Johann said...

Thanks for linking to my blog!