Wednesday, July 15, 2009

How To Block Bots, Ban IP Addresses With .Htaccess

Got a spambot or scraper constantly showing up in your server
logs? Or maybe
there's another site that's leeching all your bandwidth?
Perhaps you just want to ban a user from a certain IP address? In this article,
I'll show you how to use .htaccess to do all of that and
more!


Finding BadBots:


So you've noticed a certain user-agent keeps showing up in your logs, 

but you're not sure what it is, or if you want to ban it? There's a few ways to find out:

Once you've determined that the bot is something you want to block, the next step is

to add it to your .htaccess file.

Blocking Bots by using .htaccess:

This example, and all of the following examples, can be placed at the bottom of your .htaccess file.

If you don't already have a file called .htaccess in your site's root directory, you can create a new one.

   #get rid of the bad bot
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot
RewriteRule ^(.*)$
http://go.away/

So, what does this code do? It's simple: the above lines tell your webserver to

check for any bot whose user-agent string starts with "BadBot". When it sees a bot that matches,

it redirects them to a non-existent site called "go.away".

Now, that's great to start with, but what if you want to block more than one bot?

   #get rid of bad bots
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^BadBot[OR]
RewriteCond %{HTTP_USER_AGENT} ^EvilScraper[OR]
RewriteCond %{HTTP_USER_AGENT} ^FakeUser
RewriteRule ^(.*)$
http://go.away/

The code above shows the same thing as before, but this time I'm blocking 3 different bots.

Note the "[OR]" option after the first two bot names: this lets the server know there's more in the list.

Blocking Bandwidth Leeches:

Say there's a certain forum that's always hotlinking your images, and it's eating up all your bandwidth.

You could replace the image with something really gross, but in some countries that might get you sued!

The best way to deal with this problem is simply to block the site, like so:

RewriteEngine on

RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC]
RewriteRule .* - [F]

This code will return a 403 Forbidden error to anyone trying to hotlink your images on somebadforum.com.

The end result: users on that site will see a broken image, and your bandwidth is no longer being stolen.

Here's the code for blocking more than one site:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*lastexample\.com [NC]
RewriteRule .* - [F]

If you want to block hotlinking completely, so that no one can hotlink your files,

take a look at my article on using .htaccess to block hotlinkers.

IP address banning with htaccess:

Sometimes you just don't want a certain person (or bot) accessing your website at all.

One simple way to block them is to ban their IP address:

order allow,deny
deny from 192.168.44.201
deny from 224.39.163.12

deny from 172.16.7.92
allow from all

The example above shows how to block 3 different IP addresses.

Sometimes you might want to block a whole range of IP addresses:

order allow,deny
deny from 192.168.
deny from 10.0.0.
allow from all

The above code will block any IP address starting with "192.168." or "10.0.0." from accessing your site.

Finally, here's the code to block any specific ISP from getting access:

order allow,deny
deny from some-evil-isp.com
deny from subdomain.another-evil-isp.com
allow from all

No comments:

Post a Comment