| Code: |
An even better way to stop bad bots is to trap them with php. Here's the code should anyone fancy using it:
Index.php:
<?php
header ("Location: http://www.yourwebsite.com/index.php");
?>
htaccess:
IndexIgnore *
Now make a "blckhole" to trap them by creating /sandtrap and ban/ip
Create /sandtrap in index.php:
<?php
$ip = "$REMOTE_ADDR\n" ;
$banip = '/path/to/ban-ip/ban-ip.txt';
$fp = fopen($banip, "a");
$write = fputs($fp, $ip);
fclose($fp);
?>
Anyone coming in to /sandtrap/index.php will have his IP address written to the file
/ ban-ip.txt. What you are doing is setting a trap:
Place a hidden link someplace on your home index page that points to /sandtrap/index.php. It should not be so far up the page that the indexers use it, but high as you can get it. DO NOT place any readable email addrsses on it though because harvesters work through links in order and so the first link logs the IP to our real time blackhole list before it is able to access our remaining links.
Stick this in robots.txt:
User-agent: *
Disallow: /sandtrap
Disallow: /ban-ip
Ban anyone from reading your bab-ip.txt in .htaccess:
SetEnvIfNoCase Request_URI ban-ip\.txt ban
<Files ~ "^.*$">
order allow,deny
allow from all
deny from env=ban
</Files>
Force everyoneto go through your home page by placing this code in all the child pages:
<?php
$engine = file('/path/to/indexer.txt');
$ref = getenv('HTTP_REFERER');
$ua = $HTTP_SERVER_VARS['HTTP_USER_AGENT'];
$home = "yourwebsite.com" ;
$browse = 0 ;
if (stristr($ref, $home))
{
$browse = 1 ;
}
foreach( $engine as $indexer )
{
$indexer = rtrim( $indexer ) ;
if (stristr($ua, $indexer))
$browse = 1 ;
}
if ($browse == 0)
{
header("Location: http://www.yourwebsite.com/index.php");
}
?>
Stick this at the top of the index page:
<?php
$sandtrap = file('/path/to/ban-ip.txt');
$ua = $HTTP_SERVER_VARS['HTTP_USER_AGENT'];
$ip = $REMOTE_ADDR ;
$punish = 0;
if ( $ua == "" )
{
$punish = 1 ;
}
foreach( $sandtrap as $blockip )
{
$blockip = rtrim( $blockip ) ;
if (stristr($ip, $blockip))
$punish = 1 ;
}
if ( $punish == 1 )
{
echo "<HTML><head><title>Access Denied</title></head>
<p>The software you are using to access our website is not allowed.</p>
}
?>
In the file indexer.txt place all the good bots:
Ask Jeeves
FAST-WebCrawler
GoogleBot
ia_archiver
IBM_Planetwide
Inktomi
Scooter
Slurp
WISENutbot
|