Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> NukeSentinel™
Author Message
bergman
New Member
New Member


Joined: Mar 07, 2005
Posts: 10

PostPosted: Mon Mar 07, 2005 8:38 pm Reply with quote

A few days ago I tested downloading my site with httrack to see if everything worked. But the sentinel didnt block it.
Now, I have just upgraded to sentinel 2.0, nothing has changed. You can download the whole site with these harvest softwares.
Here is my configuration for harvesters...

Only registered users can see links on this board! Get registered or login!

Does anyone know what it may be..
 
View user's profile Send private message
sixonetonoffun
Spouse Contemplates Divorce


Joined: Jan 02, 2003
Posts: 2496

PostPosted: Mon Mar 07, 2005 9:12 pm Reply with quote

Since its based on the user agent there is really only so much that can be done. If the user agent is spoofed or changed it will get around the filter list.

_________________
[b][size=5]openSUSE 11.4-x86 | Linux 2.6.37.1-1.2desktop i686 | KDE: 4.6.41>=4.7 | XFCE 4.8 | AMD Athlon(tm) XP 3000+ | MSI K7N2 Delta-L | 3GB Black Diamond DDR
| GeForce 6200@433Mhz 512MB | Xorg 1.9.3 | NVIDIA 270.30[/size:2b8 
View user's profile Send private message
southern
Client


Joined: Jan 29, 2004
Posts: 591
Location: Texas

PostPosted: Mon Mar 07, 2005 11:47 pm Reply with quote

Makes sense. Besides I like httrack lol

_________________
Computer Science is no more about computers than astronomy is about telescopes.
- E. W. Dijkstra 
View user's profile Send private message Visit poster's website MSN Messenger ICQ Number
bergman
PostPosted: Tue Mar 08, 2005 5:34 am Reply with quote

No, I am sure that previous versions of sentinel, dont remember which one, could block harvest softwares. I had also tested with different softwares and all they got blocked.

Is there a way to block it then?

Iam tired of users who wants to download my site. I get like 3000 httrack hits per day.
 
sixonetonoffun
PostPosted: Tue Mar 08, 2005 7:29 am Reply with quote

I guess what I would do is take the exact user agent from your Apache log and compare it to the entries in your block list.

You can test different user agents with kmeleon.sourceforge.net, sam spades text browser or an online test to find what is working and not working.
 
southern
PostPosted: Tue Mar 08, 2005 12:13 pm Reply with quote

OK a bit of research finds

RewriteCond User-Agent: .*(Tele|WebZIP|Crawl|Control|Offline|Fetch|Miner|HTTrack|Ninja|Online|Fresh|NetAnts|Reaper|Wget|archiver|GetRight|Copier|DA|Stripper|Pockey|Flash).*
RewriteRule (.+) ?1$2:/noleech.html\?$2 [L]

from
Only registered users can see links on this board! Get registered or login!

What you need to do is put a rewrite rule in your htaccess so any harvester not covered by Sentinel gets stopped before it reaches your site. You can redirect it then to any page you want. Try a google on Httrack user agent if you crave more info...
 
southern
PostPosted: Tue Mar 08, 2005 1:21 pm Reply with quote

Luckily for you, bergman, I'm interested in this issue myself lol
Okeydoke a bit more funtime on google reveals

Code:


<?php
  //filename: websiteguard.php
  //--------------------------------------------------------------//
  // Purpose: To deny access for spambots, spybots and other bad agents.
  //          When the useragent is a goodone it allows, otherwise your
  //        php page will stop working and
  //        protects your website from badbots.
  // Inputs: UserAgent string
  // Author: Vivek [ webmaster AT allthewebsites DOT org ]
  // Version: 1.0.0
  //---------------------------------------------------------------//
  $thisAgent  = $HTTP_SERVER_VARS["HTTP_USER_AGENT"];
  //--- Call the function
   WebsiteGuard();
  //---------------------------------------------------------------//
 function WebsiteGuard()
 {
  global $thisAgent;
  $isDenied = false;
  if (preg_match("/webzip|httrack|wget|FlickBot|downloader|production

bot|superbot|PersonaPilot|NPBot|WebCopier|vayala|imagefetch|Microsoft URL

Control|mac finder|emailreaper|emailsiphon|emailwolf|emailmagnet|emailsweeper|Indy

Library|FrontPage|cherry picker|WebCopier|netzip|Share Program|TurnitinBot|full web

bot|zeus/i",$thisAgent))
  {
     $isDenied = true;
     // Customize this message :-)
     print("Do not disturb...Zzz...\n");
       exit(); 
  } 
 }
//--------------------------------------------------------------//
?>
Only registered users can see links on this board! Get registered or login!

How to use this script?
just "include" the script at the top of every php page.*

Advantages

It will allow good bots like googlebot, webcrawler, ia_archiver etc., to crawl your website and list your website in Search engines.
It prevents human users from downloading your entire website by using Offline browsers.
It does not utilize .htaccess, so this code is portable in almost all the platforms which can run PHP scripts (Windows, Unix, Linux etc).
Can be used in Apache (Linux, *nix ) and as well as IIS (Windows).
Simple and Light-weight.

Few limitations and disadvantages.

If the useragent is unknown or empty or not provided, it would allow the browser to view the webpage. So protect your emails by using Javascript method.
List of bad bots is not yet complete. It is just a partial list.
If all the available badbots are added, it would slightly degrade the performance of your script.
Will not prevent HTML (.html, .htm) pages from the badbots. Use .htaccess (Apache) or some other scripts.
Even IPs can be blocked. But that part of the code is not shown, since they are specific to each website.

*Just include it at the top of mainfile.php- AFTER the Sentinel code!- as that is called by every other php page.
Hope this helps, it did me Smile
 
southern
PostPosted: Tue Mar 08, 2005 4:21 pm Reply with quote

The above code works fine. I included websiteguard.php in my mainfile.php, d/led and installed Httrack and ran it against my own website and it didn't get anything. Of course I have all my 'critical' files plus all directories in my robots.txt and Httrack respects that unless turned off in Options... anyway I got a Get out of here!, which I put in the websiteguard code so I'm sure it blocks Httrack as well as other harvesters. Voila bergman your prob is solved, and mine lol
 
bergman
PostPosted: Tue Mar 08, 2005 5:16 pm Reply with quote

I shall put it in my mainfile and test it now...
 
bergman
PostPosted: Tue Mar 08, 2005 5:26 pm Reply with quote

Yes, it really works. This is really a must. Thank you southern.
 
montego
Site Admin


Joined: Aug 29, 2004
Posts: 9455
Location: Arizona

PostPosted: Tue Mar 08, 2005 5:46 pm Reply with quote

Sounds like a good addition to a future Sentinel release????

_________________
Only registered users can see links on this board! Get registered or login!
Only registered users can see links on this board! Get registered or login! 
View user's profile Send private message Visit poster's website
hitwalker
Sells PC To Pay For Divorce


Joined:
Posts: 5661

PostPosted: Tue Mar 08, 2005 6:31 pm Reply with quote

mmm Confused
so what does this mean....?
nice try bob but your "i block harvesters" doesnt work ?

anyone ever came up with the idea of going to the website of the authors that create the harvester scripts?

Personaly i think that the attacks are less now then months ago...

I still get friendly bots and they behave but 3000 hits by harvesters a day ?
And they cant even grab something usefull...
 
View user's profile Send private message
southern
PostPosted: Tue Mar 08, 2005 9:06 pm Reply with quote

You're a cynic, hitwalker lol Sentinel does block a lot of harvesters but there are some it doesn't catch. I've gone to the site of Httrack, interesting and btw in it's EULA Httrack asks that it NOT be used for harvesting so if it is its a misuse... anyway it is open source gpl licensing same as nuke and Filezilla so it can't be all bad. I dunno where other harvesters come from but if I find out I'll use Httrack to download their site. Smile

Glad to help bergman Smile
 
hitwalker
PostPosted: Wed Mar 09, 2005 3:58 am Reply with quote

synic ?...ha...ha...its a dutch thing i think...lol
im not to concerned about website harvesters.
maybe 2 a month get banned thats all.
And by visiting the website of those who create the progs that copy websites i mean...
it would be easier getting to know the prog and finding ways to reject it on visit.
 
bergman
PostPosted: Wed Mar 09, 2005 9:27 am Reply with quote

Code:
anyone ever came up with the idea of going to the website of the authors that create the harvester scripts?


??????????????????

Code:
And by visiting the website of those who create the progs that copy websites i mean... 


What is it about?
 
hitwalker
PostPosted: Wed Mar 09, 2005 9:40 am Reply with quote

anything not clear?
 
southern
PostPosted: Wed Mar 09, 2005 2:31 pm Reply with quote

I think it's a good idea to visit harvester sites and I'll do it if I can find where to go... ah a new project lol I don't get too many harvesters either like 2 or 3 a month so I got into this as a kind of knowledge for knowledge sake. Wink
 
hitwalker
PostPosted: Wed Mar 09, 2005 2:37 pm Reply with quote

well i mean sites who create the software...
the grabbers...
the site copieers...
if you can understand the technique they use it should be simple enough to write something to reject them...ban them or whatever...
 
bergman
PostPosted: Wed Mar 09, 2005 5:33 pm Reply with quote

Southern, Sentinel 2.2 has been updated, if you read the news on nukescripts.net This one really blocks harvesters, without webguard.php.
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> NukeSentinel™

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©