Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> How To's
Author Message
Serafim
Worker
Worker



Joined: Mar 25, 2006
Posts: 109
Location: Delaware Usa

PostPosted: Sat Apr 01, 2006 10:05 am Reply with quote

Well being a newb i once again assumed that this file would work.. Problem is I have to set it up.. What I would like to do is get this sites imput.. Since you understand exactly how these things work.. I would like to have a robots.txt that keeps bots out of all areas except the main index.. How would this be accomplished.. TIA

_________________
Image 
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger
Raven
Site Admin/Owner



Joined: Aug 27, 2002
Posts: 17088

PostPosted: Sat Apr 01, 2006 11:17 am Reply with quote

Here is pretty much the base standard. note that you will only have /abuse/ if you have NukeSentinel(tm) installed.

User-agent: *
Disallow: /abuse/
Disallow: /admin/
Disallow: /blocks/
Disallow: /cgi-bin/
Disallow: /db/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /modules/
Disallow: /themes/
Disallow: /admin.php
Disallow: /config.php
 
View user's profile Send private message
Serafim







PostPosted: Sat Apr 01, 2006 11:35 am Reply with quote

Ok this is what i have


User-agent: Mediapartners-Google*
Disallow:
User-agent: *
Disallow: admin.php
Disallow: /admin/
Disallow: /images/
Disallow: /includes/
Disallow: /themes/
Disallow: /blocks/
Disallow: /modules/
Disallow: /language/

Can something be added here to make this better.. Or how can i just disallow all bots in general from everything but index.php.. I would assume that for search engines they would need to have something to probe

And thanks Raven for responding
 
kguske
Site Admin



Joined: Jun 04, 2004
Posts: 6432

PostPosted: Sat Apr 01, 2006 3:22 pm Reply with quote

I'd go with Raven's suggestion. You don't want spiders hitting your admin and config files. Google will respect the robots.txt file, so there's no need to have a special user-agent for that.

_________________
I search, therefore I exist...
nukeSEO - nukeFEED - nukePIE - nukeSPAM - nukeWYSIWYG
 
View user's profile Send private message
Serafim







PostPosted: Sat Apr 01, 2006 3:28 pm Reply with quote

So the code that he posted i just copy and paste that in the file called robots.txt
 
kguske







PostPosted: Sat Apr 01, 2006 3:57 pm Reply with quote

That will work. If you don't have all the directories, it won't matter since you're telling spiders not to look there anyway...
 
Serafim







PostPosted: Sat Apr 01, 2006 4:05 pm Reply with quote

Ok thanks I will use that instead of whats there.. And I assume that when you list something as a folder and want to protect all its contents you use /folder/
I have a few other areas in my root for test sites and things that I wish to include..
 
Guardian2003
Site Admin



Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam

PostPosted: Sat Apr 01, 2006 4:33 pm Reply with quote

Serafim - You may need to look at your file again it is missing the slash before admin.php
If you copied and pasted the example Raven gave, that one is correct.

If you wanted to automatically block bots that ignore the robots.txt file that is slightly more complicated Wink

There are example scripts if you google but a method I have found which is very dirty but effective is to place a url in your robots.txt that will trigger Sentinel - so when a bot ignored the robots.txt instruction not to visit that url, Sentinel is triggered and blocks the IP Smile
 
View user's profile Send private message Send e-mail
Serafim







PostPosted: Sat Apr 01, 2006 6:17 pm Reply with quote

Sweet well since they got dirty and ignored robots.txt then fair is fair.. Can you give an example that I may use.. Thanks for all the help
 
Guardian2003







PostPosted: Sun Apr 02, 2006 3:57 am Reply with quote

See the first disallow in my file [ Only registered users can see links on this board! Get registered or login! ]
 
montego
Site Admin



Joined: Aug 29, 2004
Posts: 9457
Location: Arizona

PostPosted: Sun Apr 02, 2006 7:12 am Reply with quote

That is just too funny! How many "bots" have you caught this way?

I personally would rather know a true exploit vs. this "dirty bot" one, so I'd have to keep a close eye on the bans and if I find it is from a bot, I would probably just add them to my .htaccess file in my "bad bot" section, then unban them.

I would never have thought of that solution! Thanks Guardian Wink

_________________
Where Do YOU Stand?
HTML Newsletter::ShortLinks::Mailer::Downloads and more... 
View user's profile Send private message Visit poster's website
Guardian2003







PostPosted: Sun Apr 02, 2006 7:35 am Reply with quote

Montego - yes I see your point and a very valid one it is too.
For your own purposes, you can add something like
Quote:
?id-TRAPPED
to the end of that string so it can be identified.
i.e. When the url string is emailed via Sentinel or viewed from your server logs, if the url does not have the ?id=TRAPPED at the end it was a true exploit attempt. Wink

I have only 'trapped' about 6 bots using this method. Two of them were mass 'website downloader' type progs so it has certainly been a worthwhile experiment.
I know 6 isnt that any but when you consider that Sentinel is blocking a lot by default, 6 is quite a lot in three months since I have used this.

I suppose one could even create a unique string to an image file which you could trap by creating a 'script blocker' in Sentinel for bad bots that are looking specifically for image extensions.
Hmm now thats a thought............
 
montego







PostPosted: Sun Apr 02, 2006 7:42 am Reply with quote

This is just so simplistic! I absolutely love it!
 
Guardian2003







PostPosted: Sun Apr 02, 2006 7:53 am Reply with quote

A new tag line.........
Sentinel does the hard work, so you don't have to.
 
Serafim







PostPosted: Sun Apr 02, 2006 9:42 am Reply with quote

LOL ok You lost me but I will add that to my robots.txt and see what happens. And just add the (?id-TRAPPED) to the end of the string and I will know the dirty Bot got trapped... When you do catch one can they be reported or is that a mute point..
 
Guardian2003







PostPosted: Sun Apr 02, 2006 9:59 am Reply with quote

You can report them if you wish but unless you suddenly start gets lots of bans I wouldnt worry about it.
The main thing is, you are now automatically banning bad bots and saving precious (to some) bandwidth.
If you get the time, it is always worth following up by doing an IP trace and noting the results somewhere.
As I am quite lazy at times, I just open the email notification from Sentinel, and reply to myself adding any notes then save that email into a special folder in my mail software (Outlook).
 
Serafim







PostPosted: Sun Apr 02, 2006 12:26 pm Reply with quote

I wish to thank you for all your helpful tips and tricks.. Within moments of installing that string I busted 2 dirty bots and they were banned.. That is two funny ROTFL
 
kguske







PostPosted: Sun Apr 02, 2006 12:33 pm Reply with quote

Elegant idea, Guardian. Well done!
 
Guardian2003







PostPosted: Sun Apr 02, 2006 2:29 pm Reply with quote

Thank you kguske. If it helps this community and other nukers fight back in their war against such 'visitors' then I'm a happy chappy.
 
montego







PostPosted: Wed Apr 05, 2006 7:06 am Reply with quote

A word of caution that I thought of after posting to this thread: You MUST have NukeSentinel's UNION blocker turned ON at ALL times with "Block" turned on. Otherwise, you may have just had that "bot" cache your superadmin password!!!!!!!!!

Use with extreme caution or "dummy down" the "exploit" so that it still trips NukeSentinel but does NOT display anything meaningful to the bot if for some odd reason you accidentally have this turned off.
 
Serafim







PostPosted: Wed Apr 05, 2006 2:25 pm Reply with quote

Dummy Down?? was that a crack at me lol.. No really could you explain the dummydown thing or give some sort of example.. I have union blocker on and set to email block and forward.. The forward goes to pc killer..
 
zzb
New Member
New Member



Joined: Jun 05, 2005
Posts: 22
Location: USA

PostPosted: Sat Apr 29, 2006 7:09 pm Reply with quote

Here is a link that involves using the rewrite engine and two trapping directories... in addition traps mail harvesters....


http://ars.net/bots/


http://www.fleiner.com/bots/#identify

I have caught a few with this method as well !!

Cheers.
ZZ
 
View user's profile Send private message Visit poster's website
montego







PostPosted: Sun Apr 30, 2006 9:04 am Reply with quote

Quote:

Use with extreme caution or "dummy down" the "exploit" so that it still trips NukeSentinel but does NOT display anything meaningful to the bot if for some odd reason you accidentally have this turned off.


Sorry, Serafim, I must have missed your original question above about my comment. What I was referring to was not having the union which shows your nuke_authors table data. Instead of doing a union on that table, I'd try something "benign" like one of your empty tables or create a new table with nothing in it and have the u nion select go against that table instead.

It is just a cautionary measure to ensure you are not inadvertently giving up admin users and passwords to be cached by the search engine. (NOt sure if it will, but why take the change? In fact, a human hacker could use this as an exploit if one happens to forget and leave that blocker off.)
 
Serafim







PostPosted: Sun Apr 30, 2006 9:14 am Reply with quote

No problem Monetego. I have all blockers active except flood protection active. (Still waiting on the fix for that.) I am the only admin that has acess to sentinel so the chances that the blockers will be shut off are slim. However I am still new to the whole PHP NUke world. Do you perhaps have an alternate code that I may add to my robots text that poses less of a threat to my database info.. TIA
 
zzb







PostPosted: Sun Apr 30, 2006 9:14 am Reply with quote

This thread is very interesting. I suspect that combining the power of Apache and some of the trapping methods with violation of robots.txt protocol above one might come up with a method of also notifying the admin eMail when the trap has been sprung. That would confirm a bad robot indeed! Rather than checking through server logs. In otherwords if the robot is trapped it deserves to be banned regardless of what it is used for. At least that is my opinon.

There was an Apache script I recall that I will try and share here at this site. Perhaps those here with a better understanding on the internals of Nuke Sentinel might be able to customize it to set up a fool proof set of traps that would leave no doubt you would want to ban the offending bot. If I can find it I will post the code for you guys.
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> How To's

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©