Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Security - PHP Nuke
Author Message
Guardian2003
Site Admin



Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam

PostPosted: Fri Oct 28, 2005 2:46 pm Reply with quote

I moved my site to a new dedicated server a couple of weeks ago and have just noticed some quite high server resource useage.
Checking the server logs, there is one heck of a lot of requests for
Quote:
GET / HTTP/1.1

Should this be of any concern?
Not having seen this before I am unsure of whether it is a search engine bot or something else.
I have also noticed this request being attributed to another nuke site on the server so think it might possibly be some server misconfiguration on my part - possibly?
Operating system Linux
Kernel version: 2.6.13.1.dn1.p4
Apache version 1.3.33 (Unix)
PERL version 5.8.7
PHP version 4.3.11
MySQL version 4.1.13-standard


Last edited by Guardian2003 on Sat Oct 29, 2005 11:11 pm; edited 1 time in total 
View user's profile Send private message Send e-mail
Raven
Site Admin/Owner



Joined: Aug 27, 2002
Posts: 17088

PostPosted: Fri Oct 28, 2005 2:54 pm Reply with quote

Looks like an intentional flood to me, based strictly on what you have posted. They are probably using an automated program to flood you with HEAD requests.
 
View user's profile Send private message
Guardian2003







PostPosted: Fri Oct 28, 2005 3:02 pm Reply with quote

Yes that could very well be the case.
Most of the GET requests seem to follow immediately after a call to the main domain url with just the odd url here and there being appended with the GET request.
I think I should take a closer look at the incoming IP's - but there are sooo many lol. Though I think some if not all could well be spoofed anyway - logs show over 800 gogle bots on my site 13:30 today which is way, way to many for my site which normally only see about 100 at a time.

Thanks for your insight Raven, I will do some more digging.
 
Guardian2003







PostPosted: Sat Oct 29, 2005 3:07 pm Reply with quote

It seems the culprit is a hundred or so inktomisearch.com bots crawling hundreds of pages each.

I have added a crawl delay to my robots.txt file to see if that will slow them down enough, if not I will have to start banning them Sad

*update - Crawl-dealy directive had little effect, Yahoo just sent more bots so I am banning Slurp entirely until I can resolve this issue.
 
j_felosi
Regular
Regular



Joined: Oct 17, 2005
Posts: 51

PostPosted: Sat Oct 29, 2005 5:41 pm Reply with quote

Oh yeah them inkotmi bots are vicious but they will obey robots.txt so you can disallow. I had compiled a list of all the bad bots over last 2 years and they are number 1. Ive heard some stories about them suckers draining 100 mbs of bw in a day. Im just happy with google, I dont know what the deal with yahoo is, your not the only person this has happend to.
 
View user's profile Send private message
Guardian2003







PostPosted: Sat Oct 29, 2005 11:16 pm Reply with quote

Looks like I may have to contact Yahoo over its bots behaviour, even with Deny All in robots.txt it is STILL sending new bots to the site and according to one link popularity checker I use, the number of pages listed at Yahoo has gone from 400 to over 32,000 in a couple of days.

Another naughty bot I spotted is
81.177.7.37 cp29.agava.net
this one also seem to be putting requests for several hundred pages at a time - though the range has now been banned.
 
j_felosi







PostPosted: Sun Oct 30, 2005 12:22 am Reply with quote

You must not have sentinel huh? You can ban their useragent with sentinel or just ban their ranges. That be a big help, dont waste your time with yahoo . Ive heard stories about people thinking they was under ddos attack cause of them bots. I have them denied in my robots.txt and sentinel. If it ignores robots.txt then that is almost illegal I think, I always thought they obeyed mine. But yeah if you have sentinel you can ban their user agent and each bot that comes will get banned. I say they have ranges of ips just for them bots so It probably wouldnt keep out any legitimate users to ban every range they come on. Good Luck, by the way what type of site do you have?
 
Guardian2003







PostPosted: Sun Oct 30, 2005 5:21 am Reply with quote

Yes I have Sentinel and the first range block I did resulted in 4 emails from registered users saying they were banned lol.
I have not looked at banning the user-agent in Sentinel so perhaps that might work better.

As for the type of site I have, just click the WWW link below this post
 
j_felosi







PostPosted: Sun Oct 30, 2005 5:04 pm Reply with quote

ok cool, Well I say if you block that useragent it will do it, also there is other useragents for yahoo bots. I think the inkomi or whatever is some company yahoo has contracted. Hope you the best, i seen your admin message but your site was flying so it looks like it may be over. Thanks for the new useragent on that bad bot you posted above. I have a lil trick I like to do to find out the bad bots or to find out the ones that actually goes to rbots.txt to look for stuff to crawl. What you do is at the end you make user agent *
disallow /email_addresses/ then you make a page with a spambot trap or my favorite is attrition.org's spambot trap which is email addresses of all the people in charge of anti-spam organizations so the spambots will get their email and spam them, that will usually take care of them. But anyway, check your awstats and see what ips are hitting the email addresses page and ban em immediately. Those kind of bots are bad they wil crawl on stuff like admin pages and everything, I dont know how but they do. Thats where they get the term googledork, its people who dont do their robots and permissions right and their senstive info ends up on the net. Hope that helps. I have compiled a list of all the bad bots Ive seen do that and just overall bad behavior, if you want it Ill email it to you
 
Guardian2003







PostPosted: Sun Oct 30, 2005 11:22 pm Reply with quote

Thanks for the additional information.
For now I have banned all bots, including google - any that return will be recieving a Sentinel special LOL.

I have also sent Yahoo a bill for my bandwidth costs, just to see what they do.

At its peak, they were consuming nearly 1Gig of bandwidth an hour and although they didnt manage to crash the server, it did come pretty close.
 
j_felosi







PostPosted: Sun Oct 30, 2005 11:26 pm Reply with quote

Good luck although you probably wanna keep googlebots, they behave well and help promote your site.
 
Steptoe
Involved
Involved



Joined: Oct 09, 2004
Posts: 293

PostPosted: Mon Oct 31, 2005 12:57 pm Reply with quote

Quote:

they were consuming nearly 1Gig of bandwidth an hour

hmm I have look smart, yahoo, Searchnz, google, msn and couple others crawling our site almost non stop,
Total bw for our modest site /month is about 1.5 to 2.5 gig a month up and down.
Where do u take your bw readings from? Are u sure u are only measuring Apache and mail server usage?
With good meta tags, robots.txt and Google site map u can restrict them to the areas they only need to go. A bit of editting of files, links that are not needed for those not logged on can be removed....poster/author details, group cp, contact details etc

_________________
My Spelling is NOT incorrect, it's Creative 
View user's profile Send private message
Guardian2003







PostPosted: Mon Oct 31, 2005 2:22 pm Reply with quote

Reading was taken from cpanel/ awstats for the domain not the servers WHM so I guess there is room for some discrepancy.
Even so, there was no excuse for the inktomi bot behaving in such an unfriendly manner. Robots.txt has been used to ban all bots so I can see more clearly which remaining bots are ignoring it so I can ban them.
I have had over 100 google bots on site many times but they never gave me any problems.
I have removed the google bot ban already as, quite frankly I dont want to stop them crawling the site - just want to get a better feel for the ones that are ignoring robots.txt

Normally I only see around 7 to 8 Gig b/w a month for this domain, though it has been growing steadily for the last 3 or 4 months.

Good suggestion about restricting access to some stuff to registered users only and thats something I should definitely look at. Obviously, there is stuff that I want crawling so it has to be 'public' but equally, there is stuff that badly behaving bots can be restricted from accessing that doesnt need cawling.
 
Steptoe







PostPosted: Mon Oct 31, 2005 2:36 pm Reply with quote

Quote:

Good suggestion about restricting access to some stuff to registered users only
[ Only registered users can see links on this board! Get registered or login! ]
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> Security - PHP Nuke

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©