Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Ravens PHP Scripts And Web Hosting Forum Index -> Raven's RavenNuke(tm) v2.02.02 Distro
Author Message
Bluezzz
Involved
Involved



Joined: Feb 08, 2005
Posts: 290
Location: USA

PostPosted: Fri Sep 22, 2006 4:59 pm Reply with quote

I have the same robot.txt in the /public_html/ directory and I do my /nuke/ directory. However, these directories do not contain the same folders and files obviously ... nuke is a directory (folder) under the public_html directory. Am I to assume that each needs to list their own folders/files I want to disallow?

One site I saw said to do a robot.txt file disallowing the following (which obviously is for the site's /public_html/ folder:
/_private/
/_vti_bin/
/_vti_cnf/
/_vti_log/
/_vti_pvt/
/_vti_txt/
/cgi-bin/
Being private folders and not being in my nuke folder I assume I need a whole new robot.txt for the public_html folder listing those folders above I don't want accessed? Do I also need one for the www folder?

Should googlebot be able to view ... ?
--> Your_Account
--> Private_Messages
My IP_Tracker says some of these are being accessed by googlebot even tho the /Modules/ and /Admin/ folders are listed as Disallow for the robots.txt

Should I do these two steps as indicated by a site I was reading for the robots.txt?
Ban Alta-Vista *Scooter* bot altogether
--> User-Agent: Scooter
--> Disallow: /

Ban *Googlebot-Image* bot altogether
--> User-Agent: Googlebot-Image
--> Disallow: /

Finally, should I allow or disallow these two folders in the nuke robots.txt?
--> /Experimental/
--> /import/

IP_Tracker is also showing these, who are they???
164.82.146.3 - gw1.dc.gov
162.58.0.224 - nat.jccbi.gov
I know they're US Gov't sites ... I'm just not sure why they'd be visiting my lil space on the web LOL Are they ok or should I ban em?

_________________
Bluezzz
~ Stop & smell the roses, while you can! ~ 
View user's profile Send private message
Guardian2003
Site Admin



Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam

PostPosted: Fri Sep 22, 2006 5:42 pm Reply with quote

Bluezzz wrote:
I have the same robot.txt in the /public_html/ directory and I do my /nuke/ directory. However, these directories do not contain the same folders and files obviously ... nuke is a directory (folder) under the public_html directory. Am I to assume that each needs to list their own folders/files I want to disallow?

One site I saw said to do a robot.txt file disallowing the following (which obviously is for the site's /public_html/ folder:
/_private/
/_vti_bin/
/_vti_cnf/
/_vti_log/
/_vti_pvt/
/_vti_txt/
/cgi-bin/
Being private folders and not being in my nuke folder I assume I need a whole new robot.txt for the public_html folder listing those folders above I don't want accessed?

You would only need those extra directories listed if they exist. You would only normally find those on a Windows server or a server which has had the Windows Frontpage extensions added.
Quote:

Should googlebot be able to view ... ?
--> Your_Account
--> Private_Messages
My IP_Tracker says some of these are being accessed by googlebot even tho the /Modules/ and /Admin/ folders are listed as Disallow for the robots.txt

There is nothing be gained by letteing google or other bots access those and it will just waste bandwidth.

Quote:

Should I do these two steps as indicated by a site I was reading for the robots.txt?
Ban Alta-Vista *Scooter* bot altogether
--> User-Agent: Scooter
--> Disallow: /

Ban *Googlebot-Image* bot altogether
--> User-Agent: Googlebot-Image
--> Disallow: /

That is entirely a personal preference thing - Sentinel wil block a lot of bad bots by default any way.
Quote:

Finally, should I allow or disallow these two folders in the nuke robots.txt?
--> /Experimental/
--> /import/

They shouldnt be left on the server so, no.
 
View user's profile Send private message Send e-mail
Bluezzz







PostPosted: Fri Sep 22, 2006 6:40 pm Reply with quote

The server I'm on is linux.

I don't want google picking up my images so I will add that one to both public and nuke directories ... altavista one I'm not sure why it would matter so I'll leave it out.

I've removed these two folders ... thank you for telling me : o}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More questions ...

1) So then make a new robots.txt file to keep googlebot (and others) out of the main public_html directory blocking access to the _private folders?

2) Why is googlebot getting access to Your_Account and Private_Messages when both are covered under /modules/ for Disallow? I noticed that someone used /modules (no last backslash) ... which is correct in robots.txt ... ?
/modules/
or
/modules

3) What about those two gov't sites that IP_Tracker is showing? Are these ok to leave on or should they be banned or what? Normal or not?
 
Guardian2003







PostPosted: Fri Sep 22, 2006 11:54 pm Reply with quote

You dont need to make a new robots.txt, you simply need to add those folder paths to it if you don not want bots to crawl theose folders.

I need to explain a bit more fully for others whilst replying to your point 2 above.
the normal robots.txt directive for /modules is to stop well behaved bots from crawling inside the actual folder itself. As all nuke sites have links to SOME of the files inside that folder such as all the index.php files in all of the different modules, those will quite naturally and legititimately get crawled and indexed.
So to put it in really simple terms, it tells the bots to not go 'snooping' in that folder but it is allowed to follow other links even if they go to files within that folder.


As for the gov't sites, I'm going to leave you to research those but you would need to find out the IP for those URL's then also cross check to see if the IP's give you back the web address. If not why not?
 
Bluezzz







PostPosted: Sat Sep 23, 2006 1:36 am Reply with quote

Thanks for the clarification. I'll leave that bot to wander for now and if I find out it's cataloging those particular folders I'll add them in the Disallow. I did go ahead and add the _ Private ones also tho.

As for the two gov't sites they are:

164.82.146.3 gw1.dc.gov
162.58.0.224 nat.jccbi.gov

Seems they show up for others' tracking also as I did a google search on them. The 1st one is nothing with gw1 but without that it's Washington, DC gov't page ... the second one (jccbi.gov) returned failed pages for url with and without nat. I dunno why they'd be interested in my site LOL ... it's just a lil ole thang and very new yet! Man they're quick!

0.o @ Big Brother LOL
 
Display posts from previous:       
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Ravens PHP Scripts And Web Hosting Forum Index -> Raven's RavenNuke(tm) v2.02.02 Distro

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©