Ravens PHP Scripts: Forums
 

 

View next topic
View previous topic
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> General/Other Stuff
Author Message
hitwalker
Sells PC To Pay For Divorce



Joined:
Posts: 5661

PostPosted: Thu Aug 11, 2005 4:17 pm Reply with quote

Ok here's the story..

With CoffeeCup SiteMapper i created the google sitemap.
It works perfect when you have a html site or when you use url-rewrite (like with googletap).
So in this case it would work great with ravens forums and site (enjoy the program raven Smile)
The program can create the google map locally or it can index your site.
But i have urls like :
Code:
http://www.hitwalker.nl/phpx/html/modules.php?name=cpaneluserguide-addingAnonymousFTPmessage.html

But when i let the program index my site locally it created links like :
modules/CPanel_User_Guide/addingAnonymousFTPmessage.htm
And thats not what i wanted..cause it would address the page directly,and did not prefer that.

So i did a mass replace of urls in the index.html and xml file when it was created ...
You can see here : [ Only registered users can see links on this board! Get registered or login! ]

The xml version is here :http://www.hitwalker.nl/sitemap.xml
As you can see error...
Using the validator [ Only registered users can see links on this board! Get registered or login! ]

Use directly :http://www.feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.hitwalker.nl%2Fsitemap.xml

What i dont understand is that the error points to the >"=" as it shows at the sitemap.xml

Can anyone give a hand as to solve this...?
 
View user's profile Send private message
Guardian2003
Site Admin



Joined: Aug 28, 2003
Posts: 6799
Location: Ha Noi, Viet Nam

PostPosted: Thu Aug 11, 2005 9:57 pm Reply with quote

hitwalker, please PM me a link to a text file of your sitemap and I'll take a look for you.
The link you provided indicates an error on line 56, also, I see the line
Code:
<loc>http://www.hitwalker.nl/phpx/html/modules.php?name=CPanel_User_Guide&page=addingAnonymousFTPmessage.htm&l...

All references to the entity '&' need to be changed to '&amp;' (without the single quotes) for google to accept the sitemap and be compliant in that respect.
CoffeeCup should have automaitcally changed those references if it was going to produce compliant xml which makes me wonder if it is not simply 'indexing' files with no real thought to converting entities to be xml compliant.

I couldnt check that much of your xml file as the error halted it and prevents me from getting the full source as text but what I did check, I found 24 errors.

You might also want to consider trying the SoftPlus GSiteCrawler software (free) as discussed in another forum thread, I have found it the best so far. If you use the GSiteCrawler though, make sure you have your IP protected in Sentinel if you have Ddos prevention turned on Smile
 
View user's profile Send private message Send e-mail
hitwalker







PostPosted: Fri Aug 12, 2005 3:50 am Reply with quote

send pm.
 
Guardian2003







PostPosted: Fri Aug 12, 2005 6:30 am Reply with quote

Done! - have PM'd you.
 
hitwalker







PostPosted: Fri Aug 12, 2005 6:33 am Reply with quote

Pm ?huh?
Didnt get anything..
 
Guardian2003







PostPosted: Fri Aug 12, 2005 6:43 am Reply with quote

Just sent it LOL
 
hitwalker







PostPosted: Fri Aug 12, 2005 6:55 am Reply with quote

got it thanks,almost didnt.
seems to be a problem with the internet getting parts of the uk. Sad
But it works indeed...
How simple things can messup a xml file....
 
softplus
New Member
New Member



Joined: Jul 31, 2005
Posts: 8
Location: Switzerland

PostPosted: Wed Aug 24, 2005 1:42 am Reply with quote

Hi hitwalker
Did the file come out of CoffeeCup like that? Ouch. I see you replaced the & with &amps; BUT you still have another (typical american) problem -- your last-mod dates are incorrect:
<lastmod>2005-08-09T14:32:38+01,00</lastmod>
is not a correctly coded date/time in ISO 8601.

It seems that CoffeeCup only works when you set your regional settings to US Smile Smile - this is something that very many us-based companies mess up, "let's think about the rest of the world later".

The tag should be:
<lastmod>2005-08-09T14:32:38+01:00</lastmod>

Note the "," in the timezone offset to UTC should be a ":". To be on the safe side, my GSiteCrawler gives all times in UTC (easier to check). Perhaps you can adjust CoffeeCup to only store dates? Usually you don't really need the exact time of a URL...

Good luck with search+replace ... every time you make a sitemap Sad

If you want, I can send you a GSiteCrawler-generated sitemap-file to compare. Or just download it and try it for yourself Smile

Cheers
John
 
View user's profile Send private message Visit poster's website
hitwalker







PostPosted: Wed Aug 24, 2005 3:14 am Reply with quote

hi thanks for the reply,

well the huge search and replace was my own decission cause coffeycup works ok but generated the links in a way i dont like,specialy the ones of the phpnuke howto mod..
For example..
An url like :
modules.php?name=PHP-Nuke_HOWTO&page=xoops-vs-post-nuke.html
would come out like :
modules/PHP-Nuke_HOWTO/xoops-vs-post-nuke.html
And thats creating links directly to the html file.
But i will try yours and let you know...
 
softplus







PostPosted: Wed Aug 24, 2005 3:55 am Reply with quote

One thing you need to be aware of: If a crawler finds a link like that (modules/PHP-Nuke_HOWTO/xoops-vs-post-nuke.html) then it's linked somewhere in your website like that. So if you submit a sitemap with nukemanual-xoops-vs-post-nuke.html (which is usually better, thats why you use url-rewrite Smile) then a search engine might find both (the first by crawling, the second from your sitemap).

The problem with that is that Google (I'm not sure about the others) will then have 2 URLs to the same content: "duplicate content". That CAN and usually WILL cause problems, especially when you submit sitemap files. Google will notice that and either decide on one or the other (and throw the other one) or it will discard both (not good!) and maybe it will even mark your site as "bad" (I doubt it, not because of a few duplicate content URLs).

You should really check your site to make sure all internal links point to the rewritten URL. Otherwise you WILL run into this problem, I've seen many sites who have had to fight with something like that.

I would check the site with a crawler (CoffeeCup should do this ok, whatever you prefer) and fix all the pages that link to the "incorrect" URL. Make sure that when you crawl the site, it can only find the correct URLs! An alternative, if you can't get them all fixed - add a "rel=nofollow" tag to the link. However, if you do that, then the Search engines won't follow to your links and PR (from Google) will then not be passed on to the subpages (not a good idea, but a temporary solution until you get it fixed properly).

Good luck,
John
 
Display posts from previous:       
Post new topic   Reply to topic    Ravens PHP Scripts And Web Hosting Forum Index -> General/Other Stuff

View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum


Powered by phpBB © 2001-2007 phpBB Group
All times are GMT - 6 Hours
 
Forums ©