| Author |
Message |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Thu Dec 14, 2006 8:11 am |
|
Some time back my web server provider messed up my site, now i'm stuck with loads of links looking like
| Code: | backend.php?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f
forums.html?PHPSESSID=c6ee420bed2deb5b047ac722e46a440f |
in the google search...and probably the other engines aswell.
Using Redirect 301 this should take care of this, but i don't know how to remove the PHPSESSID from ALL cached links in my site in one.
A "normal" way to implement a 301 is with
| Code: | | RedirectMatch 301 /someoldlink-([0-9]*).html http://www.domain.com/goodlink-$1.html |
...right? Ok, now this won't work when a ? is involved in the link, so i figured i have to use a format similar to
| Code: | RewriteCond %{QUERY_STRING} (name=Forums&file=index) [NC]
RewriteRule ^.*$ /forums.html [R=301,L] |
...but how should the format be for this to work properly for any cached link containing PHPSESSID=(number) without messing up anything else? |
|
|
|
 |
montego Site Admin

Joined: Aug 29, 2004 Posts: 9135 Location: Arizona
|
Posted:
Fri Dec 15, 2006 6:59 am |
|
Maybe try something like this:
RewriteRule ^([a-zA-Z0-9._- ]*)?PHPSESSID=([a-z0-9]*)$ $1 [R=301,L]
It looked like you are using some form of URL rewriting already (such as: GoogleTap, GTNG, ShortLInks). You may need to add more characters to the first () pair. You may have to review your cached links to see if all of them will be covered by this. |
|
|
|
 |
evaders99 Former Moderator in Good Standing

Joined: Apr 30, 2004 Posts: 3221
|
Posted:
Fri Dec 15, 2006 9:05 am |
|
I don't think RewriteRule works with parameters. I think you're going to have to use RewriteCond on QUERY_STRING - that has never really worked for me, so I use THE_REQUEST |
|
|
|
 |
hitwalker Sells PC To Pay For Divorce

Joined: Posts: 5661
|
Posted:
Fri Dec 15, 2006 9:11 am |
|
well let me be the messenger of bad news.....
as i (together with a friend) build a new site that comes with new links as well google is now replacing the old with the new...
that took 5 months.....
yeah google is a lazy engine..
 |
|
|
|
 |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Fri Dec 15, 2006 10:06 am |
|
Yes, im using the ShortLinks-mod...which of course does the job very well
Initial problem with the phpsessid was this post:
...now, i solved that problem, however google did pick up quite a few of those links.
I'll go google for THE_REQUEST, see if any already have a solution, thanks for suggesting.
Oh, lol...5 months..? I have three years old links in google from before i started out with phpnuke, google still belive pages are are around  |
|
|
|
 |
hitwalker Sells PC To Pay For Divorce

Joined: Posts: 5661
|
Posted:
Fri Dec 15, 2006 10:12 am |
|
yeah google is terrible... |
|
|
|
 |
montego Site Admin

Joined: Aug 29, 2004 Posts: 9135 Location: Arizona
|
Posted:
Sat Dec 16, 2006 6:18 am |
|
Well, Evaders, I am going to have to research that because I could have sworn it worked for me. I even tested it locally and reviewed the Apache access logs to see what error codes were returned, etc. But, one thing I did not do, was actually see if I could still find the "offending links" in Google's cache.  |
|
|
|
 |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Sun Dec 17, 2006 9:07 am |
|
| Code: | RewriteCond %{QUERY_STRING} ^phpsessid=.*$ [NC]
RewriteRule .* %{REQUEST_URI}? [R=301,L] |
Don't ask me what it actually does, but it does remove the crap from inbound google links! Let's hope i didn't break something else
Now i'm eager to see if they are gone in the google cache in a month or two!
Btw, how will this addy in robots.txt work?
| Code: | | Disallow: /*phpsessid |
I was thinking, it can't hurt having the block in robots.txt, right..? Or will it work the other way, and refuse to even go to my redirect because of the block in robots.txt? |
|
|
|
 |
montego Site Admin

Joined: Aug 29, 2004 Posts: 9135 Location: Arizona
|
Posted:
Mon Dec 18, 2006 5:46 am |
|
| Quote: |
Now i'm eager to see if they are gone in the google cache in a month or two!
|
Me too.
Regarding robots.txt, I don't know enough about it to know if it can block query strings... |
|
|
|
 |
hitwalker Sells PC To Pay For Divorce

Joined: Posts: 5661
|
Posted:
Mon Dec 18, 2006 6:35 am |
|
How about ...
// See if the user agent is Googlebot
$isGoogle = stripos($_SERVER['HTTP_USER_AGENT'], 'Googlebot');
// If it is, use ini_set to only allow cookies for the session variable
if ($isGoogle !== false) {
ini_set('session.use_only_cookies', '1');
} |
|
|
|
 |
hitwalker Sells PC To Pay For Divorce

Joined: Posts: 5661
|
Posted:
Mon Dec 18, 2006 6:46 am |
|
And...
Google’s Hidden Protocol
Google’s URL removal page contains a little bit of handy information that’s not found on their webmaster info pages where it should be.
Google supports the use of “wildcards” in robots.txt files.
This isn’t part of the original 1994 robots.txt protocol, and as far as I know, is not supported by other search engines.
To make it work, you need to add a separate section for Googlebot in your robots.txt file.
An example:
User-agent: Googlebot
Disallow: /*sort=
This would stop Googlebot from reading any URL that included the string “sort=” no matter where that string occurs in the URL.
So if you have a shopping cart, and use a variable called “sort” in some URLs, you can stop Googlebot from reading the sorted (but basically duplicate) content that your site produces for users.
Every search engine should support this. It would make real life a lot easier for folks with dynamic sites, and artificial life a lot easier for spiders.
So you could easely use "phpsessid" |
|
|
|
 |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Mon Dec 18, 2006 7:48 am |
|
Hm, thanks Hitwalker, good information.
I did a bit of a research based on this and found these:
Basically it says that wildcards ARE supported (and others) by at least the three bigger engines as Google, Yahoo and MSN. Guess your wish just came trough (happy x-mas, lol)
...shopping cart..? Yeah, and what about reviews, web links and calendar modules? Lol, they suck...now we know how to actually stop them from indexing 12.000 pages when they should only do 100.
Worst part seems to get rid of the duplicates already indexed showing up as supplemental results.
When the times come, and this hopefully works, i really really hope someone (that means Montego ) implement this into a mod_rewrite package...and i'm more than willing to help out as best as i can even if my knowledge in php and programming is rather limited. |
|
|
|
 |
hitwalker Sells PC To Pay For Divorce

Joined: Posts: 5661
|
Posted:
Mon Dec 18, 2006 7:56 am |
|
nobody said this is easy but fact is,google maybe good but its world nr 1 lazy engine.
others like yahoo are much faster in updating.....
ive seen links of sites that were closed 6 months ago but still exist in google...
but thats what you get when you gamble on one stupid lazy horse...  |
|
|
|
 |
montego Site Admin

Joined: Aug 29, 2004 Posts: 9135 Location: Arizona
|
Posted:
Tue Dec 19, 2006 6:25 am |
|
| Quote: |
When the times come, and this hopefully works, i really really hope someone (that means Montego ) implement this into a mod_rewrite package
|
Not quite sure what you are looking for? The problem statement being addressed in this thread is around an by your hosting company where they forced all the URL's to show the session var/id. You are the only one that I have heard of this happening too.
The rest of the thread is devoted to trying to get these "bogus" URL's removed from the search engine cache. Again, all due to this one issue.
I'd be glad to discuss specific on any enhancements that you might like to see in ShortLinks. Just add them to my forum and we'll talk through them. |
|
|
|
 |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Tue Dec 19, 2006 8:24 am |
|
...yah, fully aware of i went off-topic somewhere up there, it should have been separate threads. Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.
Anyways, i feel like continue this on your forum, is probably just as good, since all this actually matters the way we rewrite links and related to it. |
|
|
|
 |
montego Site Admin

Joined: Aug 29, 2004 Posts: 9135 Location: Arizona
|
Posted:
Wed Dec 20, 2006 7:49 am |
|
| Quote: |
Getting rid of duplicates is a general topic, all who implements any mod_rewrite package will suffer from it.
|
According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content.
I don't mind talking about the duplicate content issue here on Raven's site. You had mentioned a possible enhancement for ShortLinks and so that is why I suggested discussing that specifically over at my site. No problem either way. Only that discussing it here on Raven's site will get more traffic and more people to weigh in. |
|
|
|
 |
NoFantasy Worker


Joined: Apr 26, 2005 Posts: 114
|
Posted:
Wed Dec 20, 2006 8:22 pm |
|
| montego wrote: | | According to what you have posted even on my site, even without mod rewriting of URL's, you are proposing that there is still an issue with duplicate content. |
Yes, indeed...with or without rewritten links, it will create duplicates like a mad man, so it's not an issue that comes from ShortLinks, it's an issue in general. |
|
|
|
 |
|
|
|
|