Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Rick
Rick
Mar 17, 2025, 11:46 AM
Forwarded from another channel:
GSC is showing me a bunch of noncanonical URLs with hashbangs appendaged to them. I gave this to engineering, and they said they found and fixed it. That was 2-3 weeks ago, and noncanonical hashbang URLs are still coming in to GSC. I've tried validating fox for these twice over 3-ish weeks, and the fix does not validate. URL inspection does not offer a referrer.
Removing Google’s ability to crawl hashbang URLs will decrease the number of URLs it's crawling by about 33%. Since engineering can't find any referrers to these pages, I'm considering these two scenarios...
1. GSC data is stale, and these are going to show up for a while
2. These are in the site somewhere, and we can resolve this with a permanent redirect until engineering can find the bug.
Any help here will be much appreciated.
Forwarded thread from another channel:
Henning
Henning
Mar 17, 2025, 2:51 PM
Have you crawled the website with a tool like ScreamingFrog? Sometimes such tools find links that the DEVs do not expect.
Google is very curious about links. Sometimes Google finds structures in JS or JSON files that they think are URLs.
Rick
Rick
Mar 17, 2025, 2:54 PM
Yes. I have crawled the site with a tool like ScreamingFrog. Believe it or not, that was the first thing I did. I also looked at server logs, and no hashbangs there either.
Tom Gregan
Tom Gregan
Mar 17, 2025, 3:47 PM
I expect these are just queued URLs being visited.
When you click on one of the URLs it will show you when they discovered the URL, if you don't have any discovery dates after the date of the release fixing the issue then you can be confident it will eventually end.
If it is simply queued URLs, there's not much you can do but wait.
Rick
Rick
Mar 17, 2025, 3:48 PM
Yeah...checked that too. Thanks.
Tom Gregan
Tom Gregan
Mar 17, 2025, 3:49 PM
Does it not provide a first seen or discovered on date?
Rick
Rick
Mar 17, 2025, 3:53 PM
Although...this is the first time I noticed that it says indexing is allowed, even though the page is noncanonical.
Tom Gregan
Tom Gregan
Mar 17, 2025, 4:01 PM
Only other idea is to disallow the hash in the robots.txt file.
Rick
Rick
Mar 17, 2025, 4:02 PM
Can a hash be disallowed successfully?
Tom Gregan
Tom Gregan
Mar 17, 2025, 4:03 PM
Think so lol. I can throw it up on my site and test it if you want?
Rick
Rick
Mar 17, 2025, 4:03 PM
Fuck yeah bro. That would be super helpful.
Tom Gregan
Tom Gregan
Mar 17, 2025, 4:04 PM
For the record, I do think it's fine having these as urls crawled. It's wasteful but I don't think you'll be scored down for it.
Umer Sohail
Umer Sohail
Mar 18, 2025, 5:13 AM
Aren't hashtags already set in a way that google only reads the path before it?
Since Google doesn't read anything after the # sign, should we really bother worrying about the URLs having canonicals?
If it were a '?' making an additional string, then canonicalization would have made sense.
What do you think?
Umer Sohail
Umer Sohail
Mar 18, 2025, 5:16 AM
Also, google might not necessarily follow the disallow directive. It can generally still index pages that it finds through other links. Doesn't apply to URL versions after the hashbang!
Dave Smart
Dave Smart
Mar 18, 2025, 5:45 AM
#! is an awkward one, hangover to old javascript crawling days,
robots.txt can't block them,
effectively is the same as
Dave Smart
Dave Smart
Mar 18, 2025, 5:49 AM
> this is the first time I noticed that it says indexing is allowed, even though the page is noncanonical.
That's kinda standard, any `Page is not indexed: Alternative page with proper canonical tag` will report that way, and it looks like that in search console.
Given canonicals seem to be respected, I'd probably not worry too much about it, given the lack of real control otherwise you have, and that canonicals seem to be doing the job.
I'd suspect. like you, that this is something historical surfacing for a speculative check to see what's happening these days with those URLs, or perhaps external sites, but lack of referrer on any would point me to the former.
Dave Smart
Dave Smart
Mar 18, 2025, 5:53 AM
@umerseo #! is a little different from other # URLs, It's leftover from the old Ajax crawling scheme Google used to support:
There's occasionally a few hangovers for some sites.
Umer Sohail
Umer Sohail
Mar 18, 2025, 6:18 AM
Oh, I didn't know that.
Tom Gregan
Tom Gregan
Mar 18, 2025, 7:05 AM
I can confirm that
Effectively is the same as just "/" - it inline comments out everything after the hash I imagine.
Good spot @dave297
Rick
Rick
Mar 18, 2025, 11:34 AM
@dave297 This is what I'm trying to solve. These hashbang URLs are about 1/3 of nonindexable pages crawled. I want to shorten the threshold of gray to green. Also, my total crawl requests are dropping at a similar clip.
Dave Smart
Dave Smart
Mar 18, 2025, 11:49 AM
I hear you, but there's very little you can actually do here I.M.H.O.
You can't server side block or redirect these as the the # and ! don't get passed as part of a network request
You can't robots.txt, firstly for the reasons of it being parsed, but again, they would request the URL without the #!
So the two realistic options really are:
1. Rely on canonicals, like you're currently doing, and live with the reporting
2. You could, at a push, use client-side JavaScript redirects, but I think you'd be worse off in terms of it consuming crawl budget (it have to crawl this, render, AND then redirect), all to achieve what would in the end of the day amount to moving them from one reason they aren't indexed to a different reason in the report.
Rick
Rick
Mar 18, 2025, 11:53 AM
That's a shame. Thanks a lot for the reply.
Scott Dodge
Scott Dodge
Mar 18, 2025, 4:05 PM
I'm also seeing a similar thing on one of our clients that used to use HTML snapshots (back before SSR was a thing).
GSC is reporting these URLs as ranking in search results and getting clicks, so I'm concerned it's more than just a nuanced crawl efficiency issue. Confirmed they aren't linked to anywhere on the site.
Tom Gregan
Tom Gregan
Mar 18, 2025, 4:51 PM
@scott.t.dodge, can you provide an example of what is the issue here? Is it hashbang'ed?

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Treating each other with respect

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.