Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Noah
Noah
Jan 19, 2025, 10:38 AM
Forwarded from another channel:
Forwarded thread from another channel:
Dan Taylor
Dan Taylor
Nov 20, 2024, 2:34 PM
So, client decided to add:
To their Robots.txt file, as they've been trying to get Google News traffic for a couple of months... The day after adding this, Google News traffic appeared.

Prior to this, Googlebot-News wasn't blocked in any way, shape, or form, and had visited the site...

I'm trying to find any rationale other than "coincidence". Has anyone ever seen this before?
Dan Taylor
Dan Taylor
Nov 20, 2024, 2:44 PM
By a couple of months, I meant a year. Everything has been "right" from a technical POV.
That was the literal, only change, then in 2 days ~150k clicks from News... From nothing.
RP
RP
Nov 20, 2024, 2:48 PM
Interesting! I'm thinking what other bot could we add for testing purposes!
Mark Alves
Mark Alves
Nov 20, 2024, 2:51 PM
Did it have Allow: * or similar specification?
Dan Taylor
Dan Taylor
Nov 20, 2024, 2:53 PM
So, it's always had
Allow: /
and 16 named user-agents
Oh crikey that's it.
Dan Taylor
Dan Taylor
Nov 20, 2024, 2:55 PM
Oh no. Sorry thought I had found something.
So yeah, it has always had:
Allow: /
And a Disallow with * == but all Google useragents pass through because of the Allow: / rule previous (e.g. Googlebot-video works, but isn't specified)
Dan Taylor
Dan Taylor
Nov 20, 2024, 3:23 PM
Ok, update.
~3/4 days prior, in crawl stats - Googlebot "other agent type" was going mad. 17.5m requests in the 90 day window, and dropped to 5k-10k per day 3/4 days before Google News kicked up. Reading the documentation verbatim, Googlebot-news would fall under this, and this is all ~2 weeks prior to the latest core update - and anecdotally we always see big crawl spikes before hand...
So this could be... _algorithmic_? And the robots.txt a happy coincidence
Mark Alves
Mark Alves
Nov 20, 2024, 3:45 PM
Welp, was hoping for the Easy Button solution.
Dawood Ahmad
Dawood Ahmad
Nov 20, 2024, 9:44 PM
Did that happen for a blog? or a n e-commerce store?
Dan Taylor
Dan Taylor
Nov 21, 2024, 2:30 AM
Massive news aggregator
Dan Taylor
Dan Taylor
Nov 21, 2024, 3:08 AM
So that pretty much confirms it's pure coincidence?
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:24 AM
I love a robots.txt mystery tho. Can you share the full robots.txt file just to check? (DM is fine if needed)
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:24 AM
There could be some odd interplay between UAs
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:29 AM
I can share a cleansed version (which would be about 25% of it)
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:31 AM
That'd do - if we can see the UA *, UA googlebot, and UA googlebot-news bits, I'd certainly be interested
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:33 AM
Disallow: /
Sitemap: 
Sitemap: 
Sitemap:
Sitemap: 
User-agent: Googlebot
User-agent: Googlebot-News
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
User-agent: Storebot-Google
User-agent: AdsBot-Google
User-agent: AdsBot-Google-Mobile
User-agent: Bingbot
User-agent: Yandex
User-agent: Google-InspectionTool
User-agent: Google-Site-Verification
User-agent: Googlebot-Image
User-agent: Googlebot-News
User-agent: Slurp
User-agent: DuckDuckBot
User-agent: Applebot
User-agent: Naverbot
Allow: /
Disallow: /*/*/2009
Disallow: /*/*/2008
Disallow: /*/*/2007
Disallow: /*/*/2006```
Then there are around 623 rows of Disallows beneath this to specific URLs ????
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:34 AM
Oh and I removed the XML sitemap URLs, but left them in so you could see the placement
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:35 AM
Just interest, not going to cause the issues you describe: you have googlebot Image and News in there 2x
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:36 AM
Yeah. We've raised that.
Weirdly the developers seem to be able to add to it in 30-seconds, but need a full change request process and testing to remove...
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:36 AM
stupid q but the relevant content isn't to be found in the `/*/*/2009` folders etc?
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:38 AM
(I don't think that should even matter, since googlebot-news would be using the googlebot directives beforehand)
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:38 AM
No such thing as a stupid Q.
No, it's not.
The site publishes maybe 75k URLs daily (on a good day), and has URLs for content as far back as 2006. There's a weird intricacy between the website and the app - had those blockers in place for a couple of years, and its a "short term" fix to curb crawling of outdated URIs that drive nothing
Will Critchlow
Will Critchlow
Nov 21, 2024, 4:43 AM
In short, I can't see anything that would explain any change in Google News behaviour based on whether `googlebot-news` is or isn't in that list - because it'd use the googlebot entry.
Considerations
• If googlebot-news were anywhere else in the robots.txt file, there could be some conflicts / issues with adding it here
• As mentioned upthread, googlebot-news isn't actually a crawler, so the effect would at most be on inclusion in news, and not seen in crawling activity
Thanks for indulging my trip down a rabbit hole, and sorry I can't solve the mystery
Dan Taylor
Dan Taylor
Nov 21, 2024, 4:44 AM
Based on the crawl stats behavior, and correlations with updates... I just think the robots.txt input was a coincidence.
I just thought I'd share this, as it's a nice problem to have that makes no sense at all haha
Dan Taylor
Dan Taylor
Nov 21, 2024, 5:42 AM
By nice problem, I mean we're getting the News traffic in substantial volume versus before

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Discussing ideas respectfully

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.