Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Noah
Noah
Jan 19, 2025, 10:58 AM
Forwarded from another channel:
Forwarded thread from another channel:
Joe
Joe
Jan 16, 2025, 2:34 PM
Howdy. Long time listener, first time caller. In the last couple of months, has anyone noticed their site is being crawled to a massive degree? For us, normal is around 15k daily crawls. We are now at 292k on January 9th. At first, it appeared like general spam trying to target a WP search endpoint, but lately, we've seen a ludicrous amount of traffic hitting just 2-3 pages and showing up in GA4.
Could be unrelated, but we also see a ton of Google crawling hitting 3 different pages as direct traffic in GA4 with a screen resolution of 412x732. Checked with internal engineering teams and they're not using it. (Cloudflare logs strongly point to a Google Cloud endpoint with Vertex AI in the User Agent -- not the official Googlebot).
Does this look typical? Proper way to mitigate this?
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:45 PM
That is an official Google bot
Google crawling your site is going to fluctuate based on various factors. In my experience, when there are a lot of non-200 status codes, I see a drop in crawling, especially if they are 5xx or some 4xx responses like 403 and 429 - they don't want to overburden a system, so those status codes will cause the crawlers to slow down.
If you can validate that the IP and Bot type are truly Google which if you are seeing this in the crawl stats, are likely to be Google there isn't anything to worry about or mitigate.
Here is a site I work on just for reference
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:46 PM
Was there a recent update or news story about a product or your brand? I've also seen sudden increases to the crawl stats when there was something in the news that caused more interest than usual.
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:49 PM
If you want to keep more data than the 90 days you get from GSC, there is this plugin that lets you export and store the data so you can keep a longer history.
a helper to efficiently download crawl data from GSC
Joe
Joe
Jan 16, 2025, 2:49 PM
It shot up from about 15k daily crawls to well over 200k. No major news or logical explanation for the jump. And the crawling activity is hitting 2-3 URLs in GA4, it _never_ did this until mid December. Orders of magnitude larger than typical direct traffic.
Joe
Joe
Jan 16, 2025, 2:51 PM
It's only hitting a support directory, no other pages. Just very odd for an enterprise crawler, right?
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:51 PM
Maybe a little, but there is something there - you're saying that in GA4 it is showing as direct traffic as well?
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:52 PM
Other than the spikes, those charts don't look out of the ordinary
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:54 PM
Is the support section a sub-domain?
Joe
Joe
Jan 16, 2025, 2:54 PM
Absolutely. That's what prompted my research into this. Here's a snapshot of the last 90 days, filtered to the same details.
Funny thing, I was just doing general research into screen resolutions and stumbled into this.
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:55 PM
The traffic to that path will cause crawling to increase for sure - I've seen this on sites before
Joe
Joe
Jan 16, 2025, 2:55 PM
Not seen, but the hostname is filtered in as well. No subdomains.
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:56 PM
GSC isn't where I'd focus, I'd try to see where/why direct for that path is increasing which will be a PITA since direct is pretty much a catch-all
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:56 PM
I'd post this in <#C05HC70HNLF|>
Joe
Joe
Jan 16, 2025, 2:57 PM
Our Cloudflare enterprise logs show this is legit traffic hitting those URLs.
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:57 PM
for sure, but it won't be Google Bot
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:57 PM
I've never seen bots count as users, but maybe
Joe
Joe
Jan 16, 2025, 2:58 PM
I'm less familiar here, is there an escalation path to have someone at Google Cloud (it comes out of their ASN) research this?
Joe
Joe
Jan 16, 2025, 2:58 PM
Could totally be two separate problems, but when I see the traffic patterns, it does look a little sus.
Shawn Huber
Shawn Huber
Jan 16, 2025, 2:59 PM
great question on the escalation path, we don't have a cloud channel here to ask the hive mind - my head though is on the <#C05HC70HNLF|> channel and see what the smarter folks there can help you with Niko, Dana, and Brie are all wicked smart and might be able to point you in a better direction
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:00 PM
to my knowledge, bots don't show in GA4 so I don't believe that is what is causing the direct traffic to spike
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:01 PM
but like I mentioned, I have seen a spike in traffic to a site cause a sudden spike in Google crawling those URls
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:01 PM
that lead me to an interesting idea of adding an internal linking widget to pages that see sudden traffic to get Google to crawl and spider out to more URLs from that crawl session
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:01 PM
Which was implemented and does work really nicely
Joe
Joe
Jan 16, 2025, 3:02 PM
Thank you, it is amazing to ask smarter minds than myself on this. I'll share my question there and see. But I also know a lot of GSC updates were made recently, who knows if it's a bug.
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:03 PM
fair point, but I haven't seen enough from the sites I have access to that makes me think there is an issue
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:03 PM
They are supposed to be two independent tools/platforms......
Shawn Huber
Shawn Huber
Jan 16, 2025, 3:04 PM
If you're lucky @John Mueller might pop in and offer some thoughts on this
John Mueller
John Mueller
Jan 16, 2025, 3:10 PM
The genai crawlers are sometimes a bit excentric. If it's too much load, send it 503's. If it's just too wild, use the form on the bottom of
Joe
Joe
Jan 16, 2025, 3:13 PM
The load isn't a concern, thankfully, the crawler is just hammering about 3 URLs and executing JS (thus, the GA4 traffic). Wouldn't Google exempt its own crawlers from GA4 visits?
That form is a bit daunting. We still wanna be crawled and indexed to a healthy degree.
Victor M Pan
Victor M Pan
Jan 17, 2025, 7:39 AM
I have the opposite problem btw where our security team sees a rise in bot activity and IP Protection/threat scores from Cloudflare group Googlebot as "malicious bots" - won't be the first time a CDN and/or humans made a change and a CDN treats a search engine bot as malicious, so you're good!
1. If goodbots are looking at resources that just really don't matter, remember you have robots.txt in your arsenal.
2. You have Cloudflare enterprise logs - do you have access to cloudflare analytics as well? It'll help you catch scary things like search engine bots that get 403's (usually the sign of a bot challenge) - it'll also help with #1 though technically that's not your job (identifying malicious bot traffic and blocking them - I know it's annoying that sometimes it escapes into analytics)
3. Every now and then we'll get a crawler that is actually Googlebot but not on their list of ranges - but it's a great idea to just allowlist all these Google IP's. CDN's really should be giving 5xx or 429 status codes when they're overloaded, but they don't always do that (something I'm working on actually)
Hope that helps!

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Discussing ideas respectfully

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.