Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Noah
Noah
Jan 19, 2025, 11:11 AM
Forwarded from another channel:
Forwarded thread from another channel:
Darren Shelley
Darren Shelley
Oct 31, 2024, 3:31 AM
Good Morning All,
Desperate for a bit of help I believe you technical SEO experts may be my only hope.
Has anyone ever experienced Googlebot stubbornly only crawling their site in `HTTP/1.1`? ()
I've searched the channel/slack and it doesn't appear to have ever come up.
My site is built in next.js and has a lot of javascript chunks so `HTTP/2` is much faster!
All other bots (e.g bing are crawling us on at least `HTTP/2`+)
Google crawling via `HTTP/1.1` means I’m getting `average response time (ms)` of `776ms` versus my other sites which are `91ms` so my crawl budget doesn’t stretch as fast as it should.
Google crawls us a minimum of 25k times a day. In the last 6 months they crawled the homepage just once over `HTTP/2` and got a 200 response
My is served by cloudflare, is https only, has hsts header, supports `http/2` and `http/3` and has a min TLS version of 1.2
I've raised a ticket via the form and had nothing nothing back from that.
If i perform a "test live URL" in Google Search Console, the resulting "more info tab" () is also over `http/1.1`
Any and all advice welcome and greatly appreciated ????
Andy Drinkwater
Andy Drinkwater
Oct 31, 2024, 3:36 AM
Has it always done this or has it only started recently?
Darren Shelley
Darren Shelley
Oct 31, 2024, 3:40 AM
It's always crawled on HTTP1.1
We've been on cloudflare since 2016, so have supported http/2 for 8 years and http/3 for 3 years.
Reading implies they choose to switch and have never chosen.
In our evaluations we found little to no benefit for certain sites (for example, those with very low qps) when crawling over h2. Therefore we have decided to switch crawling to h2 only when there's clear benefit for the site. We'll continue to evaluate the performance gains and may change our criteria for switching in the future.```
Darren Shelley
Darren Shelley
Oct 31, 2024, 3:40 AM
We support it but appears no route to encourage it
Richard Gargan
Richard Gargan
Oct 31, 2024, 3:40 AM
I have nothing helpful to add other than this gif in reply to “you technical SEO experts may be my only hope” ????
Andy Drinkwater
Andy Drinkwater
Oct 31, 2024, 3:42 AM
I assume you have checked all the basics? No issues with indexation? Nothing else pointing towards issues in Search Console? Is it a standalone server with just this site on it?
Darren Shelley
Darren Shelley
Oct 31, 2024, 3:46 AM
Thanks @Andy Drinkwater I appreciate the questions ????
The Next.js code which runs the pages is served in Cloudflare Pages on the edge so it looks like this:
Cloudflare > Cloudflare workers > Cloudflare Pages
Cloudflare itself proxies it all through their DNS so it's not like we're getting marked down by someone else/something else on the same server
But it also means no scalability issues, so it's not like Google is hitting anything like 429 or 5XX errors.
We're fairly well indexed and well crawled, Googlebot just takes longer to do it because HTTP1.1 limits the number of concurrent downloads from the same URL. which in turn restricts the number of the pages Googlebot can crawl within a given timeframe.
there are no warnings so to speak in Google Search Console.
Andy Drinkwater
Andy Drinkwater
Oct 31, 2024, 3:51 AM
Not seen many (if any) issues quite like this without there being a rogue directive or clear reason for it, which makes it a bit strange. Have you done a Screaming Frog crawl to look for clues?
Darren Shelley
Darren Shelley
Oct 31, 2024, 3:58 AM
We've run several but it never hurts to go back to basics and re-run, so will do a fresh one today just incase.
The only thing I can presume is:
The site been around 10 years
Up to 6 months ago, the site potentially would have performed as efficiently in HTTP/1.1 as HTTP/2 so Google would have adjudicated "why bother"
(we've spent a lot of time this year making the site faster)
And despite 'refresh' and index updates of pages which should/do load faster in HTTP/2 the bot does not routinely check whether the site should be moved to HTTP/2 crawling.
Like you're indicating my hope is we've missed a configuration change OR theres an issue that we can fix which instantly flips it, my fear is even if it was flawless we have to wait for some arbitrary point in the future where it might change.
Darren Shelley
Darren Shelley
Oct 31, 2024, 4:10 AM
For performance reasons the site (which is built using next.js framework)
Has javascript (to handle things like analytics, google autocomplete) which is split into as many chunks as possible
That’s done to improve page speed of those who support http/2+ (most people) as with http/2 those can be loaded concurrently
It’s my understanding that HTTP1.1 is limited to downloading 6-8 of these at a time.
The site attempts to preload many of these as possible and I believe it having to grab a bunch and then go back for the next bunch is a contributing factor to why:
• googlebot average response time is bad
• using tools like gtmetrix/lighthouse to check site performance/ TTFB is good
Dave Smart
Dave Smart
Oct 31, 2024, 4:44 AM
Re the ttfb ave. Response time in search console != TTFB.
The gsc metric is the time it takes to get the resource, so you could think of it as time to last byte. It's not unusual therefore for it to be longer. (Often mitigated by the fact googlebot has a hella good broadband ????)
For the http/2 v 1.1 thing, the way the network layer works in the crawling / rendering pipeline is very different to the network layer in your browser, they are tuned for different things.
A browser is concerned with showing you a webpage as quickly as possible.
Googlebot needs to get stuff at the scale of the Internet in a efficient way.
So pages and resources are queued and crawled, not necessarily at the same time, like they would be in a browser. They may have some resources cached, they need to check if it's blocked by robots.txt
So the performance http/2 brings in multiplexing isn't really relevant for making A page crawled, rendered and indexed faster.
But what they can and do seem to do looking at logs is use http/2 to grab a few, possibly even unrelated URLs (a page, a resource needed elsewhere etc.) that are in the queue with one connection.
But http/2 does have processing overheads, both for the site being accessed and the client fetching, so there's a balancing act as to which is actually the most efficient for a given site. And for something that doesn't need to be very frequently crawled, or is very cacheable, http/1.1 might well remain the best method.
I've never seen any link between http/2 and better ranking, and as cwv aren't coming from googlebot, there's no implementation as to how performant Google would see your site.
Caveat: this is just what I've picked up from talks / docs / observation over time,
Darren Shelley
Darren Shelley
Oct 31, 2024, 5:01 AM
Thanks @Dave Smart ????
Appeciate you taking the time to do that writeup.
Those sounds like some great insights into what makes up search console response time.
That understanding will certainly help me interrogate better the differences between my average response time (ms) of 776ms versus my other sites which are 91ms
HTTP1.1 vs HTTP2 seemed like the most apparent difference there but I will check for others.
I'm not anticipating/targetting better ranking. My goal at the moment is to identify why average response time is so slow for Google Bot 2.1 mobile in San Fran versus similar sites.
Andy Drinkwater
Andy Drinkwater
Oct 31, 2024, 5:02 AM
Keep us posted Darren - happy to have a look over it myself if you want a fresh pair of eyes.
Dave Smart
Dave Smart
Oct 31, 2024, 5:55 AM
@Darren Shelley what sort of resources are having bigger responses times than you'd expect? Is it html? JavaScript? Everything?
Do you have logging your side you can check TTFB on and see what sort of TTFB vs. ave. Response time you're getting for different types of resources, and if there's a huge disparity across everything?
Darren Shelley
Darren Shelley
Oct 31, 2024, 8:04 AM
Seems to be very heavy on the HTML side of things, looking at the breakdown on GSC
• HTML average from gsc: 904ms
• Image average from gsc: 470ms
• JS average from gsc: 70ms
• CSS average from gsc: 119ms
We test TTFB via tooling like gtmetrix
TTFB : 84ms
TTFB : 47ms
Thanks for your help. will do some more exploration + see if i can optimise some more for http1.1 given thats what google is deciding to crawl us on.
Darren Shelley
Darren Shelley
Oct 31, 2024, 8:29 AM
nominative determinism ????
Darren Shelley
Darren Shelley
Oct 31, 2024, 8:32 AM
Thanks @John Mueller very helpful advice
We're London, UK based and have a bias to often check our page speed/validate our page speed there (where we're getting lighthouse scores of 98/99)
Our use of CDN can blindside us to our performance to a bot in San fran
I ran a test from San fran gtmetrix + saw a total blocking time eratic between 364ms and 1.1s (london was 55ms), so theres evidently something wrong there which I can start to explore. potentially a 3rd party that does not respond well to San fancisco
Darren Shelley
Darren Shelley
Oct 31, 2024, 8:33 AM
It's good to know the http1 vs 2 is not the be all and end all and likely result of a problem itself
Darren Shelley
Darren Shelley
Oct 31, 2024, 9:18 AM
Thanks @John Mueller really appreciate the time.
Yes we're using next.js on the edge within cloudflare pages which could account for some fluctuation which I'll explore.
Correct I was discussing addtoevent, thank you for taking a gander. Good to hear fetch time is down + requests are up from your visibility.
Some strong angles for me to explore and you've removed some red herrings for me ????
Mika Lepistö
Mika Lepistö
Oct 31, 2024, 2:05 PM
Biologically speaking, I suspect Dave was Smart before they named him Dave.
Dave Smart
Dave Smart
Nov 1, 2024, 7:35 AM
I'm going to print this thread out so my mum can stick it on the fridge.
Dave Smart
Dave Smart
Nov 1, 2024, 7:42 AM
> Yes we're using next.js on the edge within cloudflare pages
There can be issues around cold start times that can bring latency, i.e. if no-one has accessed from a given region, it goes to sleep and needs to start up again once someone tries to access.
They can be tricky to spot and optimise for.
Luke Marsh
Luke Marsh
Nov 8, 2024, 2:49 AM
@Dave Smart Its very interesting what you mentioned here around cold start times. I thought Cloudflare Workers would help reduce that.
I wonder if setting Edge cache rules in Cloudflare would help here?
Dave Smart
Dave Smart
Nov 8, 2024, 2:56 AM
I do believe workers are better here, could be a red herring, just something to check
> I wonder if setting Edge cache rules in Cloudflare would help here?
Possible, I guess it depends how much is dynamic vs. static and cachable
Luke Marsh
Luke Marsh
Nov 28, 2024, 9:21 AM
Hey all - I thought I'd come back to this as we are starting to dig a little deeper into caching on the edge. We've enabled this on some of our pages and have started to make requests and essentially "warm" the cache. Is this a recommended strategy? Has anyone done this effectively before?

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Discussing ideas respectfully

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.