Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Braden Becker
Braden Becker
Aug 28, 2023, 11:33 AM
Forwarded from another channel:
Hey everyone! ???? Excited to contribute here. Got a weird rendering question first:
Has anyone ever worked on a site where Google spent a _ton_ of time crawling an internal API? We use various API paths as part of the page rendering process, so some of this is expected. But we're trying to figure out if we can free up some of this budget, and Engineering is hesitant to block more.
Helpful context is the site is heavily JS, and _most_ indexable pages are prerendered using (stopgap for a long-term solution). Below is a sample log period. The pink section is API crawls.
Forwarded thread from another channel:
Boris Kuslitskiy
Boris Kuslitskiy
Aug 28, 2023, 11:34 AM
Does the output from the API change from crawl to crawl?
Eric Wu
Eric Wu
Aug 28, 2023, 11:44 AM
If the site is heavy JS, I’m not surprised that you have a lot of API requests
I’m assuming all those API requests are used to render the content of the page
I know you say that is caching a lot of the pages, in my experience that service isn’t reliable. I’ve had to move several sites away from that service given how buggy it is
Another thing to check is the content-type http header to make sure the API requests are coming back as JSON and not plain-text or HTML
Eric Wu
Eric Wu
Aug 28, 2023, 11:45 AM
As far as freeing up the budget … the best way is to statically render the pages and cache them … and while Google doesn’t want you to dynamically serve anymore … you just dynamically serve them to Googlebot similar to how you’re using
Dave Smart
Dave Smart
Aug 28, 2023, 11:46 AM
Perhaps worth looking in search console> crawl stats. Are these mostly in page resources or in discovery/ refresh? If resources, probably best left unblocked, and dealing with it more holistically, optimising what's loaded for users and bots alike.
If you absolutely know a API response is heavy and adds nothing to the page for Google though, that's normally fine to block.
Braden Becker
Braden Becker
Aug 28, 2023, 2:41 PM
Appreciate it, all!
They're all definitely coming back as JSON, and are all are logged as resources, not content refreshes. Had a feeling this was mostly rendering.
@e - Can you clarify RE: Prerender - are you assuming the API hits are from pages that are not successfully being cached in Prerender, so Google is hitting the API as a fallback? Would we not spend time hitting it with better cache coverage?
Boris Kuslitskiy
Boris Kuslitskiy
Aug 28, 2023, 2:42 PM
Ideally, the API hit would be done by prerender once and then the final product pushed to users by prerender.
Boris Kuslitskiy
Boris Kuslitskiy
Aug 28, 2023, 2:43 PM
Maybe prerender isn't doing its job and every user (or at least a larger percentage than expected) is rendering the page with api calls instead of being served the final product.
Justin Briggs
Justin Briggs
Aug 28, 2023, 3:50 PM
We've seen issues when APIs have a unique parameter for each session/user. For example, tracking cart item count for logged out users (`/rest/cart/count.json?_=[session_id]`). We'd end up with a handful of unique API hits for every page load, generating a huge number of "unique" URLs.
We audited each request type and disallowed the biggest offenders that didn't affect rendering.
Eric Wu
Eric Wu
Aug 28, 2023, 4:05 PM
I’ve seen Prerender have poor caching logic and also really bad timeouts
As a result, you see Google coming back to re-crawl assets or just simply being passed through to origin
Prerender from my experience has poor support and transparency on how they cache and how they monitor their timeouts. They may not even know themselves how bad the system is … but they alway respond that everything is working fine, even though it’s clearly not
Braden Becker
Braden Becker
Aug 28, 2023, 4:06 PM
Yeah we're even seeing it exceed our recache frequency, which is maddening when launching updates undefined
Eric Wu
Eric Wu
Aug 28, 2023, 4:06 PM
While most sites I see don’t have that much JSON requests in Google Crawl Stats, there are a handful where I do see anywhere from 25% to 35% of crawl spend on JSON filetype
Eric Wu
Eric Wu
Aug 28, 2023, 4:07 PM
So it’s not entirely unheard of to have a lot of JSON requests depending on how the site is structured
Braden Becker
Braden Becker
Aug 28, 2023, 4:08 PM
Ultimately we're blocked by other teams who are building a server-side solution. I'm intercepting them when I can to make sure they prioritize the right pages. Until then, any third-party alternatives you'd recommend?
Boris Kuslitskiy
Boris Kuslitskiy
Aug 28, 2023, 4:09 PM
My company went -> custom built server side rendering too. Can't offer advice on alternatives, but nobody at my company likes prerender.
Eric Wu
Eric Wu
Aug 28, 2023, 4:25 PM
I haven’t found a 3rd party that I’ve liked. When you can’t monitor the stability of the platform yourself, it’s never good
The interim solution from doing SSR, is simply to prerender yourself
It still requires dev time but might be easier depending on your stack
I tend to use to render
Cross-browser end-to-end testing for modern web apps
Fast and reliable end-to-end testing for modern web apps | Playwright
Braden Becker
Braden Becker
Aug 28, 2023, 4:53 PM
Appreciate it! undefined
ah
ah
Aug 28, 2023, 5:41 PM
Surprised this isn’t more ironed out at this point.
Ian Cappelletti
Ian Cappelletti
Aug 28, 2023, 6:15 PM
yeah get off of dynamic rendering ASAP, esp.
Geoff
Geoff
Aug 28, 2023, 9:01 PM
To Justin’s point, if you can determine the APIs that don’t impact rendering or aren’t relevant to the pages outside of prerender, you’re probably safe.
At a previous role we had a ssr for the first pageview and subsequent pageviews were client side rendered. We had googlebot spending a ton of resources on api urls. We were able to block these without any negative impact.
Braden Becker
Braden Becker
Aug 29, 2023, 1:38 PM
> _At a previous role we had a ssr for the first pageview and subsequent pageviews were client side rendered._
Yeah I've heard that ^ approach before - definitely appealing if we're blocked on full SSR coverage.
I'm 99% sure every API path is a JS resource right now, but I triple check every time crawls spike.

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Treating each other with respect

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.