hey all, I looked at a site that has cloudflare and thought they were suffering with indexation issues (had > 4 million pages so crawl budget was a concern of theirs).
I have a few questions:
• When I crawl the site at 0.5 urls / second with 1 thread, I get 429s after 10 pages.
• I have in theory been whitelisted, but I still run into that problem
• If Cloudflare is blocking a site from being crawled by google, how could one tell they don’t have access to Cloudflare? Does Cloudflare let you look at X number of pages and then block you?
• Once I get access what settings should I check to see if it’s blocking bots (including Google)?
• And how can one look in Logs inside Cloudflare to see activity that would indicate Googlebot being blocked?
And as a follow up, if anyone wants to write a thread on the I think my site is blocking Googlebot from crawling it how do I fix it please jump in.