Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Noah
Noah
Jan 19, 2025, 11:18 AM
Forwarded from another channel:
Forwarded thread from another channel:
Mason Nelson
Mason Nelson
Dec 17, 2024, 3:10 PM
Hey All, for sites that block ScreamingFrog, what is the best work around?
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:11 PM
I've had some success with custom user agents and VPNs
If you know them, and they know you're crawling - they can put your UA on the approved list so that you don't get blocked.
If they don't - then you'll need to get creative
Richard Barrett
Richard Barrett
Dec 17, 2024, 3:12 PM
–%20you%20can%20use%20an,agent%20to%20get%20around%20it.&text=The%20SEO%20Spider%20will%20then,other%20bots%20will%20remain%20blocked.
Find out how to crawl a staging or development website, considering robots.txt, authentication, and the SEO Spider configuration.
Screaming Frog: How To Crawl A Staging Website
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:13 PM
What is the status code you're receiving when you crawl?
Mason Nelson
Mason Nelson
Dec 17, 2024, 3:14 PM
@Shawn Huber 403 Forbidden Non-Indexable Client Error
Richard Barrett
Richard Barrett
Dec 17, 2024, 3:14 PM
Could be blocking the user agent or your IP
Mason Nelson
Mason Nelson
Dec 17, 2024, 3:15 PM
I've VPN'd and it hasn't solved it so I imagine its blocking user agent, right? Is there a way to bypass?
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:15 PM
Is this your site or a competitor's site?
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:16 PM
You can try something like this
MasonBot/1.0 (compatible; ScreamingFrogSEO/21.0; +<https://www.masonnelson.com/bot>)
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:16 PM
though ScreamingFrog might be a trigger so not sure
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:17 PM
If your own, getting on a allow/whitelist is the fix and usually the CDN is doing the blocking. If it's a competitor and just a few URLs I like to have Google get the information I need from those URLs
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:17 PM
There is an amount of trial and error when it comes to getting around sites that block
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:18 PM
This could be via google suite products - and no you wouldn't be using SF at this point
Mason Nelson
Mason Nelson
Dec 17, 2024, 3:18 PM
yea, its a site that I don't have access to.
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:18 PM
Some sites will quickly find your IP/location and 403 you very quickly, I've even been blocked from view those sites via Chrome
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:19 PM
Though you could haha but ask someone else who has had to do this ????
Samuel Lavoie
Samuel Lavoie
Dec 17, 2024, 3:19 PM
Keep in mind that you might be in some legal trouble if the website is not your own and they actively blocking your IP or user-agent. At least be gentle.
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:19 PM
VPN solves this, but again, only for a short amount of time.
Then your crawl speed also plays a factor in how quickly they find you
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:20 PM
Just depends on how strict they configure their CDN/security to prevent DDOS, or other nefarious activities
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:20 PM
Most common problem folx have with cdn's is that 429 - change the max urls/thread and you'll usually find a sweet spot
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:21 PM
Yup, that's the most common - one site did 403 me though after a while lol
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:21 PM
That's the one that I can't even view their site in Chrome
Mason Nelson
Mason Nelson
Dec 17, 2024, 3:21 PM
haha yea, not worth heavy lifts or legal trouble- just a interesting scenario that I ran i've run into and figured you guys know work arounds. You guys are smart, thank you
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:21 PM
If you want a poor person's crawl you might be able to 301 redirect to a full copy of their site from your site and then submit an XML sitemap on some weird subdomain for your project
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:22 PM
Again don't know what you need your SF crawl for
Shawn Huber
Shawn Huber
Dec 17, 2024, 3:22 PM
Happy to help try things if you want to DM the site
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:23 PM
If it's just a URL or two, will render single pages no problem
Victor M Pan
Victor M Pan
Dec 17, 2024, 3:23 PM
Sometimes~ bypassing different walls

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Discussing ideas respectfully

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.