Cookie Consent by Free Privacy Policy Generator

The Best Of

Go to the Best Of the SEO Community.

Noah
Noah
Jan 19, 2025, 11:16 AM
Forwarded from another channel:
Forwarded thread from another channel:
RP
RP
Dec 4, 2024, 8:36 AM
Please educate me on bot traffic and how to stop being hit so much, we do not use cloudflare, rules are done in-house. How to stop/restrict malicious bots and ones which do not respect crawl delay? Cloudflare enterprise level too expensive we assume.
RP
RP
Dec 4, 2024, 8:42 AM
For many months we get hit by bots, our devs use code and server rules to mitigate but still lots of bad hits get through and grind the site to a halt.
I've been asked to find: 1. Does OpenAi have a crawl control setting similar to Bing Webmaster tools (I believe not) and 2. Is there any SEO tactic to assist with these "attacks".
If they are genuine bots we can disallow in robots.txt but it appears that it is the non genuine that cause the problem.
I suggested cloudflare but apparently we would need enterprise level according to our head dev and costs are way out of budget.
I do not have access to the log files. I was going to suggest screaming frog analyser, cloudflare, sucuri.
I've found the list of OpenAI bots and their IP addresses and also Amazonbot as that was mentioned also along with Alexa. But research leads me to think that Amazon Alexa bot is retired.
Any help or pointers would be much appreciated.
Usually we get 5000 "hits" per day, last night was almost 60000
Anne Hennegar
Anne Hennegar
Dec 4, 2024, 8:43 AM
My host now includes CF but when I used the $20/month plan I could set WAF rules. And I think they implemented something for AI bots recently.
Anne Hennegar
Anne Hennegar
Dec 4, 2024, 8:44 AM
To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to block all AI bots. It’s available for all customers, including those on our free tier.
The Cloudflare Blog: Declare your AIndependence: block AI bots, scrapers and crawlers with a single click
Mika Lepistö
Mika Lepistö
Dec 4, 2024, 9:00 AM
You don't need enterprise for most of CF bot protection.
Also, something is better than nothing. I wonder if the dev mindset is focused on perfection?
Youre on. MS server right? There has to be a local firewall product available. However it would still have to eval the requests at the server so doesn't have the benefit of reducing server load or stopping malicious activity at the edge. And I can definitely see it costing well more than the CF pro plan.
Ultimately this isn't primarily an SEO responsibility to find a solution for. It's just SEO responsibility to make sure the solution doesn't negatively affect marketing.
RP
RP
Dec 4, 2024, 9:34 AM
Thanks for your input, that link is interesting!
RP
RP
Dec 4, 2024, 9:38 AM
Thanks Mika, yes it is a MS Windows server I'm sure. (IIS web server).
Possibly re: perfection as is the priority here! ????
Not enterprise, interesting, I'll think of a way how to mention that. My understanding is that the head dev has reported to management that we need enterprise so I need to be tactful.
David Schargel
David Schargel
Dec 4, 2024, 12:36 PM
As @Mika Lepistö suggests, your best low-cost solution is to go with CloudFlare (free) and take advantage of the 5 free WAF rules they give you. Here's mine that are tuned to WordPress and are fairly aggressive:
David Schargel
David Schargel
Dec 4, 2024, 12:36 PM
#### 1 Bypass
This should SKIP all other items except other rules. The whitelist in my case is Patchstack and Cleantalk IPs
(ip.src in $whitelist) or (cf.client.bot) or (cf.verified_bot_category in {"Search Engine Crawler" "Search Engine Optimization" "Monitoring & Analytics" "Advertising & Marketing" "Page Preview" "Academic Research" "Security" "Accessibility" "Webhooks" "Feed Fetcher"}) or (http.user_agent contains "Uptime-Kuma")
David Schargel
David Schargel
Dec 4, 2024, 12:36 PM
#### 2 Bad Bots
This is set to managed challenge (or block) remove any bots you use (ie yandex, shrefs, semrush)
(http.user_agent contains "yandex") or (http.user_agent contains "sogou") or (http.user_agent contains "semrush") or (http.user_agent contains "ahrefs") or (http.user_agent contains "baidu") or (http.user_agent contains "python-requests") or (http.user_agent contains "neevabot") or (http.user_agent contains "CF-UC") or (http.user_agent contains "sitelock") or (http.user_agent contains "crawl" and not cf.client.bot) or (http.user_agent contains "bot" and not cf.client.bot) or (http.user_agent contains "Bot" and not cf.client.bot) or (http.user_agent contains "Crawl" and not cf.client.bot) or (http.user_agent contains "spider" and not cf.client.bot) or (http.user_agent contains "mj12bot") or (http.user_agent contains "ZoominfoBot") or (http.user_agent contains "mojeek") or (ip.geoip.asnum in {135061 23724 4808} and http.user_agent contains "siteaudit") or (http.user_agent contains "virusdie") or (http.user_agent contains "ccbot") or (http.user_agent contains "petalbot") or (http.user_agent contains "wpscan") or (http.user_agent contains "seznambot") or (http.user_agent contains "dotbot") or (http.user_agent contains "python") or (http.user_agent contains "BLEXBot/1.0") or (http.user_agent contains "ALittle Client") or (http.user_agent contains "axios") or (http.user_agent contains "Bytedance") or (http.user_agent contains "Bytespider") or (http.user_agent contains "GPTbot") or (http.user_agent contains "ChatGPT-User") or (http.user_agent contains "ClaudeBot") or (http.user_agent contains "Google-Extended") or (http.user_agent contains "Amazonbot") or (http.user_agent contains "Go-http-client" and not cf.client.bot) or (http.user_agent contains "uptime" and cf.verified_bot_category ne "Monitoring & Analytics")
David Schargel
David Schargel
Dec 4, 2024, 12:37 PM
#### 3 WordPress Login Protection
Set to Managed Challenge
(http.request.uri.path contains "/wp-login.php" and not http.request.uri.path contains "action=logout") or (http.request.uri.path contains "/login/")
David Schargel
David Schargel
Dec 4, 2024, 12:37 PM
#### 4 WordPress
Set to Block. Remove the asn for xmlrpc if you don;t use JetPack. Update the referenced domain to your domain.
(http.request.uri contains "xmlrpc" and ip.geoip.asnum ne 2635) or (http.request.uri.path contains "/wp-content/plugins/" and http.request.uri contains ".php" and not http.referer contains "") or (http.request.uri.path contains "/wp-content/themes/" and http.request.uri contains ".php" and not http.referer contains "") or (http.request.uri.path contains "/wp-content/" and http.request.uri.path contains ".php") or (http.request.uri.path contains "/wp-includes/" and http.request.uri contains ".php" and not http.referer contains "") or (http.request.uri contains "/wp-admin/admin-ajax.php" and http.request.method eq "POST" and not http.referer contains "" and not http.user_agent contains "MainWP" and not http.user_agent contains "") or (http.request.uri contains "/wp-comments-post.php" and http.request.method eq "POST" and not http.referer contains "") or (http.request.uri.path contains "/install.php") or (http.request.uri.path contains "readme") or (http.request.uri.path contains "license") or (http.request.uri.path contains "wp-config") or (http.request.uri.path contains "wlwmanifest") or (http.request.uri.path contains "wp-cron" and not http.host contains "")
David Schargel
David Schargel
Dec 4, 2024, 12:37 PM
(You can see that I do not like "". LOL.)
David Schargel
David Schargel
Dec 4, 2024, 12:38 PM
#### 5 Nusinance Protection
Set to Block. You can adjust to allow Tor.
(http.request.full_uri contains "passwd") or (http.request.uri.query contains "vuln.") or (http.request.uri.query contains "base64") or (http.request.uri.query contains "%3Cscript") or (http.cookie contains "%3Cscript") or (http.request.uri.path contains "phpmyadmin") or (http.request.uri.path contains "mysqladmin") or (http.request.uri.path contains "magento_version") or (http.request.uri.path contains "/phpunit") or (http.request.uri contains "/dfs/") or (http.request.uri contains "/autodiscover/") or (http.request.uri contains "/wpad.") or (http.request.full_uri contains "webconfig.txt") or (http.request.full_uri contains "?author") or (http.request.uri.path contains "/.htaccess") or (http.request.uri.path contains ".htpasswd") or (http.request.uri.path contains "/setup-config.php") or (http.request.uri.path contains "error_log") or (http.request.uri.path contains "/installer.php") or (http.request.uri.path contains "/installer-log.txt") or (http.request.uri.path contains "/installer-data.sql") or (http.request.uri.path contains "/database.sql") or (http.request.uri.path contains "/wpad.dat") or (http.request.uri.path contains ".php.suspected") or (http.request.uri.path contains ".php5.suspected") or (http.request.uri.path contains "autodiscover.xml") or (http.request.uri.path contains "assetlinks.json") or (http.request.uri.path contains "/cpanel") or (http.request.uri.path contains "/whm") or (http.request.uri.path contains "/cgialfa") or (http.request.uri.path eq "/shell.php") or (http.request.uri.path eq "/config.php") or (http.request.uri.path eq "/shells.php") or (http.request.uri.path contains "/.env") or (http.request.uri.path contains "/phpinfo") or (http.request.uri.path contains "/c/version.js") or (http.request.uri.path contains "/ALFA_DATA") or (http.request.uri.path contains "/alfacgiapi") or (http.request.uri.path contains ".php.PhP") or (cf.verified_bot_category in {"AI Crawler" "Other"}) or (ip.geoip.country in {"T1"})
David Schargel
David Schargel
Dec 4, 2024, 12:39 PM
Also, you might take further inspiration from nginx-bad-bot-blocker (which I run on 2 servers currently.
Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail f...
GitHub: GitHub - mitchellkrogza/nginx-ultimate-bad-bot-blocker: Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders
RP
RP
Dec 4, 2024, 11:36 PM
Thanks so much all ????much appreciated as usual
Bhagyesh Patel
Bhagyesh Patel
Dec 5, 2024, 11:38 PM
Please inform us how it goes.
Paul Thompson
Paul Thompson
Dec 8, 2024, 4:12 PM
Your biggest challenge is going to be getting the senior dev to step back from his public assertion to others that Cloudflare can't work "except with Enterprise which too expensive".
As others have pointed out, he's simply wrong - there's significant protection available in the free version. And it's 15 or 20 mins to set up even for someone not directly experienced in CF's interface, so there's essentially zero barrier to entry to at least test it.
But senior devs HATE taking advice like this from SEOs in the first place, much less being pushed to walk back their original claims.
So you're going to need a plan for getting the dev onside to reconsider this solution.
Sidenote: in my experience, one of the root causes for devs (especially Microsoft devs) to toss out Cloudflare solutions is because of their training/innate desire to maintain full control of their DNS. And there _are_ some sophisticated network DNS configurations that art a challenge to integrate with Cloudflare.
So the first thing to do is confirm with him/her that there are no legit showstopping reasons why Cloudflare couldn't be used to manage the DNS for the site in question.
Sidenote 2: There is an *account-wide* WAF functionality in CF that is only available in Enterprise. It's possible, if the dev doesn't have direct experience in Cloudflare, that they've seen this stipulation and not reaslised there are still many site-specific options available in the free version - including built-in bot blocking and AIbot blocking as one-click options.
RP
RP
Jan 8, 2025, 9:57 AM
Thanks guys, sorry for long response time. This has been taken well away from my involvement. Annoying. Yes possibly some stubborn/non Collab decision making.
I did provide some detailed and useful information in my email to them so my conscious is clear!
About £4000 per month is what the company has been quoted for CF Enterprise.
We were attacked from Romania by all accounts last week. The latest is we are going to do internal digital fingerprinting in dot net.

Our Values

What we believe in

Building friendships

Kindness

Giving

Elevating others

Creating Signal

Discussing ideas respectfully

What has no home here

Diminishing others

Gatekeeping

Taking without giving back

Spamming others

Arguing

Selling links and guest posts


Sign up for our Newsletter

Join our mailing list for updates

By signing up, you agree to our Privacy Policy and Terms of Service. We may send you occasional newsletters and promotional emails about our products and services. You can opt-out at any time.

Apply now to join our amazing community.

Powered by MODXModx Logo
the blazing fast + secure open source CMS.