fortybelowzero/Bad Bot blocking.md

## Bad Bot blocking.md

      
    Raw
  

              Bad Bot blocking.md
            
          
    Some bad bots i've found hammering some of our client sites, and how i'm blocking them.

Caveats: These are all guesswork, they might be incorrect or may block more than intended, but they work for me.
NB, i have a robots.txt file specifying a crawl-rate of one request every 5 seconds, the below appear to be ignoring this. Generally i turn a blind eye to anything thats not invoking a server-generating url multiple times a second - these are people being excessive and causing undue load on relatively-modest servers.
robots.txt:

User-agent: *
Allow: /
Crawl-delay: 5

*.setaptr.net

Found this crawling our jobs pages, multiple requests per second, spoofed useragent to look like a normal browser, and some sneaky dns - all the requests come from random-alnums.setaptr.net, all of which resolve to the same 1 ip address on nslookup but aren't the ip address they're actually originating from
Think i've managed to block it with the following CIDR subnets added to the firewall (these rules are almost certainly too broad, but theres only 256 IPs for each range to minimise things):

173.244.208.0/24
173.244.209.0/24
173.244.210.0/24
173.244.211.0/24
209.95.51.0/24
209.95.56.0/24
107.182.230.0/24

*.west-datacenter.com

Crawling, making requests too quickly - i blocked the following 2 ip addresses:

67.212.239.134
67.212.239.133

Bytespider (Bytedance's (tiktok owner) spider - too many requests per second. Suspicion googling is they're trying to build their own LLM so hoovering up as much of the web content as possible as quickly as possible.

In htaccess file, i added the following, but bear in mind i could see half the requests weren't identifying themselves in the useragent but pretty certain given the pattern it was still them (fyi they crawl from AWS):
RewriteCond %{HTTP_USER_AGENT} "(Bytespider)" [NC]
RewriteRule "^.*$" - [F,L]
Fuzzfaster (pentest tool for iterating over a site looking for exploits)

In htaccess file, i added the following:
RewriteCond %{HTTP_USER_AGENT} "(Fuzz Faster)" [NC]
RewriteRule "^.*$" - [F,L]
No results found