high resource usage notice

Avatar
  • Answered
The "high resource usage" I am receiving on my account suddenly since July 9 at bumperpress.com is from a Chinese bot which is hitting me with 468,000 kbytes several times every minute. This is not my normal usage which is not very much. I have added several lines of code to my robots.txt and .htaccess but cannot get rid of this nuisance from "ptr.cnsat.com.cn" A chinese spammer bot. Anyone have any idea what I can do besides blocking their ip address which don't seem to work for long? They keep coming back! Thanks!
Avatar
JacobIMH
Hello BumperPress, You would probably find my guide on how to block unwanted users from your site helpful in this case. When you use a robots.txt file to block search engine bots this is only going to work on the bots that follow these rules, while a malicious bot can bypass them altogether. Blocking bots with a .htaccess file allows you to force your rules from the server side. In your case the most effective block for bots would be a user-agent block that way even if the bots IP address changes, it still won't be able to get requests to go through. Taking a look from 7/9 when the resource usage started spiking until today, here are bot User-agents I'm seeing by number of requests:

3077 Mozilla/5.0 (compatible; spbot/4.1.0; +http://OpenLinkProfiler.org/bot ) 2353 Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) 2099 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 1800 Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) 1290 Mozilla/5.0 (compatible; freefind/2.1; +http://www.freefind.com/spider.html) 873 Mozilla/5.0 (compatible; 007ac9 Crawler; http://crawler.007ac9.net/) 612 Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected]) 452 Mozilla/5.0 (compatible; Plukkie/1.5; http://www.botje.com/plukkie.htm) 324 Baiduspider-image+(+http://www.baidu.com/search/spider.htm) 83 Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) 80 Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+) 58 Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/) 54 NerdyBot 40 Mozilla/5.0 (compatible; 200PleaseBot/1.0; +http://www.200please.com/bot) 28 Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)

If you for instance wanted to block all these bots outright, here is a .htaccess rule you could use:

BrowserMatchNoCase "spbot" bots BrowserMatchNoCase "EasouSpider" bots BrowserMatchNoCase "YandexBot" bots BrowserMatchNoCase "Baiduspider" bots BrowserMatchNoCase "freefind" bots BrowserMatchNoCase "007ac9" bots BrowserMatchNoCase "DotBot" bots BrowserMatchNoCase "Plukkie" bots BrowserMatchNoCase "MJ12bot" bots BrowserMatchNoCase "AhrefsBot" bots BrowserMatchNoCase "200PleaseBot" bots BrowserMatchNoCase "SISTRIX Crawler" bots Order Allow,Deny Allow from ALL Deny from env=bots

I also noticed that you have deny from rules in your .htaccess file in this format:

Deny from 157.55.*

You don't actually need the asterisk *:

Deny from 157.55

Also you typically don't want to block based off of the PTR address or hostname in your .htaccess file. Rather a direct IP address, or in the case of the Chinese Baidu crawler simply blocking them by their User-agent is more effective. It looks like since we blocked specific requests with &user=202.46 in the URL with this code:

ErrorDocument 503 "Temporarily unavailable" RewriteEngine on RewriteCond %{QUERY_STRING} ^.*user=202.46.*$ RewriteRule .* - [R=503,L]

Your site has blocked 208 of those type of requests so far today, and it looks like your resource usage has dropped a bit. If you block some of those bots that you don't need crawling your site, Yandex for instance is a Russian search engine and Baidu is a Chinese one. That can help cut your resource usage even further. As always you can view CPU graphs in cPanel to help ensure that your usage isn't spiking again. Hope that helps, and please let us know if you had any other questions at all! - Jacob
Avatar
JeffMa
If they're hitting you from a hostname of ptr.cnsat.com.cn, you may simply add the following line to your .htaccess file:

deny from ptr.cnsat.com.cn