In this article we'll review how you can locate possibly problematic user agents from requests on your site, that could be causing additional resource usage on your server.

This guide is meant for VPS or dedicated server customers that have SSH access to their server. If you happened to have setup a server load monitoring bash script, or you're utilizing one of the tools available mentioned in our advanced server load monitoring article, and see that your server's load average has been spiking, sometimes it's a good idea to see if there are any particular user agents in your access logs that seem to be causing this.

Locate high amounts of duplicate user agents

You can look at your Apache access logs in order to see a high amount of duplicate requests from one certain user agent using the steps below.

  1. Login to your server via SSH.
  2. Navigate to the home directory for the website you'd like to investigate. In this example our cPanel username is userna5, and our domain name is example.com:

    cd /home/userna5/access-logs

  3. You can use the awk command to only print certain columns of the Apache log, we will then pipe | that to the sort command so that all of the user agents are sorted by name, we'll then pipe that to the uniq -c command to uniquely count up how many times each user agent occurs, then finally we'll pipe all that to the sort -n command so it sorts the user agents by how many total requests they had:

    awk -F"\"" '{print $(NF-1)}' example.com | sort | uniq -c | sort -n

    You should get back something similar to this:

    1308 facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)
    1861 facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
    1931 msnbot-media/1.1 (+http://search.msn.com/msnbot.htm)
    3293 Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/)

  4. Now we can see that the AhrefsBot/4.0  search engine crawler has far more requests than any other user agent currently. In this case let's say that this website doesn't necessarily want to be indexed by this search engine, and they are just worried about Google and Bing (MSN) crawling them. Then we could go ahead and use the robots.txt file to stop a search engine from crawling your website.
  5. If the requests from this user-agent continue to flood in and are causing a current issue on your server, the robots.txt rules won't stop the requests until they request the rules again. You can go ahead and use our guide on how to block bad users based on their user agent string to stop them immediately from being able to access your site.

You should understand how to locate possible problematic user agents from hitting your site and causing issues.

Did you find this article helpful?

We value your feedback!

Why was this article not helpful? (Check all that apply)
The article is too difficult or too technical to follow.
There is a step or detail missing from the instructions.
The information is incorrect or out-of-date.
It does not resolve the question/problem I have.
How did you find this article?
Please tell us how we can improve our Support Center:
Email Address
Optional, but our team may contact you for more information.
Like this Article?

Post a Comment

Name:
Email Address:
Comment:
Are you a bot?
Submit

Please note: Your name and comment will be displayed, but we will not show your email address.

Related Questions

Here are a few questions related to this article that our customers have asked:
Ooops! It looks like there are no questions about this page.
Would you like to ask a question about this page? If so, click the button below!
Ask a Question

Need more Help?

Search

Ask the Community!

Get help with your questions from our community of like-minded hosting users and InMotion Hosting Staff.

Current Customers

Chat: Click to Chat Now E-mail: support@InMotionHosting.com
Call: 888-321-HOST (4678) Ticket: Submit a Support Ticket

Not a Customer?

Get web hosting from a company that is here to help. Sign up today!