In this article we'll discuss how you can block unwanted users or bots from accessing your website via .htaccess rules. The .htaccess file is a hidden file on the server that can be used to control access to your website among other features.

Following the steps below we'll walk through several different ways in which you can block unwanted users from being able to access your website.

Getting to your .htaccess file

  1. Login to your cPanel.
  2. Under the Files section, click on File Manager.
  3. Select the Document Root for: option, and choose your domain from the drop-down.
  4. Ensure that Show Hidden Files is selected.
  5. Then click Go.
  6. file-manager-hidden-files
     
  7. Right-click on the .htaccess file and select Edit.
  8. file-manager-htaccess-edit
     
  9. If your .htaccess file didn't exist already during the previous step, click on New File at the top-left, name the file .htaccess, and finally set the directory for the file to be created to /public_html/ or the document root of your site.
  10. file-manager-htaccess-create
     
  11. You might have a text editor encoding dialog box pop-up, you can simply click on Edit.

Making your .htaccess edits

  1. Now that you have your .htaccess file opened and ready to edit, you can use one of the following types of blocks:

    Block a single IP address

    deny from 123.123.123.123

    Block a range of IP addresses

    You can leave off the last few octets of an IP address to block everything in that range, the following example would block 123.123.123.1 - 123.123.123.255

    deny from 123.123.123

    You can also use CIDR (Classless Inter-Domain Routing) notation for the IPs. So for instance [123.123.123.0/24] would block the range 123.123.123.1 - 123.123.123.255, and [123.123.123.0/18] would block the range 123.123.64.1 - 123.123.127.255

    deny from 123.123.123.0/24

    Block bad users based on their user-agent string:

    The following code turns on the Apache RewriteEngine, then the next line looks at the user-agent string of a request. In this case if any of the words Baiduspider, HTTrack, or Yandex are mentioned anywhere in the string it then moves on to the RewriteRule which simply takes the original request and turns it into a 403 response with the R=403 bit of the redirect code:

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.*(Baiduspider|HTTrack|Yandex).*$ [NC]
    RewriteRule .* - [R=403,L]
    

     

    Temporarily block bad bots

    In some cases you might not want to send a 403 response to a visitor which is just a access denied message. A good example of this is lets say your site is getting a large spike in traffic for the day from a promotion you're running, and you don't want some good search engine bots like Google or Yahoo to come along and start to index your site during that same time that you might already be stressing the server with your extra traffic.

    The following code will setup a basic error document page for a 503 response, this is the default way to tell a search engine that their request is temporarily blocked and they should try back at a later time. This is different then denying them access temporarily via a 403 response, as with a 503 response Google has confirmed they will come back and try to index the page again instead of dropping it from their index.

    The following code will grab any requests from user-agents that have the words bot, crawl, or spider in them which most of the major search engines will match for. The 2nd RewriteCond line allows these bots to still request a robots.txt file to check for new rules, but any other requests will simply get a 503 response with the message "Site temporarily disabled for crawling".

    Typically you don't want to leave a 503 block in place for longer than 2 days. Otherwise Google might start to interpret this as an extended server outage and could begin to remove your URLs from their index.

    ErrorDocument 503 "Site temporarily disabled for crawling"
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^.*(bot|crawl|spider).*$ [NC]
    RewriteCond %{REQUEST_URI} !^/robots\.txt$
    RewriteRule .* - [R=503,L]
    

    This method is good to use if you notice some new bots crawling your site causing excessive requests and you want to block them or slow them down via your robots.txt file. As it will let you 503 their requests until they read your new robots.txt rules and start obeying them. You can read about how to stop search engines from crawling your website for more information regarding this.

You should now understand how to use a .htaccess file to help block access to your website in multiple ways.

Like this Article?

Login to comment.

Your Opinion Matters

... but we need to know what you're thinking!

I'm Jacob Nicholson, your friendly Community Support technician, and I wrote the article you're looking at now. I like to think it's perfect, but I'm sure you have some suggestions. Please, let me know what they are!

Feedback
Your Email Address
Because we'd like to talk with you!

Latest Questions

If you need some help, submit your question to our Community!
We guarantee a response within 60 minutes (8am - 9pm EST, Monday - Friday)
Ask a Question!
Recent Questions
  1. How do I provide a truly secure FTP site for my client to upload files with sensitive data (PHI - Personal Health Information, as referred to by HIPAA)?
  2. styles are not rendering on initial page load
  3. Internal server error 500

Need more Help?

Search

Ask the Community!

Get help with your questions from our community of like-minded hosting users and InMotion Hosting Staff.

Current Customers

Chat: Click to Chat Now E-mail: support@InMotionHosting.com
Call: 888-321-HOST (4678) Ticket: Submit a Support Ticket

Not a Customer?

Get web hosting from a company that is here to help. Sign up today!