InMotion Hosting Home

Support

Website

How to Stop Search Engines from Crawling your Website

How to Stop Search Engines from Crawling your Website

Updated on June 12, 2025 by InMotion Hosting Contributor

3 Minutes, 27 Seconds to Read

In order for your website to be found by other people, search engine crawlers, also sometimes referred to as bots or spiders, will crawl your website looking for updated text and links to update their search indexes.

How to Control search engine crawlers with a robots.txt file

Website owners can instruct search engines on how they should crawl a website, by using a robots.txt file.

When a search engine crawls a website, it requests the robots.txt file first and then follows the rules within.

It’s important to know robots.txt rules don’t have to be followed by bots, and they are a guideline.For instance, to set a Crawl-delay for Google this must be done in the Google Webmaster tools.

For bad bots that abuse your site you should look at how to block bad users by User-agent in .htaccess.

Edit or create robots.txt file

The robots.txt file needs to be at the root of your site. If your domain was example.com it should be found:

On your website:

https://example.com/robots.txt

On your server:

/home/userna5/public_html/robots.txt

You can also create a new file and call it robots.txt as just a plain-text file if you don’t already have one.

Search engine User-agents

The most common rule you’d use in a robots.txt file is based on the User-agent of the search engine crawler.

Search engine crawlers use a User-agent to identify themselves when crawling, here are some common examples:

Top 3 US search engine User-agents:

Googlebot 

Yahoo! 

Slurp bingbot

Common search engine User-agents blocked:

AhrefsBot 

Baiduspider 

Ezooms 

MJ12bot 

YandexBot

Search engine crawler access via robots.txt file

There are quite a few options when it comes to controlling how your site is crawled with the robots.txt file.

The User-agent: rule specifies which User-agent the rule applies to, and * is a wildcard matching any User-agent.

Disallow: sets the files or folders that are not allowed to be crawled.

Here are some of the most common uses of the robots.txt file:

Set a crawl delay for all search engines
Allow all search engines to crawl website
Disallow all search engines from crawling website
Disallow one particular search engines from crawling website
Disallow all search engines from particular folders
Disallow all search engines from particular files
Disallow all search engines but one

Set a crawl delay for all search engines:

If you had 1,000 pages on your website, a search engine could potentially index your entire site in a few minutes.

However, this could cause high system resource usage with all of those pages loaded in a short time period.

A Crawl-delay: of 30 seconds would allow crawlers to index your entire 1,000 page website in just 8.3 hours

A Crawl-delay: of 500 seconds would allow crawlers to index your entire 1,000 page website in 5.8 days

You can set the Crawl-delay: for all search engines at once with:

User-agent: * 
Crawl-delay: 30

Allow all search engines to crawl website:By default search engines should be able to crawl your website, but you can also specify they are allowed with:

User-agent: *
Disallow:

Disallow all search engines from crawling website:

You can disallow any search engine from crawling your website, with these rules:

User-agent: *
Disallow: /

Disallow one particular search engines from crawling website:You can disallow just one specific search engine from crawling your website, with these rules:

User-agent: Baiduspider 
Disallow: /

Disallow all search engines from particular folders:

If we had a few directories like /cgi-bin/, /private/, and /tmp/ we didn’t want bots to crawl we could use this:

User-agent: * 
Disallow: /cgi-bin/ 
Disallow: /private/ 
Disallow: /tmp/

Disallow all search engines from particular files:If we had files like contactus.htm, index.htm, and store.htm we didn’t want bots to crawl we could use this:

User-agent: *
Disallow: /contactus.htm
Disallow: /index.htm 
Disallow: /store.htm

Disallow all search engines but one:

If we only wanted to allow Googlebot access to our /private/ directory and disallow all other bots we could use:

User-agent: * 
Disallow: /private/  
User-agent: Googlebot 
Disallow:

When the Googlebot reads our robots.txt file, it will see it is not disallowed from crawling any directories.

Professional Websites Without the Tech Headaches

Our team will build a beautiful WordPress site and take care of updates, security, and maintenance – so you can focus on running your business.

Let Us Handle It for You

Share this Article

InMotion Hosting Contributor Content Writer

InMotion Hosting contributors are highly knowledgeable individuals who create relevant content on new trends and troubleshooting techniques to help you achieve your online goals!

166 thoughts on “How to Stop Search Engines from Crawling your Website”

hans says:

December 19, 2022 at 5:45 am

Hi,
Is there a way to let crawl your website only at nights? and not due the day.

Thank you
1. John-Paul Briones says:
  
  December 19, 2022 at 9:25 am
  
  I could not find a way to set specified crawl times, but you can use a crawl delay as described in the guide. This may help reduce the number of resources used overall.
Sehdev Packers and Movers says:

November 4, 2022 at 1:01 am

How can I assist Google to revisit the following page everyday?
sehdevpackers.com/packers-movers-gurgaon
1. Arnel Custodio says:
  
  November 4, 2022 at 5:07 pm
  
  Hello Sehdev Packers and Movers – Realistically, you can’t force Google to re-crawl/visit your page every day. You can be creating content that they will update or note when your site is changing. That is probably the most realistic way to get Google to review your site. But when your site is new, it won’t happen immediately. Check out this article for more information on forcing Google to recrawl your site: https://www.searchenginewatch.com/2018/04/20/how-to-force-google-to-recrawl-your-website/
goodmood says:

February 28, 2022 at 5:09 am

I would like to disallow semalt and semalt-semalt crawlers from wreaking havoc on my bounce rate.
1. John-Paul Briones says:
  
  February 28, 2022 at 5:23 pm
  
  You should be able to use the option above to Disallow one particular search engine from crawling your site.
Sam Paul says:

July 7, 2020 at 6:28 am

HI,

I am getting in Admin work result Google Analytics. This location /admin/ is counting pageview. I don’t want to crawl my Admin work by Google. What exactly I have to write in Robots.txt file to stop crawling all admin work from back end? Can anyone help me out with this problem?
1. Alyssa Kordek says:
  
  July 7, 2020 at 2:28 pm
  
  Hello Sam,
  
  Unfortunately, it is sometimes impossible to keep Google from indexing certain pages, even with robots.txt blocks in place. You may want to contact the developer of the site to see if there is a way to avoid the indexing of that page.
TK says:

March 30, 2020 at 5:26 am

Hello,
I want to remove deleted pages from search results of google .
Can i do that ???
1. InMotion Hosting S. says:
  
  March 30, 2020 at 3:49 pm
  
  Hello and thanks for contacting us. I recommend you contact Google directly and ensure all website metadata is updated accordingly.
Luis says:

March 11, 2020 at 6:14 pm

According to an e-mail I received from Google, blocking google bots will be penalized as an error. Since most crawlers do not care what the the robots.txt suggests this article is practically obsolete. Use X-Robots-Tag instead or better move your files bellow the public_html.
1. Stormy Scott says:
  
  March 12, 2020 at 2:17 pm
  
  Hi, Luis — Thank you so much for your comment. We’ll certainly review the article and make the appropriate changes.
Janus Buch says:

March 3, 2020 at 4:04 am

Hi, thanks for this detailed description. I’m not sure you can answer this question but here goes.

I’m trying to monitor an offline marketing campaign and wanna get as precise results as possible. What I was thinking was to make a copy of my website, publish it and not allow any bots to crawl it.

The idea is to have a website that can only be found by people who have been reached by the offline marketing and at the same time avoid my original website (that is crawled and ranks well) being punished for duplicate content as I would simply copy me site.

Would the above method accomplish this goal?

Once again, thanks for your help
1. Alyssa Kordek says:
  
  March 3, 2020 at 1:31 pm
  
  Hello Janus,
  
  Thank you for your comment. Yes, you can set up a cloned version for this purpose and block bots from crawling it, however you will likely need to use a subdomain such as dev.example.com as you cannot host two versions of a live site on the same domain name.
  
  Best Regards,
  Alyssa K.
Slim X3 says:

November 1, 2019 at 5:37 pm

here is my robots.txt file after edited and updated:
” User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php ”

and here is it when i check https://domain/robots.txt
” User-agent: *
Disallow: / ”

i seem website can’t update any change, so google can’t index my website. Please help me.
1. InMotion Hosting says:
  
  November 4, 2019 at 9:59 am
  
  Hello and thanks for contacting us. I recommend checking out this online robots.txt file checker for more information. Or you can contact our Live Support for further assistance.
Saily says:

July 5, 2019 at 9:05 pm

Thanks for sharing your knowledge and information, this must help us, appreciate your post. Saily from TechRecur
Mr Alexander says:

July 3, 2019 at 8:44 pm

Thanks for the detailed guide on how to block search engines from indexing the site and the surrounding values.
InMotionFan says:

March 18, 2019 at 4:06 pm

It may affect how your website shows in search engine results, but it shouldn’t affect your users’ experience negatively. It may make the website faster.
Himanshu Saxena says:

November 13, 2018 at 9:25 am

I have added robots.txt file with certain guidelines in my web-app. Now , I want to serve my new robots.txt file. How could I do so ?? Help : Urgent
1. InMotionFan says:
  
  November 13, 2018 at 6:18 pm
  
  Placing the file in your document root is sufficient to “serve” it. Is that what you meant?
Michael says:

September 5, 2018 at 7:31 pm

Please help me. Google bot stopped crawling my site for a very long time now. It used to crawl it before but eventually stopped. [email protected]
1. InMotionFan says:
  
  September 5, 2018 at 9:30 pm
  
  Hello – sorry for the issue with your site not being crawled by Google. You can go to WebMaster tools (from Google) and make sure that your site is being searched. Make sure that you do NOT have a Robots.TXT file that is blocking their crawler as per the instructions in this article.
InMotionFan says:

August 22, 2018 at 10:09 pm

The article above provides information on how to stop bots from crawling your site. If you are unable to use the information above, then I recommend speaking with a website developer for further assistance.
S Chakraborty says:

June 13, 2018 at 1:33 pm
In my robos.txt file I have written the following code:
```
User-agent: *
Disallow: /

But this is not working. I am still seeing my website in search engine.
```
1. InMotionFan says:
  
  June 13, 2018 at 5:20 pm
  
  If your website was already in the search engine, this rule does not remove it. The ROBOTS.TXT file suggests that the search engine not use it. Google supposedly does listen to this file, but remember that it is only a recommendation, not a requirement for search engines to follow the Robots.txt. If you want the search result removed, you will need to contact the search engine directly. They(the search engines) typically have a procedure to have the search results removed.
Mr Van says:

June 12, 2018 at 5:33 am

Hello, I want block bots facebook by url . Help?
1. InMotionFan says:
  
  June 12, 2018 at 3:37 pm
  
  You can use a combination of the above to disallow Facebook’s bots, listed here.
Aslam says:

March 2, 2018 at 8:04 am

In crawl-delay, whether it will be taken in seconds or milliseconds? I got some biased answers from internet, can you make it clear?
1. InMotionFan says:
  
  March 2, 2018 at 6:13 pm
  
  Crawl delay is measured in seconds.
Ron Kennedy says:

December 28, 2017 at 3:15 pm

When I see user-agent: * (does this mean Googlebot is automatically there or do I have to type in Googlebot)

Also If I see Disallow: / (could I remove the line and make it ‘allow?’ If so, where do I go to do this? I’m using WordPress platform.
1. InMotionFan says:
  
  December 28, 2017 at 10:18 pm
  
  You should specify Googlebot as shown in the example above. We are happy to help with a disallow rule but will need more information on what you are attempting to accomplish.
  
  Thank you,
  John-Paul
Welly says:

December 20, 2017 at 3:39 am

Hi. I want to block all crawlers on my site (forum).

But for a some reason, my command in “robots.txt” file don’t take any effect.

Actually, all is pretty same with, or without it.

I have constantly at least 10 crawlers (bots) on my forum…

Yes. I done a right command. I made sure that nothing is wrong, it’s pretty simple.

User-agent: *

Disallow: /

And still on my forum, I have at least 10 bots (as guests) and they keep visiting my site. I tried banning some IP’s (wich are very similar to each other). They are banned, but they still coming… And I’m receiving notification in my admin panel because of them.

Example: https://prntscr.com/hptzz3 ;

I at least tried to write mail to hosting provider of that IP adress for abuse. They replied me that “that” is only a crawler… Now… Any recommendations? 🙂 Thanks.
1. InMotionFan says:
  
  December 20, 2017 at 3:56 am
  
  Unfortunately, robots.txt rules don’t have to be followed by bots, and they are more like guidelines. However, if you have a specific bot that you find is abusive in nature to your site and affecting the traffic, you should look at how to block bad users by User-agent in your .htaccess file. I hope that helps!
fahad says:

November 20, 2017 at 12:20 pm

Hello,

My Robot.txt is
User-agent: *
Disallow: /profile/*

because i dont want anybot to crawl the user’s profile, why? because it was bringing many unusual traffic to the website, and high Bounce rate,

after i uploaded the robot.txt, i noticed a steep drop in the traffic to my website, and i am not getting relevant traffic as well, please advise what should i do?
i have done audit process as well and can’t find the reason whats holding it back.
1. InMotionFan says:
  
  November 20, 2017 at 7:14 pm
  
  If the only change you made was to the robots.txt file then there should be no reason for the sudden drop-off in traffic. My suggestion is that you remove the robots.txt entry and then analyze the traffic that you are receiving. If it continues to be an issue, then you should speak with an experienced web developer/analyst in order to help you determine what could be affecting the traffic on your site.
Mansoor Alam says:

November 16, 2017 at 3:59 pm

I want to block my main domain name from being crawled, but add on domains to be crawled. The main domain is just a blank site that I have with my Hosting Plan. If I put robot.txt in public_html to prevent crawlers, will it affect my clients’ add on domains hosted inside sub folder of public_html? So, main domain is at public_html and sub domains are at public_html/clients/abc.com

Any response will be appreciated.
1. InMotionFan says:
  
  November 16, 2017 at 4:16 pm
  
  You can disallow search engines from crawling specific files as described above. This would allow search engines to successfully crawl everything that is not listed in the rule.
  
  Thank you,
  John-Paul
ParthPatel says:

October 18, 2017 at 2:37 pm

I have to block my website for only google austelia. i have 2 domain one for india (.com) and one for austria (.com.au) but still i found my indian domain in google.com.au so let me know what is the best solution to block only google.com.au for my website.
1. InMotionFan says:
  
  October 18, 2017 at 3:37 pm
  
  Using the Robots.txt file is the remains one of the better ways to block a domain from being crawled by search engines including Google. However, if you’re still having trouble with it, then paradoxically, the best way to not have your website show in Google, is to index the page with Google and then use a metatag to let google know not to display your page(s) in their search engine. You can find a good article on this topic here.
viahal samota says:

July 3, 2017 at 5:14 am

Google blocked my site, but I never put any robots.txt file to disallow google. I’m confused. Why would Google not be tracking my page if I didn’t use a robots file?
1. InMotionFan says:
  
  July 3, 2017 at 4:12 pm
  
  You may want to double-check your analytics tracking code. Make sure that Google’s tracking code is visible on your site for each page you want to track.
Anny Watson says:

April 13, 2017 at 3:15 pm

Hello scott,

Can you explain that if my domain or subdomain are in the same root file. so how can i block the perticular subdomain by robots or etc.
1. InMotionFan says:
  
  April 13, 2017 at 6:48 pm
  
  When you create a subdomain it will create a separate document root. This is where the files (and robots.txt) for the subdomain should be stored. You can view your document root in cPanel.
  
  Thank you,
  John-Paul
Bns Babu says:

April 8, 2017 at 9:51 pm

How can I block my site in Google Search Engine?

But I want to index my site other search engine without google.

which code I paste in robot.txt file?

thanks advance
1. InMotionFan says:
  
  April 10, 2017 at 6:12 pm
  
  You will need to block the Googlebot user agent as described above.
vikram rathore says:

April 7, 2017 at 12:00 pm

hy can you help me i want remove this link for google search. www.complaintboard.in/complaints-reviews/capital-cow-l427964.html

i do search in google capital cow than this url show in 2nd possion but i want to remove or shift to next page for google so what to do? please suggest me..thanks
1. InMotionFan says:
  
  April 7, 2017 at 8:14 pm
  
  Vikram, you should be able to request that Google not crawl that site using Google Webmaster Tools.
Sumit Kumar says:

March 20, 2017 at 3:51 pm

user agent: *

disallow: /

Is it means it stops all bots to crwal our site?

Please update me because i got confused between

disllow: /abc.com/ and disallow: /
1. InMotionFan says:
  
  March 20, 2017 at 10:13 pm
  
  Yes, the code:
  user agent: *
  disallow: /
  
  is a request for the search engine to not crawl your site. They may ignore it if they choose.
Muhilan says:

January 4, 2017 at 10:11 am

Does the robots.txt prevent the website from all the browsers?
1. InMotionFan says:
  
  January 4, 2017 at 4:37 pm
  
  No, robots.txt file is to limit bots on the site. This prevents them from crawling. It does not block traffic. Traffic can be blocked by the htaccess file.
Lars says:

October 6, 2016 at 6:04 pm

I have a website wtih pages that are restricted with user/passw. On some of these restricted pages I call up PDF files. However, Google etc, finds and displays the contents of the file that was intended to restricted.

Question: If I make a robot.txt file to block the PDF directory, will google forget the old index after a while. Or do I have to recreate the file with another name?
1. InMotionFan says:
  
  October 6, 2016 at 6:22 pm
  
  If a folder is password protected correctly, it should not be accessible to be crawled by Google. So the robots.txt file shouldn’t make a difference. Even if they are listed in search results, it should not be accessible as long as they are password protected.
  
  After google re-crawls your site, it should update the links and no longer list the pdfs. If they are not crawling your site, you can request they reconsider crawling your site.
  
  Thank you,
  John-Paul
Nilesh says:

July 20, 2016 at 5:37 am

Hello Everyone I have read all the above but still not able to get it so please reply me

how can I disallow spiders crawlers and robots of search engines like google and bing to see my web page but I also want them not to block me or assume that I am a malware or something. I want to run a PPC campaign on Google and also want to redirect my link from www.example.com to www.example.com/test

or if I can change the whole url like from www.example.com to www.xyz.com

The catch is that I don’t want the bots to see my redirected domain.

Any help will be appriciated as I have seen above that you people have resolved almost everyone’s issue. hope mine will be resolved too
1. InMotionFan says:
  
  July 20, 2016 at 5:32 pm
  
  Hello Nilesh,
  
  The robots.txt files are merely GUIDES for the Search engine bots. They are not required to follow the robots.txt file. That being said, you can use the directions above to direct typical bots (e.g. google, bing) in to not scan parts (or all of your website). So, if you don’t wan them to go through a re-directed site, then you simply have to create a robots.txt file FOR that site. If that site is not under you control, then you will not have a way to do that.
  
  If you have any further questions or comments, please let us know.
  
  Regards,
  Arnel C.
2. InMotionFan says:
  
  August 2, 2016 at 1:50 am
  
  I get a lot of spam mails. I tried adding a captcha , but still i get spam mails . Now I tried editing my robot.txt and disallowed access to contact-us page. I guess this might happen as my mail id is still there in clickable format. Did I do it right, Would this effect the SEO. Please suggest me a solution.
  
  How should I get rid of spam mails in future?!
3. InMotionFan says:
  
  August 2, 2016 at 3:18 pm
  
  Bots do not have to comply with the robots.txt directives. Legitimate bots typically will but spam bots do not. So is the spam coming from the form on the contact page or is it just coming to your email address? If its the form getting filled out, captcha should help. If its simply email spam coming through, not from the form directly, you should look at changing the code so you email address is not exposed.
Elias Akel says:

July 6, 2016 at 6:40 pm

Web crawlers crawl your site to Allows potential customers to find your website. Blocking search engine spiders from accessing your website makes your website less visible. Am I right? Why are people trying to block search engine spiders? What am I missing?
1. InMotionFan says:
  
  July 7, 2016 at 1:54 pm
  
  Hello Elias,
  
  Yes, you are correct. However, sometimes, there are many files that you do NOT want a search engine to index (e.g. library of internal files). Spiders can also cause a load on the site. So, you can use a ROBOTS file to help control the search indexing of your site.
  
  I hope that helps to answer your question! If you require further assistance, please let us know!
  
  Regards,
  Arnel C.
Sunil says:

June 16, 2016 at 10:28 am

Hi, I am new to robots.txt. I would like to build a web crawler that only crawles a local site. Is it a rule that crawlers should crawl only through the alowed domains? What if my crawler ignores robots.txt file? Will there be any legal issues in doing so? Any help would be appreciated. Thanks!
1. InMotionFan says:
  
  June 16, 2016 at 2:57 pm
  
  Hello Sunil,
  
  The Robots.txt file’s purpose was to allow website owners to lessen the impact of search crawlers on their sites. If you were to ignore it, then they may consider putting something else up to block you or consider your crawler malware.
  
  If you have any further questions, please let us know.
  
  Kindest regards,
  Arnel C.
2. InMotionFan says:
  
  June 16, 2016 at 7:05 pm
  
  Hello Marnix,
  
  Thank you for contacting us. Here is a link to our guide on how to Block a country from your site using htaccess.
  
  To remove your site from the specific search engines, I recommend setting up accounts with them (such as Webmasster Tools from Google), and requesting that they do not crawl your sites.
  
  Thank you,
  John-Paul
Marnix Garner says:

June 15, 2016 at 6:36 pm

I’m wanting to block a website from being listed in only the UK search engines or being listed in the UK, e.g. google.co.uk, google.com, bing.co.uk, bing.com not to show a website when searching for it in the UK.

How can this be done please?

Best Regards,

Marnix
Michael says:

May 29, 2016 at 5:47 am

Apologies if this has been answered already. I couldn’t locate an answer…

Greetings – I have a WordPress site, and will redevelop it in a separate file and then move the redeveloped site to the root directory. I want to block the www.example.com/dev/ file from being crawled until the new site is completed.

Should the robots.txt file look like this, and will the live site www.example.com NOT be blocked while the /dev/ file will be blocked?

User-agent: *

Disallow: /example.com/dev/
1. InMotionFan says:
  
  May 31, 2016 at 5:49 pm
  
  You only have to include the fodler name, like below:
  
  User-agent: *
  Disallow: /dev/
  
  This will keep the ‘dev’ folder top be not crawled.
sharey says:

May 20, 2016 at 10:44 am

Very good work but see if my sites robots.txt is correct

https://suntechapps.com/
1. InMotionFan says:
  
  May 20, 2016 at 4:09 pm
  
  Hello Sharey,
  
  You have a Disallow: line in your robots.txt that has nothing past it. I would suggest to fix that part but other than that it looks great.
  
  Best Regards,
  TJ Edens
arun says:

May 10, 2016 at 4:50 am

For crawling do I need static Ip address or dynamic IP address which is best practise.
1. InMotionFan says:
  
  May 10, 2016 at 3:26 pm
  
  Websites do not have dynamic ip’s but maybe I’m not understanding your question. Are you asking if your website should have a static IP address to be crawled?
Brian Murphy says:

May 1, 2016 at 6:21 am

Thanks for the reply, Arn.

I didn’t see anything about wildcards in that thread about htacess. Anyhow, htaccess files are way too complicated for me.

What I want to do is tell spiders to not look at .asp and .exe files. Can *.asp and *.exe be used in a robots.txt file?
1. InMotionFan says:
  
  May 2, 2016 at 11:30 am
  
  To block specific file extensions, use the format below:
  
  User-agent: *
  Disallow: /*.gif$
  
  So in your case, you could have:
  User-agent: *
  Disallow: /*.asp$
  
  User-agent: *
  Disallow: /*.exe$
Sunil Dangi says:

April 26, 2016 at 1:48 pm

i want to stop my url crawling in yahoo

please any one suggest me
1. InMotionFan says:
  
  April 26, 2016 at 3:42 pm
  
  Place a robots.txt file in your public_html or www directory and place the following code in the robots.txt file:
  
  User-agent: Yahoo! Slurp
  Disallow: /
Brian M. says:

April 20, 2016 at 4:13 pm

Can wildcards be used to specify files to disallow? Like all .asp and .exe files?

Disallow: /*.asp

Disallow: /*.exe

If the above would work, would it apply only to files in the root folder?

Thanks
1. InMotionFan says:
  
  April 20, 2016 at 5:00 pm
  
  Hello Brian,
  
  The Robots.txt file is specifically used for controlling what robots can or cannot see. You would need to access the .htaccess file in order to add rules about certain files. Check out this forum post about the subject .
  
  If you have any further questions or comments, please let us know.
  
  Regards,
  Arnel C.
P C says:

April 19, 2016 at 6:43 pm

This is very old but cannot resist responding 🙂

You can probably write a rewrite rule to detect host header HTTP_HOST and return a 404 response for robots.txt for the site you want to allow search engines.
1. InMotionFan says:
  
  April 19, 2016 at 7:11 pm
  
  We do not have a robots.txt file at the root level that would be conflicting with yours. It’s possibly a path error. Nevertheless, I recommend doing your development with a hosts file modification, so you can use the proper domain name.
SRIKUMAR says:

April 8, 2016 at 6:28 pm

sirthe site how can i.

i have in problem after google search one web site of ekikrat.in enter into another site which crawling by google wantto stop or hide
1. InMotionFan says:
  
  April 10, 2016 at 10:33 pm
  
  Hello,
  
  You can follow the guide above which will prevent googlebot from crawling your website.
  
  Best Regards,
  TJ Edens
Sanjay Adhikari says:

April 6, 2016 at 7:46 am

For an e-commerce B2B site, price is different for different user. So I want search engine not to index the price of the product. Is it possible?

waiting for the response.

Thanking you,

San
1. InMotionFan says:
  
  April 6, 2016 at 5:53 pm
  
  Hello Sanjay,
  
  If you kept that information on separate page, then you could use Robots.txt to ignore that page. You could also encode that information on the page, but it would probably be best to simply not publish it and ask your customers to contact you for that information. Here’s a good post on keeping your content hidden.
  
  If you have any further questions or comments, please let us know.
  
  Regards,
  Arnel C.
Patrick says:

March 14, 2016 at 5:49 pm

Hi,

Is there a way to have an page indexed but not have one aspect of it crawled? We’d like to add an into box to the top of a page, but we don’t want the intro box crawled.

Thanks,

Pat
1. InMotionFan says:
  
  March 15, 2016 at 12:13 am
  
  No, a page crawled means the entire content is seen and indexed.
Harry says:

February 29, 2016 at 12:59 pm

Great post. Thanks a lot.

I have one question please. I have the domain www.test.com right? and at the same time, i have this URL: https://mail.test.com

how can I, through robots.txt, to block the https://mail.test.com from appearing in search results i.e. not to be crawled?

thanks in advance
1. InMotionFan says:
  
  February 29, 2016 at 11:12 pm
  
  You just need to create a robots.txt file in the root folder of the subdomain and enter the following code:
  
  User-agent: *
  Disallow: /
  
  This will block the entire subdomain from being crawled.
Jim says:

February 16, 2016 at 2:13 am

I noticed that, on my server — ecres161 — when you’re developing a site and working with temp urls like this:

https://ecres161.servconfig.com/~username/welcome

… if you try to do anything that needs robots.txt, if won’t work.

For example, Google’s various testing tools or sitemap software that looks at robots.txt. Both of those things fail for me, citing being prevented by robots.txt, even if I do not have a robots.txt file in my public_html dir.

However, once I launch a site and the url is like: https://www.mydomain.com/welcome, then it *does* find the local robots.txt file and works fine.

So, I suspect servconfig.com has its own robots.txt and is disallowing everything, which I understand may be good. But, it makes it tough to do any pre-testing work prior to launching a site. So, is this done on purpose, or is it something tht can be changed on Inmotion’s server to allow us to do testing prior to launching a site?
srikumarchaterjee says:

February 13, 2016 at 3:49 pm

i hav eposted a review on ******.in web site on google search it is seen on other site i want to hide
1. InMotionFan says:
  
  February 16, 2016 at 12:03 am
  
  You would use the rule above for restricting specific files on your site.
innocent cyril says:

January 29, 2016 at 10:53 pm

thanks for the great post
Will says:

September 8, 2015 at 5:22 pm

Hi, I have createed the appropriate Robots.txt and it has stopped indexing. The website in question is go.xxxxx.com. It is an internal CRM that we do not want visisble, all indexing has stopped except when I googe “go company name” or “company name go.” Then the site link pops up with no description because it says Robots.txt will not allow the crawler. Is there a way to get rid of it from indexing even the link to the page when searching that specific word. I assume it is finding it because it is in the URL?
1. InMotionFan says:
  
  September 8, 2015 at 5:28 pm
  
  Hello Will,
  
  Robots.txt is basically a request for robots to not crawl the site. All search engines, Google included, will basically do what they want. Google listens to your options in Webmaster tools more than it will in robots.txt, so you may want to check that out as well.
  
  Kindest Regards,
  Scott M
2. InMotionFan says:
  
  October 25, 2015 at 1:06 am
  
  Hello.
  I had a similar problem. Because I receive a high amount ob crawlers and spiders to my website, I decided to redirect them to another domain name. Right now I see an improvement, but not all of them are gone. I see some chinese spiders that are still crowling my website.
  What can I do to stop them, how them can avoid redirection?
  Thank you!
3. InMotionFan says:
  
  October 26, 2015 at 4:23 pm
  
  Hello Andru,
  
  Robots.txt is a request, but only good bots will listen to it. Bad bots will not listen to the robots.txt. Chinese bots are very often on the side that do not listen to the file. You may need to set up specific redirects or blocks for the ones that are more persistent.
  
  Kindest Regards,
  Scott M
Babak says:

July 27, 2015 at 1:41 pm
wow very nice article!

I wanted to block my forum like www.site.com/forum

so i using like this:
```
User-agent: *
Disallow: /forum

Thanks :X
```
Monica says:

June 13, 2015 at 9:56 pm

Are you saying that I am blocking just files, but not the HTML Pages themselves?

Monica
1. InMotionFan says:
  
  June 15, 2015 at 9:34 pm
  
  Hi Monica,
  
  The html files are the individual pages, so yes, you would be blocking those particular pages from being crawled by the search engines that honor the request.
  
  Kindest Regards,
  Scott M
Sameer says:

June 12, 2015 at 2:13 pm

Thanks a lot. It make sense..
Monica says:

April 30, 2015 at 12:14 am

We are using a program called Rapid Weaver, a mac program.

How do I create a Robot.txt file for just certian pages that we do not want to have crawled?

I understand it needs to be in the root directory?

If possible tell me if I am understanding correctly:

Create a page for example: https://www.amrtax.com/robot.Txt ( or robots.txt with an S ?)

On that page before header:

User-agent: *

Dissallow:/findrefund.html

Disallow:/whattobring.html

Dissallow:/worksheets.htm

Dissallow:/services.html

Dissallow:/Staff.html

Dissallow/enrolledagent.html

Do I have the hang of it? If I uploade that page although not added to the Menu would this work?

Trying to work it out in my head!
1. InMotionFan says:
  
  April 30, 2015 at 8:53 pm
  
  Hello Monica,
  
  You’re blocking individual files from being searched with the rules above. And, yes, it’s robots.txt. Just follow the directions in the article above to complete the file properly.
  
  I hope this helps to answer your question, please let us know if you require any further assistance.
  
  Regards,
  Arnel C.
Suraj says:

April 9, 2015 at 10:24 am

I am getting lost of httpd request on my websites for particuller page which consume my lots of cpu and memory i want to block access on that page and drop http request for that page..

Kindly suggest
1. InMotionFan says:
  
  April 9, 2015 at 5:26 pm
  
  Hello Suraj,
  
  If they’re hitting a particular page on your website, you do have the option of removing that page if it’s not necessary. Otherwise, you can use the .htaccess file to create a redirect for that specific page. Check out the list of things you can do with the .htaccess file here.
  
  Regards,
  Arnel C.
advent says:

February 26, 2015 at 11:05 am

thanks a lot bro.
Manny says:

February 18, 2015 at 4:06 am

They are in different lines only, somehow they were bunched together when I posted a comment here.

<IfModule mod_rewrite.c>

RewriteCond %{HTTP_USER_AGENT} ^Yandex [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR]

RewriteCond %{HTTP_USER_AGENT} ^Baidu [NC]

RewriteRule ^.* – [F,L]

</IfModule>
1. InMotionFan says:
  
  February 18, 2015 at 4:28 am
  
  Hello Manny,
  
  Can you please provide your domain so we may investigate the issue further?
  
  Best Regards,
  TJ Edens
Manny says:

February 18, 2015 at 2:07 am

These are the new entries from Baidu spider after all the entries made to block them.

80.76.6.233 – – [18/Feb/2015:10:05:22 +1100] “GET /link/id/zzzz5448e5b9546e4300/page.html HTTP/1.1” 403 505 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +https://www.baidu.com/search/spider.html)”
180.76.5.151 – – [18/Feb/2015:10:05:30 +1100] “GET /link/id/b57de3ecb30f9dc35741P8c23b17d6c9e0d8b4d5a/page.html HTTP/1.1” 403 521 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +https://www.baidu.com/search/spider.html)”
123.125.71.109 – – [18/Feb/2015:10:05:34 +1100] “GET /media/dynamic/id/57264034bd6461d9b091zzzz52312bad5cc09124/interface.gif HTTP/1.1” 403 529 “-” “Baiduspider-image+(+https://www.baidu.com/search/spider.htm)\\nReferer: https://image.baidu.com/i?ct=503316480&z=0&tn=baiduimagedetail”
1. InMotionFan says:
  
  February 18, 2015 at 2:17 am
  
  Hello Manny,
  
  What .htaccess did you put these in? Please be sure to put it in the one located in your domains document root. Also these should all be separated line by line and not bunched together.
  
  Best Regards,
  TJ Edens
Manny says:

February 18, 2015 at 2:05 am

Hello John,

Thanks for your response. I have added the Rewrite rules as mentioned but still I see the baidu spider entries in the access.log

180.76.5.64 – – [18/Feb/2015:08:17:31 +1100] “GET /link/id/zzzz547fe1b77394d419/page.html HTTP/1.1” 403 505 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +https://www.baidu.com/search/spider.html)”

I have the following entries in the .htaccess file.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(baidu|Baiduspider|HTTrack|Yandex|Majestic).*$ [NC]
RewriteRule .* – [F,L]
</IfModule>

BrowserMatchNoCase baidu banned
Deny from env=banned

BrowserMatchNoCase “Baiduspider” bots
BrowserMatchNoCase “HTTrack” bots
BrowserMatchNoCase “Yandex” bots
BrowserMatchNoCase “Baidu” bots

Order Allow,Deny
Allow from ALL
Deny from env=bots

Then I found the baidu spider requests are mostly from 180.76.5.x and 180.76.6.x IP addresses and then I blocked these IP range in .htaccess.

Order Allow,Deny
Allow from ALL
Deny from env=bots

order allow,deny
allow from all
# Block access to Baiduspider
deny from 180.76.5.0/24 180.76.6.0/24

But still I see the baidu spider entries in the access.log.

Please help me to get rid of this asap. Thank you.
Manny says:

February 16, 2015 at 6:12 am
Hi, This is really useful post. I have pasted my robots.txt file below. But still, I see the crawling from Yandex and Baiduspider. Please help me to fix this.

User-agent: Googlebot
```
Disallow: 
User-agent: Adsbot-Google
Disallow: 
User-agent: Googlebot-Image
Disallow: 
User-agent: Googlebot-Mobile
Disallow: 
User-agent: MSNBot
Disallow: 
User-agent: bingbot
Disallow: 
User-agent: Slurp
Disallow: 
User-Agent: Yahoo! Slurp
Disallow: 
User-agent: MJ12bot
Disallow: /
User-agent: moget
Disallow: /
User-agent: ichiro
Disallow: /
User-agent: Yeti
Disallow: /
User-agent: NaverBot
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: YoudaoBot
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: Baiduspider-video
Disallow: /
User-agent: Baiduspider-image
Disallow: /
User-agent: Yandex
Disallow: /


180.76.6.135 - - [15/Feb/2015:13:12:15 +1100] "GET / HTTP/1.1" 403 984 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +https://www.baidu.com/search/spider.html)"
```
I see that the crawling from Yandex.com refers the robots.txt file and seems it was not allowed to crawl my website. The crawling from Yandex.ru looks like it was allowed.

2.93.117.172 – – [16/Feb/2015:03:54:17 +1100] “GET / HTTP/1.1” 200 11289 “https://yandex.ru/yandsearch?text=e.bom.gov.au&lr=213” “Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MASPJS)”

100.43.91.14 – – [16/Feb/2015:04:04:35 +1100] “GET /robots.txt HTTP/1.1” 200 1071 “-” “Mozilla/5.0 (compatible; YandexBot/3.0; +https://yandex.com/bots)”

100.43.91.14 – – [16/Feb/2015:04:07:09 +1100] “GET /robots.txt HTTP/1.1” 200 1071 “-” “Mozilla/5.0 (compatible; YandexBot/3.0; +https://yandex.com/bots)”

95.221.127.107 – – [16/Feb/2015:04:08:28 +1100] “GET / HTTP/1.1” 200 9908 “https://yandex.ru/yandsearch?text=asa.i-events.info&lr=213” “Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.74 Safari/537.36 MRCHROME”
1. InMotionFan says:
  
  February 16, 2015 at 8:12 pm
  
  Hello Manny,
  
  Thank you for your question. Some bots will ignore your robots.txt rules. For bad bots that abuse your site you should look at how to block bad users by User-agent in .htaccess.
  
  If you have any further questions, feel free to post them below.
  
  Thank you,
  John-Paul
Tina Black says:

February 3, 2015 at 5:50 am

Really technical but useful post. My site is built using WordPress, so I initially try the method introduced from https://wpmatter.com/how-to-prevent-search-engine-index-page/. However, what this post share are too basic for me, so I’m searching for something more advance about how to prevent search engines to index some of my low quality posts for days. And what you share are really helpful for me. Thanks a lot.
Darshita Patel says:

January 31, 2015 at 1:16 pm

Hello,

domain is domain.com and subdomain is sub.domain.com

I want to deindex sub.domian.com

Any solutions?

Thanks.
1. InMotionFan says:
  
  February 2, 2015 at 9:12 pm
  
  Hello Darshita,
  
  To our knowledge, the only way to get a URL delisted from Google is to request it via Webmaster tools.
  
  Kindest Regards,
  Scott M
John says:

January 5, 2015 at 10:48 pm

Happy New Year Scott,

Consider the Marketing Advantage of user defined servers. The buy-in is a simple form <– aka which regions and what level of outside access do you wish to allow.

No one is doing this and its an Obvious advantage for clients!

Please email me if inMotion catches a clue in the future.

Best Regards!
John says:

January 5, 2015 at 10:21 pm

Thank You for this great article!

My current host/website is getting pounded by crawlers, spam bots, and spiders. I’m seeing kits from wankers in Asia, France, Egypt, and morons in the US.

It occurs to me, all of this nonsense can be rejected at the hosting server/router level before it hits a specific website user account on the host server.

Does inmotionhosting.com offer a hosting option which denies access to all but a white list for those of us who could care less about a global audience and simply seek a testbench?

Thanks for the help!
1. InMotionFan says:
  
  January 5, 2015 at 10:32 pm
  
  Hello John,
  
  We do not normally block things on a level prior to reaching an account, though we do block bots that have been identified as being malicious. Most of the bots are other search engines such as Yandex, Baidu, etc and many of our customers do not mind being in those engines as well. We also cannot tell what bots one account wants and another doesn’t so we leave it up to each account to decide who they want to visit or not.
  
  Kindest Regards,
  Scott M
Touqir Abbas says:

December 26, 2014 at 6:52 am

its very helpfull article for me, one of my account suspended last night due to heavy traffic ( https://***********.in ) now i apply it to robot.txt, it is working for me…

thanks
ExpertWebWorld says:

December 19, 2014 at 7:56 am

yes you are right , but me little bit surprise to that none of my directory listing page is showing in google search https://directory.expertwebworld.com/search.php?cn=Computer+leasing+-+rental other pages like about us, blog, portfolio etc are showing but the record which submit by the visitor in different category. hope you understand what i mean
1. InMotionFan says:
  
  December 19, 2014 at 4:00 pm
  
  Hello ExpertWebWorld,
  
  Thanks for getting back with us. Google has it’s own policies and algorithms for indexing and will never let anyone know so they cannot be manipulated. The best anyone can do is to work on SEO and likely with Google Webmaster Tools to help themselves in the ranking. SEO is a relationship between web pages and Google.
  
  Kindest Regards,
  Scott M
ExpertWebWorld says:

December 18, 2014 at 4:06 pm

my website not showing any page in google listing in serach directory.expertwebworld.com i dont know why i check all robot but not disallow google big robot . Even in meta its index,follow tags .
1. InMotionFan says:
  
  December 18, 2014 at 5:22 pm
  
  Hello ExpertWebWorld,
  
  I can see your website is indexed in Google. From here you can just focus on SEO for specific keywords for your pages. But it is definitely visible in the index, so Google has noticed it.
  
  Kindest Regards,
  Scott M
ExpertWebWorld says:

December 18, 2014 at 4:05 pm

my website not showing any page in google
D says:

December 17, 2014 at 7:41 pm

Is there an easy way to implement crawl delays serverwide for all domains?
1. InMotionFan says:
  
  December 17, 2014 at 8:30 pm
  
  Hello D,
  
  Thanks for the question. Each robots.txt file applies to each domain. If you want to apply a crawl delay for each domain, then simply use the instructions above, then copy the file to each domain where you need the crawl delay to apply. You can’t do it from one location.
  
  I hope this helps to answer your question, please let us know if you require any further assistance.
  
  Regards,
  Arnel C.
john carl says:

November 27, 2014 at 7:17 pm

thank for your post . I also apply to my site and it works perfectly
1. InMotionFan says:
  
  November 29, 2014 at 5:10 am
  
  Hello Lybear,
  
  Sorry that you’re having issues with the re-direct. You may want to use the the official Apache documentation on mod rewrite to determine how best to write your rule. We can’t write the rule for you, unfortunately. It does appear that you have modified the original WordPress .htaccess rule. You may want to remove the rule. Check out the rewrite rules in the articles listed for .htaccess files.
  
  Apologies that we cannot give you a direct answer on the issue. Hopefully, this will help direct you to a more appropriate answer.
  
  Regards,
  Arnel C.
lybear says:

November 27, 2014 at 7:12 am

Hello Arn !

I have follow this step and got error

Here is my error

0 # BEGIN WordPress

1 <IfModule mod_rewrite.c>

2 RewriteEngine On

3 RewriteRule ^2014/11/(.*)$ $1 [L,QSA]

4 RewriteBase /

5 RewriteRule ^index\.php$ – [L]

6 RewriteCond %{QUERY_STRING} !lp-variation-idThis condition was met

7 RewriteRule ^go/([^/]*)? /wp-content/plugins/landing-pages/modules/module.redirect-ab-testing.php?permalink_name=$1 [QSA,L]

8 RewriteRule ^landing-page=([^/]*)? /wp-content/plugins/landing-pages/modules/module.redirect-ab-testing.php?permalink_name=$1 [QSA,L]

9 RewriteCond %{REQUEST_FILENAME} !-fThis variable is not supported: %{REQUEST_FILENAME}

10 RewriteCond %{REQUEST_FILENAME} !-dThis variable is not supported: %{REQUEST_FILENAME}

11 RewriteRule . /index.php [L]This rule was not met because one of the conditions was not met

12 </IfModule>

13 # END WordPress

This rule was met, the new url is https://dbmakemoney.com/other-advertising-networks-besides-google-adsense/
lybear says:

November 27, 2014 at 5:05 am

Hello Arn,

Thanks for your reply .

anyway in the ROBOTS.TXT I have already do this way

Disallow: /2014/11/
but it still show in the Browser site.com/2014/11/mysties
I only want to show site.com/mysties

Best regards,
1. InMotionFan says:
  
  November 27, 2014 at 5:40 am
  
  Hello Lybear,
  
  The ROBOTS.TXT file will NOT block you from accessing that folder. It only prevents search bots from going into the folder. In order to prevent your browser from using /2014/11, then you will need to create a rewrite rule in your .htaccess file. Try reviewing this forum for a rewrite rule that may help in your case.
  
  Kindest regards,
  Arnel C.
lybear says:

November 27, 2014 at 3:14 am

Hello Scott ,

I want all my sites under the 2014/11/mysites show only mysites withouth 2014/11 folder .
actully I don’t know what is different between block access 2014/11 folder and redirect of some sort ?
If possible could you show me of both way .

Best Regards,
1. InMotionFan says:
  
  November 27, 2014 at 4:04 am
  
  Hello Lybear,
  
  Thanks for the question. If you are trying prevent search engines from accessing the directory you’re indicating, then you can use the ROBOTS.TXT tutorial above for this purpose. A re-direct used to change the path of a URL from one location to another. If you have other things that rely on that URL and the files at that location, then you may not want to do the re-direct. If you want more information on creating a re-direct, try reviewing Setting a 301 Redirect in your HTACCESS.
  
  I hope this helps to provide the answer that you seek. If you require further assistance, please let us know.
  
  Regards,
  Arnel C.
lybear says:

November 26, 2014 at 3:49 pm

Hi !

how can I block folder /2014/11/ ?

Here is my current site located
https://dbmakemoney.com/2014/11/other-advertising-networks-besides-google-adsense/

I want to

https://dbmakemoney.com/other-advertising-networks-besides-google-adsense/

Thanks in advance!
1. InMotionFan says:
  
  November 26, 2014 at 4:19 pm
  
  Hello Lybear,
  
  What exactly are you asking? Do you want to block access to 2014/11 folder? Or are you looking to set up a redirect of some sort?
  
  Kindest Regards,
  Scott M
Neil says:

November 24, 2014 at 7:43 pm

Thanks John-Paul. Until a few hours ago I did not have any robots.txt rules. A few hours ago I created the robots.txt file for each site with more restrictive disallow rules instructing bots to not crawl the wp-includes folder, the theme and plugin folders and wp-admin. I’m hoping this reduces the scope and impact of the bots on the server each evening. If not, then perhaps a crawl delay would at least spread the impact out and not take down the server…
1. InMotionFan says:
  
  November 24, 2014 at 7:47 pm
  
  Hello Neil,
  
  While using robots.txt and setting delays may help, overall, search engines now ignore the file. This even includes Google. You can set your preferences for them from within Google’s webmaster tools. For other search engines, setting the delays and requests not to crawl in robots.txt is done with the expectation and hope that they will listen.
  
  Kindest Regards,
  Scott M
Neil says:

November 24, 2014 at 7:01 pm

Hi there,

I have about 40 WordPress websites on one hosting account and every evening around the same time, my hosting gets sluggish and goes down for about 20 to 30 minutes. I have looked at the server logs and it looks like that’s when sites are getting crawled by Google. Previously, I haven’t had any specific robots.txt files on each site (shame on me, yes). I have added robots.txt files for all the sites with fairly restrictive disallow settings that really only give access to the wp-content folder (minus the theme and plugins). Will reducing the access to the bots significantly reduce the impact on my server when the sites are being crawled or do I also need to set a crawl delay?

Also, only a couple of the sites are blogs and those are the only ones with a significant amount of pages. The rest are small, static sites. Would you recommend just setting a crawl delay on the large blogs that have 1,000+ pages and posts?

Thanks!
1. InMotionFan says:
  
  November 24, 2014 at 7:31 pm
  
  Hello Neil,
  
  Thank you for your question. While setting a crawl delay may help, we would need to see the nature of the requests to provide a detailed answer.
  
  This is because you may be getting crawled by bots that are not following your robots.txt rules. In this case a robots.txt file will not help. Instead, identify and block the specific bots from your site.
  
  Thank you,
  John-Paul
Greg says:

November 17, 2014 at 6:28 pm

Thanks Jean-Paul,

Just a couple of further questions:

I setup a subdomain to build the new site which I want to block from the search engines.

So what is a bit confusing is – at what level do you set the password protect?

Should it be at the /public_html/abcdirectory/ which is the document root?

Also, how do you test to see that the password is actually working? I set the password as above and then was immediately able to login the the WP dashboard without having to enter a username and password….

Am I missing something?

Appreciate your help..

Regards

Greg
1. InMotionFan says:
  
  November 17, 2014 at 8:15 pm
  
  Hello Greg,
  
  If you have the WordPress site in a subfolder, say like example.com/test Then you would set the password at the folder level for ‘test’. This way no one would see the site while you were developing. You may be interested in our articles on password protecting a folder within the cPanel. You can also ask your questions about passwords on that article since it is relevant.
  
  As for checking for to see if it is working, use a browser in incognito mode so it appears to be a new visitor. You should see it ask for username and password then. Once you have logged in with a browser in normal mode, it remembers you for a time.
  
  Kindest Regards,
  Scott M
Greg says:

November 17, 2014 at 12:48 pm

Hi there,

We are faced with a situation where we have to rebuild and replace a client’s existing website with a new site. Going from static html to WordPress…

What is the best way to completely block the new site while in development?

Should we use a password protect method?

Regards

greg
1. InMotionFan says:
  
  November 17, 2014 at 5:43 pm
  
  Hello Greg,
  
  Thank you for your question. You can easily block access to your new site by using the Password Protect tool in cPanel.
  
  That tool adds the .htaccess rules for you.
  
  If you have any further questions, feel free to post them below.
  
  Thank you,
  John-Paul
GaryS says:

October 17, 2014 at 4:30 pm

How do I stop robots with an IP range that they are coming from with the robot.txt
1. InMotionFan says:
  
  October 17, 2014 at 4:35 pm
  
  Nearly all bots that are not reputable search engines will completely ignore the robots.txt file and continue to crawl. Your best solution would be to block the IP range using .htaccess.
Blaine P Johnston says:

October 7, 2014 at 9:44 pm

Thanks for the quick reply. I’ll give that a shot.
Fred says:

October 7, 2014 at 9:21 pm
Google is including my shopping cart pages in its searches. They are not in a folder that I can block like
```
User-agent: *
Disallow: /cgi-bin/
```
Is there a way to block files that all begin with:

/addtocart.sc?productld=13&quantity=1

/addtocart.sc?productld=14&quantity=1

/addtocart.sc?productld=23&quantity=1

etc.?

Thank you
1. InMotionFan says:
  
  October 7, 2014 at 9:40 pm
  
  To do so, you could do something like this:
  
  User-agent: *
  Disallow: /addtocart.sc
Ankit says:

October 3, 2014 at 7:57 am

Thanks Scott M 🙂

Have a great day!!
Ankit says:

October 1, 2014 at 10:31 am

Guys, I am having more problems realated to the SEO like: my website is made up in asp.net with 3.5 framework and i want a solution of www, home.aspx 301 redirection problem that what code an exactly for my website should be (www.rasavgems.com) and in which file i should be used it please explain it in details with steps.

Thanks

Ankit
1. InMotionFan says:
  
  October 1, 2014 at 3:56 pm
  
  Hello Ankit,
  
  It is find if you do not want a search engine to crawl your site. If it does not, however, it means those pages may not get updated in the search engine or even show at all. If you wish, allow your favorite search engines to crawl your site at a reasonable delay if you want to show up in them. You can certainly set the file to block the others.
  
  Kindest Regards,
  Scott M
2. InMotionFan says:
  
  October 1, 2014 at 3:54 pm
  
  Hello Ankit,
  
  I am not sure exactly what it is you are asking. Please try to be a bit more detailed and give us some steps if you can. Also, as this does not seem to be related to the robots.txt file, please reply with a new question.
  
  Kindest Regards,
  Scott M

Hey Johnpaulb

i used following kind of the methods :

# robots.txt generated for google
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: / 


# robots.txt generated for yahoo
User-agent: Slurp
Disallow: /
User-agent: *
Disallow: /


# robots.txt generated for Msn
User-agent: MSNBot
Disallow: /
User-agent: *
Disallow: /


# robots.txt generated for ask
User-agent: Teoma
Disallow: /
User-agent: *
Disallow: /


# robots.txt generated for bingbot
User-agent: bingbot
Disallow: /
User-agent: *
Disallow: /


please suggest me that , is it okay for my site to stop the search engine for crawling my site. i uploaded a robots.txt file with using such above methods togather in one robots.txt file.

Ankit says:

September 30, 2014 at 12:34 pm

Hello guys , i want to stop search engine to crawling my site from yahoo,google and bing. how it will be done?
1. InMotionFan says:
  
  September 30, 2014 at 4:09 pm
  
  Hello Ankit,
  
  This article above is about just that. You can Disallow all search engines from crawling website, or just block the specific user-agents for yahoo, google, and bing (the user agents are listed above).
  
  Are you having trouble with a specific step?
  
  Thank you,
  John-Paul
Jay says:

August 29, 2014 at 7:14 pm

Thanks Scott for a great tip!
Jay says:

August 29, 2014 at 4:49 pm

Hello!

I am currently developing a larger website and while it is still in development I’d prefer that search engines do not crawl through it, that is until I am finished. This way I can post the site so that multiple developers can code and test without the world knowing the site exists on google and such. It seems to me that the code above would do that, am I correct in my acessment?

Thanks,

Jay
1. InMotionFan says:
  
  August 29, 2014 at 6:30 pm
  
  Hello Jay,
  
  Unfortunately, most search engines, including Bing and Google are paying less attention to the robots.txt file. The best way to prevent anyone else from seeing the site, or having the Search Engines index it until you are ready is to password protect the site via the cPanel.
  
  Kindest Regards,
  Scott M
n/a says:

August 11, 2014 at 9:53 pm

buena informacion gracias

great inf thanks
Andy Turner says:

July 26, 2014 at 10:06 am

With regards to the crawl delay, so do i understand this correctly, if you introduce a longer delay for a bot to crawl your site, it doesn’t reduce the cpu load, merely spreads it out over a longer period ?
1. InMotionFan says:
  
  July 26, 2014 at 5:48 pm
  
  Hello Andy,
  
  Yes you understand the crawl delay for robots correctly, it just causes the robot’s requests to be spread out over a longer time period. But much like a highway dealing with traffic jams, high amounts of usage during short intervals of time can cause back ups and delays, but if the usage is spread out over the course of a day it’s not as noticeable on the highway or server and that’s typically what you’re trying to achieve with a crawl delay.
  
  Please let us know if you had any further questions at all.
  
  – Jacob
Mark S says:

July 24, 2014 at 5:35 pm

I would like to disallow semalt and semalt-semalt crawlers from wreaking havoc on my bounce rate. If I use the code to disallow one particular search engine, do I need to write this code twice? Once for each individual crawler? Or maybe a comma between them? Thank you
1. InMotionFan says:
  
  July 24, 2014 at 5:54 pm
  Hello Mark,
  
  Thank you for your question. It seems to be a common problem, judging by the amount of search results.
  
  I found the following solution via online search, where it is blocked by referrer:
```
# block visitors referred from semalt.com
RewriteEngine on
RewriteCond %{HTTP_REFERER} semalt\.com [NC]
RewriteRule .* - [F]
```
  If you have any further questions, feel free to post them below.
  Thank you,
  
  -John-Paul
Kingsley O. says:

July 18, 2014 at 1:28 pm

Thanks for a detailed explanation on this all important topic. God bless you.
Kim Huff says:

June 16, 2014 at 7:13 pm

I have looked for info about robot.txt on the web numerous times and this is the only one that made sense. thank you so much!!!
sem says:

June 2, 2014 at 12:29 pm

I have two websites pointed to same folder. How can I disallow one website.
1. InMotionFan says:
  
  June 2, 2014 at 3:32 pm
  
  As the robots.txt file only determines what files are able to be accessed, unfortunately you would not be able to block a specific domain if it uses the same files as another site that you do want to be accessed.
Abhi says:

April 18, 2014 at 1:46 am

That idea of blocking search engines worked perfect on my site.

Thanks for the precise example you have in lower half.

Abhi

Comments are closed.

How to Control search engine crawlers with a robots.txt file

Edit or create robots.txt file

Search engine User-agents

Search engine crawler access via robots.txt file

166 thoughts on “How to Stop Search Engines from Crawling your Website”

Need More Help?