Skip to content
InMotion Hosting Logo InMotion Hosting Home
Now Open: InMotion Agency Partner Program. Founding spots available. Apply Today
  • Resources
    Learn
    Compare InMotion Hosting
    Resource Center
    Blog
    Support Center
    Join Us
    Careers
    Affiliates Program
    Agency Program
    Refer Friends
  • Contact Us
    +1 757 416 6575
    +44 2045 763722
    Chat with Sales Chat with Sales
    Get Support Get Support
    Contact Us
    System Status
  • Support Center
  • Login
  • Live Chat
  • AMP Login
  • Support Center
  • 0
Main Menu
    InMotion Hosting Logo InMotion Hosting Home
  • VPS Hosting
  • Dedicated Servers
    Managed Dedicated Servers Protect Your Business with Secure,
    Scalable Infrastructure
    Bare Metal Servers Custom Bare Metal Solutions for Your
    Unique Workloads
    High Capacity Servers Our Most Powerful Servers Optimized for
    High-Demand Workloads
    Eco-Friendly Dedicated Servers Sustainable Servers at Competitive
    Prices
  • WordPress
    Shared Hosting for WordPress Classic Hosting with Email and cPanel Included
    UltraStack ONE for WordPress Superior Speed for Mission-Critical Sites
    VPS Hosting for WordPress Optimized Servers with Flexible Control
    View All WordPress Solutions
  • Products
    Hosting for Any-Size Websites
    Shared Hosting Start Strong With Reliable Hosting, Real Support, and a Platform That Scales With You
    Hosting for WordPress Optimized Hosting for WordPress with Email and cPanel Included
    VPS Hosting Flexible, High-Performance Hosting With Full Control
    Dedicated Server Hosting Protect Your Business with Secure, Scalable Infrastructure
    Reseller Hosting White-label cPanel/WHM for Resellers
    Hosting Add-Ons Upgrade Your Hosting with Tools for Security, Email, and more
    View All Web Hosting
  • Partners NEW
    Agency Partner Program
    Agency Partner Directory
    Agency Hosting Solutions
  • Services
    Managed Hosting
    Server Management Strategic SysAdmin Support & Custom Solutions
    Web Design Services
    Custom Website Design Stand Out with a Custom Design for Your Brand
    QuickSite Quality Websites Designed from Premium Templates
    Website Rebuild Modernize Your Legacy Site for Performance and Growth
    View All Website Services
    Maintenance Services
    Maintenance Plans Keep Your WordPress Site Running Smoothly
    SEO Services Get Found, Grow Traffic, Rank Better
    Speed Optimization Improve Load Time and Performance
    Hacked Site Repair Expert Malware Removal and Website Restoration
  • 0 Cart
    • $ USD
    • € EUR
  • Start a Live Chat
  • Get Support
  • +1 757 416 6575
  • +44 2045 763722
  • +1 757 416 6575
    +44 2045 763722
  • Support Center
  • 0 Cart
  • Login
Tools

AI SEO – Robots.txt, Markdown, and How AI Providers are Crawling Your Sites

Explore how InMotion Hosting’s new AI SEO Helper helps websites stay visible in evolving AI-driven search patterns. Learn how to prepare your site for LLM crawlers and future-proof your SEO strategy.

Written by:
Todd Robinson •
Menu
  • Resource Center
  • Case Studies
  • Downloads
    • eBooks
    • Infographics
  • Ultimate Guides
  • Videos
  • Tools
  • Sales Chat

Please note: this article documents a vision of a product and a standard we see emerging in the market. It is intended to help both customers and ourselves understand how to respond to and leverage the power of new AI systems and evolving search patterns. It’s a work in progress! With that, our announcement.

We are launching a new service to help our customers and other professional website managers navigate the changes brought on by AI providers increasingly handling search queries. We use a process ourselves that we want to share to help ensure your site is AI-ready. For now, we’re calling it the InMotion AI SEO Helper.

In this post, I will refer to both our website and a set of anonymized websites. As a hosting company, we can see aggregate patterns across many sites and those patterns closely match what is happening on the inmotionhosting.com website.

You will be able to use a partial version of the AI SEO Helper right from our website at inmotionhosting.com/services/ai-seo-helper to get an idea of how it works. If you need more than what that provides, you will need to sign up, for free, to use the full AI SEO Helper. Please note that in times of resource contention, our customers have first priority in the system.

The tool will check your website and will (current plan) do the following at Version 2. Version 1 will have a subset, of course:

  • Ensure the site has a robots.txt file and identify what is missing
  • Ensure the site has a sitemap.xml and identify what is missing
  • Check for the presence of .md files
  • Check whether the site includes a llms.txt file* (see note below about the caveat here)
  • Verify that the site is not unintentionally blocking LLM crawlers

As mentioned above, the tool identifies what may be missing. At this point, it is not 100% known what needs to be done as it is an evolving standard.

Our view of “what should be done” to help crawlers for the AI tools are based on our ongoing experience. We’ll link to supporting resources as they’re published, so pardon the lack of links for now.

 

Crawling, Training, Searching – Plus New Sales

Let’s start with this: sales are already coming in from these new search patterns. People are going to their favorite AI chatbot, doing research with the intent to purchase, and coming to our sites to complete the purchase. This is a fact that I have personally seen myself. The pattern is not exactly understood yet and it is also not clear how much of that purchase flow will shift from Google searches to ChatGPT and similar.

The information below outlines what we’re seeing. I am not talking about if websites, papers, books, etc. should be used to train the LLMs without the LLMs giving attribution on what it was trained on. I do have my views on it that I will publish another time as that is a legitimate concern. For this discussion, I am talking about websites that already specifically have accepted Google and its peers will crawl and ingest their information for purposes of sending visitors to their site for monetary gain.

Crawling of sites is happening now by many “AI companies”. Several major players, including OpenAI and Anthropic, have provided guidance on how they respect robots.txt and what their User-Agent will present as to your web server. We’ve observed this activity in server logs.

What is not clear is if there will be a different pattern between crawls for inclusion in Training data sets versus crawls due to “right now” information needs. The “right now” information needs are defined as:

  • Parallel Page Crawls – when a user of Anthropic or ChatGPT asks for said service, like Deep Research, to perform searches, the process includes parallel visiting of many pages for the LLM to then evaluate.
  • Recent Data Needed – when a user is seeking information that is not likely to be current in the LLMs working data set, the LLM will check websites on the fly to collect recent information.
  • Specific Request – when a user specifically asks for certain information like a webpage or video to be ingested by the LLM and summarized for usage.
  • Other reasons

“Right now” crawls are happening with a certain level of urgency that manifests itself in rapid parallel page requests to your website. We may wish these services would meter their requests more, but realistically they are trying to meet a user experience goal and speeding up the data collection process is an easy way to help do this.

Either way, when a page is crawled the main purpose is to ingest that page and convert it to a machine ready format. At its most simple, it is converted to Markdown. Markdown is a text based representation of the content of the page, including a text representation of tables and images. There are several popular systems that do this but each crawling tool does it a bit differently though. The open source ones are available for us to evaluate. Ones behind the scenes at services are less obvious, but we expect them to be using one of the popular libraries.

In addition to single page crawls we see crawlers are designed to read the sitemap.xml file. From that, it can then crawl each URL and produce its Markdown file to match. That is typically just a .md file for each one of the crawled pages.

For example, let’s take a page called “about-us”. This could be a static page or a page created by a web app or created server side like WordPress. It has been rendered in the browser though. This page is rich in graphics, colors, layout, images, etc. for a person to read and absorb. For the most common use cases, LLMs need this rich content translated to Markdown for it to absorb easily.

For our system, it will be producing some of these below as public facing URLs with the following likely file structure:

  • /inmotion-ai-helper/openai/directory/about-us.md
  • /inmotion-ai-helper/claude/directory/about-us.md
  • /inmotion-ai-helper/gemini/directory/about-us.md
  • /inmotion-ai-helper/opencrawl/directory/about-us.md
  • /inmotion-ai-helper/crawl4ai/directory/about-us.md
  • /inmotion-ai-helper/docling/directory/about-us.md

As you can see, there are several crawlers out there that are popular. We will cover a few of these in future technical evaluation videos and posts as we go along in our evaluations. The main point though is our plan is to use the individual crawlers to produce a .md specific to it. Then that crawler can simply read that .md file. That will make it much, much faster and will stop each company using this crawler to have to process the same page to the .md file.

On our side, we will watch for major updates of the crawlers and can trigger updates to the .md files occasionally. We are thinking about how often this could be or even if we can let the crawler itself trigger a fresh update of the .md files using some simple API call to our service.

Of note, we will also be working with the crawler providers themselves to see what might help them out.

 

LLMs.txt vs Robots.txt

A bit back the concept of having guidance specific for LLMs be loaded into a new llms.txt file similar to the robots.txt file. The debate now is if a specific file is the right choice. Crawlers are robots and the well written ones already respect the robots.txt. The idea of an llms.txt made sense to me the first time I read about it but after thinking about the issue, it does feel like it is either solved already by the robots.txt or should be solved with some minor additions to the robots.txt.

Here are some examples from our llms.txt on the inmotionhosting.com site. I will stay out of the argument at the moment and let the usage pattern help us. Currently, the amount of access to that file is not really measurable compared to site traffic and robots.txt requests. So currently, let’s call it “not a thing” but we will keep watching it. The idea is right though so hopefully crawlers start respecting one or the other.

Example of InMotion Hosting's LLMs.txt file

 

Intentional or Accidental Blocking of Crawlers

It is important to know if your website is crawlable or not. If you want to block crawlers, this isn’t the post for that. You can check out this page for possible methods, but it is not really possible in the end to cut off access to public content.

For this post, we are focusing on knowing if your pages are crawlable because you want your content in the major LLMs during Training and during “Right Now” lookups. For me, a quick spot check this by just going into my top four AI chatbots and asking it to access a page on our site. If it can’t, we have a problem.

Cloudflare is also trying a few things that I am concerned about. I’ll post more about this and ways to test crawlability.

 

Next Steps and Open Questions

This space is rapidly evolving, and we’re taking an interactive approach. Here are a few questions we’re still working through:

  • Which Markdown outputs should we support?
  • How much of this is already done by the big AI bots? It is likely they are caching the Markdown already for popular sites. Definitely the tools are currently doing site crawls on demand, so for now it matters.
  • Should we think about whether this content should just be hosted by us? ai-helper-cdn.inmotionhosting.com/sitename/openai/directory/filename.md
  • llms.txt – we are tracking this and will include it for now. Later we can either double down or deprecate it if the crawlers stick with the robots.txt
  • When a customer publishes new pages to their site, how often should we audit that and update the .md and .xml files?
  • Should we integrate with a Git-based workflow to make this easier?
  • How can we best support WordPress users? Should this integrate with our Total Cache plugin?

We have a lot to work through, but we wanted to share our direction and raise awareness: sales are already coming in from these tools. They are important already and there will be increased importance for years to come.

Tools

AI Tools

SEO

AI SEO – Robots.txt, Markdown, and How AI Providers are Crawling Your Sites

Explore how InMotion Hosting’s new AI SEO Helper helps websites stay visible in evolving AI-driven search patterns. Learn how to prepare your site for LLM crawlers and future-proof your SEO strategy.

Read More

Ultimate Guides

SEO

Guide To Timing & Executing A Large-Scale Site Migration

Planning a migration? Discover these expert tips to ensure a seamless site move while protecting your SEO and performance.

Read More

Ultimate Guides

SEO

How To Prioritize Website Performance for SEO

Learn how to migrate your website effectively for better ranking. Discover hosting options that enhance your SEO performance.

Read More

Additional Guides & Tools

Blog

Stay updated with the latest web hosting news, tips, and trends. Explore our expert articles to enhance your online presence and keep your website performing at its best.

Explore Our Blog

Support Center

Get 24/7 assistance from our dedicated support team. Access a wealth of resources, tutorials, and guides to solve any hosting issues quickly and efficiently.

Visit Our Support Center

Managed Hosting

Experience high-performance, secure, and reliable managed hosting solutions. Let our experts handle the technical details while you focus on growing your business.

Learn About Managed Hosting

Subscribe to get our latest website & hosting content right in your inbox:

Launching Your Website Is Easier Than You Think

Explore Hosting

InMotion Hosting Logo

InMotionHosting.com provides web hosting, cloud-based solutions and managed services to businesses and entrepreneurs across the globe.

Follow Us

  • Español
  • Nederlands
  • Deutsch
  • Italiano
  • Français
  • 中文 (简体)
  • 中文 (繁體)
  • Türkçe
  • Ελληνικά
  • हिंदी
  • Українська
  • Português (Brasil)
  • Português
  • Polski
  • Русский
  • $ USD
  • € EUR
Web Hosting
  • Shared Hosting
  • Hosting for WordPress
  • Managed Hosting for WordPress
  • UltraStack ONE for WordPress
  • VPS Hosting
  • Cloud VPS
  • Dedicated Server Hosting
  • Bare Metal Servers
  • Enterprise Hosting Solutions
  • OpenMetal Cloud IaaS
  • Reseller Hosting
  • Reseller VPS
  • Minecraft Server Hosting
  • eCommerce Hosting
  • RamNode Cloud
  • InMotion Cloud
  • Pricing
Hosting Tools
  • WordPress
  • WooCommerce Hosting
  • Drupal Hosting
  • Joomla Hosting
  • cPanel Hosting
  • PHP Hosting
  • Magento Hosting
  • PrestaShop Hosting
  • Laravel Hosting
  • Ubuntu Hosting
  • Linux Hosting
  • WebPro Dashboard
  • WordPress Website Builder
  • Domain Names
  • Professional Email
Support
  • Live Chat
  • +1 757 416 6575
  • +44 2045 763722
  • Support Center
  • Resources
  • Community Support
  • WordPress Tutorials
  • RamNode Deployment Guide
  • InMotion Solutions
  • Managed Hosting
  • Website Migrations
  • Data Center Locations
  • Los Angeles Data Center
  • Ashburn Data Center
  • Amsterdam Data Center
About Us
  • Contact Us
  • About Us
  • Blog
  • News
  • Careers
  • Affiliate Program
  • Refer a Friend
  • Student Web Hosting
  • Sitemap
  • Cookies Settings
  • Accessibility (ADA) Settings

Copyright © 2002-2026 InMotion Hosting, Inc. All Rights Reserved. InMotion Hosting® is a registered trademark of InMotion Hosting, Inc.

Terms of Service | Privacy Policy | DPA | Accessibility Statement | Legal Inquiries
Do Not Sell My Personal Information | Limit Use of My Sensitive Personal Information

By continuing to visit any webpage within this website, each visitor agrees to the use of cookies and tracking technologies, and further agrees to abide by our Universal Terms of Service, Privacy Policy, Cookie Policy, and any other terms and policies posted on this website.