Skip to main content
← Back to Blog
general

The 4 Technical SEO Files Every Developer Should Have: robots.txt, sitemap.xml, llms.txt, and security.txt

· 10 min read

When we build websites, we spend enormous effort on the human-readable layer. Typography, layout, color systems, interactive components. All of it designed for the person sitting in front of a screen. But there’s another audience for your website that most developers underserve: machines.

Search engine crawlers, AI agents, security researchers, and automated tools all visit your site constantly. They don’t care about your font choices. They care about small, plaintext files sitting at well-known paths in your root directory. These files are, effectively, the user manual for the machine-readable web. They tell bots what to crawl, where to find content, how to contact you about vulnerabilities, and (increasingly) how to understand your site’s purpose.

Think of these files as your site’s public API for non-human visitors. Just like a well-documented API makes integration easy, these files make your site legible and navigable for the bots that determine your search rankings, security posture, and (soon) your presence in AI-generated answers.

We’re going to walk through the four files every developer should have in place: robots.txt, sitemap.xml, security.txt, and a newer, more speculative one called llms.txt. Three of these are established standards with broad adoption. The fourth is a forward-looking idea that’s worth understanding even if it hasn’t reached that status yet.

Let’s get into it.

File 1: robots.txt — The Bouncer at the Door

The robots.txt file lives at yoursite.com/robots.txt and has one job: tell web crawlers what they’re allowed (or not allowed) to access. It follows the Robots Exclusion Protocol, which has been a foundational part of how the web works since the mid-1990s.

Every major search engine respects it. If you don’t have one, crawlers will assume everything is fair game. If you have a misconfigured one, you can accidentally block search engines from indexing your most important pages. We’ve seen entire sites disappear from search results because of a single misplaced Disallow directive.

What a basic robots.txt looks like

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /staging/
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

A few key things to know:

  • User-agent: * applies rules to all crawlers. You can also target specific bots by name (like Googlebot or Bingbot).
  • Disallow blocks paths. Allow explicitly permits them (useful for overriding broader disallow rules).
  • Always include your sitemap URL at the bottom. It’s one of the simplest ways to connect your two most important technical SEO files.
  • robots.txt is a suggestion, not enforcement. Well-behaved bots follow it. Malicious scrapers won’t. Don’t treat it as a security mechanism.

Common mistakes

The biggest one: using Disallow: / without realizing it blocks your entire site. This happens more often than you’d think, especially when staging sites go live with their test robots.txt still in place.

Another frequent problem is blocking CSS and JavaScript files. Modern search engines render pages to evaluate content, and if they can’t load your stylesheets or scripts, they may not index your pages correctly.

Our recommendation: review your robots.txt every time you deploy a major site change. Treat it like a config file, because that’s exactly what it is.

File 2: sitemap.xml — The Map for Crawlers

If robots.txt tells bots where they can’t go, sitemap.xml tells them where they should go. It’s an XML file (usually at yoursite.com/sitemap.xml) that lists all the URLs you want search engines to discover, along with optional metadata like when each page was last modified and how frequently it changes.

You don’t technically need a sitemap for search engines to find your pages. If your internal linking is solid, crawlers will discover most of your content through link-following alone. But sitemaps make discovery faster, more reliable, and more comprehensive. For large sites, sites with deep page hierarchies, or sites with content that isn’t well-linked internally, a sitemap is practically mandatory.

What a basic sitemap.xml looks like

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/</loc>
    <lastmod>2026-04-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://yoursite.com/blog/technical-seo-files</loc>
    <lastmod>2026-04-19</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Best practices

  • Keep it current. A stale sitemap with broken URLs or outdated lastmod dates is worse than no sitemap. Automate generation if possible.
  • Only include canonical URLs. Don’t list duplicate pages, paginated variations, or URLs that return non-200 status codes.
  • Use sitemap index files for large sites. Individual sitemaps have a common upper limit on URL count. If your site has more pages than that allows, split them into multiple sitemaps and reference them from a sitemap index file.
  • Submit it to search engines. Reference it in your robots.txt (as shown above) and submit it directly through Google Search Console and Bing Webmaster Tools.

One thing developers frequently overlook: dynamic content. If your site has programmatically generated pages (product listings, user profiles, etc.), make sure your sitemap generation process captures them. A sitemap that only lists your static marketing pages while ignoring thousands of dynamic pages is leaving a lot of discoverability on the table.

The changefreq and priority debate

Some SEO practitioners argue that changefreq and priority are largely ignored by modern search engines, and there’s truth to that. Google has publicly stated it treats these as hints at best. That said, they don’t hurt anything, and lastmod in particular remains genuinely useful for signaling fresh content.

Our take: include lastmod religiously, include the other fields if your generator supports them, and don’t lose sleep over it.

File 3: security.txt — The Contact Page for Security Researchers

Here’s a scenario that happens all the time: a security researcher finds a vulnerability on your site and wants to report it responsibly. They look for a security contact. There’s no obvious email. The “Contact Us” form goes to a sales queue. They try security@yoursite.com and it bounces.

security.txt solves this. It’s a plaintext file placed at yoursite.com/.well-known/security.txt that tells security researchers exactly how to report vulnerabilities. The format is defined by RFC 9116, which was published as an informational RFC by the Internet Engineering Task Force (IETF). It’s a real, documented standard with backing from the broader security community.

What a basic security.txt looks like

Contact: mailto:security@yoursite.com
Contact: https://yoursite.com/security/report
Expires: 2027-04-19T00:00:00.000Z
Preferred-Languages: en
Policy: https://yoursite.com/security-policy

Key fields

  • Contact (required): How to reach you. Can be an email, a URL, or a phone number. Multiple contacts are fine.
  • Expires (required): When this file should be considered stale. This forces you to review and update it periodically, which is a good habit.
  • Policy: A link to your vulnerability disclosure policy. If you have one, include it. If you don’t, creating one is worth the effort.
  • Preferred-Languages: What languages your security team can work in.
  • Encryption: A link to your PGP key if you want researchers to encrypt their reports.

Why this matters beyond security

Having a security.txt file signals organizational maturity. It tells researchers (and anyone else who checks) that you take security seriously enough to have a process. Many organizations reportedly now look for this file as part of vendor security assessments. It’s a small thing that punches well above its weight.

Plus, it’s trivially easy to implement. There’s no reason not to have one.

A Future-Facing Idea: The llms.txt Proposal

Now we need to talk about something different from the three files above. While robots.txt, sitemap.xml, and security.txt are established conventions with broad adoption and (in the case of security.txt) formal standardization, llms.txt is something else entirely: a speculative, niche proposal that hasn’t achieved anything close to that status.

We’re including it here because the idea behind it is interesting and relevant, not because it’s an established standard. Treat this section accordingly.

What’s the idea?

The concept is straightforward. As large language models (LLMs) become a more prominent way people discover and interact with web content, there’s no standardized way for a website to communicate its structure and purpose to these models. robots.txt tells crawlers where they can go. A proposed llms.txt file would tell AI systems what your site is and what content matters most, in a format optimized for language model consumption.

A hypothetical llms.txt file might sit at your site’s root and contain a structured, markdown-like summary of your site’s purpose, key content areas, and important URLs, all written in natural language rather than XML or directive syntax.

What it might look like

# YourSite

> A developer tools company focused on API testing and monitoring.

## Key Pages

- [Documentation](https://yoursite.com/docs): Complete API reference and guides
- [Blog](https://yoursite.com/blog): Technical articles on API design and testing
- [Pricing](https://yoursite.com/pricing): Plans and features comparison

## Topics We Cover

- API testing best practices
- REST and GraphQL monitoring
- Developer workflow automation

Why we’re watching this

The core insight is sound: LLMs process and understand information differently than traditional search crawlers, and as AI-powered search and answer engines grow in prominence, having a machine-friendly summary of your site could become valuable. The gap between what robots.txt and sitemap.xml communicate (URL structure and access rules) and what an LLM needs to understand (semantic meaning and content relationships) is real.

That said, we want to be direct about the current reality. As of this writing, llms.txt does not have a formal specification from any standards body. There’s no broad industry adoption. No major search engine or AI platform has announced support for it. It exists primarily as a concept discussed in certain developer and SEO circles.

If a convention like this were to gain traction and receive backing from major AI platforms, early adopters could potentially have an edge in how their content surfaces in AI-generated answers. But that’s a hypothetical, not a guarantee.

Our honest take

We think the problem llms.txt tries to solve is legitimate and will need a solution eventually. Whether the solution looks like the current proposal, gets absorbed into existing standards like robots.txt or structured data markup, or takes an entirely different form, we genuinely don’t know. If you want to experiment with creating one for your site, go for it. The effort is minimal. Just don’t prioritize it over the three proven files above, and don’t expect it to move any needles today.

Putting It All Together: Your Implementation Checklist

Here’s what we recommend for every site, in priority order:

Do these immediately:

  1. Audit your robots.txt. Make sure it’s not accidentally blocking important content. Confirm it references your sitemap.
  2. Generate and submit a sitemap.xml. If you’re using a CMS or static site generator, there’s almost certainly a plugin or built-in feature for this. Automate it so it stays current.
  3. Create a security.txt file. It takes five minutes. Put it at /.well-known/security.txt. Set an expiration date and a real contact address.

Consider for later:

  1. Experiment with an llms.txt file if you’re curious about how AI systems interact with your content. Keep expectations realistic and revisit as the landscape evolves.

Beyond just creating these files, build them into your deployment process. Your CI/CD pipeline should validate that robots.txt isn’t blocking your site. Your build step should regenerate your sitemap. Your security.txt expiration date should trigger a reminder before it lapses.

Common Challenges

The biggest challenge isn’t creating these files. It’s maintaining them.

Configuration drift is the silent killer. Your site evolves, new sections get added, old ones get restructured, but these root-level files stay frozen from the day they were first created. A sitemap.xml that hasn’t been updated in months is actively misleading search engines. A security.txt with an expired date or a dead email address defeats its own purpose.

Inconsistency across environments is another trap. Your staging site should have a restrictive robots.txt that blocks all crawlers. Your production site should not. We’ve seen more than one team accidentally push staging rules to production and wonder why their traffic cratered.

Ownership ambiguity also causes problems. Is robots.txt the SEO team’s responsibility? The dev team’s? DevOps? When nobody owns it, nobody maintains it. Assign clear ownership.

Future Outlook

The machine-readable layer of the web is only going to get more important. Search engines are getting more sophisticated in how they crawl and render content. AI systems are becoming a primary way people discover information. Security threats continue to grow in scale and complexity.

These four files (three established, one speculative) represent the current state of how websites communicate with machines. We expect this surface area to expand. More structured ways for sites to describe their content semantically. More standardized methods for security communication. Possibly new conventions for AI interaction that we can’t predict today.

The developers who treat these files as first-class parts of their site architecture, not afterthoughts, will be better positioned regardless of how the landscape shifts. It’s a small investment with compounding returns.

Start with what’s proven. Keep an eye on what’s emerging. And for the love of good engineering, don’t ship a Disallow: / to production.