Your framework generates a sitemap every time you build. Next.js writes one from your route table. Astro emits one from your page files. Nuxt, SvelteKit, Remix, WordPress, Ghost — they all produce a sitemap that declares, in machine-readable XML, every page your application says it has. Your test suite does not read it.
A sitemap is a structured declaration of every URL an application intends to be publicly accessible. When auto-generated by a framework, it is the closest thing to a single source of truth for the application’s intended surface area.
This is not an SEO article. This is about the denominator problem in test coverage.
The sitemap contract
The Sitemaps protocol defines a file format for listing URLs that a site makes available to crawlers. Every major web framework now generates sitemaps automatically or with minimal configuration.
Next.js supports sitemaps natively through a sitemap.ts file convention in the App Router. The function returns an array of URL objects, and Next.js renders it as XML. For larger applications, generateSitemaps splits the output into multiple indexed files. The popular next-sitemap package automates generation for static, dynamic, and server-rendered routes, producing index sitemaps that reference child sitemap files.
Astro generates sitemaps via the official @astrojs/sitemap integration, which reads the route table and emits XML at build time. Nuxt uses the @nuxtjs/sitemap module. SvelteKit exposes a +server.ts pattern for programmatic generation. WordPress has shipped core sitemap generation since version 5.5. Ghost generates sitemaps automatically from published content.
In every case, the framework is making a statement: “These are the pages I have.” The sitemap is not aspirational. It is declarative. When auto-generated from the route table or content database, it reflects what the build actually produced.
Why nobody uses sitemaps for testing
Sitemaps are treated as an SEO artefact — something you submit to Google Search Console, not something you feed to your CI pipeline.
Testing tools start from one of two places: authored test scripts or autonomous discovery. Script-based tools like Playwright and Cypress execute the tests a human wrote. Autonomous tools like crawlers discover pages by following links from a starting URL. Neither consults the sitemap.
This creates a circular coverage problem. Discovery-based tools measure what they find, not what they miss. If a page has no inbound link, the crawler never reaches it, never reports it, and the absence is invisible. Script-based tools measure what the author remembered to write, not what they forgot. If nobody wrote a test for /settings/billing/, the route is untested and the gap is silent.
A handful of tools use sitemaps as inputs, but they solve a narrower problem. Siteprobe fetches every sitemap URL and checks for HTTP 200 responses and performance metrics. Cypress can be configured to iterate over sitemap URLs and verify each returns a valid page. These are health checks — “does the page exist?” — not coverage checks. They confirm that sitemap URLs are live. They do not answer whether those URLs are reachable through the application’s navigation.
No tool in the current testing landscape uses the sitemap as a coverage denominator: the complete list of pages that should be tested, against which the test suite’s actual coverage is measured.
The denominator problem
Every coverage metric needs a denominator. The sitemap provides one that no other source can.
Code coverage uses “lines of code” as its denominator. Test coverage uses “authored test cases.” Both denominators are internally defined — they measure the test suite against itself. A suite with 200 tests and 100% pass rate tells you nothing about whether those 200 tests cover the full application. The denominator is the test suite’s own scope, which is whatever someone decided to write.
Navigation coverage needs a different denominator: “pages the application declares it has.” The sitemap provides exactly this. The numerator is “pages a user can reach by clicking links from a starting point.” The ratio between the two is the coverage metric.
This framing reveals three distinct gap categories.
Sitemap pages unreachable by navigation are features your users cannot find. The page exists. The sitemap tells Google about it. But no link in your navigation, sidebar, footer, or content body points to it. A user would need to type the URL directly to reach it. This is the canonical “shipped but unreachable” failure that navigation coverage is designed to detect.
Navigation-discovered pages absent from the sitemap are orphan routes your sitemap generator does not know about. The user can reach them by clicking, but search engines cannot find them through the sitemap. These often indicate stale routes from previous features, debug pages left in production, or dynamically generated pages that the sitemap configuration excludes.
Sitemap pages that return non-200 responses are stale entries telling search engines you have pages you do not. These are the pages that health-checking tools like Siteprobe already detect. They represent the most basic failure: the sitemap promises a page, and the server cannot deliver it.
How to run the audit yourself
The conceptual approach requires three steps and a Saturday morning.
Step 1: Fetch your sitemap. Download sitemap.xml from your production URL. Parse the XML to extract every <loc> entry. If your sitemap uses an index structure (common with next-sitemap and larger applications), follow each referenced sitemap file and collect all URLs.
Step 2: Crawl your site from the homepage. Starting at your root URL, follow every internal link. Record each URL you reach. Continue recursively until no new URLs are discovered. This produces the set of pages reachable through actual navigation — the user’s view of your application.
Step 3: Diff the two lists. URLs in the sitemap but not in the crawl are unreachable pages. URLs in the crawl but not in the sitemap are orphan routes. The ratio of crawl-discovered sitemap URLs to total sitemap URLs is your Navigation Coverage.
The diff is where the insight lives. A sitemap with 200 URLs and a crawl that discovers 160 of them means 40 pages are declared but unreachable. Those 40 pages are the features your users cannot find, the translations nobody can navigate to, the settings pages hidden behind a broken conditional render.
What this means for CI
A pre-deploy gate that fetches the staging sitemap, crawls the staging site, and fails the build if coverage drops below a threshold would catch an entire class of deployment failure that no existing CI check addresses.
The sitemap is already there. Every major framework generates it as part of the build. The crawl is automatable — headless browsers are a solved problem. The gap between the two is the signal.
The gate does not need to demand 100% coverage. Complex applications legitimately have pages that are conditionally rendered, role-gated, or available only after specific user actions. The threshold is configurable. What matters is that the metric exists, that it is measured on every build, and that regressions are visible.
When we shipped our own i18n deployment with 5.6% Navigation Coverage, a sitemap-anchored gate would have blocked the release immediately. The sitemap declared 170 locale pages. The crawl would have discovered 10. The gate would have failed. Four deploys and thirty hours of debugging would have been compressed to one conversation about why the numbers did not match.
The sitemap is the ground truth. The crawl is the test. The gap is the insight.
Glia Quest automates this analysis on every scan. Test your site at glia.quest.
Frequently asked questions
Do auto-generated sitemaps actually list every page? In most cases, yes. Framework-generated sitemaps reflect the route table or content database at build time. The more common problem is sitemaps that list too many pages (including stale or redirected URLs) rather than too few. Manual sitemaps are less reliable — they require human maintenance and drift from the actual application over time.
Can I use Playwright or Cypress to crawl my sitemap? You can iterate over sitemap URLs and verify each returns HTTP 200, which is a useful health check. However, this does not test navigation coverage. To measure coverage, you need to crawl the site by following links from the homepage and then compare the discovered URLs against the sitemap. The distinction is between “does this page exist?” and “can a user reach this page?”
What about authenticated pages that are not in the sitemap? Authenticated pages are typically excluded from public sitemaps, which means the sitemap denominator only covers the public surface. For authenticated applications, the denominator must come from a different source — an internal route registry, an API endpoint listing, or a framework-level route enumeration. The five-minute route audit covers this approach.
How often should I run a sitemap coverage check? Before every deploy to production. The check is fast — parsing a sitemap and running a headless crawl takes minutes, not hours. Running it as a post-build, pre-deploy gate ensures that navigation regressions are caught before users encounter them.