Technical SEO Audit: Step-by-Step Guide to Finding and Fixing Crawl Errors

Technical SEO audit: step-by-step guide to finding and fixing crawl errors

Key Takeaways

  • A technical SEO audit covers six areas: crawlability, indexation, Core Web Vitals, site architecture, duplicate content, and structured data. A problem in any one of them can cap your rankings, no matter how good everything else looks.
  • Screaming Frog and Google Search Console catch about 90% of technical issues between them, and both are free at the scale most businesses operate.
  • Crawl errors, redirect chains, and broken internal links pile up on each other. Fix them in order: 4xx errors first, then redirect chains, then canonicalisation. Skip the order and you’ll end up redoing work.
  • Pages marked “Crawled but not indexed” in GSC usually have a quality or duplicate-content problem, not a crawl problem, so the fix is better content, not another technical tweak.
  • Run a full audit quarterly and keep an eye on high-traffic pages in between. Do it once and never again, and the gains usually fade within three months.

Technical SEO is the foundation everything else sits on. Write the best content on the internet, earn backlinks from authoritative sources, and Google will still suppress your rankings if it can’t crawl the page reliably, if the canonical is pointing to the wrong version, or if the page speed score puts you in the “Poor” bucket on Core Web Vitals. We’ve audited sites where three technical fixes moved pages from outside the top 50 to the first page within eight weeks, before a single word of new content was written.

Five steps. The tools, the sequence, and why the order matters.

What a technical SEO audit actually covers

technical seo audit guide img 2

Technical SEO is the layer search engines have to work through before they can even consider ranking you. Can they find your pages? Can they crawl them without hitting walls? If this layer is broken, the content investment and the backlinks barely move the needle.

Six areas, roughly in order of importance. Crawlability and indexation matter most. Get those two wrong and the rest barely matters.

Crawlability first. Can Googlebot reach the pages you need indexed? Most failures here trace back to robots.txt blocking more than intended, a noindex tag left behind from a staging environment, or pages that exist on the site but have no internal links pointing to them. Google can’t index what it can’t find.

Indexation is different. Google reached the page. Now does it think the page deserves a spot in the index? We see the worst bloat on ecommerce sites where every filter combination spawns its own indexable URL, and on WordPress sites where tag archives and paginated pages have been left completely unmanaged.

Site architecture gets less attention than it deserves. Count the clicks from your homepage to your most important pages. Three is fine. Four is borderline. Past four? That page is leaking crawl budget and link equity, quietly, with no warning in any report.

Page speed needs checking template by template, not just the homepage. One slow blog post template can drag down forty articles simultaneously. LCP, INP, and CLS are what you’re measuring.

Then there’s duplicate content and canonicalisation, which trips up almost every CMS-driven site. Are there multiple indexable URLs pointing at essentially the same page, all competing with each other?

And structured data: does your schema markup actually validate, or does it just look fine until you run it through Google’s testing tool?

The five steps below follow this sequence deliberately. Get out of order and you end up redoing work.

Step 1: Crawl your site with Screaming Frog

The tool most people use is Screaming Frog SEO Spider. Free up to 500 URLs. After that, $259/year. If the site is built on React or Vue, you need the paid version anyway since JavaScript rendering is locked behind the licence. More than 500 pages to audit? The licence pays for itself before you finish step one.

Before you run anything, set the Spider to crawl as Googlebot under Configuration > User-Agent. If the site is built in React or Vue, turn on JavaScript rendering (Configuration > Spider > Rendering > JavaScript), since skipping this step makes a perfectly healthy site look almost empty in the results. On shared hosting, cap the crawl speed at around 4 requests per second so you don’t trip the host’s rate limiter and get the whole audit blocked mid-crawl.

Once the crawl finishes, pull these reports:

  • Response Codes > 4xx: Every 404 and 410 on internal links. These burn crawl budget for nothing.
  • Response Codes > 3xx: Every redirect on the site. Three-hop chains (A to B to C) are surprisingly common after a migration and bleed link equity at every step. These go near the top of the fix list.
  • Page Titles > Missing / Duplicate / Over 60 Characters: Duplicate titles are the most common and the most ignored. Missing ones matter most on pages with GSC impressions.
  • H1 > Missing / Duplicate: Either signals the page topic is unclear, which tends to show up in rankings eventually.
  • Images > Missing Alt Text: Every image without an alt attribute.
  • Directives > Noindex: Every page carrying a noindex tag. Check this against your intended index list, since leftover noindex tags from a migration or CMS update are a classic cause of pages quietly dropping out of rankings.
  • Canonicals > Non-canonical pages: Pages where the canonical points elsewhere. Confirm each one is deliberate.

Export the lot to CSV, sort by issue volume, and work through it using the priority order in Step 4 below.

Step 2: Audit indexation in Google Search Console

GSC’s Index Coverage report, now called “Indexing” in the current interface, is where to go. It shows every URL Google has attempted to process, sorted by what happened to it. Four statuses matter: Error, Valid, Excluded, and Warning.

Pages marked “Error” have a technical block stopping them from being indexed. “Server error (5xx)” is the most common: your server didn’t respond when Googlebot came calling. “Redirect error” is usually a loop or a chain that went too long. “Submitted URL blocked by robots.txt” means you asked GSC to index a URL that your own robots.txt file is simultaneously telling Google to skip. Fix errors before anything else on this list.

The “Valid” bucket is what you want most of your important pages in. Worth cross-checking against your Screaming Frog crawl though: if GSC shows a page as indexed but Screaming Frog found a noindex tag on it, something is conflicting between the server response and the meta tag, and it needs sorting before you trust the data.

Most “Excluded” pages are excluded deliberately. Canonical exclusions, noindex pages, that kind of thing. The one worth actually investigating is “Crawled but not indexed”: Google got there, looked at the page, and chose not to include it without explanation. Nine times out of ten: quality. Thin content, too similar to something already indexed, or not matching what searchers actually want from that query. The fix is better content, not a technical tweak.

“Warning” pages are indexed but flagged with an issue. “Indexed, though blocked by robots.txt” is the common one, and it usually means a CSS or JS file is blocked, which can mess with rendering even though the HTML page itself made it into the index.

While you’re in GSC, pull the Core Web Vitals report under Experience. Pages get sorted into Good, Needs Improvement, and Poor. Whatever’s in “Poor” is your input list for Step 3.

Step 3: Check Core Web Vitals by page group

technical seo audit guide img 3

Three numbers: LCP under 2.5 seconds, INP under 200ms, CLS under 0.1. Google’s been using these as ranking signals since 2021. For this step use the CWV report inside GSC, not PageSpeed Insights. GSC draws on field data from real Chrome users. PageSpeed runs a synthetic lab test. When they disagree, trust the field data.

Check LCP by template, not page by page. Fix the template and every page using it improves at once – that’s where the leverage is. The culprits are nearly always one of: a hero image over 200KB with no lazy loading, render-blocking JavaScript sitting in the <head>, or server response times above 600ms. Usually one of those three. Check them in that order.

CLS problems almost always come from images without explicit width and height attributes, which causes the page to jump as they load. Web fonts that render late can do the same thing. The fix for images is just adding those dimensions. Fonts need font-display: swap set plus preloading – that handles most of the text shift. Ads injected above the content after load are the messiest case, and usually need reserved space in the layout to handle cleanly.

INP, the metric formerly known as FID, measures how long the page takes to respond to interaction. On WordPress, this is almost always too much third-party JavaScript (chat widgets, analytics, social share buttons) competing for the main thread. Go through your third-party scripts and cut or defer anything that isn’t actually making you money. On one high-traffic page we tested, removing a single chat widget that nobody used shaved 60 to 100ms off INP immediately.

Step 4: Fix crawl errors in priority order

Combine the Screaming Frog export with your GSC data into one fix list. The sequence below matters: work out of order and you’ll end up redoing things.

Server errors (5xx) come first. Any page returning a 500 is burning crawl budget and returning nothing to Google. Get it to your hosting or dev team the same day, and check the server error rate in GSC weekly after that so you catch new ones fast.

Next: 404s that still have internal links pointing to them. Screaming Frog’s inlinks view shows you these. Before deciding what to do, pull the GSC impression history for each one. If it had any ranking history at all, it’s worth restoring. If not, update the pages linking to it to point somewhere relevant instead. Redirecting the 404 to the homepage is the lazy option and the wrong one: it discards both the topical relevance and the anchor text signal.

Priority 3: Redirect chains longer than one hop. Pull every 3xx URL from Screaming Frog and check the chain length. If A redirects to B which redirects to C, point A straight at C. If A still gets external backlinks, fixing this recovers a real chunk of that link equity. This is exactly the kind of compounding technical issue you need to clear up before content work pays off, and it’s a big part of how to improve your site’s overall visibility in search.

After that, check for noindex tags on pages that should be indexed. Run your noindex report against what should actually be in the index. Leftover noindex tags from migrations or CMS template changes are one of the most common reasons pages disappear from rankings without any obvious explanation. Pull the tag and request indexing for those URLs in GSC.

Missing title tags and H1s are the easiest fixes on this list. Start with pages that already get impressions in GSC but have a low click-through rate. A rewritten title tag on those pages can move traffic within days, before anything else changes.

Step 5: Resolve duplicate content and canonicalisation

technical seo audit guide img 4

Duplicate content is everywhere on CMS-driven sites, and it’s one of the harder problems to spot, because the pages can look completely different to a person but identical to Google.

The most common sources on WordPress and similar CMSs:

  • HTTP vs HTTPS versions both live (the non-HTTPS one should 301 to HTTPS)
  • Trailing slash vs no trailing slash (pick one, redirect the other)
  • www vs non-www (same idea)
  • Category and tag archive pages pulling in the same posts
  • Pagination pages (/page/2/, /page/3/) without proper rel=next/prev or canonical handling
  • URL parameters (/?sort=price, /?colour=red) spinning up extra indexable versions of the same page

For most of the sources above, a canonical tag is the right tool. Pick the primary URL, point everything else at it. Caveat: Google treats canonicals as guidance, not a command, and will sometimes ignore them. That’s acceptable for faceted navigation and product variants. HTTP vs HTTPS, www vs non-www: use a 301 redirect instead. Harder for Google to override, and it removes the extra URL entirely rather than leaving it in the index.

Update your XML sitemap while you’re at it. Canonical, indexable URLs only. Paginated pages, filtered parameter URLs, anything that 301 redirects: all of it comes out. Submit the cleaned version in GSC and request a recrawl. And if the site is being rebuilt rather than patched, this is worth getting right before launch: a site built with SEO baked in from the start doesn’t need this kind of cleanup three months after going live.

Common findings and fix priority

Issue Impact Priority Fix
Server errors (5xx) Pages unreachable to Google Critical Escalate to hosting/dev immediately
404s with inbound links Lost equity on linking pages High Restore or redirect to correct destination
Redirect chains (3+ hops) Equity dilution High Update source to point directly to final URL
Noindex on ranked pages Rankings lost entirely High Remove noindex, request indexing in GSC
Core Web Vitals: Poor LCP Ranking suppressed vs competitors High Compress hero images, reduce TTFB
Duplicate content (no canonical) Index bloat, split signals Medium Add canonical tags or 301 redirect
Missing title tags Google rewrites them; CTR loss Medium Write unique 50-60 char titles
Missing alt text on images Image search and accessibility Low Add descriptive alt attributes
Schema markup errors Rich results ineligible Low Validate and fix via Rich Results Test

Work down in order. Skip ahead and you’ll often have to redo a fix because something higher up the list changed the situation. For where technical audits fit into a wider quarterly SEO plan, building a results-driven SEO plan covers exactly that.

Frequently asked questions

How long does a technical SEO audit actually take?

Mostly depends on the site. Under 500 pages: most of a morning, using Screaming Frog and GSC. 500 to 5,000: clear a full day, maybe two. Past 50,000 URLs, you need enterprise tools like Botify or ContentKing, and the review phase alone stretches to a week. Either way, the audit time is rarely the bottleneck. Implementation is. On sites where every fix needs a ticket and dev sign-off, three to eight weeks for changes to actually go live is about right.

What’s the difference between a crawl error and an indexation error?

Two different failure modes. Crawl errors: Googlebot never got to the page at all. Server timed out, robots.txt blocked it, redirect chain went in a loop. Indexation errors: Googlebot got there, read it, decided not to include it. The fixes don’t overlap. Crawl problems go to your hosting team or dev. Indexation problems are a content and canonical conversation. GSC’s coverage report separates them by status, which is where to start.

How do I fix “Crawled but not indexed” pages?

Google got there, read it, passed on it. Not a technical problem. The content didn’t clear the bar. Narrow the topic, get more specific, or pull several thin pages together into one useful resource. Once the page is updated: bump the last-modified date in your sitemap, request re-indexing in GSC. One thing to avoid: don’t apply a noindex out of frustration. Google is already not indexing it. Adding a noindex just locks that decision in permanently.

What tools do you actually need?

Start with Screaming Frog (free under 500 URLs) and Google Search Console. Those two free tools cover roughly 90% of what the audit needs. For Core Web Vitals: PageSpeed Insights plus the CWV report in GSC. JavaScript-heavy site? Get Screaming Frog’s paid rendering mode or switch to Sitebulb – crawling without JS rendering shows you a version of the page Googlebot doesn’t actually see. Ahrefs Site Audit adds the backlink layer, which matters for triage: it tells you which broken pages have links pointing at them and are actually worth fixing.

How often should I run a technical SEO audit?

Quarterly for the full audit. If the site publishes often or the CMS configuration changes a lot, check monthly too. Most regressions between audits come down to the same things: CMS updates, plugin updates (WordPress especially), or someone on the content team accidentally setting a noindex or canonical at the page level. Set up GSC alerts for index coverage drops over 5% week-over-week, and you’ll catch most of these before they turn into a ranking problem.

Want a technical SEO audit done on your site, with a prioritised fix list back within the week? The Sky Storm Digital team runs full crawl audits and hands you a fix list ranked by impact on rankings, not just by how many issues turn up.

Tags
What do you think?
Leave a Reply

Your email address will not be published. Required fields are marked *

What to read next