Skip to content
Published

How to use a site crawl without turning it into a report dump

Screaming Frog SEO Spider is a desktop crawler for finding technical SEO problems across a website. It can reveal broken URLs, redirects, canonical signals, duplicate metadata, weak internal linking, missing alt text, structured data issues, and other problems that are hard to see by clicking around manually.

For a Thailand business website, the useful question is not “how many warnings did the crawl find?” The useful question is whether important pages - service pages, villa pages, tour pages, booking flows, contact pages, multilingual routes, or location pages - are crawlable, indexable, internally linked, and technically consistent.

Technical SEO crawler dashboard with URL rows, status signals, redirect paths, internal link nodes, metadata cards, and Thailand website context

The short answer

Use Screaming Frog when you need to see the website as a crawlable system rather than as isolated pages.

It is especially useful for:

  • Finding broken internal links and unexpected status codes
  • Checking redirect chains and old URLs after migrations
  • Reviewing titles, meta descriptions, h1s, canonicals, and indexability
  • Finding duplicate or near-duplicate page patterns
  • Auditing internal links, crawl depth, and page groups
  • Checking image alt text, oversized assets, and basic on-page signals
  • Combining crawl data with Search Console, analytics, or PageSpeed data when access is available

The mistake is treating every warning as equal. A missing meta description on an old tag page is not the same problem as a canonical mistake on a booking page that should generate leads.

What Screaming Frog does well

Screaming Frog is strong because it crawls many URLs consistently and lets you filter the results by technical signal.

For small and medium business websites, that helps with practical questions:

  • Are important URLs returning 200 OK?
  • Do internal links point through redirects?
  • Are canonical tags self-referencing, missing, conflicting, or pointing to the wrong page?
  • Are important pages marked noindex by mistake?
  • Do duplicate titles or h1s reveal repeated templates?
  • Are service pages buried too deep in the internal link structure?
  • Does the XML sitemap contain old, blocked, redirected, or duplicate URLs?
  • Does JavaScript rendering change what the crawler can see?

The free version is useful for small checks, but it is limited to 500 URLs and does not include all advanced features. For larger sites, migration checks, JavaScript rendering, scheduled crawls, integrations, saving crawl files, and deeper configuration, the paid version is usually the practical option.

What it does not solve

A crawl is evidence, not a strategy.

Screaming Frog can tell you that 300 titles are duplicated. It cannot decide whether those pages should exist, be merged, be rewritten, redirected, noindexed, or left alone. It can show crawl depth, but it cannot know which pages matter commercially unless you bring that context.

It also does not replace:

  • Google Search Console or Bing Webmaster Tools for search-engine-specific indexing and query data
  • Server logs for what search crawlers actually requested
  • PageSpeed Insights, Lighthouse, and WebPageTest for performance evidence
  • Analytics and conversion tracking for user behaviour
  • Manual review for content quality, design, forms, booking flows, and accessibility

That is why a useful crawl should be interpreted next to other signals, not exported as a long issue list.

How to set up the crawl

Start with scope. Decide whether you are checking the whole public site, a section, a list of important URLs, a staging site, or a migration map.

For a Thailand website, I usually separate the crawl into groups that match the business:

  • Homepage and main service pages
  • Location or market pages
  • Villa, room, tour, menu, product, or booking pages
  • Contact, enquiry, checkout, and form pages
  • Language versions and alternate routes
  • Blog or guide content
  • Old URLs from migrations, redesigns, or WordPress cleanup

Then configure the crawl to match the site. A mostly static Astro, Laravel, or WordPress site may only need a normal HTML crawl. A JavaScript-heavy site may need rendered crawling, because the raw HTML and the browser-rendered page can expose different links, headings, canonical tags, or content.

If the site is protected, rate limited, or behind Cloudflare rules, crawl carefully. A crawler should not look like a load test, and it should not hit booking, checkout, or form endpoints in a way that creates fake records.

How to read the crawl

The useful workflow is to move from technical access to page meaning.

First, check indexability. Important pages should be crawlable, return a successful status code, avoid accidental noindex, and point canonical signals at the intended URL. If this layer is wrong, metadata and content work may not matter.

Second, check status codes and redirects. Broken links, soft errors, redirect chains, and links through old URLs create friction for users and crawlers. After migrations, this section is often the fastest way to find lost equity and unnecessary delay.

Third, check canonicals and duplicates. Duplicate titles, repeated h1s, duplicate content patterns, parameter URLs, print views, tag archives, and filtered pages can all confuse the site structure. The answer is not always to delete pages. Sometimes the right fix is better internal linking, clearer canonical rules, rewritten templates, or noindex on low-value utility pages.

Fourth, check internal links. A page that earns enquiries should not be hidden five clicks deep with weak anchor text. Screaming Frog can show inlinks, outlinks, crawl depth, and orphan-like patterns when combined with sitemap, Search Console, or analytics sources.

Finally, review metadata and headings. Titles and h1s should be aligned but not identical by default. The title earns the search click; the h1 confirms the page topic after the visitor arrives. Repeated titles, thin descriptions, and generic h1s often reveal template problems rather than isolated writing mistakes.

What to prioritize

A useful crawl becomes a work plan only after prioritization.

For a small Thailand business website, I would usually prioritize:

  1. Important pages blocked from crawling or indexing
  2. Broken internal links on high-value pages
  3. Redirect chains on navigation, service pages, booking flows, or contact paths
  4. Canonical mistakes that point away from important content
  5. Duplicate or generic metadata on commercial pages
  6. Weak internal links to pages that should generate enquiries
  7. Old sitemap URLs that no longer represent the site
  8. Template-level issues that can be fixed once and improve many pages

Lower-priority warnings still matter, but they should not distract from problems that affect visibility, leads, bookings, or maintainability.

Use it with other tools

Screaming Frog is strongest when it is one source in a wider analysis workflow.

Use Google Search Console and Bing Webmaster Tools to see what search engines report after they process the site. Use the crawlability and indexation guide to understand the SEO concepts behind the findings. Use PageSpeed Insights, Lighthouse, and WebPageTest when the crawl exposes slow templates, large images, or script-heavy pages.

For the wider diagnostic process, start with website analysis for SEO and development.

Official references

Screaming Frog changes over time, so exact interface details should be checked against the current documentation:

From crawl data to fixes

A crawl is only useful if it leads to a clean implementation plan. For many sites, the best fixes are ordinary development work: updating templates, cleaning redirects, improving internal links, correcting canonical rules, removing obsolete routes, tightening sitemaps, or fixing WordPress output that has accumulated over time.

If you have a Thailand business website and want the crawl interpreted as practical work rather than a generic audit, send me the URL and what you want checked. I can review the crawl in context and point to useful fixes.

More articles