← Setup guides

Allow 42flows to crawl your site

42flows reads your public marketing pages to understand what you do, who your customers are, and what content gaps exist on your site. We do this once during onboarding and periodically during maintenance runs. If your robots.txt blocks our crawler, the whole pipeline stops — the analysis fails, and we can't generate content for you.

This page shows exactly what to add.

Check what you have

Visit https://yoursite.com/robots.txt in a browser. Common blocking patterns:

# Blocks everyone — this is what causes the "crawl completely disallowed" error
User-agent: *
Disallow: /

Or:

# Blocks everyone except a hand-picked allowlist — 42flows-bot isn't on it
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /

Either of these will block us.

The fix — one block to add

Add this to your robots.txt at the root of your site:

User-agent: 42flows-bot
Allow: /

This explicitly allows our crawler even if User-agent: * is blocked. Our bot identifies itself with this User-Agent string:

Mozilla/5.0 (compatible; 42flows-bot/1.0; +https://42flows.com)

If you want to be selective

You can allow some paths and disallow others — same as any other crawler:

User-agent: 42flows-bot
Allow: /
Disallow: /admin
Disallow: /internal

At minimum we need access to your homepage, /about, and any blog/landing pages you want analyzed.

Verify

After you update robots.txt, go back to the 42flows connect form and click Re-check on the preflight panel. The "robots.txt allows crawling" check should flip to green. If it doesn't:

  1. Make sure the file is at the exact path https://yoursite.com/robots.txt (not /robots/txt, not on a subdomain).
  2. Confirm the new version is actually live (some CDNs cache robots.txt — hard-reload or purge).
  3. Check that your 42flows-bot block appears before any catch-all User-agent: * block. robots.txt matching uses the most specific agent group, but having the specific block first avoids ambiguity with older parsers.

Why we respect robots.txt

We crawl via Cloudflare's Browser Rendering service, which respects robots.txt by design. This is the internet-standard contract between website owners and automated visitors — we honor it. If you explicitly disallow us, we stop. The fix is to explicitly allow us.


Still stuck? The "Site crawlability" panel on the connect form shows the exact robots.txt content we saw and which rule matched. Email [email protected] with a screenshot and we'll help.