Allow 42flows to crawl your site
42flows reads your public marketing pages to understand what you do, who your customers are, and what content gaps exist on your site. We do this once during onboarding and periodically during maintenance runs. If your robots.txt blocks our crawler, the whole pipeline stops — the analysis fails, and we can't generate content for you.
This page shows exactly what to add.
Check what you have
Visit https://yoursite.com/robots.txt in a browser. Common blocking patterns:
# Blocks everyone — this is what causes the "crawl completely disallowed" error
User-agent: *
Disallow: /
Or:
# Blocks everyone except a hand-picked allowlist — 42flows-bot isn't on it
User-agent: Googlebot
Allow: /
User-agent: *
Disallow: /
Either of these will block us.
The fix — one block to add
Add this to your robots.txt at the root of your site:
User-agent: 42flows-bot
Allow: /
This explicitly allows our crawler even if User-agent: * is blocked. Our bot identifies itself with this User-Agent string:
Mozilla/5.0 (compatible; 42flows-bot/1.0; +https://42flows.com)
If you want to be selective
You can allow some paths and disallow others — same as any other crawler:
User-agent: 42flows-bot
Allow: /
Disallow: /admin
Disallow: /internal
At minimum we need access to your homepage, /about, and any blog/landing pages you want analyzed.
Verify
After you update robots.txt, go back to the 42flows connect form and click Re-check on the preflight panel. The "robots.txt allows crawling" check should flip to green. If it doesn't:
- Make sure the file is at the exact path
https://yoursite.com/robots.txt(not/robots/txt, not on a subdomain). - Confirm the new version is actually live (some CDNs cache
robots.txt— hard-reload or purge). - Check that your
42flows-botblock appears before any catch-allUser-agent: *block. robots.txt matching uses the most specific agent group, but having the specific block first avoids ambiguity with older parsers.
Why we respect robots.txt
We crawl via Cloudflare's Browser Rendering service, which respects robots.txt by design. This is the internet-standard contract between website owners and automated visitors — we honor it. If you explicitly disallow us, we stop. The fix is to explicitly allow us.
Still stuck? The "Site crawlability" panel on the connect form shows the exact robots.txt content we saw and which rule matched. Email [email protected] with a screenshot and we'll help.