Sites — add, verify, crawl
Sites are the unit of content ownership in WordBinder. Briefs, drafts, refresh data, internal links, and pages all hang off a site.
Adding a site
From the dashboard, click Add site. You'll need:
- Name — short label that shows in the sidebar (e.g. "Acme Plumbing")
- Domain — bare hostname only (
acmeplumbing.com, nothttps://acmeplumbing.com/) - Skill pack — vertical the site belongs to. Skill packs
- Business details — name, address, phone, hours. Used in brief intake defaults and schema recommendations.
Verification
WordBinder won't crawl an unverified site. There are two verification methods:
- DNS TXT record — drop a token at
_wordbinder.<your-domain> - File upload — drop a token file at
/.well-known/wordbinder/<token>.txt
Verification is checked on demand from the site overview. Once verified, the first crawl runs automatically.
The first crawl
The crawl pipeline:
- Discovers URLs from
sitemap.xml+ the homepage - Fetches each page through Jina Reader to extract clean content
- Stores a Page row with title, H1, content text, word count, and a content hash
- Discovers further links from each page (capped to your plan's site page limit)
- Once complete, dispatches keyword discovery (one DataForSEO Labs request per page) and link-opportunity scoring (semantic similarity)
A typical 80-page site finishes its first crawl in 4–8 minutes including keyword discovery. Larger sites scale roughly linearly — there's a 2-second polite delay between keyword API calls.
Subsequent crawls
Recrawls are user-initiated from the site overview. Each crawl creates a new PageVersion snapshot per page, which is what powers the Refresh pillar's "changed" and "shrunk" flags.
Recrawls are not currently scheduled automatically — that's intentional, since each crawl costs API credits. Run a recrawl when you've made changes to the site or want to refresh the decay snapshot.
Pages list
Every fetched page lives at Site nav → Pages. From here you can:
- View page details — title, H1, content, version history, tracked keywords
- Refresh this page — opens a brief intake pre-filled with the page's top tracked keyword and a guessed archetype
- Exclude a page from analysis (e.g. legal pages you don't want to refresh)
- Retry crawl for pages that failed to fetch