Skip to main content
Website sources let BoundBot discover, fetch, and index pages from your site. This is the best option when your public content changes often and you want the bot to stay aligned with it.

Add a source

Open Knowledge -> Websites and click Add Website. You can add a source in three modes:
  • Crawl links: discover pages by following internal links
  • Sitemap: import URLs from sitemap.xml
  • Individual link: index one specific page

Advanced options

For crawl and sitemap sources, you can also set:
  • Include path prefix to limit the crawl to part of a site
  • Exclude path prefix to skip sections such as /admin
  • Max links to control crawl size
  • Auto recrawl if you want the source refreshed on a schedule

Manage a source

Each source card shows:
  • current status
  • number of discovered links
  • number of crawled links
  • total stored size
  • last crawl time
  • last error, if one occurred
You can also run two important actions:
  • Fetch links to discover or refresh the URL list
  • Crawl pending links to actually scrape and index page content

Watch your limits

The top summary shows:
  • website source count
  • total links
  • stored data size
  • crawl credit cost per page
These limits vary by plan. If you hit a limit, BoundBot prompts you to upgrade before you add more sources.

Best practices

  • Start with the smallest useful section of your site.
  • Exclude duplicate or low-value pages such as admin routes, login pages, and legal archives if they do not help customers.
  • Re-crawl after major content updates.
  • Review errors early so broken pages do not quietly degrade your answers.
Crawling consumes credits. If your site is large, use path filters and max-link limits instead of crawling the whole domain on day one.