Introduction
Crawling is useful to automatically crawl all accessible subpages of a website and convert them into structured JSON or markdown. This guide helps you:- Initiate a crawling session
- Check crawling session status
- List crawled pages
- Access crawled page content
Prerequisites
To get the most out of this guide, you’ll need to:- Create a Kadoa account
- Get your API key
1. Start a Crawl
To initiate a web crawl, send a POST request to the/crawl
endpoint with the desired configuration.
View full API reference →
Use
pathsFilterIn
and pathsFilterOut
to include or exclude specific paths.
Adjust timeout
, maxDepth
, and maxPages
to refine the crawling process.2. Check Crawl Status
Monitor the progress of your crawling session using the/crawl/status/<sessionId>
endpoint.
View full API reference →
3. List Crawled Pages
Access the crawled pages using the/crawl/<sessionId>/pages
endpoint with pagination.
View full API reference →
currentPage
: Positive integer, starting from 0.pageSize
: Positive integer, starting from 1.
4. Retrieve Page Content
Now let’s retrieve the content of the crawled pages in our preferred format. The API can deliver the page payload directly in an LLM-ready format, such as markdown. View full API reference →html
: Full HTML structuremd
: Markdown format