Overview
Create workflows programmatically using the Kadoa SDK or REST API:- Create workflows with different navigation modes
- Use existing schemas or define custom ones
- Set up Custom AI Navigation with natural language instructions
- Configure monitoring and scheduling options
Prerequisites
Before you begin, you’ll need:- A Kadoa account
- Your API key
- For SDK:
npm install @kadoa/node-sdkoryarn add @kadoa/node-sdkoruv add kadoa-sdk
Authentication
Extraction Methods
Choose how you want to extract data from websites:Auto-Detection
Auto-detect uses AI to detect and extract what’s on the page. If you’re using the REST API directly, auto-detection isn’t available and you need to pass a data schema.
Custom Schema
Define exactly what fields you want to extract for precise control:STRING, NUMBER, BOOLEAN, DATE, DATETIME, MONEY, IMAGE, LINK, OBJECT, ARRAY
See all data types →
PDF Page Selection
When extracting from PDF URLs, you can specify which pages to process:API
pageNumbers is omitted, all pages are processed.
Raw Content Extraction
Extract unstructured content as HTML, Markdown, or plain text:HTML- Raw HTML contentMARKDOWN- Markdown formatted textPAGE_URL- URLs of extracted pages
Classification
Automatically categorize content into predefined classes:Navigation Modes
Kadoa supports five navigation modes to handle different website structures. For detailed information about each mode and its parameters, see Navigation Modes.| Mode | Value | Best For |
|---|---|---|
| Single Page | single-page | Extract data from a single page |
| List | paginated-page | Navigate through lists with pagination |
| List + Details | page-and-detail | Navigate lists then open each item for details |
| All Pages | all-pages | Crawl all pages or up to maxPages pages and extract matching entities |
| Custom AI Navigation | agentic-navigation | AI-driven navigation using natural language |
Navigation Mode Examples
Single Page Extraction
Extract data from a single page, such as a job posting or product page:List Navigation
Navigate through paginated lists to extract multiple items:List + Details Navigation
Navigate through a list and then open each item for detailed extraction:All Pages (Crawler) Navigation
Crawl all pages or up tomaxPages pages (if specified) and extract matching entities from discovered pages.
The starting URL must display the entity you want to extract.
All URLs must share the exact same hostname. For example,
https://example.com and https://example.com/products are valid, but mixing https://example.com with https://www.example.com or https://shop.example.com fails.maxPages, maxDepth, pathsFilterIn, pathsFilterOut), see Navigation Modes → All Pages.
Raw Data Mode (No Schema)
Crawl a website and retrieve raw page artifacts (HTML, Markdown, screenshots) without defining an entity or schema. Useful for LLM ingestion, site archival, or content analysis. For output options and parameters, see Navigation Modes → Raw Data Mode.Custom AI Navigation
Custom AI Navigation enables autonomous website navigation through natural language instructions. The AI understands your intent and navigates complex websites automatically. Learn more about Custom AI Navigation →Next Steps
- Schedule & Run Workflows → - Configure intervals, manual execution, and check status
- Custom AI Navigation → - Use natural language for complex extraction tasks
- Working with Schemas → - Create and manage reusable schemas
- Data Delivery → - Retrieve extracted data
- API Reference →