Skip to main content

Overview

Create workflows programmatically using the Kadoa SDK or REST API:
  • Create workflows with different navigation modes
  • Use existing schemas or define custom ones
  • Set up Custom AI Navigation with natural language instructions
  • Configure monitoring and scheduling options

Prerequisites

Before you begin, you’ll need:
  • A Kadoa account
  • Your API key
  • For SDK: npm install @kadoa/node-sdk or yarn add @kadoa/node-sdk or uv add kadoa-sdk

Authentication

import { KadoaClient } from '@kadoa/node-sdk';

const client = new KadoaClient({
  apiKey: 'YOUR_API_KEY'
});

const status = await client.status();
console.log(status);
console.log(status.user);

Extraction Methods

Choose how you want to extract data from websites:

Auto-Detection

Auto-detect uses AI to detect and extract what’s on the page. If you’re using the REST API directly, auto-detection isn’t available and you need to pass a data schema.
// SDK: AI automatically detects and extracts data
const result = await client.extraction.run({
  urls: ["https://sandbox.kadoa.com/ecommerce"],
  name: "Auto Product Extraction",
  limit: 10,
});

console.log(result.data);

Custom Schema

Define exactly what fields you want to extract for precise control:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Structured Product Extraction",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "iPhone 15 Pro",
        })
        .field("price", "Price in USD", "MONEY")
        .field("inStock", "Availability", "BOOLEAN")
        .field("rating", "Rating 1-5", "NUMBER")
        .field("releaseDate", "Launch date", "DATE"),
  })
  .create();

const result = await workflow.run({ limit: 10 });

// Use destructuring for cleaner access
const { data } = await result.fetchData({});
console.log(data);
Available Data Types: STRING, NUMBER, BOOLEAN, DATE, DATETIME, MONEY, IMAGE, LINK, OBJECT, ARRAY See all data types →

PDF Page Selection

When extracting from PDF URLs, you can specify which pages to process:
API
// POST https://api.kadoa.com/v4/workflows
{
  "urls": ["https://example.com/report.pdf"],
  "name": "PDF Extraction",
  "entity": "Data",
  "fields": [
    {
      "name": "content",
      "dataType": "STRING",
      "description": "Extracted content"
    }
  ],
  "pageNumbers": [1, 2, 3]  // Extract only pages 1, 2, and 3
}
If pageNumbers is omitted, all pages are processed.

Raw Content Extraction

Extract unstructured content as HTML, Markdown, or plain text:
// Extract as Markdown
const extraction = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/news"],
    name: "Article Content",
    extraction: (builder) => builder.raw("MARKDOWN"),
  })
  .create();

const run = await extraction.run({ limit: 10 });
const data = await run.fetchData({});
console.log(data);
Available Formats:
  • HTML - Raw HTML content
  • MARKDOWN - Markdown formatted text
  • PAGE_URL - URLs of extracted pages

Classification

Automatically categorize content into predefined classes:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/news"],
    name: "Article Classifier",
    extraction: (builder) =>
      builder
        .entity("Article")
        .field("title", "Headline", "STRING", {
          example: "Tech Company Announces New Product",
        })
        .field("content", "Article text", "STRING", {
          example: "The article discusses the latest innovations...",
        })
        .classify("sentiment", "Content tone", [
          { title: "Positive", definition: "Optimistic tone" },
          { title: "Negative", definition: "Critical tone" },
          { title: "Neutral", definition: "Balanced tone" },
        ])
        .classify("category", "Article topic", [
          { title: "Technology", definition: "Tech news" },
          { title: "Business", definition: "Business news" },
          { title: "Politics", definition: "Political news" },
        ]),
  })
  .create();
//Note: 'limit' here is limiting number of extracted records not fetched
const result = await workflow.run({ limit: 10, variables: {} });
console.log(result.jobId);
const data = result.fetchData({ limit: 10 });
console.log(data);
Kadoa supports five navigation modes to handle different website structures. For detailed information about each mode and its parameters, see Navigation Modes.
ModeValueBest For
Single Pagesingle-pageExtract data from a single page
Listpaginated-pageNavigate through lists with pagination
List + Detailspage-and-detailNavigate lists then open each item for details
All Pagesall-pagesCrawl all pages or up to maxPages pages and extract matching entities
Custom AI Navigationagentic-navigationAI-driven navigation using natural language

Single Page Extraction

Extract data from a single page, such as a job posting or product page:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/careers-simple"],
    name: "Job Posting Monitor",
    navigationMode: "single-page",
    extraction: (builder) =>
      builder
        .entity("Job Posting")
        .field("jobTitle", "Job title", "STRING", {
          example: "Senior Software Engineer",
        })
        .field("department", "Department or team", "STRING", {
          example: "Engineering",
        })
        .field("location", "Job location", "STRING", {
          example: "San Francisco, CA",
        }),
  })
  .setInterval({ interval: "DAILY" })
  .create();

console.log("Workflow created:", workflow.workflowId);
const result = await workflow.run({ limit: 10, variables: {} });
console.log(result.jobId);

List Navigation

Navigate through paginated lists to extract multiple items:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Catalog Monitor",
    navigationMode: "paginated-page",
    extraction: () => ({ schemaId }),
  })
  .setInterval({ interval: "HOURLY" })
  .create();

// Run the workflow
const result = await workflow.run({ limit: 10 });
const response = await result.fetchData({});
console.log("Extracted items:", response.data);

List + Details Navigation

Navigate through a list and then open each item for detailed extraction:
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Details Extractor",
    navigationMode: "page-and-detail",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "Wireless Headphones",
        })
        .field("price", "Product price", "MONEY")
        .field("description", "Full description", "STRING", {
          example: "Premium noise-cancelling headphones...",
        })
        .field("specifications", "Technical specs", "STRING", {
          example: "Battery life: 30 hours, Bluetooth 5.0...",
        }),
  })
  .create();

const result = await workflow.run({ limit: 10 });
const productDetails = await result.fetchData({});
console.log(productDetails.data);

All Pages (Crawler) Navigation

Crawl all pages or up to maxPages pages (if specified) and extract matching entities from discovered pages.
The starting URL must display the entity you want to extract.
const workflow = await client
  .extract({
    urls: ["https://sandbox.kadoa.com/ecommerce"],
    name: "Product Catalog Crawler",
    navigationMode: "all-pages",
    extraction: (builder) =>
      builder
        .entity("Product")
        .field("title", "Product name", "STRING", {
          example: "Sennheiser HD 6XX",
        })
        .field("price", "Product price", "MONEY")
        .field("reviews", "Number of reviews", "STRING", {
          example: "155 reviews",
        }),
  })
  .create();

const result = await workflow.run({ limit: 10 });
const response = await result.fetchData({});
console.log(response.data);
All URLs must share the exact same hostname. For example, https://example.com and https://example.com/products are valid, but mixing https://example.com with https://www.example.com or https://shop.example.com fails.
For crawler parameters (maxPages, maxDepth, pathsFilterIn, pathsFilterOut), see Navigation Modes → All Pages.

Raw Data Mode (No Schema)

Crawl a website and retrieve raw page artifacts (HTML, Markdown, screenshots) without defining an entity or schema. Useful for LLM ingestion, site archival, or content analysis. For output options and parameters, see Navigation Modes → Raw Data Mode.
// POST https://api.kadoa.com/v4/workflows
{
  "urls": ["https://example.com"],
  "name": "Site Archive",
  "outputOptions": {
    "includeHtml": true,
    "includeMarkdown": true,
    "includeScreenshots": false,
    "includeJson": false
  },
  "maxPages": 500,
  "maxDepth": 5
}

Custom AI Navigation

Custom AI Navigation enables autonomous website navigation through natural language instructions. The AI understands your intent and navigates complex websites automatically. Learn more about Custom AI Navigation →

Next Steps