> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kadoa.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Schemas

> Create and manage reusable schemas programmatically using the SDK

For an overview of schema concepts and available data types, see [Schemas](/docs/workflows/schemas).

## Prerequisites

* Kadoa account with API key
* SDK installed: `npm install @kadoa/node-sdk` or `uv add kadoa-sdk`

## Working with Schemas

Define the structure of data you want to extract using the builder API:

<CodeGroup>
  ```typescript Node SDK theme={null}
  const extraction = await client
    .extract({
      urls: ["https://sandbox.kadoa.com/ecommerce"],
      name: "Product Extraction",
      extraction: (builder) =>
        builder
          .entity("Product")
          .field("title", "Product name", "STRING", { example: "Laptop" })
          .field("price", "Product price", "MONEY")
          .field("inStock", "Availability", "BOOLEAN")
          .field("rating", "Star rating 1-5", "NUMBER"),
    })
    .create();
  ```

  ```python Python SDK theme={null}
  extract_options = ExtractOptions(
      urls=["https://sandbox.kadoa.com/ecommerce"],
      name="Product Extraction",
      extraction=lambda builder: builder.entity("Product")
      .field("title", "Product name", "STRING", FieldOptions(example="Laptop"))
      .field("price", "Product price", "MONEY")
      .field("inStock", "Availability", "BOOLEAN")
      .field("rating", "Star rating 1-5", "NUMBER"),
  )

  extraction = client.extract(extract_options).create()
  print(f"Extraction created successfully: {extraction}")
  ```
</CodeGroup>

### Reusable Schemas

For consistent data extraction across multiple workflows, you can create and manage schemas separately using the Schema Management API. Schemas can also be bundled into [templates](/docs/sdk/templates/overview) along with a prompt and notification settings.

## Schema Management API

The Schema Management API allows you to create, retrieve, and delete schemas programmatically. Saved schemas can be reused across multiple extractions, ensuring consistent data structure.

### When to Use Saved Schemas

Use saved schemas when you:

* Extract the same data structure from multiple websites
* Want to maintain consistent field definitions across workflows
* Need to programmatically manage schema lifecycle
* Share schemas across different parts of your application

For one-off extractions, inline schema definitions (shown above) are simpler and don't require separate schema management.

<Tip>
  If you need to apply the same schema, prompt, and notifications to multiple workflows, consider using [templates](/docs/sdk/templates/overview) instead. Templates bundle all three into a versioned configuration.
</Tip>

## Create a Schema

<CodeGroup>
  ```typescript Node SDK theme={null}
  const schema = await client.schema.createSchema({
    name: schemaName,
    entity: "Product",
    fields: [
      {
        name: "title",
        description: "Product name",
        fieldType: "SCHEMA",
        dataType: "STRING",
        example: "iPhone 15 Pro",
      },
      {
        name: "price",
        description: "Product price",
        fieldType: "SCHEMA",
        dataType: "MONEY",
      },
      {
        name: "inStock",
        description: "Availability",
        fieldType: "SCHEMA",
        dataType: "BOOLEAN",
      },
      {
        name: "rating",
        description: "Star rating",
        fieldType: "SCHEMA",
        dataType: "NUMBER",
      },
    ],
  });

  console.log("Schema created:", schema.id);
  ```

  ```python Python SDK theme={null}
  # Create field objects
  fields = [
      SchemaField(
          actual_instance=DataField(
              name="title",
              description="Product name",
              fieldType="SCHEMA",
              dataType="STRING",
              example=FieldExample(actual_instance="iPhone 15 Pro"),
          )
      ),
      SchemaField(
          actual_instance=DataField(
              name="price",
              description="Product price",
              fieldType="SCHEMA",
              dataType="MONEY",
          )
      ),
      SchemaField(
          actual_instance=DataField(
              name="inStock",
              description="Availability",
              fieldType="SCHEMA",
              dataType="BOOLEAN",
          )
      ),
      SchemaField(
          actual_instance=DataField(
              name="rating",
              description="Star rating",
              fieldType="SCHEMA",
              dataType="NUMBER",
          )
      ),
  ]

  # Create schema request
  create_request = CreateSchemaRequest(
      name="Product Schema",
      entity="Product",
      fields=fields,
  )

  schema = client.schema.create_schema(create_request)

  print("Schema created:", schema.id)
  ```
</CodeGroup>

## Get a Schema

Retrieve an existing schema by ID:

<CodeGroup>
  ```typescript Node SDK theme={null}
  const schema = await client.schema.getSchema(schemaId);

  console.log(schema.name); // 'Product Schema'
  console.log(schema.entity); // 'Product'
  console.log(schema.schema); // Array of field definitions
  ```

  ```python Python SDK theme={null}
  schema = client.schema.get_schema(schema_id)

  print(schema.name)
  print(schema.entity)
  print(schema.var_schema)  # Array of field definitions
  ```
</CodeGroup>

## Delete a Schema

Remove a schema when it's no longer needed:

<CodeGroup>
  ```typescript Node SDK theme={null}
  await client.schema.deleteSchema(schemaId);
  ```

  ```python Python SDK theme={null}
  client.schema.delete_schema(schema_id)
  ```
</CodeGroup>

<Note>
  Deleting a schema does not affect existing workflows or extractions that were created using it.
</Note>

## Update a Schema

Modify an existing schema's name, entity, or fields:

<CodeGroup>
  ```typescript Node SDK theme={null}
  const updated = await client.schema.updateSchema(schemaId, {
    name: updatedName,
    entity: "Product",
    fields: [
      {
        name: "title",
        description: "Product name",
        fieldType: "SCHEMA",
        dataType: "STRING",
        example: "MacBook Pro",
      },
      {
        name: "price",
        description: "Product price in USD",
        fieldType: "SCHEMA",
        dataType: "MONEY",
      },
      {
        name: "sku",
        description: "Stock keeping unit",
        fieldType: "SCHEMA",
        dataType: "STRING",
        example: "Stock keeping unit",
      },
    ],
  });

  console.log("Schema updated:", updated.id);
  ```

  ```python Python SDK theme={null}
  update_request = UpdateSchemaRequest(
      name="Updated Product Schema",
      entity="Product",
      fields=[
          SchemaField(
              actual_instance=DataField(
                  name="title",
                  description="Product name",
                  fieldType="SCHEMA",
                  dataType="STRING",
                  example=FieldExample(actual_instance="MacBook Pro"),
              )
          ),
          SchemaField(
              actual_instance=DataField(
                  name="price",
                  description="Product price in USD",
                  fieldType="SCHEMA",
                  dataType="MONEY",
              )
          ),
          SchemaField(
              actual_instance=DataField(
                  name="sku",
                  description="Stock keeping unit",
                  fieldType="SCHEMA",
                  dataType="STRING",
                  example=FieldExample(actual_instance="SKU-123"),
              )
          ),
      ],
  )

  updated = client.schema.update_schema(schema_id, update_request)
  print("Schema updated:", updated.id)
  ```
</CodeGroup>

## Use a Saved Schema

Reference a saved schema in your extraction:

<CodeGroup>
  ```typescript Node SDK theme={null}
  const extraction = await client
    .extract({
      urls: ["https://sandbox.kadoa.com/ecommerce"],
      name: "Product Extraction",
      extraction: () => ({ schemaId }),
    })
    .create();

  const result = await extraction.run();
  ```

  ```python Python SDK theme={null}
  extract_options = ExtractOptions(
      urls=["https://sandbox.kadoa.com/ecommerce"],
      name="Product Extraction - PY-SCHEMAS-005",
      extraction=lambda _: {"schemaId": schema_id},
  )

  extraction = client.extract(extract_options).create()

  run_result = extraction.run()
  print("Extraction completed:", run_result)
  ```
</CodeGroup>

## Field Types

Schemas support three types of fields:

1. **Regular fields** - Structured data extraction (shown above)
2. **Classification fields** - Categorize content into predefined labels
3. **Metadata fields** - Extract raw page content (HTML, Markdown, URLs)

### Available Data Types

For regular fields, specify the `dataType`:

`STRING` • `NUMBER` • `BOOLEAN` • `DATE` • `DATETIME` • `MONEY` • `IMAGE` • `LINK` • `OBJECT` • `ARRAY`

[See data type details and examples →](/docs/workflows/schemas#data-types)

## Classification Fields

Categorize extracted content into predefined labels:

<CodeGroup>
  ```typescript Node SDK theme={null}
  const schema = await client.schema.createSchema({
    name: schemaName,
    entity: "Article",
    fields: [
      {
        name: "title",
        description: "Article headline",
        fieldType: "SCHEMA",
        dataType: "STRING",
        example: "Breaking News",
      },
      {
        name: "category",
        description: "Article category",
        fieldType: "CLASSIFICATION",
        categories: [
          { title: "Technology", definition: "Tech news and updates" },
          { title: "Business", definition: "Business and finance" },
          { title: "Sports", definition: "Sports coverage" },
        ],
      },
    ],
  });
  ```

  ```python Python SDK theme={null}
  # Create categories
  categories = [
      Category(title="Technology", definition="Tech news and updates"),
      Category(title="Business", definition="Business and finance"),
      Category(title="Sports", definition="Sports coverage"),
  ]

  # Create fields
  fields = [
      SchemaField(
          actual_instance=DataField(
              name="title",
              description="Article headline",
              fieldType="SCHEMA",
              dataType="STRING",
              example=FieldExample(actual_instance="Breaking News"),
          )
      ),
      SchemaField(
          actual_instance=ClassificationField(
              name="category",
              description="Article category",
              fieldType="CLASSIFICATION",
              categories=categories,
          )
      ),
  ]

  # Create schema request
  create_request = CreateSchemaRequest(
      name="Article Schema",
      entity="Article",
      fields=fields,
  )

  schema = client.schema.create_schema(create_request)
  print("Schema with classification created:", schema.id)
  ```
</CodeGroup>
