> ## Documentation Index
> Fetch the complete documentation index at: https://docs.kadoa.com/llms.txt
> Use this file to discover all available pages before exploring further.

# REST API

> Pull data from Kadoa on-demand using the REST API

The REST API gives you complete control over when and how you retrieve your extracted data. Perfect for batch processing, scheduled jobs, or on-demand access.

## Basic Usage

### Get Latest Data

Retrieve the most recent data from a workflow:

[View full API reference →](/api-reference/workflows/get-workflow-data-by-id)

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data
```

<CodeGroup>
  ```json Response theme={null}
  {
    "workflowId": "workflow-123",
    "runId": "run-456",
    "executedAt": "2024-01-15T10:30:00Z",
    "data": [
      {
        "id": "123",
        "title": "Product Name",
        "price": 99.99,
        "extractedAt": "2024-01-15T10:30:00Z"
      }
    ],
    "pagination": {
      "totalCount": 150,
      "page": 1,
      "totalPages": 6,
      "limit": 25
    }
  }
  ```

  The total row count is `pagination.totalCount`. There is no `hasMore` flag; to check whether further pages exist, compare `pagination.page < pagination.totalPages`.

  **Key field deduplication**: If your workflow schema marks one or more fields as key fields (`isKey: true`), Kadoa automatically deduplicates results so there is at most one record per unique key combination. Key fields should be scalar `STRING`, `NUMBER`, or `LINK` fields. Records where any key field is missing or empty are treated as distinct and are not merged.
</CodeGroup>

## Pagination and Filtering

### Handle Large Datasets

Use pagination for efficient data retrieval:

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data?page=1&limit=100
```

### Query Parameters

| Parameter | Type    | Default | Description                                                        |
| --------- | ------- | ------- | ------------------------------------------------------------------ |
| `page`    | integer | `1`     | Page number                                                        |
| `limit`   | integer | `25`    | Rows per page. Set to `0` to stream all rows without paging        |
| `sortBy`  | string  | —       | Field name to sort by                                              |
| `order`   | string  | `asc`   | Sort order: `asc` or `desc`                                        |
| `filters` | string  | —       | JSON-encoded array of filter objects (see below)                   |
| `runId`   | string  | —       | Retrieve data from a specific historical run instead of the latest |
| `format`  | string  | `json`  | Response format: `json` or `csv`                                   |

### Filtering

Pass a URL-encoded JSON array to `filters`. Each entry specifies a field, an operator, and a value:

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data?limit=50&filters=[{"field":"jobTitle","operator":"CONTAINS","value":"Manager"},{"field":"postedDate","operator":"AFTER","value":"2024-01-01"}]
```

Available operators:

| Operator                                                                      | Description                                                               |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| `EQUALS` / `NOT_EQUALS`                                                       | Exact match                                                               |
| `CONTAINS` / `NOT_CONTAINS`                                                   | Substring match (case-insensitive)                                        |
| `STARTS_WITH` / `ENDS_WITH`                                                   | Prefix / suffix match                                                     |
| `GREATER_THAN` / `LESS_THAN` / `GREATER_THAN_OR_EQUAL` / `LESS_THAN_OR_EQUAL` | Numeric or date comparison                                                |
| `IN` / `NOT_IN`                                                               | Value must (or must not) be in an array: `"value": ["Sales","Marketing"]` |
| `IS_NULL` / `IS_NOT_NULL`                                                     | Field presence check                                                      |
| `IS_EMPTY` / `IS_NOT_EMPTY`                                                   | Null or empty string check                                                |
| `BEFORE` / `AFTER`                                                            | Date field comparison                                                     |
| `WITHIN_LAST_DAYS`                                                            | Date field within the last N days: `"value": 7`                           |

## Data Formats

### JSON (Default)

Standard JSON format, perfect for modern applications:

```json theme={null}
{
  "data": [...],
  "pagination": {...}
}
```

### CSV Format

Add `?format=csv` to receive a CSV file instead of JSON:

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data?format=csv
```

All pagination and filter parameters apply. For large exports that would exceed the response timeout, use `download=link`:

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data?format=csv&download=link
```

This materializes the file in object storage and returns a `downloadPath` you can fetch separately.

### Parquet option

Create a signed URL for workflow data as a typed Parquet file:

[View full API reference →](/api-reference/workflows/create-workflow-data-export)

```bash theme={null}
GET https://api.kadoa.com/v4/workflows/{workflowId}/data/export?format=parquet
```

Use Parquet when you want to load Kadoa output directly into analytical tools such as DuckDB, Spark, Polars, Snowflake, BigQuery, or pandas. The export endpoint materializes the requested result set and returns a signed `url`. Pass `runId` to export a specific historical run:

```bash theme={null}
curl "https://api.kadoa.com/v4/workflows/{workflowId}/data/export?format=parquet&runId={runId}" \
  -H "x-api-key: YOUR_API_KEY"
```

Filtering, sorting, and row selection use the same query parameters as CSV and JSON exports. For direct streaming of the complete run artifact, use `GET /v4/workflows/{workflowId}/data/parquet`.

Typed columns are preserved where the workflow schema provides native types, including booleans, numbers, dates, timestamps, JSON-compatible objects, and arrays. Older workflow runs may still reflect the type information available when that run was produced.

## Error Handling

### Common HTTP Status Codes

| Code | Meaning      | Action                 |
| ---- | ------------ | ---------------------- |
| 200  | Success      | Process data normally  |
| 400  | Bad Request  | Check query parameters |
| 401  | Unauthorized | Verify API key         |
| 404  | Not Found    | Check workflow ID      |
| 429  | Rate Limited | Wait and retry         |
| 500  | Server Error | Contact support        |

## API Reference

For complete API documentation, see:

* [Get workflow data by ID](/api-reference/workflows/get-workflow-data-by-id)
* [Authentication](/api-reference/introduction)
