# Universal Endpoint

Developer instructions for sending data into OrcaSheets Data Lake through the universal HTTP ingestion API.

For a product overview and early adopter integration support, see the [OrcaSheets Data Lake](/datalake) page.

## What You Can Send

OrcaSheets Data Lake accepts rows from databases, applications, scheduled jobs, and one-off backfills. Each request sends a logical dataset name plus one or more records.

Use the universal endpoint when you want to:

- Push database rows from an internal service or script.
- Sync product, customer, revenue, or operations events.
- Backfill historical data in batches.
- Make ingested data available for OrcaSheets queries, dashboards, and AI Reports.

## Quick start

1. Request a workspace JWT from OrcaSheets (`hello@orcasheets.io`).
2. Set environment variables (see [Authentication](#authentication)).
3. Build a JSON payload with `event_type` and `records`.
4. POST to the universal endpoint with your JWT.
5. Repeat in batches until all rows are sent.

Minimal end-to-end request:

```bash
curl -sS -X POST "https://api.orcasheets.ai/v1/logs" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ORCASHEETS_JWT" \
  -d '{
    "event_type": "ecommerce_sync",
    "records": [
      {
        "metric_name": "orders",
        "metric_value": {
          "order_id": "1001",
          "amount": 49.99,
          "status": "shipped"
        }
      }
    ]
  }'
```

## Endpoint

| Property | Value |
| --- | --- |
| URL | `https://api.orcasheets.ai/v1/logs` |
| Method | `POST` |
| Content-Type | `application/json` |

### Required headers

Send these headers on every request:

```http
POST /v1/logs HTTP/1.1
Host: api.orcasheets.ai
Content-Type: application/json
Authorization: Bearer <your-jwt>
```

| Header | Value |
| --- | --- |
| `Content-Type` | `application/json` |
| `Authorization` | `Bearer <your-jwt>` |

## Authentication

Every request must include a workspace-scoped JWT in the `Authorization` header.

```http
Authorization: Bearer <your-jwt>
```

Request a token from OrcaSheets before production use. Do not commit tokens to version control. Store them in environment variables, your deployment secret store, or your orchestration platform's secret manager.

### Environment variables

```bash
export ORCASHEETS_JWT="your-token-here"
export ORCASHEETS_EVENT_TYPE="ecommerce_sync"
```

Example `.env` file:

```bash
ORCASHEETS_JWT=your-token-here
ORCASHEETS_EVENT_TYPE=ecommerce_sync
```

| Variable | Purpose |
| --- | --- |
| `ORCASHEETS_JWT` | Bearer token for the `Authorization` header |
| `ORCASHEETS_EVENT_TYPE` | Default `event_type` when not overridden in your script |

## Request body

The request body is a JSON object with two required top-level fields.

### Top-level schema

```json
{
  "event_type": "string",
  "records": []
}
```

| Field | Type | Required | Description |
| --- | --- | --- | --- |
| `event_type` | string | Yes | Stable name for the ingest stream, dataset, or pipeline. |
| `records` | array | Yes | Non-empty list of row objects. |

### Record shape

Each item in `records`:

```json
{
  "metric_name": "string",
  "metric_value": {}
}
```

| Field | Type | Description |
| --- | --- | --- |
| `metric_name` | string | Source table, collection, entity, or event name. |
| `metric_value` | object | One row as a flat JSON object. Use source column names as keys. |

### Mapping from SQL

For a Postgres row from `SELECT * FROM users LIMIT 1`, you might have:

```json
{
  "user_id": "u_1042",
  "email": "buyer@example.com",
  "plan": "pro",
  "lifetime_value": 249.5
}
```

Send it as one OrcaSheets record:

```json
{
  "metric_name": "users",
  "metric_value": {
    "user_id": "u_1042",
    "email": "buyer@example.com",
    "plan": "pro",
    "lifetime_value": 249.5
  }
}
```

Set `metric_name` to the table name and copy each column into `metric_value`.

### Value types

Values in `metric_value` may be strings, numbers, or `null`.

Recommended coercion when reading from CSV or database text fields:

| Input string | Send as |
| --- | --- |
| `""` | `null` |
| `"42"` | `42` (integer) |
| `"3.5"` | `3.5` (float) |
| `"3.0"` | `3` (integer) |
| `"ACTIVE"` | `"ACTIVE"` (string) |

Custom integrations can send native JSON types directly. Keep `metric_value` flat unless OrcaSheets support has confirmed a nested shape for your workspace.

## Payload examples

### Minimal (single row)

```json
{
  "event_type": "ecommerce_sync",
  "records": [
    {
      "metric_name": "orders",
      "metric_value": {
        "order_id": "1001",
        "amount": 49.99,
        "status": "shipped"
      }
    }
  ]
}
```

### Multi-table batch

Several ecommerce tables can share one `event_type` in a single POST. Use one record per row; mix tables freely in the same batch:

```json
{
  "event_type": "ecommerce_sync",
  "records": [
    {
      "metric_name": "orders",
      "metric_value": {
        "order_id": "1001",
        "amount": 49.99,
        "status": "shipped"
      }
    },
    {
      "metric_name": "users",
      "metric_value": {
        "user_id": "u_1042",
        "email": "buyer@example.com",
        "plan": "pro"
      }
    },
    {
      "metric_name": "inventory",
      "metric_value": {
        "sku": "SKU-991",
        "quantity": 42,
        "warehouse": "west-1"
      }
    },
    {
      "metric_name": "payments",
      "metric_value": {
        "payment_id": "pay_2201",
        "order_id": "1001",
        "amount": 49.99,
        "status": "captured"
      }
    },
    {
      "metric_name": "shipment-tracking",
      "metric_value": {
        "order_id": "1001",
        "carrier": "ups",
        "tracking_number": "1Z999AA10123456784",
        "status": "in_transit"
      }
    },
    {
      "metric_name": "returns",
      "metric_value": {
        "return_id": "ret_501",
        "order_id": "1002",
        "reason": "size_exchange",
        "status": "requested"
      }
    }
  ]
}
```

### Application event (one row per event)

```json
{
  "event_type": "ecommerce_sync",
  "records": [
    {
      "metric_name": "returns",
      "metric_value": {
        "return_id": "ret_502",
        "order_id": "1003",
        "reason": "damaged_item",
        "status": "approved",
        "occurred_at": "2026-05-25T10:15:00Z"
      }
    }
  ]
}
```

### Backfill job naming

Use a dated `event_type` so backfills stay separate from live syncs:

```json
{
  "event_type": "ecommerce_backfill_2026_05",
  "records": []
}
```

## curl integration

Save your payload as `payload.json`:

```json
{
  "event_type": "ecommerce_sync",
  "records": [
    {
      "metric_name": "orders",
      "metric_value": {
        "order_id": "1001",
        "amount": 49.99,
        "status": "shipped"
      }
    }
  ]
}
```

Send the request:

```bash
export ORCASHEETS_JWT="your-token-here"

curl -sS -X POST "https://api.orcasheets.ai/v1/logs" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ORCASHEETS_JWT" \
  -d @payload.json
```

Check the HTTP status code. `2xx` means the batch was accepted.

## Python integration

Install `requests` if needed:

```bash
pip install requests
```

Full example: map table rows into records and send them in batches.

```python
import os
import requests

API_URL = "https://api.orcasheets.ai/v1/logs"
JWT = os.environ["ORCASHEETS_JWT"]
EVENT_TYPE = os.getenv("ORCASHEETS_EVENT_TYPE", "ecommerce_sync")
BATCH_SIZE = 500


def coerce_value(value):
    if value is None:
        return None

    text = str(value).strip()
    if text == "":
        return None
    if text.isdigit():
        return int(text)

    try:
        number = float(text)
        return int(number) if number.is_integer() else number
    except ValueError:
        return text


def to_record(table_name: str, row: dict) -> dict:
    return {
        "metric_name": table_name,
        "metric_value": {key: coerce_value(value) for key, value in row.items()},
    }


rows = [
    ("orders", {"order_id": "1001", "amount": "49.99", "status": "shipped"}),
    ("users", {"user_id": "u_1042", "email": "buyer@example.com", "plan": "pro"}),
    ("inventory", {"sku": "SKU-991", "quantity": "42", "warehouse": "west-1"}),
    ("payments", {"payment_id": "pay_2201", "order_id": "1001", "amount": "49.99", "status": "captured"}),
    ("shipment-tracking", {"order_id": "1001", "carrier": "ups", "tracking_number": "1Z999AA10123456784", "status": "in_transit"}),
    ("returns", {"return_id": "ret_501", "order_id": "1002", "reason": "size_exchange", "status": "requested"}),
]

for start in range(0, len(rows), BATCH_SIZE):
    chunk = rows[start : start + BATCH_SIZE]
    records = [to_record(table_name, row) for table_name, row in chunk]

    response = requests.post(
        API_URL,
        json={"event_type": EVENT_TYPE, "records": records},
        headers={
            "Content-Type": "application/json",
            "Authorization": f"Bearer {JWT}",
        },
        timeout=120,
    )
    response.raise_for_status()
    print(f"Pushed {len(records)} records (HTTP {response.status_code})")
```

Example record built by `to_record`:

```json
{
  "metric_name": "orders",
  "metric_value": {
    "order_id": "1001",
    "amount": 49.99,
    "qty": 2
  }
}
```

## Batching guidelines

- Put many rows in each request's `records` array.
- Use `500` rows per request as a good default.
- Keep `event_type` constant for one backfill or sync job.
- Use a new `event_type` for a logically separate dataset.
- Log response bodies before retrying failures.
- Use exponential backoff for transient server or network errors.

Example: 12,000 rows at batch size 500 creates 24 HTTP requests with the same `event_type`:

```text
12_000 rows / 500 rows per request = 24 POST requests
```

Batch loop pattern:

```python
for start in range(0, len(all_rows), BATCH_SIZE):
    chunk = all_rows[start : start + BATCH_SIZE]
    records = [to_record(table, row) for table, row in chunk]
    post_batch(event_type=EVENT_TYPE, records=records)
```

## HTTP responses

| Status | Meaning | Action |
| --- | --- | --- |
| `2xx` | Batch accepted | Send the next batch or complete the job. |
| `401` | Invalid or missing JWT | Verify the token and `Authorization` header. |
| `4xx` | Client error | Fix JSON shape, required fields, or value types. |
| `5xx` | Server error | Retry with backoff. Contact support if it persists. |

Response bodies may be plain text or JSON. Example client-side handling:

```python
response = requests.post(API_URL, json=payload, headers=headers, timeout=120)

if response.status_code == 401:
    raise RuntimeError("Invalid or missing JWT")

if response.status_code >= 500:
    raise RuntimeError(f"Server error: {response.status_code} {response.text}")

response.raise_for_status()
```

## Integration patterns

### One-off backfill

Export historical rows, then POST batches with a dated `event_type`:

```json
{
  "event_type": "ecommerce_backfill_2026_05",
  "records": [
    {
      "metric_name": "orders",
      "metric_value": {
        "order_id": "9001",
        "amount": 120.0,
        "status": "delivered"
      }
    }
  ]
}
```

### Scheduled sync

Run a cron, Airflow, Dagster, GitHub Actions, or server job on a schedule. Keep a checkpoint so retries do not skip rows:

```python
# Pseudocode
last_id = load_checkpoint()
new_rows = fetch_rows_since(last_id)
post_in_batches(new_rows, event_type="ecommerce_sync")
save_checkpoint(max_id(new_rows))
```

### Application events

Append one record per business event from your backend:

```json
{
  "event_type": "ecommerce_sync",
  "records": [
    {
      "metric_name": "payments",
      "metric_value": {
        "payment_id": "pay_3301",
        "order_id": "1004",
        "amount": 199.0,
        "status": "captured"
      }
    }
  ]
}
```

## Troubleshooting

| Issue | Check |
| --- | --- |
| `401` on every request | JWT is present and sent as `Authorization: Bearer <token>`. |
| Empty dataset | `records` contains at least one item. |
| Rows appear under the wrong table | `metric_name` matches the intended source table or entity. |
| Numbers appear as text | Send JSON numbers or coerce numeric strings before sending. |
| Timeouts | Use smaller batches, usually 200 to 500 rows, and set client timeout to at least 120 seconds. |

Correct authorization header:

```http
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
```

Invalid empty payload:

```json
{
  "event_type": "ecommerce_sync",
  "records": []
}
```

## Related pages

- [OrcaSheets Data Lake](/datalake): product overview, how ingestion fits your stack, and what you can do after data lands in your workspace
- [Request integration support](/datalake#request-access): free help for the first 50 early adopters wiring the universal endpoint
- [Features](/features): plain-English queries, dashboards, and AI Reports on ingested data
- [Integrations](/integrations): connector-based imports alongside the universal HTTP API

## Support

For JWT provisioning, production access, or integration questions, contact `hello@orcasheets.io`.
