Pipelines Docs is in beta — content is actively being added.
AgentsRuntime Setup

Porting without the SDK

Self-contained porting flow for non-Python runtimes (TypeScript, Go, Rust) and zero-dependency Python.

Port your agent without the Pipelines SDK. It defines the request envelope, headers, proxy response shape, ping short-circuit behavior, inbound authentication, and rich-mode message shape.

The platform evaluates tool calls only when they are sent to:

{odyssey_proxy_url}/tools/{tool_name}
Authorization: Bearer {run_token}

Porting requires one core change: replace each tool body with an HTTP shim that posts arguments to the per-run proxy and returns the unwrapped tool result. Keep tool names, signatures, model selection, and prompt content unchanged.

A complete reference wrapper is provided at the bottom of this page.

One concept, three labels. The user prompt uses different field names across surfaces:

SurfaceSpelling
Dataset CSV columnuser
Seed axis / odyssey_seed blobuser_instruction
Dispatch envelope your wrapper readsbody["input"]["user_instruction"]

Same for the others: CSV state ↔ axis initial_state, CSV behavior ↔ axis behavior_instructions. Full table on Task seeding.

Step 1: Inventory your tools

For each tool, record:

  • Name^[A-Za-z_][A-Za-z0-9_-]{0,127}$, unique per agent.
  • Input schema — JSON Schema for the arguments object.
  • (Optional) Output schema — JSON Schema for the response.

SDK mappings for generating the tools list used by the registration form Import JSON dialog:

SDKHow to generate the tools list
OpenAI Agents SDKBuild a list from build_agent().tools, mapping each tool to name, description, and parameters from params_json_schema.
Anthropic SDKReuse the same tools array you pass into messages.create.
LangChainBuild a list from tools, mapping each tool to name, description, and parameters from args_schema.schema().
StrandsUse the Strands adapter helper dump_tools_schema(agent) to produce the tools list.

Step 2: Replace each tool body with a proxy shim

Replace each tool body with one HTTP POST to {odyssey_proxy_url}/tools/{name}. Keep the function signature unchanged so agent, planner, and SDK behavior remains stable.

Unwrap the response payload. The proxy returns a trace envelope, not the raw tool result:

{
  "tool_name": "get_order",
  "response": { "status": "shipped", "shipped_at": "2026-04-01" },
  "source": "odyssey",
  "latency_ms": 412,
  "matched_rule_index": null
}

Your tool body must return body["response"], not the full envelope. Forwarding the envelope causes a class of "why is my agent reasoning about a tool_name field?" bugs — the LLM sees the trace metadata as if it were domain data.

response arrives already decoded: a tool whose result is a JSON object or array (web_search, get_order, …) gives you a dict/list directly — no second json.loads. Non-JSON bodies (plain text, HTML, CSV) pass through as the original string, so a scalar like "pong" stays a string.

def _proxy_call(proxy_url, run_token, name, args):
    r = httpx.post(
        f"{proxy_url.rstrip('/')}/tools/{name}",
        json=args,
        headers={"Authorization": f"Bearer {run_token}"},
        timeout=120.0,
    )
    r.raise_for_status()
    return r.json()["response"]   # unwrap — the envelope is for the trace tab

Migration note (direct-HTTP agents). This decoding is a contract change: the proxy previously returned response as a JSON-encoded string. If your agent calls the proxy directly (not via the SDK) and did json.loads(resp["response"]) for a JSON-object/array tool, drop that second decode — resp["response"] is now already the dict/list. The SDK and the MCP shim handle this for you.

Do not keep original tool bodies as a fallback path. Doing so can diverge simulator world state from observed behavior and invalidate the trace.

Proxy contract

WhatValue
URLPOST to the odyssey proxy base URL, then append /tools/ followed by the tool name.
AuthSend Authorization: Bearer <run_token>, or X-Pipelines-Run-Token: <run_token>.
BodySend the tool arguments object, validated against input_schema.
Rate limit60 requests per minute per token. Retry with backoff on 429 responses.
Body cap1 MiB request and response.
LatencySandbox mode: 3–20 s per call (LLM-backed). Passthrough: ~50–200 ms overhead.
source valuesodyssey, injected, passthrough, error, or transport_error.
validation fieldPresent only when source equals odyssey; this is the simulator structured-output validation result.

Retryable errors

Some proxy responses represent transient platform conditions rather than tool failures. Retry these responses with backoff. The SDK proxy_call handles this automatically with up to four attempts and exponential backoff with jitter (roughly 0.5 to 2.5 s per wait, capped at 4 s).

Multi-agent attribution (optional)

If the system under test is multi-agent (for example, a supervisor delegating to specialists or peer agents), add the optional header below to attribute each proxy call to the acting sub-agent:

HeaderValue
X-Pipelines-Actor-IdThe acting sub-agent's actor_id: delimited call path (e.g. supervisor/refunds).

Proxy enforcement rules:

  • Header only. An actor_id key in the tool-arguments body is a real tool parameter and is never consumed as metadata. The header is authoritative for attribution, so a tool that legitimately takes an actor_id argument can be called under attribution — the body key flows through to your tool unchanged and never conflicts with the header.
  • A malformed label is rejected with 400 actor_id_invalid; out-of-catalog labels are accepted (the UI flags them as undeclared).
  • Absent header ⇒ single-agent shape — byte-identical to a port that never sets it.

Record control transfers between sub-agents by POSTing handoff trace events. (Python ports get all of this automatically from the SDK — this header is only for from-scratch / non-Python runtimes.)

Step 3: Wrap the agent in one HTTP endpoint

Pipelines opens one HTTP POST per dispatch to your endpoint.

Headers

HeaderValue
Content-Typeapplication/json
X-Pipelines-Run-TokenPer-run bearer (opaque, secret).
X-Pipelines-Odyssey-Proxy-UrlBase URL of the per-run proxy.
X-Pipelines-Run-Id / X-Pipelines-Task-IdDecimal-string ids. Safe to log.
X-Pipelines-Run-Token-JtiNon-secret correlation id. Safe to log.

If you set Auth header name at registration, your custom auth header is sent alongside these.

Body

{
  "task_id": 42,
  "run_id": 17,
  "agent_id": 8,
  "input": {
    "task_id": 42,
    "user_instruction": "Refund order #4521 if it shipped more than 30 days ago.",
    "input": {
      /* the row's current_state, or {} */
    }
  },
  "odyssey_proxy_url": "https://api.example.com/api/odyssey-proxy/runs/abc",
  "run_token_jti": "2105977b27b84f12920087320921f1a5"
}

The nested body["input"]["input"] structure is intentional. The nesting is shown below:

{                                          // dispatch envelope
  "task_id": 42,
  "run_id": 17,
  "agent_id": 8,
  "input": {                               // task payload
    "task_id": 42,
    "user_instruction": "Refund order #4521 if ...",
    "input": {                             // row current_state (task_input)
      "customer_id": "cust_99",
      "order_id": "4521"
    }
  },
  "odyssey_proxy_url": "https://...",
  "run_token_jti": "2105977b..."
}

The SDK exposes the inner object as envelope.task_input. If you only need the current prompt, read body["input"]["user_instruction"].

The body and matching X-Pipelines-* headers carry the same correlation identifiers. Prefer values from the body when present, and fall back to headers if an edge proxy strips a field.

For code-mode agents (sandboxed Python), the same values are also available as environment variables: PIPELINES_ODYSSEY_PROXY_URL, PIPELINES_RUN_TOKEN, PIPELINES_RUN_TOKEN_JTI, PIPELINES_RUN_ID, PIPELINES_TASK_ID, and _PIPELINES_TASK_INPUT_JSON.

What your wrapper must do

  1. Read odyssey_proxy_url and the run token from the body or headers.
  2. Build the agent inside the request handler so tools close over request-scoped values, because each run has its own bearer token.
  3. Run the agent using body["input"]["user_instruction"].
  4. Return a response object with a non-empty final_response value. Optionally include messages and metadata for rich trace rendering.
  5. Respond within run_timeout_s (default 300 s, maximum 1800 s).

Response envelope

Bare minimum:

{ "final_response": "The refund of $79.50 has been issued for order #4521." }

Rich mode (renders messages + reasoning blocks in the trace tab):

{
  "final_response": "The refund of $79.50 has been issued.",
  "messages": [
    { "role": "user", "content": "Refund order #4521..." },
    {
      "role": "assistant",
      "tool_calls": [{ "id": "c1", "name": "get_order", "arguments": { "order_id": "4521" } }]
    },
    { "role": "tool", "tool_call_id": "c1", "content": "{\"status\":\"shipped\"}" },
    { "role": "assistant", "content": "The refund of $79.50 has been issued." }
  ],
  "metadata": {
    "model": "claude-sonnet-4-5",
    "total_input_tokens": 1842,
    "total_output_tokens": 217,
    "agent_runtime_ms": 4815
  }
}

Malformed messages / metadata are dropped to null with a soft_warnings entry. With no response contract block, the run is still graded against final_response; with the rich transcript guard enabled, dropping messages can fail the run if no gradeable assistant message remains.

Response contract

By default, only final_response is required. messages is optional: if it is missing or malformed, the run is still graded and the trace falls back to the thin final-response view with a soft warning.

After your port reliably returns response messages, you can enable Require a gradeable rich transcript as a CI or regression guard. The exact UI location, API config, and valid messages examples live in Response contract.

Per-SDK templates that flesh out the loop:

Step 4: Add inbound auth

By default, any caller that can reach the endpoint can dispatch. Configure a static bearer token:

import os, secrets
from fastapi import Header, HTTPException

AGENT_TOKEN = os.environ["AGENT_TOKEN"]

def _require_pipelines_auth(authorization: str | None) -> None:
    expected = f"Bearer {AGENT_TOKEN}"
    if not authorization or not secrets.compare_digest(authorization, expected):
        raise HTTPException(status_code=401, detail="missing or invalid bearer token")

Generate a token:

python -c 'import secrets; print(secrets.token_urlsafe(32))'

In the registration form: Auth header name = Authorization, Auth header value = Bearer <token>.

Step 5: Short-circuit the ping probe

Important: Handle the {"ping": true} body explicitly. Otherwise, Test connection can fail with a 400 or 500 from envelope parsing, because the ping body omits regular dispatch fields.

The registration form's Test connection button (and internal health probes) POST exactly this body to your endpoint with the configured auth header:

{ "ping": true }

Run inbound auth first, then short-circuit the ping body to a 2xx. In practice, ping handling should execute after auth and before envelope parsing and dispatch:

@app.post("/dispatch")
async def dispatch(request: Request, ...):
    _require_pipelines_auth(request.headers.get("authorization"))  # 1. auth first
    body = await request.json()
    if body == {"ping": True}:                                     # 2. then ping
        return {"ok": True}
    # ... real dispatch path below ...

Why authenticate before ping? A successful Test connection should verify both endpoint reachability and bearer configuration. The platform preflight sends ping with the configured auth header, so a correctly configured agent still returns 200. If ping runs before auth, Test connection can pass with an invalid bearer and defer the misconfiguration to the first real dispatch 401.

The Pipelines SDK's register_dispatch_route runs auth then this ping short-circuit for you (see its handler.py); from-scratch ports must add both, in this order.

Step 6: Sandbox vs passthrough

This is a registration-time setting, not runtime branching inside tool code. Decision guidance:

  • Read-only external APIs (Wikipedia, weather, web search) — passthrough.
  • Anything whose state another simulated tool reads — sandbox.
  • Real production-money mutators — sandbox (or passthrough to a non-prod clone you've registered).

Flip individual tools via PUT /api/agents/{id} after registration. Full decision content: Tools schema.

Porting into code mode (sandbox uploads) instead

The previous sections describe externally hosted HTTP dispatch, where Pipelines posts each run to your server endpoint. In code mode, no server is required. You upload source code by paste, zip, or git reference, and the platform runs it in an E2B sandbox by invoking the entrypoint directly.

The contract is a single function the platform invokes per dispatch:

def run(task_input, *, proxy_url: str, run_token: str) -> dict:
    # ... your agent runs here ...
    return {"final_response": "the agent's answer"}
    # optional extra keys: "messages" (rich-mode transcript), "metadata"

task_input is a mapping that carries:

KeyValue
user_instructionThe user's prompt, when the task seeds one (else "").
inputThe task's free-form input (the row's current_state, or {}).
task_idDecimal task id.

proxy_url and run_token are the same per-run proxy base URL and bearer you'd otherwise read from the X-Pipelines-… headers, handed to you as arguments. To scaffold a working starting point, run pipelines odyssey init --mode code.

The sandbox already has a running event loop. Code agents execute inside the sandbox's Jupyter kernel, whose asyncio loop is already running on the main thread. Calling asyncio.run(...) or openai-agents' Runner.run_sync(...) from there raises RuntimeError: asyncio.run() cannot be called from a running event loop (and Runner.run_sync raises the same, since it calls asyncio.run under the hood).

The escape is to run your async code on a fresh thread, which has no loop of its own, so asyncio.run is legal there:

import asyncio, threading

def run_async_in_thread(coro_factory):
    out = {}
    def _target():
        try:
            out["result"] = asyncio.run(coro_factory())
        except BaseException as exc:
            out["error"] = exc
    t = threading.Thread(target=_target)
    t.start(); t.join()
    if "error" in out:
        raise out["error"]
    return out["result"]

result = run_async_in_thread(lambda: Runner.run(agent, instruction))

ContextVars don't cross threads. If you vendor the Pipelines SDK and use its proxied tools, those tools read the per-run Envelope off a ContextVar. A ContextVar set on the main thread is invisible to the worker thread above — so the bind has to happen inside _target:

from pipelines.odyssey import Envelope
from pipelines.odyssey.context import set_current

def _target():
    with set_current(Envelope.from_env()):
        out["result"] = asyncio.run(coro_factory())

No offload, no manual bind. When your code does not hop to a thread — plain straight-line sync code — you don't bind anything: the platform's driver auto-binds the envelope around your entrypoint whenever the SDK is importable in the sandbox. The manual set_current is only for the thread-offload case, where that auto-bind can't reach.

The sandbox does not include the Pipelines SDK by default. There are two supported proxy access patterns:

  • Stdlib-only proxy calls. Hit the wire contract directly. Send an HTTP POST to the per-run proxy tools endpoint with a JSON body equal to the tool arguments and an Authorization bearer header carrying the run token. The reply wraps your tool output in a response field, so read the final tool payload from that response field. No SDK is required.
  • Vendor the SDK with your upload. Bundle pipelines into the source you upload (then the Envelope / set_current helpers and proxied tools above are available).

To verify outbound connectivity from agent runtime to the per-run proxy:

curl -X POST https://api.pipelines.tech/api/odyssey-proxy/tools/__reachability_probe__ \
  -H "Authorization: Bearer fake-token" \
  -H "Content-Type: application/json" \
  -d '{}'

HTTP 401 is the expected success signal for this probe, because the proxy is reachable and rejects the fake token. Connection or DNS errors indicate that runtime egress to the proxy is unavailable.

Python note: pipelines odyssey doctor --app app:app runs these checks as one pass, including app import, ping ordering with auth, outbound proxy probe, and tools_schema validity, with optional dataset CSV checks. agents.create_http_agent(name=..., endpoint_url=..., api_key=..., tools_schema=...) registers the agent without hand-authoring the POST /api/agents body. See Agent SDK.

Full reference wrapper

Reference FastAPI wrapper that dispatches every tool through the proxy, short-circuits ping, enforces inbound authentication, returns the v1 response envelope, and unwraps proxy responses correctly. The example is approximately 60 lines and depends only on httpx and fastapi.

import json
import os
import secrets
from typing import Any

import httpx
from fastapi import FastAPI, Header, HTTPException, Request

app = FastAPI()
AGENT_TOKEN = os.environ["AGENT_TOKEN"]


def _proxy_call(proxy_url: str, run_token: str, name: str, args: dict) -> Any:
    """One tool call → POST to the per-run proxy, return the unwrapped result."""
    r = httpx.post(
        f"{proxy_url.rstrip('/')}/tools/{name}",
        json=args,
        headers={"Authorization": f"Bearer {run_token}"},
        timeout=120.0,
    )
    try:
        r.raise_for_status()
    except httpx.HTTPStatusError:
        return {"error": r.text}
    return r.json()["response"]   # NOT the full envelope


def _require_inbound_auth(authorization: str | None) -> None:
    expected = f"Bearer {AGENT_TOKEN}"
    if not authorization or not secrets.compare_digest(authorization, expected):
        raise HTTPException(status_code=401, detail="missing or invalid bearer token")


@app.post("/dispatch")
async def dispatch(
    request: Request,
    authorization: str | None = Header(default=None),
    x_pipelines_run_token: str | None = Header(default=None),
    x_pipelines_odyssey_proxy_url: str | None = Header(default=None),
):
    # 1. Inbound auth FIRST, so a wrong bearer surfaces as 401 even on
    #    the ping probe. A green Test connection then means "reachable
    #    AND authenticated" — matching what register_dispatch_route does.
    _require_inbound_auth(authorization)

    body = await request.json()

    # 2. Ping short-circuit — after auth, before envelope parsing.
    if body == {"ping": True}:
        return {"ok": True}

    # 3. Pull dispatch context out of body (preferred) or headers (fallback).
    proxy_url = body.get("odyssey_proxy_url") or x_pipelines_odyssey_proxy_url
    run_token = x_pipelines_run_token
    if not proxy_url or not run_token:
        raise HTTPException(400, "missing proxy URL or run token")

    # 4. Read the user prompt. The inner "input" is the row's current_state;
    #    most agents only need user_instruction.
    inner = body.get("input") or {}
    user_instruction = inner.get("user_instruction") or ""
    task_input = inner.get("input") or {}   # row current_state, may be {}

    # 5. Build the agent so its tool calls close over (proxy_url, run_token).
    #    Replace the next two lines with your real tool-use loop, calling
    #    _proxy_call(proxy_url, run_token, tool_name, args) for every tool.
    final_response = run_my_agent(
        user_instruction=user_instruction,
        task_input=task_input,
        proxy_call=lambda name, args: _proxy_call(proxy_url, run_token, name, args),
    )

    # 6. Required v1 shape. Add `messages` + `metadata` for rich-mode trace.
    return {"final_response": final_response or "(no final response)"}

For TypeScript, Go, or Rust runtimes, apply the same six steps and translate the proxy shim, ping short-circuit, and response envelope exactly. The platform contract is wire-format based.

Appendix — Rich-mode messages shape

If you return only final_response, this appendix is optional. That field alone is sufficient for grading. Rich-mode fields improve trace rendering by showing reasoning and tool calls inline.

Per-message structure inside messages[]:

{
  "role": "system" | "user" | "assistant" | "tool",
  "content": "<string>" | [<parts>] | null,
  "tool_call_id": "<string>",
  "tool_calls": [
    { "id": "<string>", "name": "<tool name>", "arguments": <object or JSON-encoded string> }
  ],
  "thinking": [ { /* free-form */ } ]
}
  • role — one of system / user / assistant / tool.

  • content — string, list of content parts, or null (e.g. an assistant turn that's purely a tool call).

  • tool_call_id — required on role: "tool" messages; matches the id of the assistant tool_calls[] entry it answers.

  • tool_calls[] — present on role: "assistant" messages that invoke tools. Accepts either the flat form above or the OpenAI Chat Completions nested form:

    { "id": "<string>", "type": "function", "function": { "name": "...", "arguments": "..." } }
  • thinking[] — opaque list of reasoning blocks (Anthropic thinking, OpenAI reasoning). The trace tab renders these collapsed under each assistant turn. Free-form contents; the most common shape is [{ "type": "text", "text": "..." }].

metadata first-class keys (others land in a generic key/value table):

KeyNotes
modelThe model id used by the agent (e.g. claude-sonnet-4-5).
system_prompt_idFree-form identifier for the prompt version.
total_input_tokens / total_output_tokensInteger token counts.
agent_runtime_msWall-clock duration of the agent's loop.

Malformed messages / metadata are dropped to null with a soft_warnings entry. With no response contract block, the run is still graded against final_response; with the rich transcript guard enabled, dropping messages can fail the run if no gradeable assistant message remains.

To forward intermediate reasoning during execution instead of returning it only at completion, use the side-channel trace events path. See Trace events.

JSON Schema for the full response envelope: /schemas/agent-response.json.