Skip to content

OpenResponses Client

The OpenResponsesClient uses OpenAI's Responses API (POST /v1/responses) instead of the Chat Completions API. This client is useful for providers that implement the newer Responses API format.

Key Differences from ChatCompletionsClient

Feature ChatCompletionsClient OpenResponsesClient
API endpoint chat.completions.create() responses.create()
System messages Included in messages array Passed as instructions parameter
Message format {"role": "user", "content": [...]} {"role": "user", "content": [{"type": "input_text", ...}]}
Tool call IDs tool_call_id call_id
Reasoning config reasoning_effort param reasoning: {"effort": ...} object

Usage

For models that support extended thinking (like o1/o3), you can configure the reasoning effort:

import asyncio

from stirrup import Agent
from stirrup.clients import OpenResponsesClient


async def main() -> None:
    """Run an agent using the OpenResponses API with a reasoning model."""

    # Create client using OpenResponsesClient
    # Uses the OpenAI Responses API (responses.create)
    # For reasoning models, you can set reasoning_effort
    client = OpenResponsesClient(
        model="gpt-5.2",
        reasoning_effort="medium",
    )

    agent = Agent(client=client, name="reasoning-agent", max_turns=19)

    async with agent.session(output_dir="output/open_responses_example") as session:
        _finish_params, _history, _metadata = await session.run(
            "Plan a software release with these tasks: Design (5 days), Backend (10 days, needs Design), "
            "Frontend (8 days, needs Design), Testing (4 days, needs Backend and Frontend), "
            "Documentation (3 days, can start after Backend). Two developers are available. "
            "What's the minimum time to complete? Output an Excel Gantt chart with the schedule."
        )


if __name__ == "__main__":
    asyncio.run(main())

Constructor Parameters

Parameter Type Default Description
model str required Model identifier (e.g., "gpt-4o", "o1")
max_tokens int 64_000 Maximum output tokens
base_url str \| None None Custom API base URL
api_key str \| None None API key (falls back to OPENAI_API_KEY env var)
reasoning_effort str \| None None Reasoning effort for o1/o3 models: "low", "medium", "high"
timeout float \| None None Request timeout in seconds
max_retries int 2 Number of retries for transient errors
instructions str \| None None Default system instructions
kwargs dict \| None None Additional arguments passed to responses.create()

API Reference

stirrup.clients.open_responses_client

OpenAI SDK-based LLM client for the Responses API.

This client uses the official OpenAI Python SDK's responses.create() method, supporting both OpenAI's API and any OpenAI-compatible endpoint that implements the Responses API via the base_url parameter.

__all__ module-attribute

__all__ = ['OpenResponsesClient']

LOGGER module-attribute

LOGGER = getLogger(__name__)

ChatMessage

ChatMessage = Annotated[
    SystemMessage
    | UserMessage
    | AssistantMessage
    | ToolMessage,
    Field(discriminator=role),
]

Discriminated union of all message types, automatically parsed based on role field.

Content

Content = list[ContentBlock] | str

Message content: either a plain string or list of mixed content blocks.

ContextOverflowError

Bases: Exception

Raised when LLM context window is exceeded (max_tokens or length finish_reason).

AssistantMessage

Bases: BaseModel

LLM response message with optional tool calls and token usage tracking.

AudioContentBlock

Bases: BinaryContentBlock

Audio content supporting MPEG, WAV, AAC, and other common audio formats.

to_base64_url

to_base64_url(bitrate: str = '192k') -> str

Transcode to MP3 and return base64 data URL.

Source code in src/stirrup/core/models.py
def to_base64_url(self, bitrate: str = "192k") -> str:
    """Transcode to MP3 and return base64 data URL."""
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning, module="moviepy.*")
        with NamedTemporaryFile(suffix=".bin") as fin, NamedTemporaryFile(suffix=".mp3") as fout:
            fin.write(self.data)
            fin.flush()
            clip = AudioFileClip(fin.name)
            clip.write_audiofile(fout.name, codec="libmp3lame", bitrate=bitrate, logger=None)
            clip.close()
            return f"data:audio/mpeg;base64,{b64encode(fout.read()).decode()}"

EmptyParams

Bases: BaseModel

Empty parameter model for tools that don't require parameters.

ImageContentBlock

Bases: BinaryContentBlock

Image content supporting PNG, JPEG, WebP, PSD formats with automatic downscaling.

to_base64_url

to_base64_url(
    max_pixels: int | None = RESOLUTION_1MP,
) -> str

Convert image to base64 data URL, optionally resizing to max pixel count.

Source code in src/stirrup/core/models.py
def to_base64_url(self, max_pixels: int | None = RESOLUTION_1MP) -> str:
    """Convert image to base64 data URL, optionally resizing to max pixel count."""
    img: Image.Image = Image.open(BytesIO(self.data))
    if max_pixels is not None and img.width * img.height > max_pixels:
        tw, th = downscale_image(img.width, img.height, max_pixels)
        img.thumbnail((tw, th), Image.Resampling.LANCZOS)
    if img.mode != "RGB":
        img = img.convert("RGB")
    buf = BytesIO()
    img.save(buf, format="PNG")
    return f"data:image/png;base64,{b64encode(buf.getvalue()).decode()}"

LLMClient

Bases: Protocol

Protocol defining the interface for LLM client implementations.

Any LLM client must implement this protocol to work with the Agent class. Provides text generation with tool support and model capability inspection.

Reasoning

Bases: BaseModel

Extended thinking/reasoning content from models that support chain-of-thought reasoning.

SystemMessage

Bases: BaseModel

System-level instructions and context for the LLM.

TokenUsage

Bases: BaseModel

Token counts for LLM usage (input, output, reasoning tokens).

total property

total: int

Total token count across input, output, and reasoning.

__add__

__add__(other: TokenUsage) -> TokenUsage

Add two TokenUsage objects together, summing each field independently.

Source code in src/stirrup/core/models.py
def __add__(self, other: "TokenUsage") -> "TokenUsage":
    """Add two TokenUsage objects together, summing each field independently."""
    return TokenUsage(
        input=self.input + other.input,
        output=self.output + other.output,
        reasoning=self.reasoning + other.reasoning,
    )

Tool

Bases: BaseModel

Tool definition with name, description, parameter schema, and executor function.

Generic over

P: Parameter model type (Pydantic BaseModel subclass, or EmptyParams for parameterless tools) M: Metadata type (should implement Addable for aggregation; use None for tools without metadata)

Tools are simple, stateless callables. For tools requiring lifecycle management (setup/teardown, resource pooling), use a ToolProvider instead.

Example with parameters
class CalcParams(BaseModel):
    expression: str

calc_tool = Tool[CalcParams, None](
    name="calc",
    description="Evaluate math",
    parameters=CalcParams,
    executor=lambda p: ToolResult(content=str(eval(p.expression))),
)

Example without parameters (uses EmptyParams by default):

time_tool = Tool[EmptyParams, None](
    name="time",
    description="Get current time",
    executor=lambda _: ToolResult(content=datetime.now().isoformat()),
)

ToolCall

Bases: BaseModel

Represents a tool invocation request from the LLM.

Attributes:

Name Type Description
name str

Name of the tool to invoke

arguments str

JSON string containing tool parameters

tool_call_id str | None

Unique identifier for tracking this tool call and its result

ToolMessage

Bases: BaseModel

Tool execution result returned to the LLM.

Attributes:

Name Type Description
role Literal['tool']

Always "tool"

content Content

The tool result content

tool_call_id str | None

ID linking this result to the corresponding tool call

name str | None

Name of the tool that was called

args_was_valid bool

Whether the tool arguments were valid

success bool

Whether the tool executed successfully (used by finish tool to control termination)

UserMessage

Bases: BaseModel

User input message to the LLM.

VideoContentBlock

Bases: BinaryContentBlock

MP4 video content with automatic transcoding and resolution downscaling.

to_base64_url

to_base64_url(
    max_pixels: int | None = RESOLUTION_480P,
    fps: int | None = None,
) -> str

Transcode to MP4 and return base64 data URL.

Source code in src/stirrup/core/models.py
def to_base64_url(self, max_pixels: int | None = RESOLUTION_480P, fps: int | None = None) -> str:
    """Transcode to MP4 and return base64 data URL."""
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning, module="moviepy.*")
        with NamedTemporaryFile(suffix=".mp4") as fin, NamedTemporaryFile(suffix=".mp4") as fout:
            fin.write(self.data)
            fin.flush()
            clip = VideoFileClip(fin.name)
            tw, th = downscale_image(int(clip.w), int(clip.h), max_pixels)
            clip = clip.with_effects([Resize(new_size=(tw, th))])

            clip.write_videofile(
                fout.name,
                codec="libx264",
                fps=fps,
                audio=clip.audio is not None,
                audio_codec="aac",
                preset="veryfast",
                logger=None,
            )
            clip.close()
            return f"data:video/mp4;base64,{b64encode(fout.read()).decode()}"

OpenResponsesClient

OpenResponsesClient(
    model: str,
    max_tokens: int = 64000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    instructions: str | None = None,
    kwargs: dict[str, Any] | None = None,
)

Bases: LLMClient

OpenAI SDK-based client using the Responses API.

Uses the official OpenAI Python SDK's responses.create() method. Supports custom base_url for OpenAI-compatible providers that implement the Responses API.

Includes automatic retries for transient failures and token usage tracking.

Example

Standard OpenAI usage

client = OpenResponsesClient(model="gpt-4o", max_tokens=128_000)

Custom OpenAI-compatible endpoint

client = OpenResponsesClient( ... model="gpt-4o", ... base_url="http://localhost:8000/v1", ... api_key="your-api-key", ... )

Initialize OpenAI SDK client with model configuration for Responses API.

Parameters:

Name Type Description Default
model str

Model identifier (e.g., 'gpt-4o', 'o1-preview').

required
max_tokens int

Maximum output tokens. Defaults to 64,000.

64000
base_url str | None

API base URL. If None, uses OpenAI's standard URL. Use for OpenAI-compatible providers.

None
api_key str | None

API key for authentication. If None, reads from OPENROUTER_API_KEY environment variable.

None
reasoning_effort str | None

Reasoning effort level for extended thinking models (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.

None
timeout float | None

Request timeout in seconds. If None, uses OpenAI SDK default.

None
max_retries int

Number of retries for transient errors. Defaults to 2.

2
instructions str | None

Default system-level instructions. Can be overridden by SystemMessage in the messages list.

None
kwargs dict[str, Any] | None

Additional arguments passed to responses.create().

None
Source code in src/stirrup/clients/open_responses_client.py
def __init__(
    self,
    model: str,
    max_tokens: int = 64_000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    instructions: str | None = None,
    kwargs: dict[str, Any] | None = None,
) -> None:
    """Initialize OpenAI SDK client with model configuration for Responses API.

    Args:
        model: Model identifier (e.g., 'gpt-4o', 'o1-preview').
        max_tokens: Maximum output tokens. Defaults to 64,000.
        base_url: API base URL. If None, uses OpenAI's standard URL.
            Use for OpenAI-compatible providers.
        api_key: API key for authentication. If None, reads from OPENROUTER_API_KEY
            environment variable.
        reasoning_effort: Reasoning effort level for extended thinking models
            (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.
        timeout: Request timeout in seconds. If None, uses OpenAI SDK default.
        max_retries: Number of retries for transient errors. Defaults to 2.
        instructions: Default system-level instructions. Can be overridden by
            SystemMessage in the messages list.
        kwargs: Additional arguments passed to responses.create().
    """
    self._model = model
    self._max_tokens = max_tokens
    self._reasoning_effort = reasoning_effort
    self._default_instructions = instructions
    self._kwargs = kwargs or {}

    # Initialize AsyncOpenAI client
    resolved_api_key = api_key or os.environ.get("OPENAI_API_KEY")

    # Strip /responses suffix if present - SDK appends it automatically
    resolved_base_url = base_url
    if resolved_base_url and resolved_base_url.rstrip("/").endswith("/responses"):
        resolved_base_url = resolved_base_url.rstrip("/").removesuffix("/responses")

    self._client = AsyncOpenAI(
        api_key=resolved_api_key,
        base_url=resolved_base_url,
        timeout=timeout,
        max_retries=max_retries,
    )

max_tokens property

max_tokens: int

Maximum output tokens.

model_slug property

model_slug: str

Model identifier.

generate async

generate(
    messages: list[ChatMessage], tools: dict[str, Tool]
) -> AssistantMessage

Generate assistant response with optional tool calls using Responses API.

Retries up to 3 times on transient errors (connection, timeout, rate limit, internal server errors) with exponential backoff.

Parameters:

Name Type Description Default
messages list[ChatMessage]

List of conversation messages.

required
tools dict[str, Tool]

Dictionary mapping tool names to Tool objects.

required

Returns:

Type Description
AssistantMessage

AssistantMessage containing the model's response, any tool calls,

AssistantMessage

and token usage statistics.

Raises:

Type Description
ContextOverflowError

If the response is incomplete due to token limits.

Source code in src/stirrup/clients/open_responses_client.py
@retry(
    retry=retry_if_exception_type(
        (
            APIConnectionError,
            APITimeoutError,
            RateLimitError,
            InternalServerError,
        )
    ),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def generate(
    self,
    messages: list[ChatMessage],
    tools: dict[str, Tool],
) -> AssistantMessage:
    """Generate assistant response with optional tool calls using Responses API.

    Retries up to 3 times on transient errors (connection, timeout, rate limit,
    internal server errors) with exponential backoff.

    Args:
        messages: List of conversation messages.
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        AssistantMessage containing the model's response, any tool calls,
        and token usage statistics.

    Raises:
        ContextOverflowError: If the response is incomplete due to token limits.
    """
    # Convert messages to OpenResponses format
    instructions, input_items = _to_open_responses_input(messages)

    # Use provided instructions or fall back to default
    final_instructions = instructions or self._default_instructions

    # Build request kwargs
    request_kwargs: dict[str, Any] = {
        "model": self._model,
        "input": input_items,
        "max_output_tokens": self._max_tokens,
        **self._kwargs,
    }

    # Add instructions if present
    if final_instructions:
        request_kwargs["instructions"] = final_instructions

    # Add tools if provided
    if tools:
        request_kwargs["tools"] = _to_open_responses_tools(tools)
        request_kwargs["tool_choice"] = "auto"

    # Add reasoning effort if configured (for o1/o3 models)
    if self._reasoning_effort:
        request_kwargs["reasoning"] = {"effort": self._reasoning_effort}

    # Make API call
    response = await self._client.responses.create(**request_kwargs)

    # Check for incomplete response (context overflow)
    if response.status == "incomplete":
        stop_reason = getattr(response, "incomplete_details", None)
        raise ContextOverflowError(
            f"Response incomplete for model {self.model_slug}: {stop_reason}. "
            "Reduce max_tokens or message length and try again."
        )

    # Parse response output
    content, tool_calls, reasoning = _parse_response_output(response.output)

    # Parse token usage
    usage = response.usage
    input_tokens = usage.input_tokens if usage else 0
    output_tokens = usage.output_tokens if usage else 0

    # Handle reasoning tokens if available
    reasoning_tokens = 0
    if usage and hasattr(usage, "output_tokens_details") and usage.output_tokens_details:
        reasoning_tokens = getattr(usage.output_tokens_details, "reasoning_tokens", 0) or 0
        output_tokens = output_tokens - reasoning_tokens

    return AssistantMessage(
        reasoning=reasoning,
        content=content,
        tool_calls=tool_calls,
        token_usage=TokenUsage(
            input=input_tokens,
            output=output_tokens,
            reasoning=reasoning_tokens,
        ),
    )

_content_to_open_responses_input

_content_to_open_responses_input(
    content: Content,
) -> list[dict[str, Any]]

Convert Content blocks to OpenResponses input content format.

Uses input_text for text content (vs output_text for responses).

Source code in src/stirrup/clients/open_responses_client.py
def _content_to_open_responses_input(content: Content) -> list[dict[str, Any]]:
    """Convert Content blocks to OpenResponses input content format.

    Uses input_text for text content (vs output_text for responses).
    """
    if isinstance(content, str):
        return [{"type": "input_text", "text": content}]

    out: list[dict[str, Any]] = []
    for block in content:
        if isinstance(block, str):
            out.append({"type": "input_text", "text": block})
        elif isinstance(block, ImageContentBlock):
            out.append({"type": "input_image", "image_url": block.to_base64_url()})
        elif isinstance(block, AudioContentBlock):
            out.append(
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": block.to_base64_url().split(",")[1],
                        "format": block.extension,
                    },
                }
            )
        elif isinstance(block, VideoContentBlock):
            out.append({"type": "input_file", "file_data": block.to_base64_url()})
        else:
            raise NotImplementedError(f"Unsupported content block: {type(block)}")
    return out

_content_to_open_responses_output

_content_to_open_responses_output(
    content: Content,
) -> list[dict[str, Any]]

Convert Content blocks to OpenResponses output content format.

Uses output_text for assistant message content.

Source code in src/stirrup/clients/open_responses_client.py
def _content_to_open_responses_output(content: Content) -> list[dict[str, Any]]:
    """Convert Content blocks to OpenResponses output content format.

    Uses output_text for assistant message content.
    """
    if isinstance(content, str):
        return [{"type": "output_text", "text": content}]

    out: list[dict[str, Any]] = []
    for block in content:
        if isinstance(block, str):
            out.append({"type": "output_text", "text": block})
        else:
            raise NotImplementedError(f"Unsupported output content block: {type(block)}")
    return out

_to_open_responses_tools

_to_open_responses_tools(
    tools: dict[str, Tool],
) -> list[dict[str, Any]]

Convert Tool objects to OpenResponses function format.

OpenResponses API expects tools with name/description/parameters at top level, not nested under a 'function' key like Chat Completions API.

Parameters:

Name Type Description Default
tools dict[str, Tool]

Dictionary mapping tool names to Tool objects.

required

Returns:

Type Description
list[dict[str, Any]]

List of tool definitions in OpenResponses format.

Source code in src/stirrup/clients/open_responses_client.py
def _to_open_responses_tools(tools: dict[str, Tool]) -> list[dict[str, Any]]:
    """Convert Tool objects to OpenResponses function format.

    OpenResponses API expects tools with name/description/parameters at top level,
    not nested under a 'function' key like Chat Completions API.

    Args:
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        List of tool definitions in OpenResponses format.
    """
    out: list[dict[str, Any]] = []
    for t in tools.values():
        tool_def: dict[str, Any] = {
            "type": "function",
            "name": t.name,
            "description": t.description,
        }
        if t.parameters is not EmptyParams:
            tool_def["parameters"] = t.parameters.model_json_schema()
        out.append(tool_def)
    return out

_to_open_responses_input

_to_open_responses_input(
    msgs: list[ChatMessage],
) -> tuple[str | None, list[dict[str, Any]]]

Convert ChatMessage list to OpenResponses (instructions, input) tuple.

SystemMessage content is extracted as the instructions parameter. Other messages are converted to input items.

Returns:

Type Description
str | None

Tuple of (instructions, input_items) where instructions is the system

list[dict[str, Any]]

message content (or None) and input_items is the list of input items.

Source code in src/stirrup/clients/open_responses_client.py
def _to_open_responses_input(
    msgs: list[ChatMessage],
) -> tuple[str | None, list[dict[str, Any]]]:
    """Convert ChatMessage list to OpenResponses (instructions, input) tuple.

    SystemMessage content is extracted as the instructions parameter.
    Other messages are converted to input items.

    Returns:
        Tuple of (instructions, input_items) where instructions is the system
        message content (or None) and input_items is the list of input items.
    """
    instructions: str | None = None
    input_items: list[dict[str, Any]] = []

    for m in msgs:
        if isinstance(m, SystemMessage):
            # Extract system message as instructions
            if isinstance(m.content, str):
                instructions = m.content
            else:
                # Join text content blocks for instructions
                instructions = "\n".join(block if isinstance(block, str) else "" for block in m.content)
        elif isinstance(m, UserMessage):
            input_items.append(
                {
                    "role": "user",
                    "content": _content_to_open_responses_input(m.content),
                }
            )
        elif isinstance(m, AssistantMessage):
            # For assistant messages, we need to add them as response output items
            # First add any text content as a message item
            content_str = (
                m.content
                if isinstance(m.content, str)
                else "\n".join(block if isinstance(block, str) else "" for block in m.content)
            )
            if content_str:
                input_items.append(
                    {
                        "type": "message",
                        "role": "assistant",
                        "content": [{"type": "output_text", "text": content_str}],
                    }
                )

            # Add tool calls as separate function_call items
            input_items.extend(
                {
                    "type": "function_call",
                    "call_id": tc.tool_call_id,
                    "name": tc.name,
                    "arguments": tc.arguments,
                }
                for tc in m.tool_calls
            )
        elif isinstance(m, ToolMessage):
            # Tool results are function_call_output items
            content_str = m.content if isinstance(m.content, str) else str(m.content)
            input_items.append(
                {
                    "type": "function_call_output",
                    "call_id": m.tool_call_id,
                    "output": content_str,
                }
            )
        else:
            raise NotImplementedError(f"Unsupported message type: {type(m)}")

    return instructions, input_items

_get_attr

_get_attr(obj: Any, name: str, default: Any = None) -> Any

Get attribute from object or dict, with fallback default.

Source code in src/stirrup/clients/open_responses_client.py
def _get_attr(obj: Any, name: str, default: Any = None) -> Any:  # noqa: ANN401
    """Get attribute from object or dict, with fallback default."""
    if isinstance(obj, dict):
        return obj.get(name, default)
    return getattr(obj, name, default)

_parse_response_output

_parse_response_output(
    output: list[Any],
) -> tuple[str, list[ToolCall], Reasoning | None]

Parse response output items into content, tool_calls, and reasoning.

Parameters:

Name Type Description Default
output list[Any]

List of output items from the response.

required

Returns:

Type Description
tuple[str, list[ToolCall], Reasoning | None]

Tuple of (content_text, tool_calls, reasoning).

Source code in src/stirrup/clients/open_responses_client.py
def _parse_response_output(
    output: list[Any],
) -> tuple[str, list[ToolCall], Reasoning | None]:
    """Parse response output items into content, tool_calls, and reasoning.

    Args:
        output: List of output items from the response.

    Returns:
        Tuple of (content_text, tool_calls, reasoning).
    """
    content_parts: list[str] = []
    tool_calls: list[ToolCall] = []
    reasoning: Reasoning | None = None

    for item in output:
        item_type = _get_attr(item, "type")

        if item_type == "message":
            # Extract text content from message
            msg_content = _get_attr(item, "content", [])
            for content_item in msg_content:
                content_type = _get_attr(content_item, "type")
                if content_type == "output_text":
                    text = _get_attr(content_item, "text", "")
                    content_parts.append(text)

        elif item_type == "function_call":
            call_id = _get_attr(item, "call_id")
            name = _get_attr(item, "name")
            arguments = _get_attr(item, "arguments", "")
            tool_calls.append(
                ToolCall(
                    tool_call_id=call_id,
                    name=name,
                    arguments=arguments,
                )
            )

        elif item_type == "reasoning":
            # Extract reasoning/thinking content - try multiple possible attribute names
            # summary can be a list of Summary objects with .text attribute
            summary = _get_attr(item, "summary")
            if summary:
                if isinstance(summary, list):
                    # Extract text from Summary objects
                    thinking = "\n".join(_get_attr(s, "text", "") for s in summary if _get_attr(s, "text"))
                else:
                    thinking = str(summary)
            else:
                thinking = _get_attr(item, "thinking") or ""

            if thinking:
                reasoning = Reasoning(content=thinking)

    return "\n".join(content_parts), tool_calls, reasoning