OpenResponses Client
The OpenResponsesClient uses OpenAI's Responses API (POST /v1/responses) instead of the Chat Completions API. This client is useful for providers that implement the newer Responses API format.
Key Differences from ChatCompletionsClient
| Feature | ChatCompletionsClient | OpenResponsesClient |
|---|---|---|
| API endpoint | chat.completions.create() |
responses.create() |
| System messages | Included in messages array |
Passed as instructions parameter |
| Message format | {"role": "user", "content": [...]} |
{"role": "user", "content": [{"type": "input_text", ...}]} |
| Tool call IDs | tool_call_id |
call_id |
| Reasoning config | reasoning_effort param |
reasoning: {"effort": ...} object |
Usage
For models that support extended thinking (like o1/o3), you can configure the reasoning effort:
import asyncio
from stirrup import Agent
from stirrup.clients import OpenResponsesClient
async def main() -> None:
"""Run an agent using the OpenResponses API with a reasoning model."""
# Create client using OpenResponsesClient
# Uses the OpenAI Responses API (responses.create)
# For reasoning models, you can set reasoning_effort
client = OpenResponsesClient(
model="gpt-5.2",
reasoning_effort="medium",
)
agent = Agent(client=client, name="reasoning-agent", max_turns=19)
async with agent.session(output_dir="output/open_responses_example") as session:
_finish_params, _history, _metadata = await session.run(
"Plan a software release with these tasks: Design (5 days), Backend (10 days, needs Design), "
"Frontend (8 days, needs Design), Testing (4 days, needs Backend and Frontend), "
"Documentation (3 days, can start after Backend). Two developers are available. "
"What's the minimum time to complete? Output an Excel Gantt chart with the schedule."
)
if __name__ == "__main__":
asyncio.run(main())
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
str |
required | Model identifier (e.g., "gpt-4o", "o1") |
max_tokens |
int |
64_000 |
Maximum output tokens |
base_url |
str \| None |
None |
Custom API base URL |
api_key |
str \| None |
None |
API key (falls back to OPENAI_API_KEY env var) |
reasoning_effort |
str \| None |
None |
Reasoning effort for o1/o3 models: "low", "medium", "high" |
timeout |
float \| None |
None |
Request timeout in seconds |
max_retries |
int |
2 |
Number of retries for transient errors |
instructions |
str \| None |
None |
Default system instructions |
kwargs |
dict \| None |
None |
Additional arguments passed to responses.create() |
API Reference
stirrup.clients.open_responses_client
OpenAI SDK-based LLM client for the Responses API.
This client uses the official OpenAI Python SDK's responses.create() method,
supporting both OpenAI's API and any OpenAI-compatible endpoint that implements
the Responses API via the base_url parameter.
ChatMessage
ChatMessage = Annotated[
SystemMessage
| UserMessage
| AssistantMessage
| ToolMessage,
Field(discriminator=role),
]
Discriminated union of all message types, automatically parsed based on role field.
Content
Content = list[ContentBlock] | str
Message content: either a plain string or list of mixed content blocks.
ContextOverflowError
Bases: Exception
Raised when LLM context window is exceeded (max_tokens or length finish_reason).
AssistantMessage
Bases: BaseModel
LLM response message with optional tool calls and token usage tracking.
AudioContentBlock
Bases: BinaryContentBlock
Audio content supporting MPEG, WAV, AAC, and other common audio formats.
to_base64_url
Transcode to MP3 and return base64 data URL.
Source code in src/stirrup/core/models.py
EmptyParams
Bases: BaseModel
Empty parameter model for tools that don't require parameters.
ImageContentBlock
Bases: BinaryContentBlock
Image content supporting PNG, JPEG, WebP, PSD formats with automatic downscaling.
to_base64_url
to_base64_url(
max_pixels: int | None = RESOLUTION_1MP,
) -> str
Convert image to base64 data URL, optionally resizing to max pixel count.
Source code in src/stirrup/core/models.py
LLMClient
Bases: Protocol
Protocol defining the interface for LLM client implementations.
Any LLM client must implement this protocol to work with the Agent class. Provides text generation with tool support and model capability inspection.
Reasoning
Bases: BaseModel
Extended thinking/reasoning content from models that support chain-of-thought reasoning.
SystemMessage
Bases: BaseModel
System-level instructions and context for the LLM.
TokenUsage
Bases: BaseModel
Token counts for LLM usage (input, output, reasoning tokens).
__add__
__add__(other: TokenUsage) -> TokenUsage
Add two TokenUsage objects together, summing each field independently.
Source code in src/stirrup/core/models.py
Tool
Bases: BaseModel
Tool definition with name, description, parameter schema, and executor function.
Generic over
P: Parameter model type (Pydantic BaseModel subclass, or EmptyParams for parameterless tools) M: Metadata type (should implement Addable for aggregation; use None for tools without metadata)
Tools are simple, stateless callables. For tools requiring lifecycle management (setup/teardown, resource pooling), use a ToolProvider instead.
Example with parameters
Example without parameters (uses EmptyParams by default):
time_tool = Tool[EmptyParams, None](
name="time",
description="Get current time",
executor=lambda _: ToolResult(content=datetime.now().isoformat()),
)
ToolCall
ToolMessage
Bases: BaseModel
Tool execution result returned to the LLM.
Attributes:
| Name | Type | Description |
|---|---|---|
role |
Literal['tool']
|
Always "tool" |
content |
Content
|
The tool result content |
tool_call_id |
str | None
|
ID linking this result to the corresponding tool call |
name |
str | None
|
Name of the tool that was called |
args_was_valid |
bool
|
Whether the tool arguments were valid |
success |
bool
|
Whether the tool executed successfully (used by finish tool to control termination) |
UserMessage
Bases: BaseModel
User input message to the LLM.
VideoContentBlock
Bases: BinaryContentBlock
MP4 video content with automatic transcoding and resolution downscaling.
to_base64_url
to_base64_url(
max_pixels: int | None = RESOLUTION_480P,
fps: int | None = None,
) -> str
Transcode to MP4 and return base64 data URL.
Source code in src/stirrup/core/models.py
OpenResponsesClient
OpenResponsesClient(
model: str,
max_tokens: int = 64000,
*,
base_url: str | None = None,
api_key: str | None = None,
reasoning_effort: str | None = None,
timeout: float | None = None,
max_retries: int = 2,
instructions: str | None = None,
kwargs: dict[str, Any] | None = None,
)
Bases: LLMClient
OpenAI SDK-based client using the Responses API.
Uses the official OpenAI Python SDK's responses.create() method. Supports custom base_url for OpenAI-compatible providers that implement the Responses API.
Includes automatic retries for transient failures and token usage tracking.
Example
Standard OpenAI usage
client = OpenResponsesClient(model="gpt-4o", max_tokens=128_000)
Custom OpenAI-compatible endpoint
client = OpenResponsesClient( ... model="gpt-4o", ... base_url="http://localhost:8000/v1", ... api_key="your-api-key", ... )
Initialize OpenAI SDK client with model configuration for Responses API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Model identifier (e.g., 'gpt-4o', 'o1-preview'). |
required |
max_tokens
|
int
|
Maximum output tokens. Defaults to 64,000. |
64000
|
base_url
|
str | None
|
API base URL. If None, uses OpenAI's standard URL. Use for OpenAI-compatible providers. |
None
|
api_key
|
str | None
|
API key for authentication. If None, reads from OPENROUTER_API_KEY environment variable. |
None
|
reasoning_effort
|
str | None
|
Reasoning effort level for extended thinking models (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models. |
None
|
timeout
|
float | None
|
Request timeout in seconds. If None, uses OpenAI SDK default. |
None
|
max_retries
|
int
|
Number of retries for transient errors. Defaults to 2. |
2
|
instructions
|
str | None
|
Default system-level instructions. Can be overridden by SystemMessage in the messages list. |
None
|
kwargs
|
dict[str, Any] | None
|
Additional arguments passed to responses.create(). |
None
|
Source code in src/stirrup/clients/open_responses_client.py
generate
async
generate(
messages: list[ChatMessage], tools: dict[str, Tool]
) -> AssistantMessage
Generate assistant response with optional tool calls using Responses API.
Retries up to 3 times on transient errors (connection, timeout, rate limit, internal server errors) with exponential backoff.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[ChatMessage]
|
List of conversation messages. |
required |
tools
|
dict[str, Tool]
|
Dictionary mapping tool names to Tool objects. |
required |
Returns:
| Type | Description |
|---|---|
AssistantMessage
|
AssistantMessage containing the model's response, any tool calls, |
AssistantMessage
|
and token usage statistics. |
Raises:
| Type | Description |
|---|---|
ContextOverflowError
|
If the response is incomplete due to token limits. |
Source code in src/stirrup/clients/open_responses_client.py
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | |
_content_to_open_responses_input
Convert Content blocks to OpenResponses input content format.
Uses input_text for text content (vs output_text for responses).
Source code in src/stirrup/clients/open_responses_client.py
_content_to_open_responses_output
Convert Content blocks to OpenResponses output content format.
Uses output_text for assistant message content.
Source code in src/stirrup/clients/open_responses_client.py
_to_open_responses_tools
Convert Tool objects to OpenResponses function format.
OpenResponses API expects tools with name/description/parameters at top level, not nested under a 'function' key like Chat Completions API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tools
|
dict[str, Tool]
|
Dictionary mapping tool names to Tool objects. |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of tool definitions in OpenResponses format. |
Source code in src/stirrup/clients/open_responses_client.py
_to_open_responses_input
Convert ChatMessage list to OpenResponses (instructions, input) tuple.
SystemMessage content is extracted as the instructions parameter. Other messages are converted to input items.
Returns:
| Type | Description |
|---|---|
str | None
|
Tuple of (instructions, input_items) where instructions is the system |
list[dict[str, Any]]
|
message content (or None) and input_items is the list of input items. |
Source code in src/stirrup/clients/open_responses_client.py
_get_attr
Get attribute from object or dict, with fallback default.
Source code in src/stirrup/clients/open_responses_client.py
_parse_response_output
Parse response output items into content, tool_calls, and reasoning.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output
|
list[Any]
|
List of output items from the response. |
required |
Returns:
| Type | Description |
|---|---|
tuple[str, list[ToolCall], Reasoning | None]
|
Tuple of (content_text, tool_calls, reasoning). |