ChatCompletions Client

stirrup.clients.chat_completions_client

OpenAI SDK-based LLM client for chat completions.

This client uses the official OpenAI Python SDK directly, supporting both OpenAI's API and any OpenAI-compatible endpoint via the base_url parameter (e.g., vLLM, Ollama, Azure OpenAI, local models).

This is the default client for Stirrup.

all `module-attribute`

__all__ = ['ChatCompletionsClient']

LOGGER `module-attribute`

LOGGER = getLogger(__name__)

ChatMessage

ChatMessage = Annotated[
    SystemMessage
    | UserMessage
    | AssistantMessage
    | ToolMessage,
    Field(discriminator=role),
]

Discriminated union of all message types, automatically parsed based on role field.

ContextOverflowError

Bases: Exception

Raised when LLM context window is exceeded (max_tokens or length finish_reason).

AssistantMessage

Bases: BaseModel

LLM response message with optional tool calls and token usage tracking.

LLMClient

Bases: Protocol

Protocol defining the interface for LLM client implementations.

Any LLM client must implement this protocol to work with the Agent class. Provides text generation with tool support and model capability inspection.

Reasoning

Bases: BaseModel

Extended thinking/reasoning content from models that support chain-of-thought reasoning.

TokenUsage

Bases: BaseModel

Token counts for LLM usage (input, output, reasoning tokens).

total `property`

total: int

Total token count across input, output, and reasoning.

add

__add__(other: TokenUsage) -> TokenUsage

Add two TokenUsage objects together, summing each field independently.

Source code in src/stirrup/core/models.py

def __add__(self, other: "TokenUsage") -> "TokenUsage":
    """Add two TokenUsage objects together, summing each field independently."""
    return TokenUsage(
        input=self.input + other.input,
        output=self.output + other.output,
        reasoning=self.reasoning + other.reasoning,
    )

Tool

Bases: BaseModel

Tool definition with name, description, parameter schema, and executor function.

Generic over

P: Parameter model type (must be a Pydantic BaseModel, or None for parameterless tools) M: Metadata type (should implement Addable for aggregation; use None for tools without metadata)

Tools are simple, stateless callables. For tools requiring lifecycle management (setup/teardown, resource pooling), use a ToolProvider instead.

Example with parameters

class CalcParams(BaseModel): expression: str

calc_tool = ToolCalcParams, None)

Example without parameters

time_tool = ToolNone, None)

ToolCall

Bases: BaseModel

Represents a tool invocation request from the LLM.

Attributes:

Name	Type	Description
`name`	`str`	Name of the tool to invoke
`arguments`	`str`	JSON string containing tool parameters
`tool_call_id`	`str \| None`	Unique identifier for tracking this tool call and its result

ChatCompletionsClient

ChatCompletionsClient(
    model: str,
    max_tokens: int = 64000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    supports_audio_input: bool = False,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    kwargs: dict[str, Any] | None = None,
)

Bases: LLMClient

OpenAI SDK-based client supporting OpenAI and OpenAI-compatible APIs.

Uses the official OpenAI Python SDK directly for chat completions. Supports custom base_url for OpenAI-compatible providers (vLLM, Ollama, Azure OpenAI, local models, etc.).

Includes automatic retries for transient failures and token usage tracking.

Example

Standard OpenAI usage

client = ChatCompletionsClient(model="gpt-4o", max_tokens=128_000)

Custom OpenAI-compatible endpoint

client = ChatCompletionsClient( ... model="llama-3.1-70b", ... base_url="http://localhost:8000/v1", ... api_key="your-api-key", ... )

Initialize OpenAI SDK client with model configuration.

Parameters:

Name	Type	Description	Default
`model`	`str`	Model identifier (e.g., 'gpt-5', 'gpt-4o', 'o1-preview').	required
`max_tokens`	`int`	Maximum context window size in tokens. Defaults to 64,000.	`64000`
`base_url`	`str \| None`	API base URL. If None, uses OpenAI's standard URL. Use for OpenAI-compatible providers (e.g., 'http://localhost:8000/v1').	`None`
`api_key`	`str \| None`	API key for authentication. If None, reads from OPENROUTER_API_KEY environment variable.	`None`
`supports_audio_input`	`bool`	Whether the model supports audio inputs. Defaults to False.	`False`
`reasoning_effort`	`str \| None`	Reasoning effort level for extended thinking models (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.	`None`
`timeout`	`float \| None`	Request timeout in seconds. If None, uses OpenAI SDK default.	`None`
`max_retries`	`int`	Number of retries for transient errors. Defaults to 2. The OpenAI SDK handles retries internally with exponential backoff.	`2`
`kwargs`	`dict[str, Any] \| None`	Additional arguments passed to chat.completions.create().	`None`

Source code in src/stirrup/clients/chat_completions_client.py

def __init__(
    self,
    model: str,
    max_tokens: int = 64_000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    supports_audio_input: bool = False,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    kwargs: dict[str, Any] | None = None,
) -> None:
    """Initialize OpenAI SDK client with model configuration.

    Args:
        model: Model identifier (e.g., 'gpt-5', 'gpt-4o', 'o1-preview').
        max_tokens: Maximum context window size in tokens. Defaults to 64,000.
        base_url: API base URL. If None, uses OpenAI's standard URL.
            Use for OpenAI-compatible providers (e.g., 'http://localhost:8000/v1').
        api_key: API key for authentication. If None, reads from OPENROUTER_API_KEY
            environment variable.
        supports_audio_input: Whether the model supports audio inputs. Defaults to False.
        reasoning_effort: Reasoning effort level for extended thinking models
            (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.
        timeout: Request timeout in seconds. If None, uses OpenAI SDK default.
        max_retries: Number of retries for transient errors. Defaults to 2.
            The OpenAI SDK handles retries internally with exponential backoff.
        kwargs: Additional arguments passed to chat.completions.create().
    """
    self._model = model
    self._max_tokens = max_tokens
    self._supports_audio_input = supports_audio_input
    self._reasoning_effort = reasoning_effort
    self._kwargs = kwargs or {}

    # Initialize AsyncOpenAI client
    # Read from OPENROUTER_API_KEY if no api_key provided
    resolved_api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
    self._client = AsyncOpenAI(
        api_key=resolved_api_key,
        base_url=base_url,
        timeout=timeout,
        max_retries=max_retries,
    )

max_tokens `property`

max_tokens: int

Maximum context window size in tokens.

model_slug `property`

model_slug: str

Model identifier.

generate `async`

generate(
    messages: list[ChatMessage], tools: dict[str, Tool]
) -> AssistantMessage

Generate assistant response with optional tool calls.

Retries up to 3 times on transient errors (connection, timeout, rate limit, internal server errors) with exponential backoff.

Parameters:

Name	Type	Description	Default
`messages`	`list[ChatMessage]`	List of conversation messages.	required
`tools`	`dict[str, Tool]`	Dictionary mapping tool names to Tool objects.	required

Returns:

Type	Description
`AssistantMessage`	AssistantMessage containing the model's response, any tool calls,
`AssistantMessage`	and token usage statistics.

Raises:

Type	Description
`ContextOverflowError`	If the context window is exceeded.

Source code in src/stirrup/clients/chat_completions_client.py

@retry(
    retry=retry_if_exception_type(
        (
            APIConnectionError,
            APITimeoutError,
            RateLimitError,
            InternalServerError,
        )
    ),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def generate(
    self,
    messages: list[ChatMessage],
    tools: dict[str, Tool],
) -> AssistantMessage:
    """Generate assistant response with optional tool calls.

    Retries up to 3 times on transient errors (connection, timeout, rate limit,
    internal server errors) with exponential backoff.

    Args:
        messages: List of conversation messages.
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        AssistantMessage containing the model's response, any tool calls,
        and token usage statistics.

    Raises:
        ContextOverflowError: If the context window is exceeded.
    """
    # Build request kwargs
    request_kwargs: dict[str, Any] = {
        "model": self._model,
        "messages": to_openai_messages(messages),
        "max_completion_tokens": self._max_tokens,
        **self._kwargs,
    }

    # Add tools if provided
    if tools:
        request_kwargs["tools"] = to_openai_tools(tools)
        request_kwargs["tool_choice"] = "auto"

    # Add reasoning effort if configured (for o1/o3 models)
    if self._reasoning_effort:
        request_kwargs["reasoning_effort"] = self._reasoning_effort

    # Make API call
    response = await self._client.chat.completions.create(**request_kwargs)

    choice = response.choices[0]

    # Check for context overflow
    if choice.finish_reason in ("max_tokens", "length"):
        raise ContextOverflowError(
            f"Maximal context window tokens reached for model {self.model_slug}, "
            f"resulting in finish reason: {choice.finish_reason}. "
            "Reduce agent.max_tokens and try again."
        )

    msg = choice.message

    # Parse reasoning content (for o1/o3 models with extended thinking)
    reasoning: Reasoning | None = None
    if hasattr(msg, "reasoning_content") and msg.reasoning_content:
        reasoning = Reasoning(content=msg.reasoning_content)

    # Parse tool calls
    tool_calls = [
        ToolCall(
            tool_call_id=tc.id,
            name=tc.function.name,
            arguments=tc.function.arguments or "",
        )
        for tc in (msg.tool_calls or [])
    ]

    # Parse token usage
    usage = response.usage
    input_tokens = usage.prompt_tokens if usage else 0
    output_tokens = usage.completion_tokens if usage else 0

    # Handle reasoning tokens if available (for o1/o3 models)
    reasoning_tokens = 0
    if usage and hasattr(usage, "completion_tokens_details") and usage.completion_tokens_details:
        reasoning_tokens = getattr(usage.completion_tokens_details, "reasoning_tokens", 0) or 0
        output_tokens = output_tokens - reasoning_tokens

    return AssistantMessage(
        reasoning=reasoning,
        content=msg.content or "",
        tool_calls=tool_calls,
        token_usage=TokenUsage(
            input=input_tokens,
            output=output_tokens,
            reasoning=reasoning_tokens,
        ),
    )

to_openai_messages

to_openai_messages(
    msgs: list[ChatMessage],
) -> list[dict[str, Any]]

Convert ChatMessage list to OpenAI-compatible message dictionaries.

Handles all message types: SystemMessage, UserMessage, AssistantMessage, and ToolMessage. Preserves reasoning content and tool calls for assistant messages.

Parameters:

Name	Type	Description	Default
`msgs`	`list[ChatMessage]`	List of ChatMessage objects (System, User, Assistant, or Tool messages).	required

Returns:

Type	Description
`list[dict[str, Any]]`	List of message dictionaries ready for the OpenAI API.

Raises:

Type	Description
`NotImplementedError`	If an unsupported message type is encountered.

Source code in src/stirrup/clients/utils.py

def to_openai_messages(msgs: list[ChatMessage]) -> list[dict[str, Any]]:
    """Convert ChatMessage list to OpenAI-compatible message dictionaries.

    Handles all message types: SystemMessage, UserMessage, AssistantMessage,
    and ToolMessage. Preserves reasoning content and tool calls for assistant
    messages.

    Args:
        msgs: List of ChatMessage objects (System, User, Assistant, or Tool messages).

    Returns:
        List of message dictionaries ready for the OpenAI API.

    Raises:
        NotImplementedError: If an unsupported message type is encountered.
    """
    out: list[dict[str, Any]] = []
    for m in msgs:
        if isinstance(m, SystemMessage):
            out.append({"role": "system", "content": content_to_openai(m.content)})
        elif isinstance(m, UserMessage):
            out.append({"role": "user", "content": content_to_openai(m.content)})
        elif isinstance(m, AssistantMessage):
            msg: dict[str, Any] = {"role": "assistant", "content": content_to_openai(m.content)}

            if m.reasoning:
                if m.reasoning.content:
                    msg["reasoning_content"] = m.reasoning.content

                if m.reasoning.signature:
                    msg["thinking_blocks"] = [
                        {"type": "thinking", "signature": m.reasoning.signature, "thinking": m.reasoning.content}
                    ]

            if m.tool_calls:
                msg["tool_calls"] = []
                for tool in m.tool_calls:
                    tool_dict = tool.model_dump()
                    tool_dict["id"] = tool.tool_call_id
                    tool_dict["type"] = "function"
                    tool_dict["function"] = {
                        "name": tool.name,
                        "arguments": tool.arguments,
                    }
                    msg["tool_calls"].append(tool_dict)

            out.append(msg)
        elif isinstance(m, ToolMessage):
            out.append(
                {
                    "role": "tool",
                    "content": content_to_openai(m.content),
                    "tool_call_id": m.tool_call_id,
                    "name": m.name,
                }
            )
        else:
            raise NotImplementedError(f"Unsupported message type: {type(m)}")

    return out

to_openai_tools

to_openai_tools(
    tools: dict[str, Tool],
) -> list[dict[str, Any]]

Convert Tool objects to OpenAI function calling format.

Parameters:

Name	Type	Description	Default
`tools`	`dict[str, Tool]`	Dictionary mapping tool names to Tool objects.	required

Returns:

Type	Description
`list[dict[str, Any]]`	List of tool definitions in OpenAI's function calling format.

Example

tools = {"calculator": calculator_tool} openai_tools = to_openai_tools(tools)

Returns: [{"type": "function", "function": {"name": "calculator", ...}}]

Source code in src/stirrup/clients/utils.py

def to_openai_tools(tools: dict[str, Tool]) -> list[dict[str, Any]]:
    """Convert Tool objects to OpenAI function calling format.

    Args:
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        List of tool definitions in OpenAI's function calling format.

    Example:
        >>> tools = {"calculator": calculator_tool}
        >>> openai_tools = to_openai_tools(tools)
        >>> # Returns: [{"type": "function", "function": {"name": "calculator", ...}}]
    """
    out: list[dict[str, Any]] = []
    for t in tools.values():
        function: dict[str, Any] = {
            "name": t.name,
            "description": t.description,
        }
        if t.parameters is not None:
            function["parameters"] = t.parameters.model_json_schema()
        tool_payload: dict[str, Any] = {
            "type": "function",
            "function": function,
        }
        out.append(tool_payload)
    return out

ChatCompletions Client

stirrup.clients.chat_completions_client

__all__ module-attribute

LOGGER module-attribute

ChatMessage

ContextOverflowError

AssistantMessage

LLMClient

Reasoning

TokenUsage

total property

__add__

Tool

ToolCall

ChatCompletionsClient

Standard OpenAI usage

Custom OpenAI-compatible endpoint

max_tokens property

model_slug property

generate async

to_openai_messages

to_openai_tools

Returns: [{"type": "function", "function": {"name": "calculator", ...}}]

all `module-attribute`

LOGGER `module-attribute`

total `property`

add

max_tokens `property`

model_slug `property`

generate `async`