Skip to content

ChatCompletions Client

stirrup.clients.chat_completions_client

OpenAI SDK-based LLM client for chat completions.

This client uses the official OpenAI Python SDK directly, supporting both OpenAI's API and any OpenAI-compatible endpoint via the base_url parameter (e.g., vLLM, Ollama, Azure OpenAI, local models).

This is the default client for Stirrup.

__all__ module-attribute

__all__ = ['ChatCompletionsClient']

LOGGER module-attribute

LOGGER = getLogger(__name__)

ChatMessage

ChatMessage = Annotated[
    SystemMessage
    | UserMessage
    | AssistantMessage
    | ToolMessage,
    Field(discriminator=role),
]

Discriminated union of all message types, automatically parsed based on role field.

ContextOverflowError

Bases: Exception

Raised when LLM context window is exceeded (max_tokens or length finish_reason).

AssistantMessage

Bases: BaseModel

LLM response message with optional tool calls and token usage tracking.

LLMClient

Bases: Protocol

Protocol defining the interface for LLM client implementations.

Any LLM client must implement this protocol to work with the Agent class. Provides text generation with tool support and model capability inspection.

Reasoning

Bases: BaseModel

Extended thinking/reasoning content from models that support chain-of-thought reasoning.

TokenUsage

Bases: BaseModel

Token counts for LLM usage (input, output, reasoning tokens).

total property

total: int

Total token count across input, output, and reasoning.

__add__

__add__(other: TokenUsage) -> TokenUsage

Add two TokenUsage objects together, summing each field independently.

Source code in src/stirrup/core/models.py
def __add__(self, other: "TokenUsage") -> "TokenUsage":
    """Add two TokenUsage objects together, summing each field independently."""
    return TokenUsage(
        input=self.input + other.input,
        output=self.output + other.output,
        reasoning=self.reasoning + other.reasoning,
    )

Tool

Bases: BaseModel

Tool definition with name, description, parameter schema, and executor function.

Generic over

P: Parameter model type (must be a Pydantic BaseModel, or None for parameterless tools) M: Metadata type (should implement Addable for aggregation; use None for tools without metadata)

Tools are simple, stateless callables. For tools requiring lifecycle management (setup/teardown, resource pooling), use a ToolProvider instead.

Example with parameters

class CalcParams(BaseModel): expression: str

calc_tool = ToolCalcParams, None)

Example without parameters

time_tool = ToolNone, None)

ToolCall

Bases: BaseModel

Represents a tool invocation request from the LLM.

Attributes:

Name Type Description
name str

Name of the tool to invoke

arguments str

JSON string containing tool parameters

tool_call_id str | None

Unique identifier for tracking this tool call and its result

ChatCompletionsClient

ChatCompletionsClient(
    model: str,
    max_tokens: int = 64000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    supports_audio_input: bool = False,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    kwargs: dict[str, Any] | None = None,
)

Bases: LLMClient

OpenAI SDK-based client supporting OpenAI and OpenAI-compatible APIs.

Uses the official OpenAI Python SDK directly for chat completions. Supports custom base_url for OpenAI-compatible providers (vLLM, Ollama, Azure OpenAI, local models, etc.).

Includes automatic retries for transient failures and token usage tracking.

Example

Standard OpenAI usage

client = ChatCompletionsClient(model="gpt-4o", max_tokens=128_000)

Custom OpenAI-compatible endpoint

client = ChatCompletionsClient( ... model="llama-3.1-70b", ... base_url="http://localhost:8000/v1", ... api_key="your-api-key", ... )

Initialize OpenAI SDK client with model configuration.

Parameters:

Name Type Description Default
model str

Model identifier (e.g., 'gpt-5', 'gpt-4o', 'o1-preview').

required
max_tokens int

Maximum context window size in tokens. Defaults to 64,000.

64000
base_url str | None

API base URL. If None, uses OpenAI's standard URL. Use for OpenAI-compatible providers (e.g., 'http://localhost:8000/v1').

None
api_key str | None

API key for authentication. If None, reads from OPENROUTER_API_KEY environment variable.

None
supports_audio_input bool

Whether the model supports audio inputs. Defaults to False.

False
reasoning_effort str | None

Reasoning effort level for extended thinking models (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.

None
timeout float | None

Request timeout in seconds. If None, uses OpenAI SDK default.

None
max_retries int

Number of retries for transient errors. Defaults to 2. The OpenAI SDK handles retries internally with exponential backoff.

2
kwargs dict[str, Any] | None

Additional arguments passed to chat.completions.create().

None
Source code in src/stirrup/clients/chat_completions_client.py
def __init__(
    self,
    model: str,
    max_tokens: int = 64_000,
    *,
    base_url: str | None = None,
    api_key: str | None = None,
    supports_audio_input: bool = False,
    reasoning_effort: str | None = None,
    timeout: float | None = None,
    max_retries: int = 2,
    kwargs: dict[str, Any] | None = None,
) -> None:
    """Initialize OpenAI SDK client with model configuration.

    Args:
        model: Model identifier (e.g., 'gpt-5', 'gpt-4o', 'o1-preview').
        max_tokens: Maximum context window size in tokens. Defaults to 64,000.
        base_url: API base URL. If None, uses OpenAI's standard URL.
            Use for OpenAI-compatible providers (e.g., 'http://localhost:8000/v1').
        api_key: API key for authentication. If None, reads from OPENROUTER_API_KEY
            environment variable.
        supports_audio_input: Whether the model supports audio inputs. Defaults to False.
        reasoning_effort: Reasoning effort level for extended thinking models
            (e.g., 'low', 'medium', 'high'). Only used with o1/o3 style models.
        timeout: Request timeout in seconds. If None, uses OpenAI SDK default.
        max_retries: Number of retries for transient errors. Defaults to 2.
            The OpenAI SDK handles retries internally with exponential backoff.
        kwargs: Additional arguments passed to chat.completions.create().
    """
    self._model = model
    self._max_tokens = max_tokens
    self._supports_audio_input = supports_audio_input
    self._reasoning_effort = reasoning_effort
    self._kwargs = kwargs or {}

    # Initialize AsyncOpenAI client
    # Read from OPENROUTER_API_KEY if no api_key provided
    resolved_api_key = api_key or os.environ.get("OPENROUTER_API_KEY")
    self._client = AsyncOpenAI(
        api_key=resolved_api_key,
        base_url=base_url,
        timeout=timeout,
        max_retries=max_retries,
    )

max_tokens property

max_tokens: int

Maximum context window size in tokens.

model_slug property

model_slug: str

Model identifier.

generate async

generate(
    messages: list[ChatMessage], tools: dict[str, Tool]
) -> AssistantMessage

Generate assistant response with optional tool calls.

Retries up to 3 times on transient errors (connection, timeout, rate limit, internal server errors) with exponential backoff.

Parameters:

Name Type Description Default
messages list[ChatMessage]

List of conversation messages.

required
tools dict[str, Tool]

Dictionary mapping tool names to Tool objects.

required

Returns:

Type Description
AssistantMessage

AssistantMessage containing the model's response, any tool calls,

AssistantMessage

and token usage statistics.

Raises:

Type Description
ContextOverflowError

If the context window is exceeded.

Source code in src/stirrup/clients/chat_completions_client.py
@retry(
    retry=retry_if_exception_type(
        (
            APIConnectionError,
            APITimeoutError,
            RateLimitError,
            InternalServerError,
        )
    ),
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def generate(
    self,
    messages: list[ChatMessage],
    tools: dict[str, Tool],
) -> AssistantMessage:
    """Generate assistant response with optional tool calls.

    Retries up to 3 times on transient errors (connection, timeout, rate limit,
    internal server errors) with exponential backoff.

    Args:
        messages: List of conversation messages.
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        AssistantMessage containing the model's response, any tool calls,
        and token usage statistics.

    Raises:
        ContextOverflowError: If the context window is exceeded.
    """
    # Build request kwargs
    request_kwargs: dict[str, Any] = {
        "model": self._model,
        "messages": to_openai_messages(messages),
        "max_completion_tokens": self._max_tokens,
        **self._kwargs,
    }

    # Add tools if provided
    if tools:
        request_kwargs["tools"] = to_openai_tools(tools)
        request_kwargs["tool_choice"] = "auto"

    # Add reasoning effort if configured (for o1/o3 models)
    if self._reasoning_effort:
        request_kwargs["reasoning_effort"] = self._reasoning_effort

    # Make API call
    response = await self._client.chat.completions.create(**request_kwargs)

    choice = response.choices[0]

    # Check for context overflow
    if choice.finish_reason in ("max_tokens", "length"):
        raise ContextOverflowError(
            f"Maximal context window tokens reached for model {self.model_slug}, "
            f"resulting in finish reason: {choice.finish_reason}. "
            "Reduce agent.max_tokens and try again."
        )

    msg = choice.message

    # Parse reasoning content (for o1/o3 models with extended thinking)
    reasoning: Reasoning | None = None
    if hasattr(msg, "reasoning_content") and msg.reasoning_content:
        reasoning = Reasoning(content=msg.reasoning_content)

    # Parse tool calls
    tool_calls = [
        ToolCall(
            tool_call_id=tc.id,
            name=tc.function.name,
            arguments=tc.function.arguments or "",
        )
        for tc in (msg.tool_calls or [])
    ]

    # Parse token usage
    usage = response.usage
    input_tokens = usage.prompt_tokens if usage else 0
    output_tokens = usage.completion_tokens if usage else 0

    # Handle reasoning tokens if available (for o1/o3 models)
    reasoning_tokens = 0
    if usage and hasattr(usage, "completion_tokens_details") and usage.completion_tokens_details:
        reasoning_tokens = getattr(usage.completion_tokens_details, "reasoning_tokens", 0) or 0
        output_tokens = output_tokens - reasoning_tokens

    return AssistantMessage(
        reasoning=reasoning,
        content=msg.content or "",
        tool_calls=tool_calls,
        token_usage=TokenUsage(
            input=input_tokens,
            output=output_tokens,
            reasoning=reasoning_tokens,
        ),
    )

to_openai_messages

to_openai_messages(
    msgs: list[ChatMessage],
) -> list[dict[str, Any]]

Convert ChatMessage list to OpenAI-compatible message dictionaries.

Handles all message types: SystemMessage, UserMessage, AssistantMessage, and ToolMessage. Preserves reasoning content and tool calls for assistant messages.

Parameters:

Name Type Description Default
msgs list[ChatMessage]

List of ChatMessage objects (System, User, Assistant, or Tool messages).

required

Returns:

Type Description
list[dict[str, Any]]

List of message dictionaries ready for the OpenAI API.

Raises:

Type Description
NotImplementedError

If an unsupported message type is encountered.

Source code in src/stirrup/clients/utils.py
def to_openai_messages(msgs: list[ChatMessage]) -> list[dict[str, Any]]:
    """Convert ChatMessage list to OpenAI-compatible message dictionaries.

    Handles all message types: SystemMessage, UserMessage, AssistantMessage,
    and ToolMessage. Preserves reasoning content and tool calls for assistant
    messages.

    Args:
        msgs: List of ChatMessage objects (System, User, Assistant, or Tool messages).

    Returns:
        List of message dictionaries ready for the OpenAI API.

    Raises:
        NotImplementedError: If an unsupported message type is encountered.
    """
    out: list[dict[str, Any]] = []
    for m in msgs:
        if isinstance(m, SystemMessage):
            out.append({"role": "system", "content": content_to_openai(m.content)})
        elif isinstance(m, UserMessage):
            out.append({"role": "user", "content": content_to_openai(m.content)})
        elif isinstance(m, AssistantMessage):
            msg: dict[str, Any] = {"role": "assistant", "content": content_to_openai(m.content)}

            if m.reasoning:
                if m.reasoning.content:
                    msg["reasoning_content"] = m.reasoning.content

                if m.reasoning.signature:
                    msg["thinking_blocks"] = [
                        {"type": "thinking", "signature": m.reasoning.signature, "thinking": m.reasoning.content}
                    ]

            if m.tool_calls:
                msg["tool_calls"] = []
                for tool in m.tool_calls:
                    tool_dict = tool.model_dump()
                    tool_dict["id"] = tool.tool_call_id
                    tool_dict["type"] = "function"
                    tool_dict["function"] = {
                        "name": tool.name,
                        "arguments": tool.arguments,
                    }
                    msg["tool_calls"].append(tool_dict)

            out.append(msg)
        elif isinstance(m, ToolMessage):
            out.append(
                {
                    "role": "tool",
                    "content": content_to_openai(m.content),
                    "tool_call_id": m.tool_call_id,
                    "name": m.name,
                }
            )
        else:
            raise NotImplementedError(f"Unsupported message type: {type(m)}")

    return out

to_openai_tools

to_openai_tools(
    tools: dict[str, Tool],
) -> list[dict[str, Any]]

Convert Tool objects to OpenAI function calling format.

Parameters:

Name Type Description Default
tools dict[str, Tool]

Dictionary mapping tool names to Tool objects.

required

Returns:

Type Description
list[dict[str, Any]]

List of tool definitions in OpenAI's function calling format.

Example

tools = {"calculator": calculator_tool} openai_tools = to_openai_tools(tools)

Returns: [{"type": "function", "function": {"name": "calculator", ...}}]

Source code in src/stirrup/clients/utils.py
def to_openai_tools(tools: dict[str, Tool]) -> list[dict[str, Any]]:
    """Convert Tool objects to OpenAI function calling format.

    Args:
        tools: Dictionary mapping tool names to Tool objects.

    Returns:
        List of tool definitions in OpenAI's function calling format.

    Example:
        >>> tools = {"calculator": calculator_tool}
        >>> openai_tools = to_openai_tools(tools)
        >>> # Returns: [{"type": "function", "function": {"name": "calculator", ...}}]
    """
    out: list[dict[str, Any]] = []
    for t in tools.values():
        function: dict[str, Any] = {
            "name": t.name,
            "description": t.description,
        }
        if t.parameters is not None:
            function["parameters"] = t.parameters.model_json_schema()
        tool_payload: dict[str, Any] = {
            "type": "function",
            "function": function,
        }
        out.append(tool_payload)
    return out