Skip to content

fix(lite_llm): preserve Anthropic thinking signatures across streaming tool-use turns#5437

Open
gurjot-05 wants to merge 4 commits intogoogle:mainfrom
gurjot-05:fix/anthropic-thinking-streaming-continuity
Open

fix(lite_llm): preserve Anthropic thinking signatures across streaming tool-use turns#5437
gurjot-05 wants to merge 4 commits intogoogle:mainfrom
gurjot-05:fix/anthropic-thinking-streaming-continuity

Conversation

@gurjot-05
Copy link
Copy Markdown

Summary

Fixes continuity of Anthropic extended thinking across streaming responses that include tool use. After the first tool call, thinking blocks stop appearing in Claude's responses because the signature carrying the prior thinking block is silently discarded during streaming.

Related to #4801 (closed, but the underlying streaming path is still broken in v1.31.x).

Repro

A minimal agent with one tool. Claude emits thinking on turn 1 but not after the tool result.

import asyncio
from google.adk.agents import Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import InMemoryRunner
from google.adk.tools import FunctionTool
from google.genai import types


def get_weather(city: str) -> dict:
    """Returns the weather for a city."""
    return {"city": city, "temp_c": 22, "condition": "sunny"}


agent = Agent(
    model=LiteLlm(
        model="bedrock/us.anthropic.claude-opus-4-6",  # or vertex_ai/... or anthropic/claude-opus-4-6
        extra_body={"thinking": {"type": "adaptive", "display": "summarized"}},
    ),
    name="repro",
    instruction="You are a helpful assistant. Use the weather tool when asked.",
    tools=[FunctionTool(func=get_weather)],
)


async def main() -> None:
    runner = InMemoryRunner(agent=agent, app_name="repro")
    session = await runner.session_service.create_session(app_name="repro", user_id="u1")

    async for event in runner.run_async(
        user_id="u1",
        session_id=session.id,
        new_message=types.Content(
            role="user",
            parts=[types.Part.from_text(text="What's the weather in Tokyo? Think it through.")],
        ),
    ):
        if event.content and event.content.parts:
            for p in event.content.parts:
                if p.thought:
                    print(f"[THOUGHT] {p.text[:80]}... sig={p.thought_signature!r}")
                elif p.function_call:
                    print(f"[TOOL CALL] {p.function_call.name}({p.function_call.args})")
                elif p.function_response:
                    print(f"[TOOL RESULT] {p.function_response.response}")
                elif p.text:
                    print(f"[TEXT] {p.text[:80]}")


asyncio.run(main())

Expected flow (after fix):

[THOUGHT] Let me figure out the weather in Tokyo...  sig=b'EqwDCl...'
[TOOL CALL] get_weather({'city': 'Tokyo'})
[TOOL RESULT] {'city': 'Tokyo', 'temp_c': 22, 'condition': 'sunny'}
[THOUGHT] Now I have the weather data. It's sunny and 22°C...  sig=b'ErEDCl...'
[TEXT] Tokyo is currently sunny at 22°C...

Actual flow (before fix) — the second thought is missing:

[THOUGHT] Let me figure out the weather in Tokyo...  sig=None
[TOOL CALL] get_weather({'city': 'Tokyo'})
[TOOL RESULT] {'city': 'Tokyo', 'temp_c': 22, 'condition': 'sunny'}
[TEXT] Tokyo is currently sunny at 22°C...

To see the underlying transport-level difference, enable litellm.set_verbose = True and inspect the outbound message history on the second LLM call. Without the fix, the assistant turn carries reasoning_content as a flat string and no thinking_blocks; with the fix, it carries thinking_blocks: [{..., "signature": "..."}] and reasoning_content is absent.

Non-tool-use chats keep producing thinking on every turn (each turn is fresh), which masks the bug for simple conversations.

Root cause — three-stage signature loss

Anthropic streams one thinking block as:

  1. Many deltas with text but signature: "".
  2. One final delta with thinking: "" and a non-empty signature (block_stop equivalent).

Three sequential problems in lite_llm.py together drop the signature:

1. _has_meaningful_signal discards signature-only deltas

The nested helper inside _model_response_to_chunk treats a delta as noise when content, tool_calls, function_call, reasoning_content, and reasoning are all falsy. The signature-only delta satisfies exactly that — only thinking_blocks is populated. The delta is set to None before _extract_reasoning_value ever runs, so the signature never enters the pipeline.

2. _convert_reasoning_value_to_parts skips empty-text blocks

Even if the delta reached it, if not thinking_text: continue would skip the block because its thinking field is empty, ignoring the non-empty signature.

3. _content_to_message_param Anthropic branch requires text+signature on the same part

The existing Anthropic branch (introduced after #4801) iterates reasoning_parts and requires part.text AND part.thought_signature on the same part. Streaming produces many text-only parts and exactly one signature-only part with empty text — so the branch always produces an empty thinking_blocks list and falls through to the legacy reasoning_content path.

All three problems must be fixed together. Any two without the third still breaks on streaming + tool use.

Fix

Three coordinated changes in src/google/adk/models/lite_llm.py:

  • _has_meaningful_signal: recognize thinking_blocks as signal so signature-only deltas survive.
  • _convert_reasoning_value_to_parts: preserve blocks with a non-empty signature even when thinking text is empty; store the signature on Part.thought_signature.
  • _content_to_message_param (Anthropic branch): aggregate text across streaming-split thought parts and attach the signature from whichever part carries it, emitting a single thinking_blocks entry.

Relation to #4999

#4999 changes the serialization format (thinking blocks inside content list vs. sibling thinking_blocks key) and is valuable in its own right. But it does not address the streaming-aggregation bug: no matter which outbound format is used, the signature is gone by the time you reach it, because the signature-only delta is dropped at _has_meaningful_signal before any serialization runs.

These changes are complementary to #4999 and sufficient on their own for the sibling-key path, which is already supported end-to-end by litellm ≥ 1.82.

Tests

  • Updated test_convert_reasoning_value_to_parts_skips_empty_thinking → renamed to ..._preserves_signature_only_blocks, reflecting the new contract (blocks with either text or signature are kept; fully empty blocks are still skipped).
  • New test_content_to_message_param_anthropic_aggregates_streaming_split_thinking covers the outbound aggregation end-to-end with multiple streaming-split thought parts plus a tool call.
  • New test_model_response_to_chunk_preserves_signature_only_delta covers the streaming-path fix: a delta with empty content and reasoning_content but thinking_blocks with a signature flows into a ReasoningChunk with thought_signature set.

Results:

Scope Passed
tests/unittests/models/test_litellm.py 247 (was 245, +2 new, 1 renamed)
tests/unittests/models/ (full directory) 612
tests/unittests/flows/ (adjacent area) 378

No regressions.

Verified end-to-end

Before the fix, a turn-2 outbound assistant message looks like:

{
  "role": "assistant",
  "tool_calls": [...],
  "reasoning_content": "The user wants me to:\n1. Research JavaScript\n2. ...",
  # no thinking_blocks
}

After the fix:

{
  "role": "assistant",
  "tool_calls": [...],
  "thinking_blocks": [{
    "type": "thinking",
    "thinking": "The user wants me to:\n1. Research JavaScript\n2. ...",
    "signature": "ErEDClsIDBACGAIqQEn0DQVP..."
  }]
  # reasoning_content absent (Anthropic branch took)
}

Verified on Claude Opus 4.6 via Vertex AI through a LiteLLM proxy across 6+ assistant turns with interleaved tool calls. Thinking continues on every turn and evolves with tool results (e.g. "The user approved the plan. Let me execute it...""I need to gather information on the latest geopolitical news...").

@google-cla
Copy link
Copy Markdown

google-cla Bot commented Apr 21, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the models [Component] Issues related to model support label Apr 21, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Apr 21, 2026

Response from ADK Triaging Agent

Hello @gurjot-05, thank you for your contribution!

Before we can merge this PR, could you please sign the Contributor License Agreement (CLA)? You can find more information in the PR checks.

Additionally, since this PR addresses a bug, could you please create a new issue describing the problem and associate it with this pull request? This helps us track fixes more effectively.

Thanks!

…eservation

Updates the existing _convert_reasoning_value_to_parts test to reflect the
new contract: signature-only blocks (empty thinking text) are preserved so
the signature survives streaming aggregation.

Adds two new tests:
- test_content_to_message_param_anthropic_aggregates_streaming_split_thinking
  covers the outbound aggregation: multiple streaming-split thought parts
  (text chunks plus a final signature-only chunk) are rejoined into one
  thinking_block for Anthropic models.
- test_model_response_to_chunk_preserves_signature_only_delta
  covers the streaming-path fix: _has_meaningful_signal recognizes
  thinking_blocks as signal, so a delta with empty content/reasoning_content
  but a signature survives into a ReasoningChunk.
@gurjot-05 gurjot-05 force-pushed the fix/anthropic-thinking-streaming-continuity branch from a094230 to 03c8681 Compare April 21, 2026 17:16
@gurjot-05
Copy link
Copy Markdown
Author

CLA issue is fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models [Component] Issues related to model support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants