fix(lite_llm): preserve Anthropic thinking signatures across streaming tool-use turns#5437
fix(lite_llm): preserve Anthropic thinking signatures across streaming tool-use turns#5437gurjot-05 wants to merge 4 commits intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @gurjot-05, thank you for your contribution! Before we can merge this PR, could you please sign the Contributor License Agreement (CLA)? You can find more information in the PR checks. Additionally, since this PR addresses a bug, could you please create a new issue describing the problem and associate it with this pull request? This helps us track fixes more effectively. Thanks! |
…eservation Updates the existing _convert_reasoning_value_to_parts test to reflect the new contract: signature-only blocks (empty thinking text) are preserved so the signature survives streaming aggregation. Adds two new tests: - test_content_to_message_param_anthropic_aggregates_streaming_split_thinking covers the outbound aggregation: multiple streaming-split thought parts (text chunks plus a final signature-only chunk) are rejoined into one thinking_block for Anthropic models. - test_model_response_to_chunk_preserves_signature_only_delta covers the streaming-path fix: _has_meaningful_signal recognizes thinking_blocks as signal, so a delta with empty content/reasoning_content but a signature survives into a ReasoningChunk.
a094230 to
03c8681
Compare
|
CLA issue is fixed |
Summary
Fixes continuity of Anthropic extended thinking across streaming responses that include tool use. After the first tool call, thinking blocks stop appearing in Claude's responses because the signature carrying the prior thinking block is silently discarded during streaming.
Related to #4801 (closed, but the underlying streaming path is still broken in v1.31.x).
Repro
A minimal agent with one tool. Claude emits thinking on turn 1 but not after the tool result.
Expected flow (after fix):
Actual flow (before fix) — the second thought is missing:
To see the underlying transport-level difference, enable
litellm.set_verbose = Trueand inspect the outbound message history on the second LLM call. Without the fix, the assistant turn carriesreasoning_contentas a flat string and nothinking_blocks; with the fix, it carriesthinking_blocks: [{..., "signature": "..."}]andreasoning_contentis absent.Non-tool-use chats keep producing thinking on every turn (each turn is fresh), which masks the bug for simple conversations.
Root cause — three-stage signature loss
Anthropic streams one thinking block as:
signature: "".thinking: ""and a non-emptysignature(block_stop equivalent).Three sequential problems in
lite_llm.pytogether drop the signature:1.
_has_meaningful_signaldiscards signature-only deltasThe nested helper inside
_model_response_to_chunktreats a delta as noise whencontent,tool_calls,function_call,reasoning_content, andreasoningare all falsy. The signature-only delta satisfies exactly that — onlythinking_blocksis populated. The delta is set toNonebefore_extract_reasoning_valueever runs, so the signature never enters the pipeline.2.
_convert_reasoning_value_to_partsskips empty-text blocksEven if the delta reached it,
if not thinking_text: continuewould skip the block because itsthinkingfield is empty, ignoring the non-emptysignature.3.
_content_to_message_paramAnthropic branch requires text+signature on the same partThe existing Anthropic branch (introduced after #4801) iterates
reasoning_partsand requirespart.text AND part.thought_signatureon the same part. Streaming produces many text-only parts and exactly one signature-only part with empty text — so the branch always produces an emptythinking_blockslist and falls through to the legacyreasoning_contentpath.All three problems must be fixed together. Any two without the third still breaks on streaming + tool use.
Fix
Three coordinated changes in
src/google/adk/models/lite_llm.py:_has_meaningful_signal: recognizethinking_blocksas signal so signature-only deltas survive._convert_reasoning_value_to_parts: preserve blocks with a non-emptysignatureeven whenthinkingtext is empty; store the signature onPart.thought_signature._content_to_message_param(Anthropic branch): aggregate text across streaming-split thought parts and attach the signature from whichever part carries it, emitting a singlethinking_blocksentry.Relation to #4999
#4999 changes the serialization format (thinking blocks inside
contentlist vs. siblingthinking_blockskey) and is valuable in its own right. But it does not address the streaming-aggregation bug: no matter which outbound format is used, the signature is gone by the time you reach it, because the signature-only delta is dropped at_has_meaningful_signalbefore any serialization runs.These changes are complementary to #4999 and sufficient on their own for the sibling-key path, which is already supported end-to-end by litellm ≥ 1.82.
Tests
test_convert_reasoning_value_to_parts_skips_empty_thinking→ renamed to..._preserves_signature_only_blocks, reflecting the new contract (blocks with either text or signature are kept; fully empty blocks are still skipped).test_content_to_message_param_anthropic_aggregates_streaming_split_thinkingcovers the outbound aggregation end-to-end with multiple streaming-split thought parts plus a tool call.test_model_response_to_chunk_preserves_signature_only_deltacovers the streaming-path fix: a delta with emptycontentandreasoning_contentbutthinking_blockswith a signature flows into aReasoningChunkwiththought_signatureset.Results:
tests/unittests/models/test_litellm.pytests/unittests/models/(full directory)tests/unittests/flows/(adjacent area)No regressions.
Verified end-to-end
Before the fix, a turn-2 outbound assistant message looks like:
{ "role": "assistant", "tool_calls": [...], "reasoning_content": "The user wants me to:\n1. Research JavaScript\n2. ...", # no thinking_blocks }After the fix:
{ "role": "assistant", "tool_calls": [...], "thinking_blocks": [{ "type": "thinking", "thinking": "The user wants me to:\n1. Research JavaScript\n2. ...", "signature": "ErEDClsIDBACGAIqQEn0DQVP..." }] # reasoning_content absent (Anthropic branch took) }Verified on Claude Opus 4.6 via Vertex AI through a LiteLLM proxy across 6+ assistant turns with interleaved tool calls. Thinking continues on every turn and evolves with tool results (e.g. "The user approved the plan. Let me execute it..." → "I need to gather information on the latest geopolitical news...").