feat(otel): instrument runtime with GenAI semantic conventions by tdabasinskas · Pull Request #2620 · docker/docker-agent

tdabasinskas · 2026-05-04T07:49:26Z

Adds end-to-end OpenTelemetry instrumentation following the GenAI semantic conventions:

Provider-layer chat/embeddings/rerank CLIENT spans with gen_ai.* attributes and the gen_ai.client.token.usage / operation.duration histograms.
Runtime spans (runtime.session, runtime.stream, runtime.fallback, runtime.tool.call, runtime.run_skill, runtime.task_transfer, runtime.handoff, background_agent.run).
MCP client + server spans with params._meta propagation, plus OAuth flow spans.
A2A endpoints wrapped with otelhttp and marked as invoke_agent.
Hook executor span with verdict/decision/reason annotation; subprocess trace context propagation for hooks, LSP servers, and sandbox docker exec.
Memory, RAG, sessiontitle, evaluation, anthropic-specific spans.
Built-in tool internals (shell, filesystem, fetch, lsp, codemode, ...) surface their work as span attributes.
W3C trace context + baggage propagation across all HTTP servers and clients.
Standard OTel resource attributes (service.*, host.*, process.*, os.type)

This PR wires two opt-in env vars beyond the default OTel SDK ones:

OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT — capture prompts, responses, tool arguments and tool results as span attributes. Off by default (PII surface).
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental — emit only the spec-defined gen_ai.* keys. Default is dual-emit (both gen_ai.* and the legacy tool.name / agent / session.id keys), so existing dashboards keep working alongside spec-aware tooling.

The diff is large — ~50 files, ~5k lines. It's split into 10 topical commits (telemetry primitives → SDK init → providers → runtime → hooks → MCP → A2A → servers/cold-start → memory/RAG → tool internals) so each commit is independently reviewable. Most of the volume is in the new pkg/telemetry/genai/ and pkg/telemetry/mcp/ packages, which are pure helpers; the surface-area changes elsewhere are 1-3 lines per call site.

dgageot · 2026-05-04T17:33:21Z

@tdabasinskas not sure why, GitHub doesn't want to merge this one, because of hypothetical merge conflicts. Could you rebase?

- `pkg/telemetry/genai/` provides the GenAI semantic-conventions surface: span helpers (`ChatSpan`, `EmbeddingSpan`, `FallbackSpan`, `SandboxSpan`, runtime helpers), attribute / operation-name / provider-name constants per the OTel GenAI semconv, conversation-id baggage round-trippers, error classification, content-capture gating (`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`), stability gating (`OTEL_SEMCONV_STABILITY_OPT_IN`), `gen_ai.client.token.usage` and operation-duration histograms, the `gen_ai.evaluation.result` log emitter, and process-boundary helpers (`InjectSandboxEnv`, `InjectTraceContextEnv`) - `pkg/telemetry/mcp/` provides MCP-specific telemetry: `ConversationIDFromBaggage`, span starters for client / server, `params._meta` propagation carrier, attribute constants, and metrics - Test files cover content gating, stability defaults, conversation propagation, and span lifecycle invariants

- `cmd/root/otel.go`: stand up `TracerProvider` / `MeterProvider` / `LoggerProvider` from a single `initOTelSDK` entry, configure OTLP/HTTP exporters with explicit-scheme endpoint normalization, set the global W3C trace-context + baggage propagator unconditionally, flush providers in dependency order, attach `service.*` / `host.*` / `process.*` / `os.type` / `host.arch` resource attributes, and use `AlwaysSample` so local agent sessions are not dropped by an upstream sampling decision - `pkg/httpclient/client.go`: add a `WrapWithOTel` round-tripper gated on a single `atomic.Bool` flipped by `initOTelSDK` (avoids the prior mismatch between `--otel` and the otelhttp wrap), plus `TracedDefaultClient` / `TracedClient` helpers for one-off HTTP calls - `cmd/root/sandbox.go`: open a host-side `sandbox.exec` span and inject the active W3C trace context as `-e KEY=VALUE` flags so processes inside the container chain onto the host trace - `cmd/root/new.go`, `cmd/root/otel_test.go`: wire tracer scope and cover the endpoint normalization / localhost detection cases - `go.mod` / `go.sum`: pull in `go.opentelemetry.io/otel` SDK + OTLP/HTTP exporters

…s and metrics - `pkg/model/provider/instrument.go`: decorator that wraps any `Provider` with a `chat {model}` CLIENT span (per OTel GenAI semconv), opt-in capture of `gen_ai.input.messages` / `gen_ai.output.messages` / `gen_ai.tool.definitions`, request/response attributes including the Anthropic spec-sum input-token computation (input + cache_read + cache_creation), `gen_ai.client.token.usage` histogram, and `gen_ai.client.operation.duration` histogram. Six wrapper variants preserve the EmbeddingProvider / RerankingProvider capability surfaces so RAG fallbacks round-trip correctly - `pkg/model/provider/factory.go`, `factory_test.go`: route construction through the decorator - `pkg/model/provider/anthropic/client.go`, `files.go`: add `anthropic.tokens.count` and `anthropic.files.get_or_upload` spans for the overflow-retry token-counting path and the file-upload cache-or-create path; drop the unnecessary `string(model)` cast

…n, skills, and background agents - `pkg/runtime/loop.go`: open `runtime.session` and `runtime.stream` INTERNAL spans seeded with `gen_ai.conversation.id` baggage at session start; mark the session span with `error.type=loop_detected` + `codes.Error` when the loop detector terminates - `pkg/runtime/fallback.go`, `pkg/runtime/cache.go`: wrap the fallback chain with a `runtime.fallback` span carrying primary/final model, attempts, outcome, cooldown state; record provider-cache hit/backing on the cache span - `pkg/runtime/agent_delegation.go`: emit `runtime.task_transfer` and `runtime.handoff` spans with `gen_ai.operation.name=invoke_agent` and `gen_ai.agent.name` - `pkg/runtime/skill_runner.go`: emit `invoke_workflow {skill}` per spec - `pkg/runtime/toolexec/dispatcher.go`: open `runtime.tool.call` and `runtime.tool.handler` spans with the GenAI execute_tool semconv, capture `gen_ai.tool.call.{arguments,result}` under the content-capture opt-in, and stamp `cagent.approval.{decision,source}` from `notifyApproval` so denied / canceled / read-only-allowed calls are distinguishable in trace dashboards - `pkg/runtime/compactor/compactor.go`: wrap compaction with a span that carries summary tokens and cost - `pkg/tools/builtin/agent/agent.go`: open a `background_agent.run` root span with a link back to the spawning context, and stamp `gen_ai.conversation.id` from baggage so the span participates in conversation-scoped queries - `pkg/tools/startable.go`, `pkg/toolinstall/registry.go`: wrap toolset Start with a `toolset.start` span so capability discovery latency is attributable

…race context - `pkg/hooks/executor.go`: open a single `hook.{event}` INTERNAL span per Dispatch covering every matched hook, then `annotateHookSpan` stamps the aggregated `Result` so denied / asked / allowed / modified-input / summary-provided cases are distinguishable. Verdict booleans and the structured decision/reason are unconditional; free-text `message` / `additional_context` / `system_message` / `summary` are gated on `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` - `pkg/hooks/handler.go`: append `genai.InjectTraceContextEnv(ctx)` to the hook subprocess env so script-driven hooks that emit OTel spans (or call instrumented CLIs / LLM endpoints) chain onto the parent `hook.{event}` span instead of producing orphaned roots

- `pkg/mcp/server.go`: route the MCP HTTP transport through `otelhttp.NewHandler` and `otelmcp.StartServer` so inbound requests carry `traceparent` / `baggage` and emit a SERVER span per call - `pkg/tools/mcp/session_client.go`: wrap MCP client calls (`tools/list`, `tools/call`, `prompts/list`) with CLIENT spans using the params._meta propagation carrier. Iterator wrappers open the span inside the iterator closure (not at call time) so unused iterators do not leak spans, and end on every exit path including early `yield` returns - `pkg/tools/mcp/oauth.go`, `oauth_helpers.go`, `oauth_login.go`, `oauth_server.go`: wrap interactive OAuth flow and token refresh with `oauth.flow` / `oauth.token.refresh` CLIENT spans, route metadata HTTP calls through `httpclient.TracedClient` / `TracedDefaultClient`, and emit `oauth.step` span events at each network sub-step boundary (`fetch_protected_resource_metadata`, `fetch_authorization_server_metadata`, `dynamic_client_registration`, `request_authorization_code`, `token_exchange`) so a failure can be attributed to a specific stage without descending into HTTP children

…nt semconv - `pkg/a2a/server.go`: wrap the agent-card and JSON-RPC endpoints with `otelhttp.NewHandler` so inbound A2A requests extract `traceparent` / `tracestate` / `baggage` and emit a SERVER span. The outer `agent-a2a` server wrap covers any auxiliary routes - `pkg/a2a/adapter.go`: in `runDockerAgent`, decorate the active SERVER span with `gen_ai.operation.name=invoke_agent`, `gen_ai.agent.name`, and `cagent.agent.name`. Wires the runtime tracer scope so per-invocation `runtime.session` / `runtime.stream` / `runtime.tool.call` chain onto the inbound A2A span instead of starting fresh trace ids per request

…ints, and add cold-start spans - `pkg/server/server.go`: wrap the agent-api Echo handler with `otelhttp.NewHandler` so inbound API requests extract `traceparent` / `tracestate` / `baggage` and the runtime spans started downstream chain onto the calling client trace - `pkg/server/session_manager.go`: wire the runtime tracer scope into per-session runtime construction; open a `session.runtime_init` INTERNAL span on the cold path (team load + runtime construction) so per-request first-use latency is attributable. Cached hits skip the span — they are a pointer load - `pkg/chatserver/server.go`, `pkg/chatserver/runtime_pool.go`: wrap the chat completions HTTP server with `otelhttp.NewHandler` and propagate the runtime tracer through the per-session pool - `pkg/teamloader/teamloader.go`: open a `teamloader.load` INTERNAL span around `LoadWithConfig` so the cold-start path (config parse, model alias resolution, OCI agent pulls, toolset starts) becomes attributable - `pkg/acp/agent.go`: wire the runtime tracer into the ACP entry point so its sub-spans share scope with CLI / API runs

- `pkg/memory/database/sqlite/sqlite.go`: open `memory.{op}` spans on `AddMemory`, `SearchMemories`, etc., with named-return error capture so failures attach to the span via `RecordError`. The search path additionally emits a `retrieval` semconv span for cross-tool dashboards - `pkg/rag/manager.go`: open `retrieval` (semconv) spans on `Query`, plus `rag.init` / `rag.reindex` / `rag.file_watcher` for lifecycle visibility - `pkg/sessiontitle/generator.go`: wrap title generation with a `sessiontitle.generate` span; named-return errors fold onto the span on failure - `pkg/evaluation/judge.go`: emit `gen_ai.evaluation.result` log events from the LLM-as-judge evaluator with score / explanation / error.type, linked to the active span via context for cross-signal join

- `pkg/tools/builtin/shell.go`, `script_shell.go`: stamp `cagent.tool.{shell,script_shell}.{cmd,cwd,timeout_seconds}` on the active `runtime.tool.handler` span. Cmd ships unconditionally because it is the main signal of what the agent did; redact at the OTel collector if commands carry secrets - `pkg/tools/builtin/filesystem.go`: stamp `cagent.tool.filesystem.{op,path,paths,path_count}` covering all file operations. Paths ship unconditionally for the same incident-response reason - `pkg/tools/builtin/fetch.go`: stamp `cagent.tool.fetch.{urls,url_count,format}`; each fetched URL still emits its own HTTP CLIENT child span via `httpclient.WrapWithOTel` - `pkg/tools/builtin/lsp.go`: wrap every tool from `lspTool` so each LSP RPC stamps `cagent.tool.lsp.{tool,read_only}` on the parent span - `pkg/tools/builtin/lsp_lifecycle.go`: inject `genai.InjectTraceContextEnv(ctx)` into the LSP server spawn env so OTel-aware language servers chain onto the agent trace - `pkg/tools/builtin/openapi.go`, `pkg/tools/builtin/api.go`: route the user-facing HTTP clients through `httpclient.WrapWithOTel(remote.NewTransport(ctx))` so each API call emits a CLIENT span and propagates `traceparent` - `pkg/tools/codemode/exec.go`: stamp `cagent.tool.codemode.{script,script_length,tool_call_count}` so a code-mode turn is visible as "ran N lines of JS that called M tools"

… attribute - Change `tool_call_response` parts to use `result` field instead of `content` to align with OTel GenAI semconv example schema - Cap `cagent.tool.filesystem.paths` attribute to 32 entries to prevent backends from dropping oversized attributes on multi-hundred-path calls - Always record `path_count` to preserve total fidelity when paths are truncated - Fix typo in `ApprovalSourcePermissionRequestHook` constant name (add missing `Allow` suffix) - Remove `t.Parallel()` from MCP tests that mutate global OTel state

…ttrs - `pkg/tools/codemode/exec.go`: emit `cagent.tool.codemode.script_hash` (SHA-256) + `script_length` unconditionally so dashboards can correlate identical scripts and spot oversize submissions, but gate the full `cagent.tool.codemode.script` body behind `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`. Codemode scripts are kilobyte-scale arbitrary JS that routinely embed auth tokens / pasted user data / inline secrets, so the bundle decision (Option B, ship body unconditionally) was the wrong call for this attribute specifically - `pkg/tools/builtin/fetch.go`: strip query strings, fragments, and userinfo from `cagent.tool.fetch.urls` so the attribute can ship by default without leaking signed-URL tokens, OAuth codes, or inline credentials. Path stays intact so dashboards still answer "which sites/endpoints did the agent hit?". Unparseable URLs are emitted as `<unparseable>` rather than passed through verbatim Both span attributes were flagged on the upstream PR review for the same root cause — emitting unbounded user-controlled content as a default-on telemetry attribute creates a PII/secret-exfiltration surface. The other Option B attributes (`shell.cmd`, `filesystem.path`, `script_shell.cmd`) stay unconditional: they are short, do not carry the same query-token / arbitrary-content risk, and remain decision-relevant for incident response

tdabasinskas · 2026-05-04T18:40:45Z

@tdabasinskas not sure why, GitHub doesn't want to merge this one, because of hypothetical merge conflicts. Could you rebase?

Done!

tdabasinskas requested a review from a team as a code owner May 4, 2026 07:49

tdabasinskas mentioned this pull request May 4, 2026

OTEL, again #393

Open

tdabasinskas marked this pull request as draft May 4, 2026 07:58

tdabasinskas marked this pull request as ready for review May 4, 2026 08:52

tdabasinskas force-pushed the feat/otel-genai-semconv branch from fa4a01d to 2a69313 Compare May 4, 2026 11:16

tdabasinskas added 12 commits May 4, 2026 21:35

tdabasinskas force-pushed the feat/otel-genai-semconv branch from 2a69313 to 9b08feb Compare May 4, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(otel): instrument runtime with GenAI semantic conventions#2620

feat(otel): instrument runtime with GenAI semantic conventions#2620
tdabasinskas wants to merge 12 commits intodocker:mainfrom
cogvel:feat/otel-genai-semconv

tdabasinskas commented May 4, 2026 •

edited

Loading

Uh oh!

dgageot commented May 4, 2026

Uh oh!

tdabasinskas commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tdabasinskas commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dgageot commented May 4, 2026

Uh oh!

tdabasinskas commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tdabasinskas commented May 4, 2026 •

edited

Loading