Skip to content

mozilla-ai/tinyagent

tinyagent

Python 3.11+ PyPI

A minimal agent framework with first-class tracing, callbacks, MCP, and serving.

What this is

tinyagent is a small, dependency-light Python implementation of the HuggingFace Tiny Agents loop, built on any-llm. It gives you:

  • A simple agent loop you can read in one sitting.
  • Native MCP tool support (stdio, SSE, streamable HTTP).
  • OpenTelemetry-based tracing of LLM calls and tool executions, including token counts and cost.
  • A callback system for guardrails, metrics, and intentional cancellation.
  • Optional A2A and MCP serving so your agent can run as a service.

This package was extracted from any-agent, which still uses tinyagent under the hood as one of its supported framework backends.

Install

pip install mozilla-ai-tinyagent

The PyPI distribution name is mozilla-ai-tinyagent; the import name is tinyagent.

Optional extras:

pip install 'mozilla-ai-tinyagent[a2a]'       # A2A serving
pip install 'mozilla-ai-tinyagent[composio]'  # Composio tools
pip install 'mozilla-ai-tinyagent[all]'       # everything

Quickstart

from tinyagent import TinyAgent, AgentConfig
from tinyagent.tools import search_web, visit_webpage

agent = TinyAgent.create(
    AgentConfig(
        model_id="mistral:mistral-small-latest",
        instructions="Use the tools to find an answer.",
        tools=[search_web, visit_webpage],
    )
)

trace = agent.run("Which agent framework is the simplest?")
print(trace.final_output)

model_id follows the any-llm provider syntax (provider:model). Set the relevant API key (e.g. MISTRAL_API_KEY, OPENAI_API_KEY) in your environment.

Use MCP tools

from tinyagent import TinyAgent, AgentConfig
from tinyagent.config import MCPStdio

agent = TinyAgent.create(
    AgentConfig(
        model_id="mistral:mistral-small-latest",
        instructions="Use the available tools to answer.",
        tools=[
            MCPStdio(command="uvx", args=["duckduckgo-mcp-server"]),
        ],
    )
)

trace = agent.run("What is the capital of Pennsylvania?")
print(trace.final_output)

Tracing

Every agent.run(...) returns an AgentTrace with the final output, the spans, token counts, and cost.

trace = agent.run("...")
print(trace.duration)
print(trace.tokens)
print(trace.cost)
for span in trace.spans:
    print(span.name, span.attributes)

Callbacks

Subclass Callback to observe or control execution. Each hook receives a Context and returns it, optionally mutating shared state.

from tinyagent.callbacks import Callback, Context

class LimitToolCalls(Callback):
    def before_tool_execution(self, context: Context, *args, **kwargs) -> Context:
        context.shared["count"] = context.shared.get("count", 0) + 1
        if context.shared["count"] > 5:
            raise StopIteration("Too many tool calls")
        return context

See AgentCancel for cancellation that preserves the trace.

Evaluate

tinyagent ships two evaluators in tinyagent.evaluation for grading agent runs against criteria.

LlmJudge answers a question about a fixed context with a single LLM call:

from tinyagent.evaluation import LlmJudge

judge = LlmJudge(model_id="mistral:mistral-small-latest")
result = judge.run(
    context=trace.final_output,
    question="Does the answer cite a primary source?",
)
print(result.passed, result.reasoning)

AgentJudge runs an agent that has tools to inspect the full AgentTrace (token counts, spans, tool calls), so it can answer richer questions:

from tinyagent.evaluation import AgentJudge

judge = AgentJudge(model_id="mistral:mistral-small-latest")
eval_trace = judge.run(
    trace=trace,
    question="Did the agent use the search_web tool before answering?",
)
print(eval_trace.final_output)

For deterministic checks (token budget, span shape, expected tool sequence) you can also walk the AgentTrace directly without an LLM. See docs/evaluation.md for the full guide and tradeoffs.

Serve as a service

from tinyagent.serving import MCPServingConfig

agent = TinyAgent.create(...)
handle = await agent.serve_async(MCPServingConfig(port=8080))

A2A serving is available with the [a2a] extra.

Running in Jupyter

import nest_asyncio
nest_asyncio.apply()

License

Apache 2.0.

About

Minimalist framework for running and monitoring an agent

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages