tinyagent is a small, dependency-light Python implementation of the HuggingFace Tiny Agents loop, built on any-llm. It gives you:
- A simple agent loop you can read in one sitting.
- Native MCP tool support (stdio, SSE, streamable HTTP).
- OpenTelemetry-based tracing of LLM calls and tool executions, including token counts and cost.
- A callback system for guardrails, metrics, and intentional cancellation.
- Optional A2A and MCP serving so your agent can run as a service.
This package was extracted from any-agent, which still uses tinyagent under the hood as one of its supported framework backends.
pip install mozilla-ai-tinyagentThe PyPI distribution name is mozilla-ai-tinyagent; the import name is tinyagent.
Optional extras:
pip install 'mozilla-ai-tinyagent[a2a]' # A2A serving
pip install 'mozilla-ai-tinyagent[composio]' # Composio tools
pip install 'mozilla-ai-tinyagent[all]' # everythingfrom tinyagent import TinyAgent, AgentConfig
from tinyagent.tools import search_web, visit_webpage
agent = TinyAgent.create(
AgentConfig(
model_id="mistral:mistral-small-latest",
instructions="Use the tools to find an answer.",
tools=[search_web, visit_webpage],
)
)
trace = agent.run("Which agent framework is the simplest?")
print(trace.final_output)model_id follows the any-llm provider syntax (provider:model). Set the relevant API key (e.g. MISTRAL_API_KEY, OPENAI_API_KEY) in your environment.
from tinyagent import TinyAgent, AgentConfig
from tinyagent.config import MCPStdio
agent = TinyAgent.create(
AgentConfig(
model_id="mistral:mistral-small-latest",
instructions="Use the available tools to answer.",
tools=[
MCPStdio(command="uvx", args=["duckduckgo-mcp-server"]),
],
)
)
trace = agent.run("What is the capital of Pennsylvania?")
print(trace.final_output)Every agent.run(...) returns an AgentTrace with the final output, the spans, token counts, and cost.
trace = agent.run("...")
print(trace.duration)
print(trace.tokens)
print(trace.cost)
for span in trace.spans:
print(span.name, span.attributes)Subclass Callback to observe or control execution. Each hook receives a Context and returns it, optionally mutating shared state.
from tinyagent.callbacks import Callback, Context
class LimitToolCalls(Callback):
def before_tool_execution(self, context: Context, *args, **kwargs) -> Context:
context.shared["count"] = context.shared.get("count", 0) + 1
if context.shared["count"] > 5:
raise StopIteration("Too many tool calls")
return contextSee AgentCancel for cancellation that preserves the trace.
tinyagent ships two evaluators in tinyagent.evaluation for grading agent runs against criteria.
LlmJudge answers a question about a fixed context with a single LLM call:
from tinyagent.evaluation import LlmJudge
judge = LlmJudge(model_id="mistral:mistral-small-latest")
result = judge.run(
context=trace.final_output,
question="Does the answer cite a primary source?",
)
print(result.passed, result.reasoning)AgentJudge runs an agent that has tools to inspect the full AgentTrace (token counts, spans, tool calls), so it can answer richer questions:
from tinyagent.evaluation import AgentJudge
judge = AgentJudge(model_id="mistral:mistral-small-latest")
eval_trace = judge.run(
trace=trace,
question="Did the agent use the search_web tool before answering?",
)
print(eval_trace.final_output)For deterministic checks (token budget, span shape, expected tool sequence) you can also walk the AgentTrace directly without an LLM. See docs/evaluation.md for the full guide and tradeoffs.
from tinyagent.serving import MCPServingConfig
agent = TinyAgent.create(...)
handle = await agent.serve_async(MCPServingConfig(port=8080))A2A serving is available with the [a2a] extra.
import nest_asyncio
nest_asyncio.apply()Apache 2.0.