Agno 2.6.1 Adds Claude Prompt Caching and Parallel MCP Web Backend

By FintechExtra
24 Apr 2026

Agno 2.6.1 delivers a focused update aimed at improving model efficiency and web context retrieval. The release adds more granular Claude prompt caching controls, introduces a new Parallel MCP backend for web context access, and updates OpenAI model string routing so newer Responses-based behavior is selected by default.

What Changed

The biggest addition in version 2.6.1 is expanded Claude prompt caching support. Agno now supports system_prompt_blocks on Claude, letting developers break system prompts into multiple cache-aware blocks with per-block cache flags and optional TTL settings such as 5 minutes or 1 hour. This gives teams finer control over which prompt sections should remain reusable across requests.

The release also adds a new cache_tools option for Claude integrations across Anthropic, AWS Bedrock, and Vertex AI. This applies cache control to the last tool so the tool prefix can be reused more effectively. In parallel, Agno now sorts tools deterministically by name in Model._format_tools, which helps keep request prefixes stable and improves actual prompt cache hit rates across Anthropic, OpenAI, Gemini, and Bedrock.

On the web retrieval side, Agno 2.6.1 introduces ParallelMCPBackend as a new backend for WebContextProvider. It connects to Parallel’s public MCP server and exposes web_search and web_fetch with compressed markdown output. The backend works without a key by default, supports Bearer authentication via PARALLEL_API_KEY for higher rate limits, and can optionally use OAuth. The default timeout is also raised to 30 seconds to better handle large-page fetches.

The release further updates OpenAI model string handling. The openai: prefix now maps to OpenAIResponses rather than OpenAIChat, while openai-chat: remains available for users who still need the previous behavior. That means a model string like openai:gpt-5.4 now resolves to the newer Responses-based interface by default.

Why It Matters

This update matters because it improves both efficiency and consistency for production AI agents. More precise Claude caching controls can reduce repeated prompt overhead, while deterministic tool ordering addresses a subtle but important issue that can otherwise prevent cache reuse even when prompts appear unchanged.

The new Parallel MCP backend also broadens Agno’s options for web-grounded agent workflows. Developers building assistants that depend on search and page retrieval can now plug into another MCP-compatible source with longer timeouts and flexible authentication paths.

Finally, the OpenAI routing change signals a shift toward the newer Responses API as the default experience. For teams standardizing model configuration strings across environments, that makes Agno’s behavior more aligned with current OpenAI platform direction while still preserving backward compatibility through the explicit openai-chat: fallback.

Official Source: https://github.com/agno-agi/agno/releases/tag/v2.6.1