AgentsMedium impactFor DevGitHub MCP Servers · May 18, 2026
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
MinishLab/semble
Semble is a Python-based fast and accurate code search tool designed for AI agents that reduces token usage by approximately 98% compared to grep+read methods.
Signal strength4.5/5·2,004 stars
Semble is a Python-based fast and accurate code search tool designed for AI agents that reduces token usage by approximately 98% compared to grep+read methods.
TL;DR
Semble is a Python-based fast and accurate code search tool designed for AI agents that reduces token usage by approximately 98% compared to grep+read methods.
What happened
MinishLab released Semble, a code search system leveraging embeddings and model context protocols to enable efficient retrieval for AI agents, significantly minimizing token consumption.
Why it matters
Reducing token usage in code search greatly optimizes prompt costs and latency for AI agents, improving scalability and responsiveness in agent-based development environments.
Generating deep dive...
AI-powered analysis takes a few seconds
The bigger picture
Semble epitomizes the next phase in AI agent tooling where embedding-centric retrieval replaces heuristic or substring matching, reflecting a paradigm shift in how large codebases are navigated intelligently. This efficiency leap suggests AI agent ecosystems are moving beyond treating code as mere text, instead leveraging learned semantic representations that align with model reasoning. The dramatic reduction in token usage signals that prompt cost optimization will become a foundational design criterion for agent frameworks. More broadly, embedding-based search systems like Semble enable AI agents to scale gracefully with repository size while maintaining responsiveness, accelerating the adoption of AI in continuous integration, automated refactoring, and intelligent developer assistants. It points to an emerging class of infrastructure that is AI-native and designed bottom-up to integrate tightly with model architectures and their contextual constraints.
Technical deep dive
Semble’s core innovation lies in its embedding pipeline, which converts code snippets into dense vector representations optimized for semantic similarity searches. This requires pre-processing code into logical chunks aligned with function or class boundaries to preserve context, followed by encoding via a specialized embedding model fine-tuned on code corpora. The system then indexes these embeddings in a vector database supporting fast approximate nearest neighbor search, allowing for rapid relevance scoring. When an AI agent issues a query, instead of sending the entire code text, Semble retrieves top-k highly relevant vectors, drastically reducing the tokens sent to the language model for downstream reasoning. Architecturally, Semble necessitates close integration with the agent’s context window management to efficiently swap retrieved code segments while respecting token limits. The approach also demands careful balancing of embedding granularity and prompt context length to optimize between precision and recall. Implementing Semble involves deploying the vector search service, embedding model, and integration middleware within a seamless pipeline that abstracts away the retrieval complexity from the agent developer.
Real-world applications
1
An AI-powered code review assistant that leverages Semble to quickly surface relevant function implementations for context-aware commentary without incurring excessive prompt token costs.
2
A continuous integration agent that uses Semble to identify dependencies and affected modules in large monorepos, enabling fast, targeted testing and build steps.
3
A developer productivity tool that embeds Semble for instant semantic search across multiple projects, delivering precise code snippets in IDEs with minimal latency.
4
A refactoring bot that utilizes Semble’s embedding-based retrieval to analyze and suggest structurally consistent code transformations across sprawling legacy codebases.
What to do now
Integrate Semble’s vector search backend into your AI agent pipelines to benchmark token consumption and retrieval latency compared to grep-based methods.
Re-examine your agent prompt design to leverage embedding-based retrieval mechanisms, focusing on balancing retrieval granularity with model context limitations.
Assess tooling and infrastructure readiness to deploy vector databases and embedding models at scale alongside your existing AI agents.
Investigate opportunities where expensive prompt tokens have limited your agent’s code understanding capabilities and pilot Semble to unlock those potential improvements.