AI Prompt Router System Design
LLM request routing system that classifies prompts by complexity and intent, then routes to the optimal model (small, medium, or large) to minimize cost while maintaining quality. Includes semantic caching, response evaluation, and automatic fallback to larger models.
ai
Requirements
Functional
- Route prompts to the optimal LLM based on complexity and intent
- Cache semantically similar prompts to reduce model calls
- Fallback to larger models when response quality is low
- Log all routing decisions and model responses for analytics
- Support multiple model providers (Anthropic, OpenAI, etc.)
Non-Functional
- P99 latency under 200ms for cached responses
- Handle 10K routing decisions per second
- 99.9% availability with graceful degradation
- Cost optimization: reduce model spend by 40%+ vs always using largest model
- Horizontally scalable routing layer
Author
Published
March 23, 2026
Last updated March 23, 2026