AI Prompt Router System Design

LLM request routing system that classifies prompts by complexity and intent, then routes to the optimal model (small, medium, or large) to minimize cost while maintaining quality. Includes semantic caching, response evaluation, and automatic fallback to larger models.

ai

AI Prompt Router System Design

Requirements

Functional

Route prompts to the optimal LLM based on complexity and intent
Cache semantically similar prompts to reduce model calls
Fallback to larger models when response quality is low
Log all routing decisions and model responses for analytics
Support multiple model providers (Anthropic, OpenAI, etc.)

Non-Functional

P99 latency under 200ms for cached responses
Handle 10K routing decisions per second
99.9% availability with graceful degradation
Cost optimization: reduce model spend by 40%+ vs always using largest model
Horizontally scalable routing layer

Author

Published

March 23, 2026

Last updated March 23, 2026

Comments

Sign in to join the discussion

Back to Gallery