AI Prompt Router System Design

LLM request routing system that classifies prompts by complexity and intent, then routes to the optimal model (small, medium, or large) to minimize cost while maintaining quality. Includes semantic caching, response evaluation, and automatic fallback to larger models.

ai
AI Prompt Router System Design
Requirements

Functional

  • Route prompts to the optimal LLM based on complexity and intent
  • Cache semantically similar prompts to reduce model calls
  • Fallback to larger models when response quality is low
  • Log all routing decisions and model responses for analytics
  • Support multiple model providers (Anthropic, OpenAI, etc.)

Non-Functional

  • P99 latency under 200ms for cached responses
  • Handle 10K routing decisions per second
  • 99.9% availability with graceful degradation
  • Cost optimization: reduce model spend by 40%+ vs always using largest model
  • Horizontally scalable routing layer
Published
March 23, 2026

Last updated March 23, 2026

Comments

Sign in to join the discussion

Sign in