AI Application Patterns
Chatbots, copilots, content generators, classification, recommendation, and search systems.
Every AI-powered application follows one of a handful of proven architectural patterns. Understanding these patterns — chatbots, copilots, content generators, classifiers, recommendation engines, and search systems — lets you choose the right approach for your product and avoid reinventing the wheel. This module covers each pattern's architecture, key components, and the production considerations that separate a demo from a shipped product.
Why Patterns Matter
Building an AI feature without a clear architectural pattern is like constructing a building without blueprints. You might get something that stands, but it will be fragile, expensive to maintain, and hard to scale. The patterns in this module have been refined across thousands of production deployments. They encode hard-won lessons about latency, cost, reliability, and user experience.
Each pattern addresses a different user need and comes with its own set of trade-offs. The key is matching the right pattern to your problem — not forcing every AI feature into the "chatbot" mold just because it's the most familiar.
Pattern 1: Chatbots
Chatbots are conversational interfaces that handle multi-turn dialogue with users. They range from simple FAQ responders to sophisticated customer support agents that can take actions, look up account data, and escalate to human agents when needed.
Architecture Overview
Chatbot architecture components:
User interface: Chat widget, messaging app integration, or full-page conversation view.
Conversation manager: Tracks conversation history, manages context windows, handles session state.
LLM layer: The language model that generates responses. Usually accessed via API (Claude, GPT, Gemini).
System prompt: Defines the chatbot's persona, rules, boundaries, and available tools.
Tool/action layer: Functions the chatbot can call — database lookups, API calls, order management, ticket creation.
Knowledge base: RAG (Retrieval-Augmented Generation) pipeline connecting the bot to your documents, FAQs, and policies.
Guardrails: Input/output filters for safety, topic boundaries, and PII redaction.
Common Use Cases
- Customer support: Handle tier-1 inquiries, look up order status, process returns, escalate complex issues to human agents
- Internal tools: IT help desks, HR policy bots, sales enablement assistants that query internal databases
- Onboarding: Guide new users through product setup with interactive, context-aware conversations
Production Considerations
- Context window management: Long conversations exceed context limits. Implement summarization of older messages or a sliding window approach that keeps the most recent turns plus a summary.
- Escalation paths: Always provide a way to reach a human. Define clear triggers: user frustration signals, repeated failures, or sensitive topics.
- Conversation persistence: Store conversation history so users can resume sessions and support agents can review context.
- Latency: Users expect sub-2-second first-token response times. Use streaming responses and consider smaller, faster models (Haiku, Flash) for simple routing decisions.
Pattern 2: Copilots
Copilots are AI assistants embedded directly within a product's workflow. Unlike chatbots, which exist as separate conversational interfaces, copilots work alongside the user in their existing tool — suggesting actions, generating content, and automating repetitive tasks in context.
Architecture Overview
Copilot architecture components:
Context collector: Gathers the user's current state — open document, cursor position, selected data, recent actions.
Intent detector: Determines what the user is trying to do based on their context and actions.
Suggestion engine: Generates inline suggestions, completions, or action recommendations.
Action executor: Applies accepted suggestions — inserts text, modifies data, triggers workflows.
Feedback loop: Tracks acceptance/rejection rates to improve suggestion quality over time.
Common Use Cases
- Code completion: GitHub Copilot, Cursor Tab — suggest code as the developer types, aware of the full project context
- Writing assistants: Email copilots that draft replies, document editors that suggest improvements, CRM tools that generate follow-up messages
- Data analysis: Spreadsheet copilots that suggest formulas, generate charts, or flag anomalies in datasets
- Design tools: AI-powered design assistants that generate layouts, suggest color palettes, or auto-resize assets
Production Considerations
- Latency is critical: Copilot suggestions must appear in under 500ms to feel responsive. Use speculative generation — start generating before the user explicitly asks.
- Non-intrusive UX: Suggestions should be easy to accept or dismiss. Ghost text (greyed-out inline suggestions) works well for text; side panels work for complex recommendations.
- Context assembly: The quality of a copilot lives and dies by its context. Gather the right signals — current file, recent edits, project structure, user preferences — and assemble them into an effective prompt.
Pattern 3: Content Generators
Content generators produce text, images, audio, or video based on user inputs. They differ from copilots in that the primary output is the generated content itself, not an augmentation of an existing workflow.
Architecture Overview
Content generator pipeline:
Input processing: Parse user request, extract parameters (tone, length, style, format).
Template/prompt selection: Choose the right system prompt or template based on content type.
Generation: Call the appropriate model — LLM for text, image model for visuals, TTS for audio.
Post-processing: Format output, apply brand guidelines, run quality checks, resize/crop media.
Review interface: Let users edit, regenerate, or approve before publishing.
Content Types and Model Choices
| Content Type | Models | Key Considerations |
|---|---|---|
| Marketing copy | Claude, GPT-5.4 | Brand voice consistency, A/B testing variants |
| Blog articles | Claude Opus, GPT-5.4 Pro | Factual accuracy, SEO optimization, originality |
| Images | GPT Image 1.5, Midjourney V7, FLUX | Style consistency, brand assets, resolution |
| Video | Sora 2, Runway Gen-4.5, Veo 2 | Length limits, consistency across scenes, cost |
| Audio/Voice | ElevenLabs v3, OpenAI TTS | Voice cloning rights, naturalness, emotion |
Production Considerations
- Human review workflows: Always include a review step before publishing AI-generated content. Automated quality checks can catch formatting issues, but factual accuracy and brand alignment need human eyes.
- Originality and plagiarism: LLMs can reproduce training data. Run generated text through plagiarism checkers for published content. For images, be aware of style replication concerns.
- Template systems: Build a library of tested prompts (templates) for each content type. This ensures consistency and lets non-technical team members generate content reliably.
Pattern 4: Classification Systems
Classification systems use AI to categorize inputs into predefined buckets. They're one of the most mature and reliable AI patterns because the output space is constrained — the model picks from a known set of categories rather than generating free-form text.
Architecture Overview
Classification pipeline:
Input preprocessing: Normalize text, extract relevant features, handle edge cases (empty input, multiple languages).
Classification model: LLM with structured output (JSON mode) or fine-tuned smaller model for high-volume use.
Confidence scoring: Output a confidence score alongside the classification to enable threshold-based routing.
Low-confidence handling: Route uncertain classifications to human reviewers or secondary models.
Feedback collection: Track accuracy and collect corrections to improve the system over time.
Common Use Cases
| Use Case | Categories | Approach |
|---|---|---|
| Sentiment analysis | Positive, negative, neutral, mixed | LLM with structured output or fine-tuned classifier |
| Intent detection | Buy, return, complain, inquire, etc. | LLM classifier as a router for chatbot systems |
| Content moderation | Safe, spam, toxic, NSFW, violence | Specialized models (OpenAI Moderation, Perspective API) or fine-tuned |
| Ticket routing | Billing, technical, sales, feature request | LLM with confidence scores plus rule-based fallbacks |
| Document categorization | Invoice, contract, report, correspondence | Multimodal model for scanned documents, LLM for digital text |
Production Considerations
- Use structured outputs: Force the model to return JSON with a predefined schema. This eliminates parsing errors and ensures consistent output format.
- Fine-tuned vs. prompted: For high-volume classification (10,000+ items/day), fine-tuning a smaller model is usually more cost-effective than prompting a large model. For low volume or rapidly changing categories, prompting is more flexible.
- Confidence thresholds: Set a minimum confidence score (e.g., 0.85). Route anything below the threshold to human review. This prevents silent misclassification.
Pattern 5: Recommendation Engines
Modern recommendation engines combine traditional collaborative filtering and content-based methods with LLM-enhanced understanding. The LLM adds the ability to understand nuanced preferences, generate explanations for recommendations, and handle cold-start problems where you have little data about a user.
Architecture Overview
Hybrid recommendation architecture:
User profile: Behavioral data (clicks, purchases, time spent), stated preferences, demographic signals.
Item catalog: Product/content embeddings — vector representations of each item using an embedding model.
Collaborative filtering: Traditional "users like you also liked" signals from behavioral data.
LLM enrichment: Natural-language understanding of user queries ("something cozy for a rainy day"), explanation generation, preference extraction from reviews.
Ranking layer: Combines scores from multiple signals (collaborative, content-based, LLM) into a final ranked list.
Diversity filter: Ensures recommendations aren't all from the same category or overly similar.
How LLMs Enhance Recommendations
- Natural-language queries: Users can describe what they want in plain English instead of navigating filter menus
- Explanation generation: "We recommend this because you enjoyed X and mentioned you prefer Y" — LLMs generate human-readable reasoning
- Cold-start solutions: For new users with no behavioral data, LLMs can extract preferences from a brief onboarding conversation
- Cross-domain understanding: LLMs can connect preferences across categories — someone who likes minimalist design in furniture might appreciate clean typography in books
Pattern 6: Search Systems
AI-powered search goes far beyond keyword matching. Semantic search understands the meaning behind queries, and hybrid search combines semantic understanding with traditional keyword matching for the best of both worlds. This is the backbone of RAG (Retrieval-Augmented Generation) systems.
Architecture Overview
Hybrid search architecture:
Ingestion pipeline: Documents are chunked, embedded (converted to vectors), and indexed in both a vector database and a keyword index.
Embedding model: Converts text to dense vector representations. Popular choices: OpenAI text-embedding-3-large, Cohere embed-v4, Voyage AI.
Vector database: Stores and searches embeddings for semantic similarity. Options: Pinecone, Weaviate, Qdrant, pgvector.
Keyword index: Traditional full-text search (Elasticsearch, Typesense) for exact matches, proper nouns, and specific terms.
Reranking: A cross-encoder model (Cohere Rerank, Jina Reranker) rescores results from both retrieval paths for higher precision.
Answer generation: An LLM synthesizes retrieved documents into a direct answer (the "generation" in RAG).
Semantic vs. Keyword vs. Hybrid
| Approach | Strengths | Weaknesses |
|---|---|---|
| Keyword (BM25) | Exact matches, proper nouns, product codes, fast | Misses synonyms, no semantic understanding |
| Semantic (vector) | Understands meaning, handles synonyms, works across languages | Can miss exact terms, higher latency, embedding cost |
| Hybrid (both) | Best of both — semantic understanding plus exact matching | More complex infrastructure, needs tuning of score weights |
Design Principles for AI Applications
Regardless of which pattern you choose, these principles apply across all AI-powered applications:
Latency Budgets
Every AI call adds latency. Define your latency budget upfront and design around it:
| Interaction Type | Target Latency | Strategy |
|---|---|---|
| Inline autocomplete | 100–300ms | Small/fast models, speculative generation, caching |
| Chat response | 1–3s first token | Streaming responses, balanced models (Sonnet, Flash) |
| Content generation | 5–30s total | Progress indicators, background processing, webhooks |
| Batch processing | Minutes to hours | Async queues, batch APIs, progress dashboards |
Fallback Strategies
AI systems fail. Models go down, rate limits hit, outputs are nonsensical. Build fallbacks at every layer:
- Model fallbacks: If your primary model (e.g., Claude Opus) is unavailable, automatically route to a secondary model (e.g., Claude Sonnet or GPT-5.4). Most AI gateway services like LiteLLM or Portkey handle this.
- Graceful degradation: If the AI feature fails entirely, the application should still work. A search system should fall back to keyword search. A copilot should let the user continue working manually.
- Output validation: Check that the model's response matches expected formats before using it. Retry with a clearer prompt if validation fails.
Human-in-the-Loop
The most reliable AI systems include human oversight at critical decision points:
- Review queues: Low-confidence outputs are routed to human reviewers before being acted upon
- Approval workflows: For high-stakes actions (sending emails, processing refunds, publishing content), require human approval
- Feedback mechanisms: Thumbs up/down, edit tracking, and correction workflows that feed back into system improvement
- Escalation triggers: Automatically escalate based on confidence scores, user frustration signals, or topic sensitivity
Cost Management
AI API costs can spiral quickly in production. Strategies to manage them:
- Model routing: Use cheap, fast models for simple tasks and expensive, powerful models only when needed. A classifier can determine complexity before routing.
- Caching: Cache responses for identical or near-identical inputs. Semantic caching (using embeddings to match similar queries to cached responses) can dramatically reduce API calls.
- Prompt optimization: Shorter prompts cost less. Remove unnecessary instructions, compress examples, and use references instead of inline content where possible.
- Batch APIs: For non-time-sensitive workloads, use batch processing endpoints which are typically 50% cheaper than real-time APIs.
Choosing the Right Pattern
| If You Need... | Use This Pattern | Start With |
|---|---|---|
| Conversational interface for users | Chatbot | Vercel AI SDK + Claude API |
| AI embedded in existing workflows | Copilot | Context-aware suggestion engine |
| Automated text, image, or video output | Content Generator | Prompt templates + review workflow |
| Categorize or label data at scale | Classifier | LLM + structured output + confidence scores |
| Personalized suggestions for users | Recommendation Engine | Embeddings + vector search + LLM reranking |
| Find information across documents | Search System | Hybrid search + RAG pipeline |
Resources
Anthropic's Guide to Building with Claude
Anthropic
Official documentation covering patterns for building production AI applications with Claude, including tool use, streaming, and structured outputs.
Vercel AI SDK
Vercel
Open-source TypeScript toolkit for building AI-powered applications with React. Supports streaming, tool calling, and structured outputs across multiple model providers.
Building AI Applications
DeepLearning.AI
Practical courses on building production AI applications including RAG, agents, and AI-powered search systems.
Key Takeaways
- 1Six core patterns cover most AI applications: chatbots, copilots, content generators, classifiers, recommendation engines, and search systems.
- 2Chatbots handle conversational interactions; copilots embed AI directly into existing workflows — choose based on where your users work.
- 3Classification systems are the most reliable AI pattern because the output space is constrained to predefined categories.
- 4Hybrid search (semantic + keyword) is the gold standard for RAG systems — use it from day one to avoid the failure modes of either approach alone.
- 5Latency budgets, fallback strategies, human-in-the-loop design, and cost management separate production systems from demos.
- 6Model routing — using cheap models for simple tasks and powerful models for complex ones — is the single most effective cost optimization strategy.
- 7Always calculate cost per user action and project monthly spend before launching an AI feature in production.
Test Your Understanding
Module Assessment
5 questions · Score 70% or higher to complete this module
You can retake the quiz as many times as you need. Your best score is saved.