Enterprise Agentic RAG Chatbot¶
Status: Production — Deployed on VPS Type: Company Project (Private Repository)
Executive Summary¶
An AI-powered help assistant for an enterprise advertising management platform, built with Agentic RAG — combining LangGraph multi-agent orchestration with self-reflective retrieval for accurate, citation-backed answers from internal documentation.
Unlike traditional RAG systems, this chatbot uses autonomous agents that reason about query intent, evaluate retrieval quality, and retry with refined strategies when initial results are insufficient.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Agentic RAG Stack │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ Frontend │───▶│ Backend │───▶│ RAG Pipeline │ │
│ │ (React) │ │ (FastAPI)│ │ ├─ Query Analyzer │ │
│ │ :3100 │ │ :8000 │ │ ├─ Query Router │ │
│ └──────────┘ └────┬─────┘ │ ├─ Hybrid Retriever │ │
│ │ │ ├─ Self-Reflection │ │
│ │ │ ├─ Reranker │ │
│ │ │ └─ LLM Generator │ │
│ │ └──────────────────────┘ │
│ │ │ │
│ ┌────────┼────────────────────┤ │
│ ▼ ▼ ▼ │
│ ┌────────┐ ┌────────┐ ┌────────────┐ │
│ │ Redis │ │Postgres│ │ LLM APIs │ │
│ │ Cache │ │pgvector│ │ (fallback) │ │
│ └────────┘ └────────┘ └────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Observability │ │
│ │ ┌─────────┐ ┌───────────┐ ┌─────────┐ │ │
│ │ │Langfuse │ │Prometheus │ │ Grafana │ │ │
│ │ └─────────┘ └───────────┘ └─────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
What Makes It "Agentic"¶
Traditional RAG follows a fixed retrieve → generate pipeline. This system uses LangGraph to create autonomous agents that:
- Query Analysis — Classifies intent, extracts entities, detects language (Indonesian + English)
- Query Routing — Determines optimal retrieval strategy based on query type
- Hybrid Retrieval — Dense vector search (pgvector) + BM25 keyword search with RRF fusion
- Self-Reflection — Evaluates retrieval quality and automatically retries with refined queries if results are insufficient
- Reranking — Cross-encoder reranking for precision
- Generation — Streaming response with source citations
The self-reflection loop is the key differentiator — the agent can recognize when retrieved context doesn't adequately answer the question and autonomously refine its search strategy.
Technology Stack¶
| Layer | Technology |
|---|---|
| Frontend | React 18, Vite 5, TypeScript, shadcn/ui, Tailwind CSS, Zustand |
| Backend | Python 3.11+, FastAPI, Pydantic v2 |
| Agent Framework | LangChain, LangGraph |
| Database | PostgreSQL 16 + pgvector (HNSW index) |
| Vector Store | pgvector / Qdrant |
| Embeddings | BAAI/bge-m3 (1024-dim, multilingual) |
| Cache | Redis 7 (semantic caching) |
| LLM Providers | Multi-provider fallback: Groq → Cerebras → Gemini → Ollama |
| Observability | Langfuse (RAG tracing), Prometheus, Grafana |
| Deployment | Docker Compose, production VPS |
Key Features¶
Multi-Agent System (14 Specialized Agents)¶
The project uses a team of 14 specialized Claude Code agents for development, each with distinct responsibilities:
- Implementation agents: RAG architect, backend dev, frontend dev, chatbot developer, vector DB engineer, document processor, API designer, database architect, DevOps
- Quality agents: Test engineer, RAG evaluator, security auditor, code reviewer
- Coordination: Project lead
Hybrid Search with Self-Correction¶
- Dense vector search captures semantic meaning
- BM25 keyword search catches exact terminology
- RRF fusion combines both for 10-15% accuracy improvement
- Self-reflection agent evaluates and retries when needed
Multilingual Support¶
- Indonesian and English language detection
- BGE-m3 multilingual embeddings (1024-dim)
- Language-aware prompt templates
Production Observability¶
- Langfuse: Full RAG pipeline tracing (query → retrieval → generation)
- Prometheus: System and application metrics
- Grafana: Real-time dashboards and alerting
Streaming Responses¶
- Server-Sent Events (SSE) for real-time token streaming
- Responsive UI with incremental rendering
Performance¶
| Metric | Target |
|---|---|
| Health check | <50ms |
| End-to-end RAG | 2-5s |
| Cache hit rate | 75%+ |
| Retrieval accuracy | ≥75% top-1 |
| Citation coverage | ≥90% |
| Hallucination rate | ≤5% |
Evaluation System¶
- Golden dataset with curated Q&A pairs
- Automated retrieval evaluation (Recall, MRR, Faithfulness)
- Nightly regression testing
- Human evaluation sampling
Skills Demonstrated¶
- Agentic RAG with LangGraph (multi-step reasoning, self-reflection)
- Hybrid retrieval (dense + sparse + RRF fusion)
- Multi-provider LLM fallback chains
- Production observability (Langfuse, Prometheus, Grafana)
- Multilingual NLP (Indonesian + English)
- Streaming API design (SSE)
- Multi-agent development workflows