Broadcast Analytics System - MLOps Platform¶

Status: Production-Ready (All 6 Phases Complete) Repository: Private Company Repository Version: 2.0

Executive Summary¶

A production-grade enterprise MLOps platform for broadcast analytics combining machine learning, LLM-powered chat, and automated deployment infrastructure. Features 24 trained models, intelligent caching, schedule optimization, and comprehensive monitoring.

System Architecture¶

graph LR
    subgraph Data["Data Layer"]
        PG["PostgreSQL 16<br/>ML Registry"]
        Mongo["MongoDB 7<br/>Chat History"]
        Redis["Redis 7<br/>LLM Cache"]
    end

    subgraph ML["ML Pipeline"]
        Loader["Data<br/>Loader"]
        Preprocess["Feature<br/>Engineering"]
        Training["Model<br/>Training"]
        Evaluation["Model<br/>Evaluation"]
        Registry["ML<br/>Registry"]
    end

    subgraph LLM["LLM Agent"]
        Agent["LangChain<br/>Agent"]
        LLMPrimary["Groq<br/>Llama 70B"]
        LLMSecondary["Gemini<br/>2 Flash"]
        LLMFallback["OpenRouter<br/>Fallback"]
        Cache["Response<br/>Cache"]
    end

    subgraph API["API Layer"]
        APINode["REST API<br/>OpenAPI"]
        Predictor["Prediction<br/>Service"]
        Optimizer["Schedule<br/>Optimizer"]
        Export["Data<br/>Export"]
    end

    subgraph UI["Frontend"]
        Dashboard["Monitoring<br/>Dashboard"]
        Chat["LLM Chat<br/>Interface"]
        Charts["Interactive<br/>Charts"]
    end

    subgraph Monitor["CI/CD and Monitoring"]
        GHA["GitHub<br/>Actions"]
        Prometheus["Prometheus<br/>Metrics"]
        Grafana["Grafana<br/>Dashboards"]
        Alertmanager["Alert<br/>Manager"]
    end

    Loader --> Preprocess
    Preprocess --> Training
    Training --> Evaluation
    Evaluation --> Registry
    Registry --> PG

    APINode --> Predictor
    APINode --> Optimizer
    APINode --> Export
    Predictor --> Registry

    Agent --> LLMPrimary
    LLMPrimary -.->|Fallback| LLMSecondary
    LLMSecondary -.->|Fallback| LLMFallback
    Agent --> Cache
    Cache --> Redis
    Agent --> Mongo

    Dashboard --> APINode
    Chat --> Agent
    Charts --> APINode

    APINode --> Prometheus
    Prometheus --> Grafana
    Prometheus --> Alertmanager

    classDef dataStyle fill:#FFE0B2,stroke:#E65100,stroke-width:2px
    classDef mlStyle fill:#E1BEE7,stroke:#4A148C,stroke-width:2px
    classDef llmStyle fill:#C8E6C9,stroke:#1B5E20,stroke-width:2px
    classDef apiStyle fill:#BBDEFB,stroke:#0D47A1,stroke-width:2px
    classDef frontendStyle fill:#F8BBD0,stroke:#880E4F,stroke-width:2px
    classDef cicdStyle fill:#FFF9C4,stroke:#F57F17,stroke-width:2px

    class PG,Mongo,Redis dataStyle
    class Loader,Preprocess,Training,Evaluation,Registry mlStyle
    class Agent,LLMPrimary,LLMSecondary,LLMFallback,Cache llmStyle
    class API,Predictor,Optimizer,Export apiStyle
    class Dashboard,Chat,Charts frontendStyle
    class GHA,Prometheus,Grafana,Alertmanager cicdStyle

System Overview¶

The platform provides: - Predictive analytics for broadcast revenue and ratings - Natural language query interface with LLM agents - Automated ML pipeline with model registry - Schedule optimization using genetic algorithms - Industrial-grade monitoring stack

Technology Stack¶

Backend Framework¶

API: FastAPI with async support
Python: 3.12+ with modern features
Validation: Pydantic models
Server: Uvicorn ASGI server
ORM: SQLAlchemy 2.0 for PostgreSQL

Frontend¶

Framework: React 18
Visualization: Chart.js for interactive charts
HTTP Client: Axios
Theme: Custom enterprise professional theme
Build: Create React App

Databases¶

PostgreSQL 16¶

Purpose: Structured data, ML model registry
Features:
Model versioning and metadata
Training results and metrics
Performance tracking
ACID transactions

MongoDB 7.0¶

Purpose: Unstructured data, chat sessions
Features:
Chat history storage
Session management
Flexible document schema
Async access with Motor

Redis 7.0¶

Purpose: High-performance caching
Features:
LLM response cache (75%+ hit rate)
Session data
Real-time metrics
Pub/sub messaging

Machine Learning Frameworks¶

Scikit-learn¶

Models: Random Forest, Gradient Boosting, ElasticNet, MLP
Purpose: Classical ML algorithms
Features: Feature engineering, preprocessing, evaluation

XGBoost¶

Purpose: Gradient boosting trees
Performance: Fast training, high accuracy
Features: GPU support, early stopping, feature importance

CatBoost¶

Purpose: Categorical feature handling
Performance: State-of-the-art accuracy
Features: Built-in categorical encoding, robust to overfitting

LLM & Agent Framework¶

LangChain¶

Purpose: LLM application framework
Features:
Agent orchestration
Tool integration
Memory management
Chain composition

LLM Providers¶

Primary: Groq - Model: Llama 3.3 70B - Free Tier: 14.4K TPM - Speed: Fastest inference - Use Case: Primary generation

Secondary: Google Gemini - Model: Gemini 2.0 Flash - Free Tier: 1.5M tokens/day - Speed: Excellent balance - Use Case: Fallback provider

Tertiary: OpenRouter - Model: Various (Llama, Mixtral) - Use Case: Final fallback

Optimization¶

Genetic Algorithm¶

Library: DEAP (Distributed Evolutionary Algorithms)
Purpose: Schedule optimization
Features:
Custom fitness functions
Multi-objective optimization
Constraint handling
Population evolution

Infrastructure & DevOps¶

Containerization¶

Docker: Multi-stage builds for efficiency
Docker Compose: Full stack orchestration
Images: Optimized layer caching
Networking: Custom bridge networks

Orchestration¶

Kubernetes: Production deployment
Resources:
Deployments for stateless services
StatefulSets for databases
Services for load balancing
Ingress for external access
ConfigMaps for configuration
Secrets for credentials

CI/CD¶

Platform: GitHub Actions
Pipelines:
Automated testing (pytest)
Code quality (black, ruff, mypy)
Docker image builds
Deployment automation
ML model training pipeline

Monitoring Stack¶

Prometheus 2.48.0 - Time-series metrics collection - 90-day data retention - Alert rule evaluation - Service discovery

Grafana 10.2.2 - Professional dashboards - Real-time visualization - Alert management - Data source integration

Alertmanager - Alert routing - Email and Slack notifications - Alert grouping and deduplication - Silence management

Loki + Promtail - Log aggregation - Log querying - Integration with Grafana - Distributed tracing (planned)

Core Features¶

Automated ML Pipeline¶

1. Data Loading¶

ETL from PostgreSQL
Feature extraction
Data validation
Missing value handling

2. Preprocessing¶

Feature engineering
Categorical encoding
Numerical scaling
Train/test splitting

3. Model Training¶

6 Algorithms:
Random Forest
XGBoost
CatBoost
Gradient Boosting
Multi-Layer Perceptron (MLP)
ElasticNet
4 Target Variables:
Rating predictions
Revenue forecasts
Audience metrics
Engagement scores
Total: 24 trained models (6 × 4)

4. Model Evaluation¶

R² score calculation
MSE, RMSE, MAE metrics
Feature importance analysis
Cross-validation
Model comparison

5. Model Selection¶

Automatic best model selection (R² > 0.7)
Versioning and metadata tracking
A/B testing support
Rollback capabilities

6. Model Registry¶

Database-backed storage
Version control
Metadata tracking (hyperparameters, metrics, training date)
Production model tagging
Model provenance

LLM-Powered Analytics¶

7 Specialized Tools¶

get_programs: List all available programs
get_program_details: Detailed program information
predict_values: Generate predictions for specific programs
get_best_model: Retrieve best model for target variable
analyze_trends: Time-series trend analysis
export_data: Export results (JSON/CSV/Excel)
optimize_schedule: Run genetic algorithm optimization

Agent Architecture¶

LangChain Agent
  ├── Chat History (MongoDB)
  ├── Tool Selection (dynamic)
  ├── LLM Generation (multi-provider)
  └── Response Formatting

Natural Language Queries¶

Examples: - "What's the predicted revenue for program X?" - "Show me the top 5 programs by rating" - "Optimize the schedule for prime time" - "What factors affect viewership most?" - "Export last week's data to Excel"

Response Caching System¶

Architecture: - Redis-based caching layer - Key: hash(query + parameters) - TTL: Configurable (default 1 hour) - Invalidation: Smart cache invalidation

Benefits: - Cost Reduction: 75%+ reduction in LLM API costs - Latency: <10ms for cache hits vs 2-5s for misses - Scalability: Reduces backend load - Consistency: Same query returns same answer

Metrics: - Cache hit rate monitoring - Cost savings tracking - Performance analytics

Schedule Optimization¶

Genetic Algorithm Optimizer¶

Components: 1. Chromosome: Schedule representation 2. Fitness Function: Multi-objective scoring - Maximize total revenue - Maximize audience reach - Balance genre distribution - Respect time slot constraints 3. Genetic Operators: - Selection (tournament, roulette) - Crossover (single-point, uniform) - Mutation (swap, shuffle) 4. Evolution: 100+ generations 5. Convergence: Early stopping on plateau

Constraints: - Time slot availability - Content ratings (G, PG, R) - Minimum/maximum program length - Genre diversity requirements - Advertiser preferences

Output: - Optimized schedule - Predicted performance metrics - Visualization of improvements - Alternative schedules (top-K)

API Endpoints¶

Chat & Session Management¶

POST /api/v1/chat¶

Send message to LLM agent.

Request:

{
  "message": "Predict revenue for program XYZ",
  "session_id": "uuid-..."
}

Response:

{
  "response": "Based on historical data...",
  "session_id": "uuid-...",
  "metadata": {
    "model_used": "groq-llama-70b",
    "cache_hit": false,
    "tokens_used": 450,
    "latency_ms": 1850
  }
}

POST /api/v1/sessions/create¶

Create new chat session.

POST /api/v1/chat/history¶

Get conversation history.

Models & Predictions¶

GET /api/v1/models/production¶

List all production models.

POST /api/v1/predict¶

Generate predictions.

Request:

{
  "program_id": "ABC123",
  "target": "revenue",
  "features": {...}
}

GET /api/v1/models/{target}/best¶

Get best model for target variable.

Data & Export¶

POST /api/v1/export¶

Export data in various formats.

Formats: JSON, CSV, Excel

GET /api/v1/programs¶

List all programs with metadata.

Monitoring¶

GET /health¶

Basic health check.

GET /health/deep¶

Detailed system health: - Database connections - Redis status - LLM provider status - Disk space - Memory usage

GET /api/v1/llm/cache/stats¶

Cache performance statistics:

{
  "hit_rate": 0.78,
  "total_requests": 10000,
  "cache_hits": 7800,
  "cache_misses": 2200,
  "cost_savings": "$375",
  "avg_latency_cached_ms": 8,
  "avg_latency_uncached_ms": 2100
}

GET /api/v1/llm/tokens/usage¶

Token usage tracking per model.

GET /metrics¶

Prometheus metrics endpoint.

Project Structure¶

v2/
├── src/                           # Source code
│   ├── core/                      # Core application
│   │   ├── server.py              # FastAPI server
│   │   ├── agent_service.py       # LangChain agent
│   │   ├── agent_service_cached.py # Cached wrapper
│   │   ├── llm_cache.py           # Redis caching
│   │   ├── llm_ab_testing.py      # A/B testing
│   │   └── config.py              # Configuration
│   │
│   └── ml/                        # ML Pipeline
│       ├── EDA.py                 # Exploratory analysis
│       ├── preprocess.py          # Feature engineering
│       ├── modeling.py            # Model training
│       ├── predictor.py           # Predictions
│       ├── optimizer.py           # Schedule optimization
│       └── data_Loader.py         # Data loading
│
├── db/                            # Database layer
│   ├── postgres_manager.py        # PostgreSQL
│   ├── mongo_manager.py           # MongoDB
│   ├── ml_registry.py             # Model registry
│   ├── models.py                  # SQLAlchemy models
│   ├── migrations/                # SQL migrations
│   └── schemas/                   # Pydantic schemas
│
├── dashboard/                     # Frontend
│   └── src/
│       ├── Chat.js                # Chat interface
│       ├── MonitoringDashboard.js # Monitoring UI
│       ├── ChartComponent.js      # Visualizations
│       └── App.js                 # Main app
│
├── infrastructure/                # Deployment
│   ├── docker/
│   │   ├── docker-compose.yml     # Full stack
│   │   ├── Dockerfile.api         # API image
│   │   └── .env.example           # Environment template
│   ├── k8s/                       # Kubernetes
│   │   ├── base/                  # Base manifests
│   │   └── README.md              # K8s guide
│   ├── monitoring/                # Monitoring
│   │   ├── prometheus.yml         # Prometheus config
│   │   └── alerts.yml             # Alert rules
│   └── deploy.sh                  # Deployment script
│
├── scripts/                       # Automation
│   ├── deploy_databases.sh        # DB deployment
│   ├── train_models.py            # Automated training
│   └── monitor_models.py          # Performance monitoring
│
├── tests/                         # Test suite
│   ├── unit/                      # Unit tests
│   ├── integration/               # Integration tests
│   └── conftest.py                # Pytest fixtures
│
├── .github/workflows/             # CI/CD
│   ├── ci.yml                     # Continuous Integration
│   ├── cd.yml                     # Continuous Deployment
│   └── ml-pipeline.yml            # ML automation
│
└── docs/                          # Documentation
    ├── getting-started/           # Quick start guides
    ├── development/               # Development docs
    ├── production/                # Production guides
    └── reference/                 # Technical reference

Performance Characteristics¶

API Performance¶

Average Latency: 200-500ms
P95 Latency: <2s
P99 Latency: <5s
Throughput: 1000+ requests/minute

Database Performance¶

Query Latency: <100ms average
Connection Pool: 10-50 connections
Transaction Rate: 5000+ TPS

Cache Performance¶

Hit Rate: 75-80% in production
Latency: <10ms for hits
Memory Usage: ~500MB for 10K cached responses

ML Inference¶

Single Prediction: 50-100ms
Batch (100): 500ms
Model Loading: <1s on startup

Schedule Optimization¶

Small Schedule (50 slots): 2-3 seconds
Large Schedule (200 slots): 5-10 seconds
Genetic Algorithm: 100-500 generations

Monitoring & Observability¶

Pre-configured Dashboards¶

System Metrics: - CPU usage per service - Memory utilization - Disk I/O and space - Network throughput

Application Metrics: - API request rate and latency - Error rates by endpoint - Cache hit/miss rates - LLM token usage

Database Metrics: - Query performance - Connection pool status - Database size and growth - Slow query log

ML Metrics: - Model accuracy drift - Prediction latency - Feature distribution - Training job status

Alert Rules¶

Critical Alerts: - Database down (immediate) - API error rate >5% (immediate) - Disk space <10% (15 min) - Memory usage >90% (5 min)

Warning Alerts: - High latency (P95 >3s) (30 min) - Cache hit rate <50% (1 hour) - Model accuracy drift >10% (daily)

Notification Channels: - Email for critical alerts - Slack for all alerts - PagerDuty integration (optional)

Security Features¶

API Security¶

Input Validation: Pydantic models enforce schemas
SQL Injection: Parameterized queries only
XSS Protection: Input sanitization
Rate Limiting: Per-IP and per-user limits
CORS: Configured per environment

Authentication & Authorization¶

JWT Tokens: Stateless authentication
Token Expiration: Configurable TTL
Role-Based Access: Admin, user, read-only roles
API Keys: For service-to-service communication

Data Security¶

Encryption at Rest: Database encryption
Encryption in Transit: TLS/SSL for all connections
Secrets Management: Kubernetes Secrets, environment variables
Audit Logging: All sensitive operations logged

Container Security¶

Image Scanning: Trivy for vulnerability scanning
Non-root User: Containers run as unprivileged user
Read-only Filesystem: Where possible
Network Policies: Kubernetes network isolation

Cost Optimization¶

Development (Free Tier)¶

Databases: Local Docker containers
LLMs: 100% free-tier APIs
Hosting: Local machine
Total: $0/month

Production (Estimated)¶

Infrastructure: $100-200/month
Kubernetes cluster (3 nodes)
Load balancer
Storage (100GB)
Databases: $100/month
Managed PostgreSQL
Managed MongoDB
Managed Redis
LLM (with caching): $50-100/month
Free tier covers most queries
Paid tier for overflow
Monitoring: Included (open source)
Total: $300-500/month

Savings from Caching¶

Without Cache: ~$500/month in LLM costs
With Cache (75% hit rate): ~$125/month
Savings: $375/month (75% reduction)

Deployment¶

Docker Compose (Quick Start)¶

# Start all services
docker-compose up -d

# Services include:
# - PostgreSQL 16
# - MongoDB 7.0
# - Redis 7.0
# - FastAPI backend (port 5000)
# - React frontend (port 3000)

Kubernetes (Production)¶

# Deploy to production
bash deploy.sh kubernetes production

# Verify deployment
kubectl get pods -n bms
kubectl get services -n bms

# View logs
kubectl logs -f deployment/api -n bms

Testing¶

Unit Tests¶

# Run all unit tests
pytest tests/unit/ -v

# With coverage
pytest tests/unit/ --cov=. --cov-report=html

Integration Tests¶

# Run integration tests
pytest tests/integration/ -v

# Requires running services (databases, Redis)

End-to-End Tests¶

# Run E2E tests
pytest tests/e2e/ -v

# Tests full workflows

Load Testing¶

# Using locust
locust -f tests/load/locustfile.py

# Simulates 100+ concurrent users

Documentation¶

Comprehensive Guides¶

Getting Started: - Installation guide - Quick start (5-minute setup) - Project overview - Architecture introduction

Development: - Setup guide for developers - ML pipeline documentation - LLM agent development - API reference - Database schema

Production: - Database deployment and scaling - Application deployment - Infrastructure setup (Docker, K8s) - MLOps best practices - Monitoring setup

Reference: - Project structure - ML pipeline technical details - Further reading on algorithms

Completed Phases¶

Future Enhancements¶

Planned Features: - Real-time streaming predictions - Advanced visualization (D3.js) - Mobile app (React Native) - Multi-tenant support - Advanced anomaly detection - AutoML for hyperparameter tuning

Infrastructure: - Multi-region deployment - Advanced caching (CDN) - Read replicas for databases - Service mesh (Istio) - Advanced tracing (Jaeger)

Links¶

Repository: Private Company Repository
Documentation: Comprehensive guides for development, production, and MLOps

← Back to Projects