DataPilot -- Private AI Data Analyst¶

Status: Live demo Demo: datapilot.robiriu-dev.my.id (Docs)

DataPilot Private AI Data Analyst

Executive Summary¶

A self-hostable "code interpreter" for data analysis. The user describes what they want in plain English; DataPilot writes the Python to preprocess, analyse and visualise the data, runs it in a sandbox, and returns a report (charts, printed results, and the exact code it ran), which the user refines by prompting further.

It is built for organisations that cannot send data to a cloud AI. The numbers are produced by code running locally, and the model that writes the code can be a small local model (via Ollama), so raw data never leaves the machine -- it can be fully on-premise and air-gapped.

How it works¶

request + data ─► AI writes Python (sees only column names, types, a 5-row sample)
                       │
                       ▼
        sandbox runs it against the FULL data, locally
        (no network, time + memory limits)   ──on error──► AI fixes ─► retry (x3)
                       │
        figures + printed output + the code ─► report
                       │
        user refines by prompting (history-aware)

The split is deliberate: every statistic (aggregates, anomalies, charts) is produced by the executed code, not invented by the model, so the analysis is trustworthy. The AI reasons about how to analyse; the figures come from real computation.

Key Features¶

Plain-English to real analysis - writes and runs pandas / numpy / matplotlib / scikit-learn code for any request: preprocessing, statistics, correlation, charts, simple modelling.
Live, streaming code generation - the Python is typed out in real time, then run.
Self-correction loop - if the generated code errors, the traceback is fed back to the model to fix and re-run, up to three attempts.
Switchable model, with a UI selector - cloud (Gemini) for speed, or a local model (Qwen2.5-Coder 1.5B / 0.5B, Llama, Mistral via Ollama); switch live in the dropdown.
Privacy by design - the model only ever sees the schema and a small sample; the code runs locally against the full data; with a local model the whole system is air-gappable.
Auditable - every report shows the exact code that produced it.
Packaged with Docker - a one-command, cross-platform private deployment.

Technology Stack¶

Layer	Technology
Frontend	Next.js 15 (App Router), TypeScript, Tailwind CSS
Code model	Gemini 2.5 Flash (Vertex AI) or local Ollama (Qwen2.5-Coder, Llama) -- env-switchable
Execution	Sandboxed Python subprocess (pandas, numpy, matplotlib, scikit-learn, scipy) with no network, a timeout, and a memory cap
Streaming	Server-Sent-style token streaming for live code generation
Packaging	Multi-stage Docker image (compiled app only) + docker-compose
Deployment	pm2 + nginx, Let's Encrypt SSL on a VPS

Skills Demonstrated¶

Agentic, self-correcting code-generation pipeline
Safe execution of model-written code (sandboxing, guards, resource limits)
Local / self-hosted LLM integration (Ollama) and a swappable model layer
Grounded analysis (no hallucinated numbers; code-produced figures)
Privacy-first, on-premise architecture and Docker packaging

← Back to Projects