EnergyLM-7B — LLM Fine-Tuning & Alignment Pipeline¶
Status: In Progress (Week 1 Complete) Organization: ForceX-AI on HuggingFace Budget: $0 — 100% free-tier compute
Executive Summary¶
End-to-end LLM training, alignment, and serving pipeline for EnergyLM-7B — a domain-adapted language model for energy systems and scientific reasoning. Fine-tunes Qwen2.5-7B using QLoRA SFT on 20K synthetic instruction samples, aligns with both DPO and ORPO for a controlled comparison, and benchmarks across 10 evaluation dimensions including domain-specific physics calculations, safety, and bilingual (Indonesian) capability.
The entire project runs on $0 budget — Kaggle T4 for training, free-tier LLM APIs for data generation, and HuggingFace Spaces for deployment.
Key Capabilities¶
| Capability | Implementation |
|---|---|
| LLM Fine-Tuning | QLoRA SFT on Qwen2.5-7B (4-bit NF4, LoRA r=64) |
| Alignment Methods | DPO vs ORPO controlled comparison study |
| Chain-of-Thought | CoT distillation with <think> tag reasoning |
| Reward Modeling | DeBERTa-v3-base binary classifier on preference pairs |
| Synthetic Data | 20K+ samples via multi-teacher round-robin (Gemini, Groq, OpenRouter) |
| Data Quality | MinHash + semantic dedup, LLM-as-judge filtering, n-gram contamination check |
| Evaluation | 10-benchmark suite: MMLU, GPQA, MATH, MBPP, IFEval + 4 custom domain evals |
| Quantization | AWQ 4-bit + GGUF (Q3/Q4/Q5/Q8) with quality retention benchmarks |
| Serving | vLLM inference benchmarking, HuggingFace Spaces deployment |
| Cost | $0 — Kaggle T4, Colab Free, free API tiers only |
Pipeline Architecture¶
flowchart TB
subgraph DATA["Data Engineering"]
T1["Gemini 2.5 Flash"]
T2["Groq Llama 3.3 70B"]
T3["OpenRouter gpt-oss-120b"]
T1 & T2 & T3 -->|"Round-Robin"| GEN["Instruction Generator<br/>20K samples"]
GEN --> MH["MinHash Dedup<br/>Jaccard 0.85"]
MH --> SEM["Semantic Dedup<br/>Cosine 0.92"]
SEM --> QF["LLM-as-Judge<br/>Quality Filter"]
QF --> CC["Contamination Check<br/>13-gram vs MMLU/GPQA/MATH"]
CC --> SPLIT["Stratified Split<br/>80/10/10"]
end
subgraph TRAIN["Training Pipeline"]
SPLIT -->|"Train Set"| SFT["QLoRA SFT<br/>Qwen2.5-7B<br/>3 epochs, r=64"]
SFT --> COT["CoT Distillation<br/>3K reasoning samples"]
SFT -->|"+ Preference Pairs"| DPO["DPO Alignment<br/>beta=0.1, sigmoid"]
SFT -->|"+ Preference Pairs"| ORPO["ORPO Alignment<br/>odds-ratio, lr=5e-6"]
SPLIT -->|"Preferences"| RM["Reward Model<br/>DeBERTa-v3-base"]
end
subgraph EVAL["Evaluation & Deployment"]
DPO & ORPO & COT --> BENCH["10-Benchmark Suite<br/>+ Bootstrap CIs"]
BENCH --> QUANT["AWQ + GGUF<br/>Quantization"]
QUANT --> SERVE["vLLM Serving<br/>+ HF Spaces Demo"]
end
style DATA fill:#e3f2fd,stroke:#1565c0
style TRAIN fill:#e8f5e9,stroke:#2e7d32
style EVAL fill:#fff3e0,stroke:#e65100
Training Configuration¶
QLoRA SFT Setup¶
Base Model: Qwen/Qwen2.5-7B
Quantization: 4-bit NF4 (double quantization)
LoRA Rank: 64 (alpha=128)
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Effective Batch: 16 (4 x 4 gradient accumulation)
Learning Rate: 2e-4 (cosine schedule)
Max Seq Length: 2048
Compute: Kaggle T4 (16GB VRAM, free)
Alignment Comparison¶
| Parameter | DPO | ORPO |
|---|---|---|
| Loss Function | Sigmoid | Odds-Ratio |
| Learning Rate | 5e-5 | 5e-6 |
| Beta | 0.1 | 0.1 |
| LoRA Rank | 32 | 32 |
| Starting Point | SFT model | Base model |
| Reference Model | Implicit (PEFT) | Not needed |
Data Engineering Pipeline¶
flowchart LR
subgraph TEACHERS["Multi-Teacher Generation"]
direction TB
G["Gemini 2.5 Flash<br/>Free Tier"]
Q["Groq Llama 3.3 70B<br/>Free Tier"]
O["OpenRouter gpt-oss-120b<br/>Free Tier"]
end
subgraph QUALITY["Quality Pipeline"]
direction TB
D1["MinHash LSH<br/>128 permutations"]
D2["Semantic Dedup<br/>all-MiniLM-L6-v2"]
D3["LLM Judge<br/>4-criteria scoring"]
D4["N-gram Check<br/>vs benchmarks"]
D1 --> D2 --> D3 --> D4
end
subgraph OUTPUT["Datasets"]
direction TB
I["energy-instruct-20k<br/>ChatML format"]
P["energy-preferences-5k<br/>chosen/rejected pairs"]
C["energy-cot-3k<br/>think tag reasoning"]
end
TEACHERS -->|"20 energy topics<br/>4 prompt templates"| QUALITY
QUALITY --> OUTPUT
style TEACHERS fill:#fce4ec,stroke:#c62828
style QUALITY fill:#f3e5f5,stroke:#6a1b9a
style OUTPUT fill:#e0f2f1,stroke:#00695c
20 Domain Topics: Geothermal systems, nuclear reactor physics, solar PV engineering, wind turbine aerodynamics, reservoir engineering, thermodynamics, fluid dynamics, heat transfer, reactor safety, grid integration, molten salt reactors, well drilling, power plant optimization, energy storage, carbon capture, hydrogen fuel cells, thermal hydraulics, radiation shielding, seismic analysis, CFD for energy.
Evaluation Framework¶
10-benchmark evaluation with bootstrap confidence intervals:
| Benchmark | Type | Metric | Purpose |
|---|---|---|---|
| Energy QA | Custom | Accuracy | Domain knowledge |
| Physics Calculations | Custom | Numerical tolerance (5%) | Quantitative reasoning |
| Indonesian Energy | Custom | BLEU + Accuracy | Bilingual capability |
| Safety Prompts | Custom | Refusal rate | Safety alignment |
| MMLU (STEM) | Standard | 5-shot accuracy | General knowledge |
| GPQA Diamond | Standard | 0-shot accuracy | Hard science reasoning |
| MATH | Standard | 4-shot accuracy | Mathematical reasoning |
| MBPP | Standard | pass@1 | Code generation |
| IFEval | Standard | Strict accuracy | Instruction following |
| LLM-as-Judge | Gemini | 4-criteria (1-5) | Overall quality |
Model Variants¶
flowchart LR
BASE["Qwen2.5-7B<br/>Base Model"] --> SFT["EnergyLM-7B<br/>SFT"]
SFT --> COT["EnergyLM-7B<br/>SFT + CoT"]
SFT --> DPO["EnergyLM-7B<br/>DPO"]
BASE --> ORPO["EnergyLM-7B<br/>ORPO"]
DPO & ORPO --> BEST{{"Best Variant"}}
BEST --> AWQ["AWQ 4-bit"]
BEST --> Q4["GGUF Q4_K_M"]
BEST --> Q8["GGUF Q8_0"]
AWQ -->|"vLLM"| API["API Endpoint"]
Q4 -->|"llama.cpp"| SPACES["HF Spaces<br/>Demo"]
style BASE fill:#f5f5f5,stroke:#616161
style SFT fill:#c8e6c9,stroke:#2e7d32
style COT fill:#a5d6a7,stroke:#1b5e20
style DPO fill:#bbdefb,stroke:#1565c0
style ORPO fill:#b3e5fc,stroke:#0277bd
style BEST fill:#fff9c4,stroke:#f57f17
style AWQ fill:#ffe0b2,stroke:#e65100
style Q4 fill:#ffe0b2,stroke:#e65100
style Q8 fill:#ffe0b2,stroke:#e65100
Free Compute Strategy¶
| Resource | Purpose | Quota |
|---|---|---|
| Kaggle T4 | SFT, DPO, ORPO training | 30 GPU-hrs/week |
| Colab Free T4 | Reward model, eval runs | ~4 hrs/session |
| Gemini Free | Data generation, quality judge | 1500 req/day |
| Groq Free | Data generation (Llama 3.3 70B) | 30 req/min |
| OpenRouter Free | Data generation (gpt-oss-120b) | 10 req/min |
| HuggingFace Spaces | Model demo deployment | Free CPU/GPU |
Project Timeline¶
gantt
title EnergyLM-7B Development Timeline
dateFormat YYYY-MM-DD
axisFormat %b %d
section Data
Foundation & Scripts :done, w1, 2026-05-19, 7d
Data Generation (20K) :active, w1b, 2026-05-24, 7d
CoT + Preferences :w2b, after w1b, 5d
Dedup + Filter + Publish :w2c, after w2b, 2d
section Training
SFT QLoRA (3 epochs) :w3a, 2026-06-02, 3d
Ablation Studies (3 runs) :w3b, after w3a, 3d
CoT Distillation :w3c, after w3b, 1d
DPO Training :w4a, 2026-06-09, 2d
ORPO Training :w4b, after w4a, 2d
Reward Model (DeBERTa) :w4c, after w4b, 1d
section Eval & Deploy
Full Benchmark Suite :w4d, after w4c, 2d
Quantization (AWQ + GGUF) :w5a, 2026-06-16, 2d
vLLM Benchmarks :w5b, after w5a, 2d
HF Spaces Deployment :w5c, after w5b, 1d
section Documentation
Blog Post + Model Cards :w6a, 2026-06-23, 3d
OSS Contribution :w6b, after w6a, 2d
Public Release :milestone, after w6b, 0d
Planned HuggingFace Artifacts¶
All published under the ForceX-AI organization:
| Artifact | Type | Description |
|---|---|---|
ForceX-AI/energy-instruct-20k |
Dataset | 20K energy-domain instruction pairs (ChatML) |
ForceX-AI/energy-preferences-5k |
Dataset | 5K chosen/rejected preference pairs for DPO |
ForceX-AI/EnergyLM-7B-SFT |
Model | QLoRA SFT adapter + merged model |
ForceX-AI/EnergyLM-7B-DPO |
Model | DPO-aligned variant |
ForceX-AI/EnergyLM-7B-ORPO |
Model | ORPO-aligned variant |
ForceX-AI/EnergyLM-7B-GGUF |
Model | Quantized GGUF files (Q3/Q4/Q5/Q8) |
ForceX-AI/EnergyLM-RewardModel |
Model | DeBERTa-v3-base reward classifier |
Technology Stack¶
| Layer | Technologies |
|---|---|
| Base Model | Qwen/Qwen2.5-7B |
| Training | PyTorch, Transformers, TRL, PEFT, bitsandbytes |
| Data | datasketch, sentence-transformers, Gemini/Groq/OpenRouter APIs |
| Evaluation | lm-evaluation-harness, custom benchmarks, Gemini judge |
| Quantization | AutoAWQ, llama.cpp (GGUF) |
| Serving | vLLM, Gradio, HuggingFace Spaces |
| Tracking | Weights & Biases, HuggingFace Hub |
| CI/CD | GitHub Actions (lint, test, data-validate) |
| Compute | Kaggle T4, Colab Free, free LLM API tiers |
Current Status¶
Week 1: COMPLETE — All foundation code, scripts, notebooks, eval framework, and infrastructure configured. Data generation running across 3 free-tier API backends.
Next: Complete data generation, run quality pipeline, begin SFT training on Kaggle T4.