Chakra Quick Start Guide¶
This guide walks you through a complete end-to-end experiment in under 5 minutes.
One-command alternative: If you just want to see the full cycle, skip to Run Everything At Once after installing.
Prerequisites¶
- Python 3.10+ (tested with 3.13)
- Git for cloning
- (Optional) W&B account for experiment tracking
Step 1: Setup¶
# Clone the repo
git clone https://github.com/The-Harsh-Vardhan/Chakra-Autonomous-Research-System.git
cd Chakra-Autonomous-Research-System
# Create and activate a virtual environment
python -m venv .venv
# Linux/macOS
source .venv/bin/activate
# Windows PowerShell
.venv\Scripts\activate
# Install everything
pip install -e ".[dev]"
Verify installation¶
python -m chakra list-domains
You should see three domains: hndsr_vr, nlp_lm, tabular_cls.
Step 2: (Optional) Configure W&B¶
Create a .env file in the repo root:
WANDB_API_KEY=your_api_key_here
Tip: Get your API key from wandb.ai/authorize. If you skip this step, everything still works — metrics are saved locally to JSON.
Step 3: Your First Experiment — Iris Classification¶
The fastest way to see the full lifecycle is with the tabular classification domain.
3a. Scaffold the version¶
python -m chakra --domain tabular_cls scaffold-version --version v1.0 --force
This creates:
- notebooks/versions/v1.0_Tabular_CLS.ipynb — Kaggle-ready notebook
- docs/notebooks/v1.0_Tabular_CLS.md — Run documentation
- reports/reviews/v1.0_Tabular_CLS.review.md — Review template
3b. Run the control baseline¶
python -m chakra.domains.tabular_cls.train_runner \
--config configs/tabular_cls/v1.0_control.yaml \
--run-name v1.0-control --device cpu
This trains a logistic regression for 1 epoch with 0 train batches — it measures the untrained baseline accuracy.
Expected output: Best val accuracy: ~16.67% (random chance on 3 classes)
3c. Train the MLP¶
python -m chakra.domains.tabular_cls.train_runner \
--config configs/tabular_cls/v1.0_train.yaml \
--run-name v1.0-train --device cpu
Expected output: Best val accuracy: ~93% after 30 epochs (~10 seconds on CPU)
3d. Evaluate¶
python -m chakra.domains.tabular_cls.evaluate_runner \
--config configs/tabular_cls/v1.0_train.yaml \
--run-name v1.0-eval \
--checkpoint artifacts/v1.0-train/checkpoints/v1.0_train_best.pt \
--device cpu
Expected output: Accuracy=93.33%, F1=0.9296
3e. Sync, Review, and Validate¶
# Index results into a run manifest
python -m chakra --domain tabular_cls sync-run \
--version v1.0 --source-dir artifacts/v1.0-train
# Generate an automated review and roast
python -m chakra --domain tabular_cls review-run --version v1.0
# Validate the version contract (all files exist)
python -m chakra --domain tabular_cls validate-version --version v1.0
Expected output: v1.0 contract passed for domain 'tabular_cls'.
🎉 Congratulations! You've just completed a full lifecycle loop.
Run Everything At Once¶
Instead of running each stage individually, use the Chakra CLI to run the complete cycle with one command:
chakra aavart --domain tabular_cls --version v1.0 --device cpu --force
This orchestrates the full Aavart (cycle): Plan → Execute → Guard → Review → Improve.
Step 4: Try a Different Dataset — Titanic¶
The same domain supports multiple datasets. Run the Titanic survival prediction:
# Train
python -m chakra.domains.tabular_cls.train_runner \
--config configs/tabular_cls/v2.0_train.yaml \
--run-name v2.0-train --device cpu
# Evaluate
python -m chakra.domains.tabular_cls.evaluate_runner \
--config configs/tabular_cls/v2.0_train.yaml \
--run-name v2.0-eval \
--checkpoint artifacts/v2.0-train/checkpoints/v2.0_train_best.pt \
--device cpu
Expected output: Accuracy=83.63%, F1=0.8116
Step 5: Run the NLP Domain¶
# Scaffold
python -m chakra --domain nlp_lm scaffold-version --version v1.0 --force
# Control baseline (bigram model)
python -m chakra.domains.nlp_lm.train_runner \
--config configs/nlp_lm/v1.0_control.yaml --run-name v1.0-control --device cpu
# Full training (GPT-nano, 5 epochs)
python -m chakra.domains.nlp_lm.train_runner \
--config configs/nlp_lm/v1.0_train.yaml --run-name v1.0-train --device cpu
# Evaluate
python -m chakra.domains.nlp_lm.evaluate_runner \
--config configs/nlp_lm/v1.0_train.yaml --run-name v1.0-eval \
--checkpoint artifacts/v1.0-train/checkpoints/v1.0_train_best.pt --device cpu
Expected results: - Bigram baseline: BPB = 6.38, Perplexity = 83.28 - GPT-nano trained: BPB = 3.59, Perplexity = 12.18 (77% improvement)
Step 6: Run the Tests¶
python -m pytest tests/ -v
All 47 tests should pass.
Understanding the Config System¶
Configs use YAML with an inherits: key for layered configuration:
# configs/tabular_cls/v1.0_train.yaml
inherits: configs/tabular_cls/base.yaml # ← inherit from base
project:
group: v1.0-train # ← override specific fields
training:
epochs: 30
checkpoint_name: v1.0_train_best.pt
Config variants per version:
- *_control.yaml — Baseline model, no training (sets the floor)
- *_smoke.yaml — Limited batches, fast pipeline validation
- *_train.yaml — Full training run
Understanding the Domain Protocol¶
Every domain implements DomainLifecycleHooks:
class LifecycleHooks:
def version_stem(self, version: str) -> str: ...
def resolve_version_paths(self, version: str) -> VersionPaths: ...
def build_version_configs(self, version, parent, lineage) -> dict: ...
def render_notebook(self, version, parent, paths) -> str: ...
def render_doc(self, version, parent, paths) -> str: ...
def render_review(self, version) -> str: ...
def build_findings(self, version, manifest, eval_summary, registry): ...
def ablation_suggestions(self, version, eval_summary, delta): ...
def roast_lines(self) -> list[str]: ...
def validate_version(self, version) -> list[str]: ...
The core lifecycle.py calls these hooks — it never contains domain-specific logic.
Next Steps¶
- Add W&B tracking — Set up your
.envand rerun experiments to see them in the W&B dashboard - Add a new domain — Follow the guide in Contributing
- Push to Kaggle — Use
push-kaggleandpull-kagglecommands for cloud execution - Extend the framework — Add custom metrics, models, or datasets to existing domains