Bring up the orchestrator locally, run the iOS app, train a LoRA on the RTX 4080, export GGUF, run the test suite, and replay the demo. Real commands per the repo layout in plan.md §19 and the bootstrap notes in technical_spec.md §13.
All paths assume the repo root at /Users/shauryapunj/Desktop/Samsung Hack/aura.
macOS 14+ on Apple Silicon for the iOS dev host. Xcode 16 installed. Android Studio Koala for the emulator path. Homebrew for fonts and uv. The Alienware M16 R1 (RTX 4080 Laptop, 12 GB VRAM) for LoRA training; CUDA 12.x, Python 3.11, PyTorch 2.4 with CUDA wheel.
# Mac side — fonts, Python toolchain
brew install --cask font-fraunces
brew install --cask font-inter
brew install uv
# RTX 4080 box — verify CUDA
nvidia-smi
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
cd "/Users/shauryapunj/Desktop/Samsung Hack/aura"
uv venv --python 3.11 .venv
source .venv/bin/activate
uv pip install -r requirements.txt
requirements.txt is fixed in technical_spec.md §13.3.
The orchestrator can run in a Python reference implementation against synthetic fixtures, with the LLM served by MLX on Apple Silicon.
# Pull a Gemma 2B for smoke test
python -m mlx_lm.generate \
--model mlx-community/gemma-2b-it-4bit \
--prompt "respond with the word ok" \
--max-tokens 4
# Pull Phi-3-mini for the orchestrator
python -m mlx_lm.generate \
--model mlx-community/Phi-3-mini-4k-instruct-4bit \
--prompt "tool call: classify" \
--max-tokens 4
# Run a one-shot orchestrator tick from a fixture
python -m orchestrator.cli \
--once \
--fixture e2e/journeys/monday_brief.replay.json
Expected output: a JSON Reasoning Trace conforming to trace.v1.json printed to stdout. If the trace fails schema validation the CLI exits non-zero and writes the offending payload to errors/quarantine_traces.jsonl.
For interactive replay across all five user journeys:
python -m orchestrator.cli \
--replay e2e/journeys/ \
--speed 4x
open "apps/ios/Aura.xcworkspace"
In Xcode: select the Aura scheme, target an iPhone simulator (iPhone 15 or 14) or a paired physical device. Build and run.
First-launch checklist (manual):
If GoogleSignIn fails with a redirect URI error, verify GIDClientID in Info.plist matches the OAuth client in the Google Cloud Console.
For TestFlight distribution to pilot users (pilot/qual_protocol.md):
xcodebuild -workspace apps/ios/Aura.xcworkspace \
-scheme Aura \
-archivePath build/Aura.xcarchive \
archive
xcodebuild -exportArchive \
-archivePath build/Aura.xcarchive \
-exportPath build/export \
-exportOptionsPlist apps/ios/ExportOptions.plist
cd apps/android
./gradlew assembleDebug
Open Android Studio Koala, launch the Galaxy AVD (Galaxy S22 image; download via SDK Manager → System Images), install the APK:
adb install -r app/build/outputs/apk/debug/app-debug.apk
Enable Notification Access for Aura: Settings → Apps → Special access → Notification access → Aura → ON.
For Health Connect mock data on the emulator, use the Health Connect Toolbox app from the Play Store; seed HRV and sleep records before running the WellnessAgent smoke test.
Run on the Alienware M16 R1. Reference: technical_spec.md §9.
cd "/path/to/aura" # whatever path the Windows / Linux box uses
source .venv/bin/activate
# Step 1 — collect and anonymise data
python scripts/collect_comms_data.py \
--raw-glob "data/raw/*.txt" \
--out data/comms_train.jsonl \
--anonymise
# Step 2 — split
python scripts/split_dataset.py \
--in data/comms_train.jsonl \
--train data/comms_train.jsonl \
--eval data/comms_eval.jsonl \
--test data/comms_test.jsonl \
--ratio 80,10,10
# Step 3 — train
accelerate launch models/lora/train_comms.py \
--base-model google/gemma-2b-it \
--train-file data/comms_train.jsonl \
--eval-file data/comms_eval.jsonl \
--lora-r 16 \
--lora-alpha 32 \
--target-modules q_proj,v_proj,o_proj \
--epochs 3 \
--lr 2e-4 \
--bf16 true \
--max-seq-length 1024 \
--gradient-accumulation-steps 4 \
--output-dir models/lora/comms_v1
# Step 4 — eval
python models/eval/eval_comms.py \
--model-dir models/lora/comms_v1 \
--test-file data/comms_test.jsonl \
--report models/eval/reports/comms_v1.json
Expected wall-clock on RTX 4080 12 GB: ~2.5 hours per epoch with QLoRA r=16, bf16, 4-bit base. Total 5 hours for two iterations including hyperparameter tweaks.
Train loss starts ~2.4, ends ~0.6 by epoch 3; eval loss tracks within 0.2. Plateau after epoch 2 → early stop.
# Merge LoRA into base
python -m peft.scripts.merge_adapter \
--base google/gemma-2b-it \
--adapter models/lora/comms_v1 \
--output models/exports/gemma-2b-aura-comms-merged
# Convert to GGUF Q4_K_M for llama.cpp / Android MediaPipe path
python third_party/llama.cpp/convert.py \
models/exports/gemma-2b-aura-comms-merged \
--outtype q4_K_M \
--outfile models/exports/gemma-2b-aura-comms.Q4_K_M.gguf
# Convert to MLX 4-bit for iOS dev
python -m mlx_lm.convert \
--hf-path models/exports/gemma-2b-aura-comms-merged \
--mlx-path models/exports/gemma-2b-aura-comms-mlx \
-q --q-bits 4
Verify the GGUF file size is ~1.5 GB and the MLX directory is ~1.5 GB. [TEAM TO VERIFY] MediaPipe LLM Inference 0.10+ Gemma 2B + LoRA conversion (technical_spec.md §9.1).
# Unit tests
pytest agents/ -m unit
# Property-based fuzz tests
pytest agents/ -m hypothesis
# Latency assertions via pytest-benchmark
pytest agents/ -m benchmark
# Orchestrator state-machine tests
pytest orchestrator/tests/
# End-to-end journey replays (smoke set)
pytest e2e/ -m journey
# Full nightly run including all fixtures
pytest e2e/ --tb=short
Coverage target per technical_spec.md §12.1: 85% line, 75% branch per agent.
Update golden traces only with UPDATE_GOLDEN=1:
UPDATE_GOLDEN=1 pytest e2e/ -m journey
The Phase 1 / Phase 2 demo runs five segments per plan.md §28: Morning Brief, Quiet Group Chat, Closed-Loop Stress, Spend Mirror, Memory Graph.
Live device path:
# 1. Wake iPhone, ensure Apple Watch is paired and worn.
# 2. Open Aura. Verify Silence Budget = 3.
# 3. Trigger demo notification storm from Mac:
python scripts/demo_notif_storm.py --target-device <udid> --count 47 --window-min 8
# 4. Trigger synthetic HRV drop via the dev menu (long-press title bar -> "Inject HRV drop").
# 5. Send a synthetic UPI SMS via the dev menu -> "Inject UPI debit".
# 6. Open Memory tab -> Export to JSON -> Audit log roots.
Backup video path (if live fails):
open "deck/phase3_pitch/demo_backup_90s.mp4"
The 90-second backup video is cued and ready on the laptop per plan.md §28 backup protocol.
cd pilot/analysis
jupyter lab loadscore_eval.ipynb
The notebook reads anonymised raw CSVs from pilot/analysis/raw_phase2.csv, computes Spearman ρ between Load Score and self-rated stress, runs paired t-tests for the five baseline-vs-prototype tasks, and emits a slide-9 figure pack into pilot/analysis/figures/.
[TEAM TO VERIFY]technical_spec.md §6.3).technical_spec.md §9.1).technical_spec.md §8.1).End of runbook.md.