aide — AI Development Performance Evaluator

Two ways to use aide

Pick whichever fits — they share no runtime state.

CLI mode

v0.1.x · for individual engineers

Reads ~/.claude/ and ~/.codex/ on your laptop
Generates terminal / JSON / Markdown / HTML reports
Bring-your-own evaluation standard (default: OneAD R&D)
Monthly / quarterly KPI presets
100% local, read-only — nothing leaves your machine
npm install -g @hanfour.huang/aide

Install CLI CLI usage

v0.5.0 candidate

Platform mode

v0.2 → v0.5 · self-hosted for teams

Next.js + Fastify web platform with org-scoped RBAC + invites + audit log
Gateway proxies /v1/messages + /v1/chat/completions with shared upstream account pool, per-user API keys, usage + cost dashboards
Evaluator subsystem: opt-in content capture, rule-based + LLM scoring, custom rubrics with dry-run preview, GDPR delete workflow
Per-org monthly LLM cost budget (degrade / halt) + ledger + cost dashboard
Opt-in LLM facet extraction + 6 facet-based rubric signal types
Multi-arch Docker images on ghcr.io/hanfour/aide-{api,web,gateway}

See Platform Self-hosting guide

CLI features

Everything an individual engineer needs to evaluate their own AI-assisted development. (Platform-mode capabilities are listed in the Platform section below.)

📊

Multi-source Extraction

Reads Claude Code session metadata, facets, SQLite cost data, JSONL conversations, and Codex thread data from local storage.

💡

Decision Pattern Detection

Detects iterative refinement, multi-task coordination, and active corrections to evaluate decision-making quality.

🛡

Risk Signal Analysis

Identifies security awareness, performance discussions, bug catching, and other risk-control behaviors.

📄

Multiple Output Formats

Generate reports in colored terminal, JSON (machine-parseable stdout), or Markdown format.

⚙

Configurable Standards

Bring your own evaluation criteria, keywords, thresholds, and superior rules via custom JSON standard files.

🔒

Privacy-first & Local

All data is read locally and read-only. No data is sent to any external service. Your code stays yours.

📅

Period Presets

Built-in monthly and quarterly commands with --previous flag for calendar-aligned KPI reviews.

📈

Management Summary

Auto-generated executive summary with headline, observations, and recommendations for KPI reviews.

🔧

Persistent Config

Set defaults for locale, theme, output format, period days, and data directories via config command.

How It Works

Both modes share an "extract → analyze → score → report" pipeline; Platform adds storage + multi-tenancy + LLM augmentation.

Step 1

Extract

Read JSON, JSONL, SQLite from ~/.claude/ & ~/.codex/

➔

Step 2

Analyze

Aggregate metrics & detect patterns

➔

Step 3

Score

Compare against rubric thresholds

➔

Step 4

Report

Text, JSON, Markdown, or HTML to stdout

Step 1

Capture

Gateway intercepts /v1/messages; bodies AES-256-GCM-encrypted into request_bodies

➔

Step 2

Enrich

Optional LLM facet extraction → request_body_facets (per-session classification)

➔

Step 3

Score

Rule engine + optional LLM Deep Analysis; cost gate via per-org monthly budget

➔

Step 4

Surface

Web reports, cost dashboard, leaderboards, export, GDPR delete

Platform mode

Self-hosted multi-tenant evaluator. Released incrementally — every version is a feature flag flip away from the previous one.

v0.2.0

Web platform foundation

Next.js UI + Fastify API on Postgres + Redis. OAuth sign-in, organization-scoped RBAC, member invites, immutable audit log.

Multi-org / department / team hierarchy
Role-based action gating throughout the API
Audit trail for every privileged change

v0.3.0

Gateway

Anthropic-native + OpenAI-compatible HTTP proxy that pools admin-donated upstream accounts and meters per-user usage.

Pool of sk-ant-... + OAuth bundles; scheduler picks per request
Per-user platform API keys (ak_...) authenticated against the pool
usage_logs with per-call token / cost attribution
Dashboards: per-user, per-org, per-team usage drill-downs

v0.4.0

Evaluator subsystem

Opt-in content capture + dual-layer evaluation (rule-based + optional LLM Deep Analysis), with admin-customizable scoring rubrics.

AES-256-GCM body encryption (HKDF domain key); 90-day retention with override
Platform default rubrics seeded for en / zh-Hant / ja
Custom rubrics with Zod-validated signals (15 types after v0.5.0)
GDPR member-initiated delete request workflow with 30-day SLA
Labor-law-friendly transparency: members always see own full report

v0.5.0 — Plan 4C

Cost budget infrastructure

Per-org monthly USD cap with two enforcement modes. Spend tracked in a dedicated ledger + visualised in a cost dashboard.

llm_monthly_budget_usd + llm_budget_overage_behavior (degrade or halt)
llm_usage_events ledger (one row per LLM call)
UTC monthly rollover; halt state auto-clears at month boundary
Cost dashboard at /dashboard/organizations/<id>/evaluator/costs
Settings form fieldset for the budget knobs + cross-field validation

v0.5.0 — Plan 4C

LLM facet extraction

Opt-in second LLM pass that classifies each captured session into structured JSON, persisted for rubric signals + report drill-downs.

Per-session classification: sessionType, outcome, claudeHelpfulness, frictionCount, bugsCaughtCount, codexErrorsCount
request_body_facets table; prompt_version cache prevents duplicate LLM calls
6 new rubric signal types (facet_*) wired end-to-end
Platform default rubrics bumped to v1.1.0 with additive facet supports
Report-page facet drill-down (silently hidden when no rows)

v0.5.0 — Plan 4C

Operability

Production-ready observability shipped with the release. The pipeline can be deployed and supported without secondary tooling.

3 Grafana dashboards (evaluator / body-capture / GDPR)
11 Prometheus alert rules (DLQ / purge lag / GDPR SLA / facet failure / cost budget)
9 runbooks under docs/runbooks/
Post-release smoke workflow auto-creates a release-blocker issue on canary failure
SSE → StreamTranscript integration test against MSW-mocked Anthropic

Full changelog v0.5.0 upgrade guide Evaluator docs Gateway docs

CLI data sources

All data is read locally. Nothing leaves your machine. (Platform mode persists captured request bodies in Postgres with AES-256-GCM encryption — see the Platform section above.)

Source	Path	Data
Claude Code Sessions	`~/.claude/usage-data/session-meta/*.json`	Tokens, duration, tools, languages, git commits
Claude Code Facets	`~/.claude/usage-data/facets/*.json`	Goals, outcomes, friction, helpfulness
Claude Code Costs	`~/.claude/__store.db`	Per-message cost (USD), model, duration
Claude Code JSONL	`~/.claude/projects//.jsonl`	Full conversations for keyword scanning
Codex Threads	`~/.codex/state_5.sqlite`	Tokens, model, title, git info
Codex History	`~/.codex/history.jsonl`	Full user prompts by thread
Codex Logs	`~/.codex/logs_2.sqlite`	Tool calls and error events

Installation

Pick the path that matches your use case. Requires Node.js ≥ 18 for the CLI, or Docker + Postgres + Redis for Platform mode.

CLI mode (npm)

bash

# Install globally from npm
npm install -g @hanfour.huang/aide

# Verify installation
aide --help

# Update to latest
npm install -g @hanfour.huang/aide@latest

Platform mode (Docker)

Multi-arch images on ghcr.io/hanfour/aide-{api,web,gateway}. aide-web is amd64-only since v0.5.0; aide-api and aide-gateway publish both amd64 and arm64.

bash — quick start

# Clone, configure, start core stack (api + web + postgres + redis)
git clone https://github.com/hanfour/aide.git && cd aide/docker
cp .env.example .env   # fill in OAuth + bootstrap email + secrets
docker compose up -d

# Opt-in: add gateway service
docker compose --profile gateway up -d

# Opt-in: enable evaluator subsystem (v0.4.0+)
echo 'ENABLE_EVALUATOR=true' >> .env

# Opt-in: enable LLM facet extraction (v0.5.0)
echo 'ENABLE_FACET_EXTRACTION=true' >> .env

Full setup walk-through: docs/SELF_HOSTING.md · Gateway operator guide: docs/GATEWAY.md · v0.5.0 upgrade: docs/UPGRADE-v0.5.0.md

CLI usage

Generate reports with a single command. (Platform mode is operated via the web UI; see the Platform section above for screenshots / docs.)

bash

# Quick summary (last 7 days)
aide summary

# Full report (last 30 days)
aide report

# Monthly KPI report as Markdown
aide monthly --format markdown --output report.md

# HTML report for manager review
aide report --format html --output report.html

# Previous quarter, JSON for automation
aide quarterly --previous --format json | jq '.sections[].score'

# Custom date range with engineer metadata
aide report --since 2026-03-01 --until 2026-03-31 \
  --engineer "Jane Doe" --department "R&D"

# Use a custom evaluation standard
aide report --standard my-standard.json

bash — config

# View settings
aide config

# Set defaults
aide config set locale zh-TW
aide config set theme no-color
aide config set defaultFormat markdown
aide config set defaultPeriodDays 14

# Reset to defaults
aide config reset

AI Development
Performance Evaluator

Two ways to use aide

CLI mode

Platform mode

CLI features

Multi-source Extraction

Decision Pattern Detection

Risk Signal Analysis

Multiple Output Formats

Configurable Standards

Privacy-first & Local

Period Presets

Management Summary

Persistent Config

How It Works

Platform mode

Web platform foundation

Gateway

Evaluator subsystem

Cost budget infrastructure

LLM facet extraction

Operability

CLI data sources

Installation

CLI mode (npm)

Platform mode (Docker)

CLI usage

Default Scoring Standard

AI Interaction & Decision

AI Identification & Risk Control