CLI · Self-hosted Platform · v0.5.0 candidate
npm version badge npm downloads badge

AI Development
Performance Evaluator

Two ways to use it: a local CLI for individual engineers reading their own Claude Code & Codex usage, or a self-hosted Platform with multi-tenant gateway, evaluator, and cost budgeting for engineering teams.

AI 開發績效評核工具 — 個人 CLI 讀取本地資料;團隊 Platform 提供 multi-tenant gateway、評核器與成本控管。

Compare Modes npm (CLI) GitHub

Two ways to use aide

Pick whichever fits — they share no runtime state.

CLI mode

v0.1.x · for individual engineers

  • Reads ~/.claude/ and ~/.codex/ on your laptop
  • Generates terminal / JSON / Markdown / HTML reports
  • Bring-your-own evaluation standard (default: OneAD R&D)
  • Monthly / quarterly KPI presets
  • 100% local, read-only — nothing leaves your machine
  • npm install -g @hanfour.huang/aide
v0.5.0 candidate

Platform mode

v0.2 → v0.5 · self-hosted for teams

  • Next.js + Fastify web platform with org-scoped RBAC + invites + audit log
  • Gateway proxies /v1/messages + /v1/chat/completions with shared upstream account pool, per-user API keys, usage + cost dashboards
  • Evaluator subsystem: opt-in content capture, rule-based + LLM scoring, custom rubrics with dry-run preview, GDPR delete workflow
  • Per-org monthly LLM cost budget (degrade / halt) + ledger + cost dashboard
  • Opt-in LLM facet extraction + 6 facet-based rubric signal types
  • Multi-arch Docker images on ghcr.io/hanfour/aide-{api,web,gateway}

CLI features

Everything an individual engineer needs to evaluate their own AI-assisted development. (Platform-mode capabilities are listed in the Platform section below.)

📊

Multi-source Extraction

Reads Claude Code session metadata, facets, SQLite cost data, JSONL conversations, and Codex thread data from local storage.

💡

Decision Pattern Detection

Detects iterative refinement, multi-task coordination, and active corrections to evaluate decision-making quality.

🛡

Risk Signal Analysis

Identifies security awareness, performance discussions, bug catching, and other risk-control behaviors.

📄

Multiple Output Formats

Generate reports in colored terminal, JSON (machine-parseable stdout), or Markdown format.

Configurable Standards

Bring your own evaluation criteria, keywords, thresholds, and superior rules via custom JSON standard files.

🔒

Privacy-first & Local

All data is read locally and read-only. No data is sent to any external service. Your code stays yours.

📅

Period Presets

Built-in monthly and quarterly commands with --previous flag for calendar-aligned KPI reviews.

📈

Management Summary

Auto-generated executive summary with headline, observations, and recommendations for KPI reviews.

🔧

Persistent Config

Set defaults for locale, theme, output format, period days, and data directories via config command.

How It Works

Both modes share an "extract → analyze → score → report" pipeline; Platform adds storage + multi-tenancy + LLM augmentation.

Step 1
Extract
Read JSON, JSONL, SQLite from ~/.claude/ & ~/.codex/
Step 2
Analyze
Aggregate metrics & detect patterns
Step 3
Score
Compare against rubric thresholds
Step 4
Report
Text, JSON, Markdown, or HTML to stdout
Step 1
Capture
Gateway intercepts /v1/messages; bodies AES-256-GCM-encrypted into request_bodies
Step 2
Enrich
Optional LLM facet extraction → request_body_facets (per-session classification)
Step 3
Score
Rule engine + optional LLM Deep Analysis; cost gate via per-org monthly budget
Step 4
Surface
Web reports, cost dashboard, leaderboards, export, GDPR delete

Platform mode

Self-hosted multi-tenant evaluator. Released incrementally — every version is a feature flag flip away from the previous one.

v0.2.0

Web platform foundation

Next.js UI + Fastify API on Postgres + Redis. OAuth sign-in, organization-scoped RBAC, member invites, immutable audit log.

  • Multi-org / department / team hierarchy
  • Role-based action gating throughout the API
  • Audit trail for every privileged change
v0.3.0

Gateway

Anthropic-native + OpenAI-compatible HTTP proxy that pools admin-donated upstream accounts and meters per-user usage.

  • Pool of sk-ant-... + OAuth bundles; scheduler picks per request
  • Per-user platform API keys (ak_...) authenticated against the pool
  • usage_logs with per-call token / cost attribution
  • Dashboards: per-user, per-org, per-team usage drill-downs
v0.4.0

Evaluator subsystem

Opt-in content capture + dual-layer evaluation (rule-based + optional LLM Deep Analysis), with admin-customizable scoring rubrics.

  • AES-256-GCM body encryption (HKDF domain key); 90-day retention with override
  • Platform default rubrics seeded for en / zh-Hant / ja
  • Custom rubrics with Zod-validated signals (15 types after v0.5.0)
  • GDPR member-initiated delete request workflow with 30-day SLA
  • Labor-law-friendly transparency: members always see own full report
v0.5.0 — Plan 4C

Cost budget infrastructure

Per-org monthly USD cap with two enforcement modes. Spend tracked in a dedicated ledger + visualised in a cost dashboard.

  • llm_monthly_budget_usd + llm_budget_overage_behavior (degrade or halt)
  • llm_usage_events ledger (one row per LLM call)
  • UTC monthly rollover; halt state auto-clears at month boundary
  • Cost dashboard at /dashboard/organizations/<id>/evaluator/costs
  • Settings form fieldset for the budget knobs + cross-field validation
v0.5.0 — Plan 4C

LLM facet extraction

Opt-in second LLM pass that classifies each captured session into structured JSON, persisted for rubric signals + report drill-downs.

  • Per-session classification: sessionType, outcome, claudeHelpfulness, frictionCount, bugsCaughtCount, codexErrorsCount
  • request_body_facets table; prompt_version cache prevents duplicate LLM calls
  • 6 new rubric signal types (facet_*) wired end-to-end
  • Platform default rubrics bumped to v1.1.0 with additive facet supports
  • Report-page facet drill-down (silently hidden when no rows)
v0.5.0 — Plan 4C

Operability

Production-ready observability shipped with the release. The pipeline can be deployed and supported without secondary tooling.

  • 3 Grafana dashboards (evaluator / body-capture / GDPR)
  • 11 Prometheus alert rules (DLQ / purge lag / GDPR SLA / facet failure / cost budget)
  • 9 runbooks under docs/runbooks/
  • Post-release smoke workflow auto-creates a release-blocker issue on canary failure
  • SSE → StreamTranscript integration test against MSW-mocked Anthropic
Full changelog v0.5.0 upgrade guide Evaluator docs Gateway docs

CLI data sources

All data is read locally. Nothing leaves your machine. (Platform mode persists captured request bodies in Postgres with AES-256-GCM encryption — see the Platform section above.)

SourcePathData
Claude Code Sessions~/.claude/usage-data/session-meta/*.jsonTokens, duration, tools, languages, git commits
Claude Code Facets~/.claude/usage-data/facets/*.jsonGoals, outcomes, friction, helpfulness
Claude Code Costs~/.claude/__store.dbPer-message cost (USD), model, duration
Claude Code JSONL~/.claude/projects/*/*.jsonlFull conversations for keyword scanning
Codex Threads~/.codex/state_5.sqliteTokens, model, title, git info
Codex History~/.codex/history.jsonlFull user prompts by thread
Codex Logs~/.codex/logs_2.sqliteTool calls and error events

Installation

Pick the path that matches your use case. Requires Node.js ≥ 18 for the CLI, or Docker + Postgres + Redis for Platform mode.

CLI mode (npm)

bash
# Install globally from npm
npm install -g @hanfour.huang/aide

# Verify installation
aide --help

# Update to latest
npm install -g @hanfour.huang/aide@latest

Platform mode (Docker)

Multi-arch images on ghcr.io/hanfour/aide-{api,web,gateway}. aide-web is amd64-only since v0.5.0; aide-api and aide-gateway publish both amd64 and arm64.

bash — quick start
# Clone, configure, start core stack (api + web + postgres + redis)
git clone https://github.com/hanfour/aide.git && cd aide/docker
cp .env.example .env   # fill in OAuth + bootstrap email + secrets
docker compose up -d

# Opt-in: add gateway service
docker compose --profile gateway up -d

# Opt-in: enable evaluator subsystem (v0.4.0+)
echo 'ENABLE_EVALUATOR=true' >> .env

# Opt-in: enable LLM facet extraction (v0.5.0)
echo 'ENABLE_FACET_EXTRACTION=true' >> .env

Full setup walk-through: docs/SELF_HOSTING.md · Gateway operator guide: docs/GATEWAY.md · v0.5.0 upgrade: docs/UPGRADE-v0.5.0.md

CLI usage

Generate reports with a single command. (Platform mode is operated via the web UI; see the Platform section above for screenshots / docs.)

bash
# Quick summary (last 7 days)
aide summary

# Full report (last 30 days)
aide report

# Monthly KPI report as Markdown
aide monthly --format markdown --output report.md

# HTML report for manager review
aide report --format html --output report.html

# Previous quarter, JSON for automation
aide quarterly --previous --format json | jq '.sections[].score'

# Custom date range with engineer metadata
aide report --since 2026-03-01 --until 2026-03-31 \
  --engineer "Jane Doe" --department "R&D"

# Use a custom evaluation standard
aide report --standard my-standard.json
bash — config
# View settings
aide config

# Set defaults
aide config set locale zh-TW
aide config set theme no-color
aide config set defaultFormat markdown
aide config set defaultPeriodDays 14

# Reset to defaults
aide config reset

Default Scoring Standard

Built-in OneAD R&D AI-Application Evaluation Standard. Fully customizable. CLI ships the v1.0.0 baseline; Platform mode v0.5.0 ships v1.1.0 with additive facet supports.

20% KPI

AI Interaction & Decision

100%
Actively use AI for coding; clear decision notes
120%
Multi-iteration guidance (A→B→C); system-constraint-aware optimization
50% KPI

AI Identification & Risk Control

100%
Catch common AI errors / hallucinations; stable code
120%
Identify critical risks (security, performance, memory); produce SOP/Wiki for team