Local-first context cleaner for AI agents

Stop paying LLMs to read raw HTML, repeated logs, and secret-shaped noise.

ContextClean turns noisy files, web exports, terminal output, and project folders into compact, reviewable, token-budgeted context packs with reports that explain the savings.

$ ctxclean report benchmarks/fixtures/github_actions_failure_large.log \
  --mode aggressive --max-tokens 3200 --format text

ContextClean Report
input_tokens:      75,768
output_tokens:      3,200
tokens_saved:      72,568
reduction_percent:  95.8%

biggest_noise_sources:
- log_noise: 31,692 tokens
- repetition: 12,586 tokens
- stack_trace: 3,620 tokens

recommended_command:
ctxclean ... --mode aggressive --max-tokens 3200

Measured launch fixtures

Exact token counts, checked into the repo

Fixture Before After Saved Reduction
HTML scrape 70,571 5,874 64,697 91.7%
CI failure log 75,768 3,200 72,568 95.8%
Stack trace dump 28,189 1,850 26,339 93.4%

Fixture-backed excerpts

From noisy HTML and logs to model-ready context

HTML Before

<script>window.analytics = true;</script>
<nav>Home Pricing Login Cookie preferences</nav>
<div id="cookie-banner">Accept all cookies</div>
<main>
  <h1>API Setup &amp; Troubleshooting</h1>
  <p>Read <a href="/docs/setup">setup guide</a>.</p>
  <table><tr><th>Mode</th><th>Keeps</th></tr></table>
</main>
<footer>Newsletter signup</footer>

HTML After

# API Setup & Troubleshooting

Read [setup guide](/docs/setup).

| Mode | Keeps |
| --- | --- |

Tokens saved: 266
Reduction: 71.7%

Log Before

added 481 packages
found 0 vulnerabilities
2026-07-04T10:00:01Z warn Connection timeout to database
2026-07-04T10:00:02Z warn Connection timeout to database
2026-07-04T10:00:03Z warn Connection timeout to database
FAIL packages/api/user.test.ts
Unique failure:
TypeError: Cannot read properties of undefined
    at loadUser (/app/src/user.ts:42:13)
    at main (/app/src/main.ts:8:1)
    at loadUser (/app/src/user.ts:42:13)
    at main (/app/src/main.ts:8:1)
Final error summary: request failed after retries

Log After

[Repeated 3 times from 2026-07-04T10:00:01Z to 2026-07-04T10:00:03Z] Connection timeout to database
FAIL packages/api/user.test.ts
Unique failure:
TypeError: Cannot read properties of undefined
    at loadUser (/app/src/user.ts:42:13)
    at main (/app/src/main.ts:8:1)
[Collapsed stack frames: 2 duplicate frames removed]
Final error summary: request failed after retries

Tokens saved: 43
Reduction: 28.9%

Cleaner + Crusher + Reporter

Built for the context AI agents actually need

HTML Cleaner

Remove scripts, styles, navs, footers, comments, cookie banners, modals, ads, and tracking blocks.

Preserve Structure

Keep headings, links, paragraphs, tables, lists, inline code, and fenced code blocks readable.

Log Crusher

Collapse repeated lines and duplicate stack frames while preserving failed tests, timestamps, and final errors.

Budget Packer

Use exact token counting, --max-tokens, and model presets for GPT, Claude, and Gemini-sized contexts.

Context Reports

Run ctxclean report to see input tokens, output tokens, compression ratio, biggest noise, and the command to use next.

Agent Integrations

Use ctxclean gha, ctxclean repo, ctxclean mcp, and ctxrun inside daily AI workflows.

Protect

Redact secret-like values by default, respect ignore files, skip generated paths, and require explicit sensitive-file opt-in.

Run locally

No API keys. No telemetry. No cloud dependency.

git clone https://github.com/karurikwao/contextclean.git
cd contextclean
cargo install --path crates/contextclean-cli

ctxclean gha build.log --max-tokens 3200 --format markdown
ctxclean repo . --fit gpt-4.1 --output context.md
ctxrun --max-tokens 8000 npm test

Trust model

Context stays on your machine.