Privacy & Security¶
"Will this steal my code?" — No. Here's exactly what ToolWitness sees and doesn't see.
What ToolWitness Sees vs. Doesn't See¶
ToolWitness is a passive observer at the tool boundary. What it can see depends on which integration path you use:
| ToolWitness sees | ToolWitness does NOT see |
|---|---|
| Tool function name + arguments | Your source code |
| Tool return value | Other files in your project |
| Agent's text response (for verification) | Environment variables or secrets |
| Timing of tool calls | Network traffic outside tool calls |
| Nothing else | Your prompts, system messages, or history |
| ToolWitness sees | ToolWitness does NOT see |
|---|---|
tools/call request names + arguments |
Agent's text responses (these stay inside the host) |
tools/call response data |
Conversation history, prompts, or system messages |
| Timing of tool calls | Files on disk, source code, or environment variables |
| Nothing else | Any host-internal state (Cursor sessions, editor context) |
The proxy processes tools/call JSON-RPC messages only. All other protocol messages (initialize, tools/list, notifications) pass through without being recorded. The proxy runs as a local subprocess — it makes no network calls of its own.
Local-First¶
All data is stored in a local SQLite database at ~/.toolwitness/toolwitness.db.
- File permissions set to
0600on Unix (owner read/write only) - No cloud service, no accounts, no signup
- Your data never leaves your machine unless you explicitly configure external alerting (webhooks)
No Telemetry¶
Telemetry is off by default. ToolWitness does not:
- Phone home to any server
- Send analytics, crash reports, or usage data
- Ping any endpoint on startup or during operation
If future versions add optional telemetry, it will always be opt-in, never opt-out.
No Training¶
Your data is never used to train models. ToolWitness is a local verification tool, not a data collection service. Tool inputs, outputs, and agent responses stay on your machine.
Fail-Open¶
If ToolWitness itself has a bug or encounters an error:
- Your tools still work. Internal errors are caught and logged, never raised into your tool execution path.
- The affected verification is classified as
UNMONITOREDwith the error logged as a structuredTOOLWITNESS_INTERNAL_ERROR. - ToolWitness never blocks, delays, or modifies your agent's tool calls.
Narrow Scope¶
ToolWitness monitors only the tool I/O boundary:
| In scope | Out of scope |
|---|---|
| Tool function name and arguments | Source code in your repo |
| Tool return values | Files on disk |
| Agent text responses (for verification) | Environment variables and secrets |
| Execution timing | System prompts or conversation history |
| Network traffic beyond tool calls |
| In scope | Out of scope |
|---|---|
tools/call request names and arguments |
Agent text responses (host-internal) |
tools/call response data |
Conversation history, prompts, system messages |
| Execution timing | Source code, files on disk, environment variables |
| All non-tool JSON-RPC messages (pass through unrecorded) | |
| Any host-internal state (Cursor editor, Claude Desktop UI) |
This means ToolWitness cannot accidentally expose your proprietary code, configuration, or prompt engineering — it never has access to any of it. The MCP Proxy has an even narrower scope than the SDK because it never sees the agent's text responses.
Alert Privacy¶
When you configure webhook or Slack alerts, you're sending data outside your machine. Here's exactly what goes out.
What each detail level sends¶
| Field | Summary mode | Full mode |
|---|---|---|
Tool name (e.g. get_file_info) |
:material-check: | :material-check: |
Classification (e.g. fabricated) |
:material-check: | :material-check: |
Confidence score (e.g. 0.85) |
:material-check: | :material-check: |
| Session ID | :material-check: | :material-check: |
| Claimed vs actual values | :material-close: | :material-check: |
| Receipt ID | :material-close: | :material-check: |
What is NEVER sent in alerts¶
Regardless of detail level, alert payloads never include:
- Source code or file contents
- Agent prompt text, system messages, or conversation history
- Environment variables, API keys, or credentials
- Full tool output data (even in "full" mode, only the mismatched values are included — not the complete tool response)
Slack message format¶
A Slack alert looks like this:
That's it — four lines. No file contents, no code, no prompts.
Configuration¶
Default is summary — ToolWitness errs on the side of privacy.
Digest Privacy¶
The toolwitness digest --send command delivers a periodic summary to Slack or webhook. The digest payload contains aggregate counts only:
- Total verifications in the period
- Total failures (count and rate)
- Breakdown by classification (verified: 44, fabricated: 2, skipped: 1)
- Top offending tool names with failure counts
The digest never includes individual tool outputs, file contents, agent responses, or any raw data from specific verifications. It's a statistical summary — like getting "3 build failures today" without the build logs.
Threshold Alert Privacy¶
Threshold alerts fire when failure counts or rates breach a configured limit (e.g. 10+ failures in 60 minutes). The alert payload contains:
- The name of the worst-offending tool
- Its classification and confidence
- The threshold that was breached (e.g. "10 failures in 60min")
No tool output data, no agent text, no file contents. The alert tells you something is wrong and which tool — you investigate the details on your local dashboard.
Open Source¶
ToolWitness is licensed under Apache 2.0. Every line of code is inspectable:
- No hidden data collection
- No obfuscated binaries
- No proprietary components in the core
:fontawesome-brands-github: View the source code
MCP Proxy Privacy¶
The toolwitness proxy command deserves specific attention because it sits in the data path of an MCP host like Cursor or Claude Desktop:
- Local subprocess only — the proxy runs on your machine as a child process of the MCP host. It makes zero network calls.
- Records
tools/callonly — the proxy sees all JSON-RPC messages but only recordstools/callrequests and their corresponding responses. Protocol handshake messages (initialize,tools/list, notifications) pass through without being stored. - No host-internal access — the proxy cannot see your conversation history, system prompts, editor state, or anything that stays inside the MCP host. It only sees what flows over the stdio pipe between host and server.
- Same SQLite storage — recorded data goes to the same local SQLite database as SDK data, with the same
0600permissions.
Verification Bridge Privacy¶
The verification bridge (toolwitness verify CLI and tw_verify_response MCP tool) compares the agent's response text against tool outputs. Here's the data flow:
- Agent response text — you provide this (via CLI or the agent calls
tw_verify_response). It is compared against tool outputs locally and stored in the local SQLite database. It is never transmitted anywhere. - Proxy-recorded tool outputs — read from local SQLite (where the proxy recorded them). Never leave the database.
- Self-reported tool outputs (optional) — when the agent passes
tool_outputstotw_verify_response, these are compared in memory and immediately discarded. Raw tool content (file contents, shell output, search results) is never written to SQLite. Only the verification verdict (tool name, classification, confidence, evidence keys) is persisted. - Verification results — classification, confidence, and evidence are written to local SQLite. Visible on the local dashboard.
If you have alerting configured, the bridge sends alerts through the same channels with the same privacy guarantees described above — classification metadata only, never raw data.
The bridge's text grounding engine (used for long outputs like file contents) extracts claims from the agent's text and checks them against the source. This happens entirely in local memory. No external API, no cloud service, no network call.
Self-Report Privacy¶
When using full-coverage verification (the default --cursor-rule), the agent includes raw tool outputs in the tw_verify_response call. This data flows through the local MCP connection to the ToolWitness server process on your machine.
| What happens | Detail |
|---|---|
Agent passes tool outputs to tw_verify_response |
Via local MCP connection (localhost only) |
| ToolWitness compares response vs outputs | In memory — no disk write of raw content |
| Raw tool outputs are discarded | After classification completes |
| Only verdicts are persisted | Tool name, classification, confidence, evidence keys |
| No file contents stored | Not in SQLite, not in logs, not anywhere |
| No network calls | ToolWitness has no HTTP client, no telemetry |
Self-reported data receives the same security treatment as any other local tool call — it stays on your machine and is processed transiently.
Security Model Summary¶
| Property | Guarantee |
|---|---|
| Data storage | Local SQLite, 0600 permissions |
| Telemetry | Off by default, opt-in only |
| Training | Never — your data is not used |
| Failure mode | Fail-open — never blocks your agent |
| Scope | Tool I/O only — no code, no secrets, no prompts (proxy scope is even narrower) |
| Verification bridge | Compares locally, stores locally — response text never transmitted |
| Self-reported outputs | Compared in memory, discarded immediately — raw content never persisted |
| Alerts (summary) | Classification + tool name + confidence only — no raw data |
| Alerts (full) | Adds mismatched values — still no source code, prompts, or file contents |
| Digest reports | Aggregate counts only — no individual tool outputs |
| Threshold alerts | Tool name + classification + breach reason — no raw data |
| License | Apache 2.0 — fully open source |
| Dependencies | Core engine: zero external dependencies |