Contributing¶
Thank you for your interest in contributing! ToolWitness is an open-source project and we welcome contributions of all kinds.
Development setup¶
The mcp extra is required for tests/test_mcp_server.py (FastMCP verification server). Without it, those tests fail with ModuleNotFoundError: No module named 'mcp'.
Running tests¶
To run only the verification scenario harness (multi-tool scenarios, structural limits, semantic pairing):
Linting¶
We use ruff for linting and formatting:
Code style¶
- Type hints on all public functions
- Docstrings on all public classes and modules
- No unused imports (ruff enforces this)
- Imports sorted per ruff/isort conventions
Project structure¶
src/toolwitness/
├── core/ # Types, receipt generation, monitor, classifier, detector
├── verification/ # Structural matching, schema checking, chain verification
├── adapters/ # OpenAI, Anthropic, LangChain, MCP, CrewAI
├── alerting/ # Webhook/Slack channels, alert rules and engine
├── storage/ # SQLite backend (abstract base + implementation)
├── reporting/ # HTML report generator, remediation cards, about page
├── dashboard/ # Local web dashboard server
├── proxy/ # Transparent stdio proxy for MCP servers
├── cli/ # Click-based command-line interface
└── config.py # Configuration system (env > YAML > defaults)
Adding a new framework adapter¶
- Create
src/toolwitness/adapters/your_framework.py - Follow the pattern of existing adapters (e.g.,
openai.py):- Accept optional
storageandsession_idparameters - Wire into
ExecutionMonitorfor receipts - Implement
verify()returninglist[VerificationResult]
- Accept optional
- Add tests in
tests/test_adapters/test_your_framework.py - Add a docs page at
docs/adapters/your_framework.md - Update
mkdocs.ymlnavigation - Update the README with usage examples
Adding false-positive corpus entries¶
If you find a legitimate response that ToolWitness incorrectly flags:
- Add a case to
tests/test_false_positives.pyin theFALSE_POSITIVE_CORPUSlist - If the case reveals a known structural matching limitation, document it with a comment and include
Classification.FABRICATEDin the acceptable set - Run the full test suite to confirm the overall FP rate stays acceptable
Pull requests¶
- One feature/fix per PR
- Include tests for new functionality
- Ensure
ruff checkandpytestboth pass - Update
CHANGELOG.mdwith your changes
Reporting issues¶
Use the GitHub issue templates:
- Bug report — for incorrect classifications, crashes, or unexpected behavior
- Feature request — for new adapters, verification strategies, or UI improvements