Share
## https://sploitus.com/exploit?id=197F5F1A-4022-5470-BA29-351D92AC0901
# Sentinel β Agentic Code & System Quality Guardian
Production-grade AI agent for automated code review, security analysis, and system quality assessment. Built with LangGraph orchestration, real tool execution (bandit, ruff, custom secret scanner), a configurable policy engine, and AgentOps-compatible observability.
**Not a demo.** 43-unit pytest suite. 10 benchmark tasks across 5 vulnerability categories. CI-reproducible with deterministic simulated backend β no API keys needed.
## Quick Start
```bash
pip install -e ".[dev]"
sentinel scan fixtures/vulnerable # Full scan
sentinel scan fixtures/vulnerable --plan security # Security-focused
sentinel scan fixtures/vulnerable --plan quick # Lint-only
sentinel benchmark # Run evaluation suite
```
## Architecture
```
βββββββββββββββββββββββββββββββββββ
β Sentinel Agent β
β β
Target Code βββΆ β Plan βββΆ Execute βββΆ Verify βββΆββββΆ Report
β β β β β
β βΌ βΌ βΌ β
β Planner Tool Policy β
β (sim/LLM) Registry Engine β
βββββββββββββββββββββββββββββββββββ
β
βββββββββββΌββββββββββ
βΌ βΌ βΌ
bandit ruff sentinel-secrets
(SAST) (linter) (regex scanner)
```
### Agent Pipeline
1. **Plan**: Selects analysis strategy (full, security, quick, secrets) and tools
2. **Execute**: Runs real analysis tools against target code with typed output parsing
3. **Verify**: Applies configurable policy engine β severity thresholds, blocking categories, allowlists
4. **Report**: Structured findings with severity, CWE IDs, locations, and remediation
## Features
### Deployable AI Agent Service (NEW v0.2)
- **REST API** β FastAPI server with POST /scan, GET /scans, GET /scans/{id}, DELETE /scans/{id}
- **Persistent Storage** β SQLite database for scan history, findings, and metrics
- **Live Dashboard** β Built-in HTML dashboard showing scan stats, recent results, and API reference
- **Stats & Monitoring** β /stats endpoint with pass/fail rates, /health endpoint for orchestration
- **Docker Compose** β API + worker deployment with health checks and persistent volume
### Multi-Tool Analysis (v0.1)
- **bandit** β Python SAST (SQL injection, command injection, hardcoded credentials, unsafe deserialization, weak crypto)
- **ruff** β Fast Python linter with security rule support
- **sentinel-secrets** β Built-in regex scanner for API keys, passwords, tokens, AWS credentials, GitHub PATs, private keys
- **safety** β Dependency vulnerability scanner (CVE database)
### Configurable Policy Engine
- Global severity thresholds (max_critical, max_high, max_medium, max_low)
- Per-category thresholds with blocking rules
- Allowlist: suppress specific finding IDs, CWE IDs, or file patterns
- CI-friendly exit codes (0 = pass, 1 = policy failure)
### Evaluation Framework
- 10 benchmark tasks across 5 vulnerability categories
- Clean code tests (zero false positives on secure patterns)
- Mixed code tests (discrimination between safe and vulnerable code)
- Plan type comparisons (full vs security vs quick)
- Deterministic simulated backend for CI reproducibility
### Production-Ready
- Docker + Docker Compose with health checks
- Structured logging (JSON)
- Typed state models (Pydantic)
- 43-unit pytest suite
- Typer CLI with rich terminal output
## Vulnerabilities Detected
| Category | CWE | Detection Method |
|----------|-----|-----------------|
| SQL Injection | CWE-89 | bandit (B608), ruff (S608) |
| Hardcoded Secrets | CWE-798 | bandit (B105-107), sentinel-secrets |
| Command Injection | CWE-78 | bandit (B602-604) |
| Path Traversal | CWE-22 | bandit |
| Unsafe Deserialization | CWE-502 | bandit (B301-302) |
| Weak Cryptography | CWE-327 | bandit (B303, B324) |
| Dangerous Functions | β | bandit (B102, B307, B601) |
| XSS | CWE-79 | ruff |
## Project Structure
```
sentinel/
βββ src/sentinel/
β βββ state.py # Typed state models (Pydantic)
β βββ tools/
β β βββ registry.py # Tool registry + execution + output parsers
β βββ policy/
β β βββ engine.py # Configurable policy engine
β βββ agent/
β β βββ runner.py # LangGraph agent loop (planβexecuteβverifyβreport)
β βββ storage/
β β βββ __init__.py # SQLite persistence for scan history
β βββ api/
β β βββ __init__.py # FastAPI REST server + dashboard
β βββ evals/
β β βββ __init__.py # 10-task benchmark suite
β βββ cli/
β βββ main.py # Typer CLI (scan, serve, benchmark, eval, tools, policy)
βββ fixtures/
β βββ vulnerable/ # Known-vulnerable code (5 categories, 5 files)
β βββ clean/ # Secure code patterns (2 files)
β βββ mixed/ # Mixed code (1 file with both safe + vulnerable)
βββ config/
β βββ policy.yaml # Default policy configuration
βββ tests/
β βββ test_sentinel.py # 43-unit pytest suite (agent, tools, policy, benchmarks)
β βββ test_api.py # 19 API integration tests (REST, persistence, dashboard)
βββ Dockerfile
βββ docker-compose.yml # API + worker deployment with health checks
βββ README.md
```
## CLI Reference
```bash
# Scan with different plans
sentinel scan # Full analysis (default)
sentinel scan --plan security # Security-focused
sentinel scan --plan quick # Ruff-only lint
sentinel scan --plan secrets # Secret scan only
# With options
sentinel scan --policy custom.yaml # Custom policy
sentinel scan --output report.json # JSON output
sentinel scan --verbose # Detailed findings
# REST API server (NEW v0.2)
sentinel serve # Start API on http://0.0.0.0:8000
sentinel serve --port 3000 # Custom port
sentinel serve --reload # Dev mode with auto-reload
sentinel serve --db /path/to/scans.db # Custom DB path
# API endpoints (when serve is running)
# POST /scan β Trigger a scan
# GET /scans β List recent scans
# GET /scans/{id} β Get scan details + findings
# DELETE /scans/{id} β Delete a scan
# GET /health β Health check
# GET /stats β Pass/fail statistics
# GET / β HTML dashboard
# GET /docs β OpenAPI documentation
# Evaluation
sentinel benchmark # Run 10-task benchmark suite
sentinel benchmark --output results.json # With JSON output
sentinel eval # CI-friendly eval (exit 1 on fail)
sentinel eval --output ci.json # With JSON output
# Utilities
sentinel tools # List available tools
sentinel policy # Show default policy
```
## Docker
```bash
# Start API server (recommended)
docker compose up -d # API on http://localhost:8000
docker compose up -d --profile worker # API + background worker
# CLI mode
docker compose run sentinel scan /app/fixtures # Custom scan
docker build -t sentinel . # Build only
```
### API Usage Examples
```bash
# Trigger a scan
curl -X POST http://localhost:8000/scan \
-H "Content-Type: application/json" \
-d '{"target_path": "/app/fixtures/vulnerable", "plan_type": "security"}'
# Get scan history
curl http://localhost:8000/scans?limit=5
# Get scan details
curl http://localhost:8000/scans/abc12345
# Health check
curl http://localhost:8000/health
# Stats
curl http://localhost:8000/stats
```
## Tradeoffs
| Decision | Rationale |
|----------|-----------|
| Simulated planner, not LLM | CI-reproducible; no API keys needed; deterministic results |
| Subprocess tool execution | Uses real bandit/ruff binaries for authentic results |
| bandit exit code 1 = success | bandit uses exit code 1 for "issues found" β we handle this |
| No real LLM integration | Keep project self-contained; LLM backend via env vars is a planned extension |
| fixtures/ not in skip paths | Test fixtures need to be scannable; real projects should add `fixtures/` to their policy |
## Quality Bar
- β
Non-trivial architecture (6 modules, typed state, tool registry, policy engine, REST API, persistence)
- β
62 pytest (all passing): 43 unit + 19 API integration
- β
10 benchmark tasks (100% pass rate)
- β
Realistic vulnerable code fixtures (5 categories, 30+ individual vulnerabilities)
- β
Configurable policy engine with severity thresholds, blocking categories, allowlists
- β
Real tool execution (bandit, ruff, custom secret scanner)
- β
Deployable REST API with FastAPI + SQLite persistence + HTML dashboard
- β
Docker Compose for API + worker deployment with health checks
- β
Reproducible setup (`pip install -e ".[dev]" && pytest && sentinel benchmark`)
- β
Polished README with architecture diagram, API docs, and CLI reference
## Roadmap
- [x] v0.1 β Core agent loop, tool registry, policy engine, benchmarks, tests
- [x] v0.2 β REST API server, SQLite persistence, dashboard, API deployment (Docker Compose)
- [ ] v0.3 β LLM-driven planner (opt-in via env vars)
- [ ] v0.4 β CI/CD integration (GitHub Actions, SARIF output for code scanning)
- [ ] v0.5 β GitHub App integration (webhook receiver, PR review comments)
- [ ] v1.0 β AgentOps deep integration (trace/eval/metrics) & K8s deployment manifests
## License
MIT