## https://sploitus.com/exploit?id=C2EADCC2-26FF-59B2-81B3-CF91E86DA229
# poc2detect
Defense-first pipeline that discovers GitHub proof-of-concept repositories, statically
ingests their source, asks a configurable remote inference LLM for structured behavioral analysis, grounds
proposed indicators in source evidence, and generates Sigma, YARA, and Suricata rules.
## Safety model
- PoC repositories are treated as untrusted data and are never executed.
- Every source and binary artifact is inventoried and hashed.
- Large artifacts are analyzed through bounded chunks and hierarchical summaries.
- Executables contribute format metadata and printable strings without execution.
- Existing clones are not automatically pulled or deleted.
- LLM-proposed indicators are discarded unless they occur in ingested source.
- Live assessment uses fingerprinting and benign endpoint requests only.
- Learned paths, exploit payloads, and PoC code are never sent to or executed on targets.
- The LLM can produce intelligence only; it cannot direct live network actions.
- Assessment requires `--authorized`; public targets are blocked by default.
## Docker Compose
Copy the environment template and start the complete containerized platform:
```powershell
Copy-Item .env.example .env
New-Item -ItemType Directory -Force output
docker compose up -d opensearch dashboards source-engine python-engine clang-engine binary-engine dependency-engine app
```
Configure your remote OpenAI-compatible inference endpoint using environment
variables. Supported authentication modes are `none`, `bearer`, `api_key`,
`basic`, and `oauth2_client_credentials`. Custom gateway or tenant headers,
custom CA certificates, and mTLS client certificates are also supported.
Set `LLM_API_STYLE=openai_compatible` or `LLM_API_STYLE=ollama` to match the
remote server protocol.
```powershell
$env:LLM_BASE_URL = "https://inference.example.com"
$env:LLM_MODEL = "your-model"
$env:LLM_AUTH_TYPE = "bearer"
$env:LLM_BEARER_TOKEN = "token"
$env:LLM_TLS_VERIFY = "true"
docker compose up -d app
```
For API-key authentication set `LLM_AUTH_TYPE=api_key`, `LLM_API_KEY`, and
optionally `LLM_API_KEY_HEADER`. For OAuth2 client credentials set
`LLM_AUTH_TYPE=oauth2_client_credentials`, `LLM_OAUTH_TOKEN_URL`,
`LLM_OAUTH_CLIENT_ID`, `LLM_OAUTH_CLIENT_SECRET`, and optional scope/audience.
Set `LLM_HEADERS_JSON` for additional authorization gateway headers. Place CA and
mTLS files under `certs/` and set their container paths, such as
`LLM_CA_CERT=/app/certs/ca.pem`.
Enable the optional symbolic engine only when its additional CPU and memory cost is
acceptable:
```powershell
docker compose --profile symbolic up -d symbolic-engine
```
OpenSearch Dashboards is available at `http://127.0.0.1:5601`. Generated reports
and rules remain in the host `output/` directory. OpenSearch and Ollama data use
named Docker volumes.
Stop the stack without deleting indexed data or downloaded models:
```powershell
docker compose down
```
The default Compose deployment disables OpenSearch security and binds service
ports to loopback. Do not expose it directly to untrusted networks. Production
deployments should enable OpenSearch TLS and authentication or place the stack
behind an authenticated private network.
Assess systems you own or are explicitly authorized to test:
```powershell
Invoke-RestMethod -Method Post `
-Uri http://127.0.0.1:8088/v1/verifications `
-ContentType application/json `
-Body '{"target":"127.0.0.1","cve_id":"CVE-2024-XXXX","ports":[80,443],"authorized":true}'
```
The persistent `POST /v1/verifications` endpoint retrieves the CVE's grounded PoC
analyses from OpenSearch and creates a
deterministic safe-probe plan. It fingerprints only approved ports. Learned HTTP
paths and exploit behavior are used only for offline correlation and are never
requested from targets. It never executes PoCs or sends exploit payloads.
Results are indexed in `poc2detect-verifications` and reported as
`likely_affected`, `not_observed`, or `insufficient_evidence`.
The API binds to `127.0.0.1:8088` by default. Set `POC2DETECT_API_KEY` to require
the same value in the `X-API-Key` request header. Administrative CLI commands
run inside the already-running app container:
```powershell
docker compose exec app poc2detect train
docker compose exec app poc2detect scan CVE-2024-XXXX
```
Run tests only inside Docker:
```powershell
docker compose --profile test run --rm test
```
No analyzer, datastore, API, CLI workflow, or test process needs to run on
the base machine. The host is used only to submit HTTP requests and manage Compose.
The inference LLM is externally managed and is not deployed by this solution.
Grounded analyses, CVE profiles, verification history, and generated-rule metadata
are indexed into `poc2detect-analyses`, `poc2detect-cves`,
`poc2detect-verifications`, and `poc2detect-rules` in OpenSearch. Reports and generated
rules are also written to `output/`. Set `assessment.allow_public_targets: true`
only within an approved engagement scope.
## Limited-context analysis strategy
The analyzer uses a hierarchical map-reduce workflow instead of placing only the
highest-ranked files into one prompt:
1. Inventory and hash every repository artifact.
2. Classify artifacts as source, executable, archive, or binary.
3. Split source and extracted binary strings into bounded overlapping chunks.
4. Produce compact chunk findings and per-artifact summaries.
5. Recursively reduce summaries until they fit the LLM context window.
6. Synthesize one repository-level attack model and ground its indicators against
the original extracted evidence.
Symbolic execution is used only as a targeted escalation for executable candidates
identified during analysis. The optional angr engine runs with strict target,
state, memory, and timeout budgets because whole-repository symbolic execution
quickly suffers from path explosion. Results and limitations are recorded in
`analysis_trace.symbolic_execution`; bounded exploration is not treated as proof
that unexplored paths are safe.
## Supervisor workflow
The `scan` workflow does not use a ReAct agent. A single supervisor coordinates
an explicit, auditable LangGraph pipeline:
`Supervisor Agent -> Methodology Planner -> Skill Registry -> Task Queue -> Specialized Workers -> Tool MCP Layer -> Normalizer -> Elasticsearch-compatible OpenSearch Raw Store -> Correlation Agent -> Context Builder -> Protocol Extraction Agent -> Payload Reconstruction Agent -> Payload Safety Agent -> Verification Agent -> Critic Agent -> Evidence Agent -> Report Agent`
Correlation is deterministic and in-memory for the current analysis.
Verification is bounded and non-destructive. The Evidence Agent grounds
indicators in source evidence before the Report Agent emits the final result.
Payload reconstruction is offline and source is never executed. The versioned
`payload-ir-v1` model uses bounded static dataflow to reconstruct Python values
at known request/socket sinks and C/C++ buffers passed to known send functions,
with a literal fallback. Recognized command interpreters, destructive
operations, and callbacks are replaced with `POC2DETECT_MARKER`. Sanitized IR is
canonically serialized and reparsed before acceptance; encoded, ambiguous, or
incomplete candidates fail closed. Every candidate records hashes, provenance,
dataflow, and a replacement audit trail and always carries
`send_allowed: false`. The live verifier never transmits reconstructed or
sanitized payloads.
The first-class skill catalog separates auditable skill names from their backing
tools. For example, `python-static` invokes the `python` engine,
`c-cpp-static` invokes `clang`, and `binary-symbolic` invokes the bounded
`symbolic` engine. Local skills such as `protocol-extractor`,
`payload-sanitizer`, and `evidence-grounder` are independently configurable
under `skills:`. Reports include the resolved catalog and artifact routes.
| Engine | Default | Techniques |
|---|---:|---|
| `source-engine` | Yes | Python AST, imports, functions, calls, literals, URLs, command patterns |
| `python-engine` | Yes | Repository-wide Python index, cross-file calls, assignments, dangerous calls, heuristic taint paths |
| `clang-engine` | Yes | Clang JSON AST, C/C++ calls and literals, LLVM IR, diagnostics, static analyzer, source/sink heuristics |
| `binary-engine` | Yes | File type, headers, sections, imports, printable strings |
| `dependency-engine` | Yes | Package manifests, dependency names, versions, and scopes |
| `symbolic-engine` | No | Bounded angr entry-state exploration for selected executable candidates |
| Remote inference API | External | Hierarchical semantic analysis and repository synthesis |
| OpenSearch | Yes | Durable grounded analyses and generated-rule metadata |
All analysis engines mount `output/` read-only, run without Linux capabilities,
and reject paths outside the shared analysis root. The coordinator is the only
component that writes reports and detection rules. Python and C source-flow
results are labeled heuristic; they improve prioritization but are not presented
as formal proof.