Share
## https://sploitus.com/exploit?id=443EE359-CE13-5055-94BC-ADC9E389907C
# XFinder
> **External Attack Surface Management (EASM)** β a lightweight, production-ready Python CLI that continuously discovers, monitors, enriches, and tracks internet-facing assets.
[](https://www.python.org/)
[](LICENSE)
[](#testing)
XFinder is **not** a script β it is a modular EASM/SOC automation framework suitable for enterprise environments and advanced cybersecurity portfolios. It orchestrates industry-standard open-source tools (Subfinder, dnsx, httpx, Naabu, Nmap, Katana, Nuclei) into an optimized scan pipeline, persists everything to PostgreSQL, generates structured JSON reports, and supports scheduled rescans with change detection.
---
## Table of Contents
1. [Features](#features)
2. [Architecture](#architecture)
3. [Quick Start](#quick-start)
4. [Installation](#installation)
5. [Configuration](#configuration)
6. [Usage](#usage)
7. [Scan Workflow](#scan-workflow)
8. [Database Schema](#database-schema)
9. [JSON Output Structure](#json-output-structure)
10. [Change Detection](#change-detection)
11. [Scheduler](#scheduler)
12. [Extending XFinder](#extending-xfinder)
13. [Testing](#testing)
14. [Project Structure](#project-structure)
15. [User Guide](docs/USER_GUIDE.md)
16. [Troubleshooting](docs/TROUBLESHOOTING.md)
17. [Roadmap](#roadmap)
18. [License](#license)
---
## Features
### Core Capabilities
- **Subdomain Discovery** β passive enumeration via Subfinder (50+ sources)
- **DNS Resolution** β A, AAAA, CNAME, MX, TXT, NS, SOA records via dnsx
- **Live HTTP Detection** β status, title, server, redirect, content-length, response time via httpx
- **Port Discovery** β fast TCP scanning via Naabu (only against live hosts)
- **Service/Version/OS Detection** β Nmap (only against Naabu-discovered ports)
- **Web/API Crawling** β endpoint discovery via Katana
- **Vulnerability Scanning** β template-based via Nuclei (tech-aware template selection)
- **Cloud/CDN/WAF Detection** β AWS, Azure, GCP, Cloudflare, Fastly, Akamai, DigitalOcean, Vercel, Netlify, GitHub Pages
- **Asset Enrichment** β ASN, organization, country, hosting provider, reverse DNS, SSL certificate, WHOIS/RDAP, Shodan, VirusTotal
### Production-Grade Features
- **PostgreSQL persistence** with normalized tables and append-only history
- **Scheduled rescans** via APScheduler (configurable interval, minimum 5 minutes)
- **Change detection** between scans β new/removed subdomains, ports, technologies, DNS, cloud, vulnerabilities, API endpoints
- **Structured JSON reports** per scan, never overwritten
- **Professional logging** β rotating file handler + console
- **Plugin architecture** β new scanners can be added without touching the core engine
- **Rich CLI** β SOC-style menu, colored output, tables
- **Type-hinted, PEP8-compliant, fully documented** codebase
- **96 automated tests** β unit + integration coverage
### Performance Optimizations
- Never scans dead hosts (Naabu runs only after httpx confirms liveness)
- Nmap runs only against Naabu-discovered ports
- Nuclei runs only against live HTTP/HTTPS services
- Technology-aware Nuclei template selection (faster, fewer false positives)
- Configurable thread count, timeout, and scan rate per scan
- Batched database writes via repository pattern
- Intermediate results persisted in real time (survive crashes)
---
## Architecture
```text
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β XFinder CLI (Rich) β
β cli.py β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β Scan Engine β β Scheduler β
β scanners/engine.py β β scheduler/scheduler.py β
β (orchestrates the chain) β β (APScheduler) β
ββββββββββββββββ¬ββββββββββββββββ βββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Scanner Plugins (BaseScanner subclasses) β
β βββββββββββββββ βββββββββββ ββββββββββ ββββββββββ ββββββββ ββββββββ β
β β Subfinder βββ dnsx βββ httpx βββ Naabu βββ Nmap βββNucleiβ β
β βββββββββββββββ βββββββββββ ββββββββββ ββββββββββ ββββββββ ββββββββ β
β ββββββββββ β
β β Katana β β
β ββββββββββ β
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Enrichment Modules β
β βββββββββ βββββββ βββββββ βββββββββ ββββββββββ ββββββββββββββ β
β β Cloud β β ASN β β SSL β β WHOIS β β Shodan β β VirusTotal β β
β βββββββββ βββββββ βββββββ βββββββββ ββββββββββ ββββββββββββββ β
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Repository Layer β
β database/repository.py β
β (batched writes, change-detection analytics, history) β
βββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PostgreSQL (SQLAlchemy ORM, 12 tables) β
β targets Β· scans Β· subdomains Β· dns_records Β· http_information Β· β
β cloud_assets Β· ip_addresses Β· ports Β· services Β· technologies Β· β
β api_endpoints Β· vulnerabilities β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Also writes per-scan JSON to:
output///
βββ subdomains.json
βββ dns.json
βββ http.json
βββ cloud.json
βββ ports.json
βββ services.json
βββ technologies.json
βββ api.json
βββ vulnerabilities.json
βββ changes.json
βββ full_scan.json
```
### Optimized Scan Workflow
```text
Target
β
βΌ
Subfinder (passive subdomain enumeration)
β
βΌ
dnsx (DNS resolution: A/AAAA/CNAME/MX/TXT/NS/SOA)
β
βΌ
httpx (live HTTP detection + fingerprinting)
β
ββββββββββββββββ¬ββββββββββββββββββ¬ββββββββββββββββββ
βΌ βΌ βΌ βΌ
Naabu Cloud Detection HTTP Fingerprint Technology Detect
β β β β
βΌ βΌ βΌ βΌ
Nmap Asset Enrichment Server Header Tech Stack
β β β β
ββββββββββββββββ΄ββββββββββββββββββ΄ββββββββββββββββββ
β
βΌ
Katana Crawl (endpoints & APIs)
β
βΌ
Nuclei (tech-aware template scan)
β
βΌ
PostgreSQL + JSON Export + Change Detection
```
---
## Quick Start
```bash
# 1. Clone the repository
git clone https://github.com/your-org/xfinder.git
cd xfinder
# 2. Create & activate virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install Python dependencies
pip install -r requirements.txt
# 4. Copy and edit the environment file
cp .env.example .env
# Edit .env: set DB credentials and API keys
# 5. Verify all system tools are installed
python install.py
# 6. Initialize PostgreSQL database
sudo -u postgres createdb xfinder
sudo -u postgres createuser -P xfinder
sudo -u postgres psql -c "GRANT ALL ON DATABASE xfinder TO xfinder;"
# IMPORTANT (PostgreSQL 15+): grant CREATE on the public schema, otherwise
# XFinder cannot create its tables. The default public-schema privileges
# were tightened in PG 15 for security.
sudo -u postgres psql -d xfinder -c "GRANT ALL ON SCHEMA public TO xfinder;"
# 7. Launch XFinder
python main.py
```
---
## Installation
### Prerequisites
| Component | Version | Purpose |
| ------------------ | -------- | -------------------------------------------------- |
| Python | 3.13+ | Runtime |
| PostgreSQL | 14+ | Database |
| Subfinder | latest | Subdomain discovery |
| dnsx | latest | DNS resolution |
| httpx (PD) | latest | Live HTTP detection |
| Naabu | latest | Port discovery |
| Nmap | 7.92+ | Service/OS detection |
| Katana | latest | Web crawling |
| Nuclei | latest | Vulnerability scanning |
### Automated Verification
```bash
python install.py
```
This script checks every dependency and prints actionable installation instructions for anything missing. It **never crashes** β it exits with code 1 if any required dependency is absent, so it can be used in CI pipelines.
### Installing the System Tools
Most projectdiscovery tools require Go:
```bash
# Install Go (https://go.dev/doc/install)
sudo apt-get install -y golang-go
# Set GOPATH (if not already)
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
# Install all PD tools
go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install -v github.com/projectdiscovery/dnsx/cmd/dnsx@latest
go install -v github.com/projectdiscovery/httpx/cmd/httpx@latest
go install -v github.com/projectdiscovery/naabu/v2/cmd/naabu@latest
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
go install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
# Nmap via apt
sudo apt-get install -y nmap
# Initialize Nuclei templates
nuclei -update-templates
```
---
## Configuration
All configuration is driven by environment variables loaded from `.env`:
```bash
cp .env.example .env
```
| Variable | Default | Description |
| ----------------------- | -------------- | -------------------------------------- |
| `DB_HOST` | `localhost` | PostgreSQL host |
| `DB_PORT` | `5432` | PostgreSQL port |
| `DB_NAME` | `xfinder` | Database name |
| `DB_USER` | `xfinder` | Database user |
| `DB_PASSWORD` | *(empty)* | Database password |
| `SHODAN_API_KEY` | *(empty)* | Shodan API key (optional) |
| `VIRUSTOTAL_API_KEY` | *(empty)* | VirusTotal API key (optional) |
| `DEFAULT_THREADS` | `20` | Default thread count |
| `HTTPX_TIMEOUT` | `15` | httpx timeout (seconds) |
| `DNSX_TIMEOUT` | `10` | dnsx timeout (seconds) |
| `NAABU_TIMEOUT` | `15` | Naabu timeout (seconds) |
| `NMAP_TIMEOUT` | `60` | Nmap timeout (seconds) |
| `KATANA_TIMEOUT` | `120` | Katana timeout (seconds) |
| `NUCLEI_TIMEOUT` | `180` | Nuclei timeout (seconds) |
| `SCAN_RATE` | `1000` | Packets/requests per second |
| `SCAN_INTERVAL_MINUTES` | `60` | Default rescan interval |
| `OUTPUT_DIR` | `./output` | JSON output directory |
| `LOG_LEVEL` | `INFO` | Logging verbosity |
| `NUCLEI_SEVERITY` | `low,medium,high,critical` | Nuclei severity filter |
---
## Usage
### Interactive CLI
```bash
python main.py
```
Renders the SOC-style menu:
```text
========================================
XFinder
External Attack Surface Management
========================================
1. Subdomain Discovery
2. DNS Enumeration
3. Cloud Discovery
4. Port Discovery
5. Web/API Discovery
6. Vulnerability Scan
7. Full Scan
8. View Previous Scans
9. Configuration
10. Exit
```
After every scan, XFinder prompts:
```text
Run this scan automatically every 60 minutes?
[Y] Yes [N] No
```
### Example Scans
```bash
# Launch the CLI then choose:
# 7. Full Scan
# Enter: example.com
# Threads: 50
# Timeout: 60
# Or schedule recurring scans from within the menu (option Y after a scan)
```
---
## Scan Workflow
XFinder enforces an **optimized pipeline** to avoid wasting resources:
| Step | Tool | Runs when | Output cached for next step |
| ------------ | ---------- | -------------------------------------- | --------------------------- |
| 1. Subdomains| Subfinder | Always | `ctx.subdomains` |
| 2. DNS | dnsx | After step 1 | `ctx.cache["dns_records"]` |
| 3. Live HTTP | httpx | After step 2 (only resolved hosts) | `ctx.live_hosts` |
| 4a. Ports | Naabu | After step 3 (only live hosts) | `ctx.ports` (ip β ports) |
| 4b. Cloud | Cloud detect| After step 3 | `ctx.cache["http_results"]` |
| 5. Services | Nmap | After step 4a (only on found ports) | `ctx.cache["nmap_results"]` |
| 6. Crawl | Katana | After step 3 | `ctx.cache["katana_results"]` |
| 7. Vulns | Nuclei | After step 3 (tech-aware templates) | `ctx.cache["nuclei_results"]` |
The order of scanner classes in `scanners/registry.py` controls the chain. The engine never skips ahead β if step 3 finds zero live hosts, steps 4-7 are no-ops.
---
## Database Schema
12 normalized tables, all scoped by `scan_id` for append-only history:
```text
targets (id, domain, created_at, is_active)
β
βββΊ scans (id, target_id, scan_type, status, started_at,
β finished_at, duration_seconds, error, output_dir)
β β
β βββΊ subdomains (id, scan_id, target_id, name, is_resolved,
β β is_live_http, source, created_at)
β β β
β β βββΊ dns_records (id, scan_id, subdomain_id,
β β β record_type, value, ttl)
β β βββΊ http_information (id, scan_id, subdomain_id, url,
β β β status_code, title, server_header,
β β β content_length, response_time_ms,
β β β scheme, webserver, tech_blob)
β β β β
β β β βββΊ technologies (id, scan_id, http_info_id,
β β β category, name, version)
β β βββΊ cloud_assets (id, scan_id, subdomain_id,
β β β provider, cdn, waf, is_cloud_hosted)
β β βββΊ ip_addresses (id, scan_id, subdomain_id, address,
β β version, reverse_dns, asn, asn_org,
β β country, hosting_provider)
β β β
β β βββΊ ports (id, scan_id, ip_address_id, port,
β β protocol, state)
β β β
β β βββΊ services (id, scan_id, port_id,
β β name, product, version, os)
β βββΊ api_endpoints (id, scan_id, source_host, method, url,
β β body, tag)
β βββΊ vulnerabilities (id, scan_id, template_id, name, severity,
β description, matched_url, matched_at,
β evidence, reference_urls, tags,
β cvss_score, discovered_at)
```
### Schema Initialization
On first run, `python main.py` calls `init_db()` which runs `Base.metadata.create_all(...)`. This is idempotent β safe to call repeatedly. For production migrations, use Alembic.
---
## JSON Output Structure
Every scan produces a timestamped folder:
```text
output/
ββ example.com/
ββ 2026-07-01_10-00-00/
ββ subdomains.json # All discovered subdomains
ββ dns.json # DNS records per subdomain
ββ http.json # HTTP fingerprint per live host
ββ cloud.json # Cloud/CDN/WAF classification
ββ ports.json # Open ports per IP
ββ services.json # Nmap service/version/OS
ββ technologies.json # Detected web technologies
ββ api.json # Crawled endpoints
ββ vulnerabilities.json # Nuclei findings
ββ changes.json # Diff vs previous scan
ββ full_scan.json # Consolidated summary
```
Reports are **never overwritten**. The historical record is preserved per spec.
---
## Change Detection
After each scan, XFinder compares the current scan with the most recent previous completed scan for the same target. The diff is persisted to `changes.json` and stored in the `summary` of `full_scan.json`.
Detected change types:
- **New / Removed Subdomains**
- **New Open Ports / Closed Ports**
- **Technology Changes** (added/removed techs per HTTP service)
- **DNS Changes** (per-subdomain record additions/removals)
- **Cloud Changes** (provider/CDN/WAF transitions)
- **New / Resolved Vulnerabilities** (by template ID + matched URL)
- **New / Removed API Endpoints**
Each category is summarized in a `summary` block with counts.
---
## Scheduler
XFinder uses APScheduler's `BackgroundScheduler` for recurring scans:
```python
from scheduler.scheduler import get_scheduler
sched = get_scheduler()
sched.start()
sched.schedule(
target="example.com",
scan_type="full",
interval_minutes=60,
)
```
Features:
- **No `while True` loops** β uses APScheduler's own event loop
- **Coalesce + max_instances=1** β prevents overlapping runs of the same target
- **Misfire grace time = 300s** β recovers from short downtimes
- **Replace semantics** β scheduling the same (target, scan_type) replaces the existing job
- **Minimum interval = 5 minutes** β protects against accidental DoS of your own infrastructure
---
## Extending XFinder
### Adding a New Scanner
1. Subclass `BaseScanner`:
```python
# scanners/my_tool.py
from scanners.base import BaseScanner, ScanResult
from utils.helpers import run_subprocess
class MyToolScanner(BaseScanner):
name = "my_tool"
description = "Does something cool"
required_tools = ["my_tool"]
def run(self) -> ScanResult:
res = run_subprocess(["my_tool", self.ctx.target],
timeout=self.ctx.timeout)
if not res.ok:
return ScanResult(
scanner=self.name, success=False,
duration_seconds=0.0, error=res.stderr,
)
return ScanResult(
scanner=self.name, success=True,
duration_seconds=0.0,
data={"result": res.stdout},
)
```
2. Register it in `scanners/registry.py`:
```python
from scanners.my_tool import MyToolScanner
SCANNERS = {
# ... existing entries ...
"my_scan": [SubfinderScanner, MyToolScanner],
}
SCAN_LABELS["my_scan"] = "My Custom Scan"
```
3. Add persistence logic in `scanners/engine.py::_persist_result` if you want DB storage.
That's it β the CLI menu and scheduler will pick up the new scan type automatically.
### Adding a New Enrichment Module
Create a new file in `enrichment/` with an `enrich(target)` function returning a dict. See `enrichment/shodan.py` for the pattern.
---
## Testing
XFinder ships with **96 automated tests**:
```bash
# Run all tests
python -m pytest tests/ -v
# With coverage
python -m pytest tests/ --cov=. --cov-report=term-missing
```
### Test Categories
| File | Coverage |
| ------------------------------- | ------------------------------------------- |
| `test_validators.py` | Domain/IP/URL validation |
| `test_helpers.py` | Subprocess, JSON parsing, helpers |
| `test_settings.py` | Configuration loader |
| `test_cloud.py` | Cloud/CDN/WAF detection |
| `test_change_detection.py` | Change diff logic |
| `test_scanners.py` | BaseScanner + registry |
| `test_database.py` | Repository layer (SQLite) |
| `test_scheduler.py` | APScheduler integration |
| `test_engine_integration.py` | End-to-end scan with mocked scanners |
| `test_install.py` | Dependency detection logic |
### Test Results
```text
======================= 96 passed, 33 warnings in 0.67s ========================
```
The warnings are deprecation notices from `datetime.utcnow()` (still functional in Python 3.13, scheduled for removal in a future version).
---
## Project Structure
```text
XFinder/
βββ main.py # Entry point
βββ cli.py # Rich CLI menu
βββ config.py # (Re-export of config.settings)
βββ install.py # Dependency verifier
βββ requirements.txt
βββ README.md
βββ .env.example
βββ .gitignore
β
βββ config/
β βββ __init__.py
β βββ settings.py # Pydantic settings loader
β βββ database.py # SQLAlchemy engine + session scope
β
βββ scanners/
β βββ __init__.py
β βββ base.py # BaseScanner + ScanContext + ScanResult
β βββ registry.py # Scan-type β scanner-class mapping
β βββ engine.py # Orchestration engine
β βββ subfinder.py
β βββ dnsx.py
β βββ httpx.py
β βββ naabu.py
β βββ nmap.py
β βββ katana.py
β βββ nuclei.py
β
βββ enrichment/
β βββ __init__.py
β βββ cloud.py # Cloud/CDN/WAF detection
β βββ asn.py # ASN/org/country via Team Cymru DNS
β βββ ssl.py # SSL certificate metadata
β βββ whois.py # RDAP + WHOIS fallback
β βββ shodan.py # Shodan API
β βββ virustotal.py # VirusTotal v3 API
β
βββ database/
β βββ __init__.py
β βββ models.py # 12 SQLAlchemy ORM models
β βββ repository.py # Data-access layer (batched writes)
β
βββ scheduler/
β βββ __init__.py
β βββ scheduler.py # APScheduler wrapper
β
βββ reports/
β βββ __init__.py
β βββ json_export.py # Per-scan JSON + change detection
β
βββ utils/
β βββ __init__.py
β βββ logger.py # Rotating file + console logger
β βββ helpers.py # Subprocess, JSON, iteration helpers
β βββ validators.py # Domain/IP/URL validation
β
βββ tests/ # 96 tests (unit + integration)
β βββ conftest.py
β βββ test_validators.py
β βββ test_helpers.py
β βββ test_settings.py
β βββ test_cloud.py
β βββ test_change_detection.py
β βββ test_scanners.py
β βββ test_database.py
β βββ test_scheduler.py
β βββ test_engine_integration.py
β βββ test_install.py
β
βββ docs/
β βββ USER_GUIDE.md
β βββ TROUBLESHOOTING.md
β βββ ARCHITECTURE.md # This README's architecture section, expanded
β βββ architecture_diagram.md
β
βββ samples/
β βββ scan_examples/ # Sample JSON output
β βββ db_records/ # Sample DB row examples
β
βββ logs/ # Runtime logs (auto-created)
```
---
## Roadmap
- [ ] Alembic migration scripts for production schema evolution
- [ ] Web UI (FastAPI + React) for browsing scan history
- [ ] Slack/Discord/Teams alerts on new vulnerabilities
- [ ] CVSS-based risk scoring (currently disabled per spec)
- [ ] Multi-target batch scans from a CSV file
- [ ] GraphQL API for programmatic access
- [ ] Docker image (for users who want it, despite the no-Docker spec)
- [ ] Plugin marketplace (install community scanners via pip)
---
## License
MIT License. See [LICENSE](LICENSE) for details.
---
## Disclaimer
XFinder is intended for **authorized security testing only**. Always obtain written permission before scanning infrastructure you do not own or operate. The authors are not responsible for misuse of this tool.