Share
## https://sploitus.com/exploit?id=443EE359-CE13-5055-94BC-ADC9E389907C
# XFinder

> **External Attack Surface Management (EASM)** β€” a lightweight, production-ready Python CLI that continuously discovers, monitors, enriches, and tracks internet-facing assets.

[![Python 3.13+](https://img.shields.io/badge/python-3.13%2B-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Tests: 96 passing](https://img.shields.io/badge/tests-96%20passing-success.svg)](#testing)

XFinder is **not** a script β€” it is a modular EASM/SOC automation framework suitable for enterprise environments and advanced cybersecurity portfolios. It orchestrates industry-standard open-source tools (Subfinder, dnsx, httpx, Naabu, Nmap, Katana, Nuclei) into an optimized scan pipeline, persists everything to PostgreSQL, generates structured JSON reports, and supports scheduled rescans with change detection.

---

## Table of Contents

1. [Features](#features)
2. [Architecture](#architecture)
3. [Quick Start](#quick-start)
4. [Installation](#installation)
5. [Configuration](#configuration)
6. [Usage](#usage)
7. [Scan Workflow](#scan-workflow)
8. [Database Schema](#database-schema)
9. [JSON Output Structure](#json-output-structure)
10. [Change Detection](#change-detection)
11. [Scheduler](#scheduler)
12. [Extending XFinder](#extending-xfinder)
13. [Testing](#testing)
14. [Project Structure](#project-structure)
15. [User Guide](docs/USER_GUIDE.md)
16. [Troubleshooting](docs/TROUBLESHOOTING.md)
17. [Roadmap](#roadmap)
18. [License](#license)

---

## Features

### Core Capabilities
- **Subdomain Discovery** β€” passive enumeration via Subfinder (50+ sources)
- **DNS Resolution** β€” A, AAAA, CNAME, MX, TXT, NS, SOA records via dnsx
- **Live HTTP Detection** β€” status, title, server, redirect, content-length, response time via httpx
- **Port Discovery** β€” fast TCP scanning via Naabu (only against live hosts)
- **Service/Version/OS Detection** β€” Nmap (only against Naabu-discovered ports)
- **Web/API Crawling** β€” endpoint discovery via Katana
- **Vulnerability Scanning** β€” template-based via Nuclei (tech-aware template selection)
- **Cloud/CDN/WAF Detection** β€” AWS, Azure, GCP, Cloudflare, Fastly, Akamai, DigitalOcean, Vercel, Netlify, GitHub Pages
- **Asset Enrichment** β€” ASN, organization, country, hosting provider, reverse DNS, SSL certificate, WHOIS/RDAP, Shodan, VirusTotal

### Production-Grade Features
- **PostgreSQL persistence** with normalized tables and append-only history
- **Scheduled rescans** via APScheduler (configurable interval, minimum 5 minutes)
- **Change detection** between scans β€” new/removed subdomains, ports, technologies, DNS, cloud, vulnerabilities, API endpoints
- **Structured JSON reports** per scan, never overwritten
- **Professional logging** β€” rotating file handler + console
- **Plugin architecture** β€” new scanners can be added without touching the core engine
- **Rich CLI** β€” SOC-style menu, colored output, tables
- **Type-hinted, PEP8-compliant, fully documented** codebase
- **96 automated tests** β€” unit + integration coverage

### Performance Optimizations
- Never scans dead hosts (Naabu runs only after httpx confirms liveness)
- Nmap runs only against Naabu-discovered ports
- Nuclei runs only against live HTTP/HTTPS services
- Technology-aware Nuclei template selection (faster, fewer false positives)
- Configurable thread count, timeout, and scan rate per scan
- Batched database writes via repository pattern
- Intermediate results persisted in real time (survive crashes)

---

## Architecture

```text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          XFinder CLI (Rich)                          β”‚
β”‚                              cli.py                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                                       β”‚
               β–Ό                                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Scan Engine             β”‚         β”‚      Scheduler              β”‚
β”‚   scanners/engine.py         β”‚         β”‚   scheduler/scheduler.py    β”‚
β”‚  (orchestrates the chain)    β”‚         β”‚     (APScheduler)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Scanner Plugins (BaseScanner subclasses)                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Subfinder   β”‚β†’β”‚  dnsx   β”‚β†’β”‚ httpx  β”‚β†’β”‚ Naabu  β”‚β†’β”‚ Nmap β”‚β†’β”‚Nucleiβ”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚                                          β”‚ Katana β”‚                  β”‚
β”‚                                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Enrichment Modules                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚ Cloud β”‚ β”‚ ASN β”‚ β”‚ SSL β”‚ β”‚ WHOIS β”‚ β”‚ Shodan β”‚ β”‚ VirusTotal β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Repository Layer                                 β”‚
β”‚                  database/repository.py                              β”‚
β”‚       (batched writes, change-detection analytics, history)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚             PostgreSQL (SQLAlchemy ORM, 12 tables)                   β”‚
β”‚   targets Β· scans Β· subdomains Β· dns_records Β· http_information Β·   β”‚
β”‚   cloud_assets Β· ip_addresses Β· ports Β· services Β· technologies Β·    β”‚
β”‚   api_endpoints Β· vulnerabilities                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

                      Also writes per-scan JSON to:
                      output///
                          β”œβ”€β”€ subdomains.json
                          β”œβ”€β”€ dns.json
                          β”œβ”€β”€ http.json
                          β”œβ”€β”€ cloud.json
                          β”œβ”€β”€ ports.json
                          β”œβ”€β”€ services.json
                          β”œβ”€β”€ technologies.json
                          β”œβ”€β”€ api.json
                          β”œβ”€β”€ vulnerabilities.json
                          β”œβ”€β”€ changes.json
                          └── full_scan.json
```

### Optimized Scan Workflow

```text
Target
  β”‚
  β–Ό
Subfinder          (passive subdomain enumeration)
  β”‚
  β–Ό
dnsx               (DNS resolution: A/AAAA/CNAME/MX/TXT/NS/SOA)
  β”‚
  β–Ό
httpx              (live HTTP detection + fingerprinting)
  β”‚
  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β–Ό              β–Ό                 β–Ό                 β–Ό
Naabu        Cloud Detection   HTTP Fingerprint   Technology Detect
  β”‚              β”‚                 β”‚                 β”‚
  β–Ό              β–Ό                 β–Ό                 β–Ό
Nmap         Asset Enrichment   Server Header     Tech Stack
  β”‚              β”‚                 β”‚                 β”‚
  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
              Katana Crawl      (endpoints & APIs)
                   β”‚
                   β–Ό
                Nuclei           (tech-aware template scan)
                   β”‚
                   β–Ό
         PostgreSQL + JSON Export + Change Detection
```

---

## Quick Start

```bash
# 1. Clone the repository
git clone https://github.com/your-org/xfinder.git
cd xfinder

# 2. Create & activate virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Copy and edit the environment file
cp .env.example .env
# Edit .env: set DB credentials and API keys

# 5. Verify all system tools are installed
python install.py

# 6. Initialize PostgreSQL database
sudo -u postgres createdb xfinder
sudo -u postgres createuser -P xfinder
sudo -u postgres psql -c "GRANT ALL ON DATABASE xfinder TO xfinder;"
# IMPORTANT (PostgreSQL 15+): grant CREATE on the public schema, otherwise
# XFinder cannot create its tables. The default public-schema privileges
# were tightened in PG 15 for security.
sudo -u postgres psql -d xfinder -c "GRANT ALL ON SCHEMA public TO xfinder;"

# 7. Launch XFinder
python main.py
```

---

## Installation

### Prerequisites

| Component          | Version  | Purpose                                            |
| ------------------ | -------- | -------------------------------------------------- |
| Python             | 3.13+    | Runtime                                            |
| PostgreSQL         | 14+      | Database                                           |
| Subfinder          | latest   | Subdomain discovery                                |
| dnsx               | latest   | DNS resolution                                     |
| httpx (PD)         | latest   | Live HTTP detection                                |
| Naabu              | latest   | Port discovery                                     |
| Nmap               | 7.92+    | Service/OS detection                               |
| Katana             | latest   | Web crawling                                       |
| Nuclei             | latest   | Vulnerability scanning                             |

### Automated Verification

```bash
python install.py
```

This script checks every dependency and prints actionable installation instructions for anything missing. It **never crashes** β€” it exits with code 1 if any required dependency is absent, so it can be used in CI pipelines.

### Installing the System Tools

Most projectdiscovery tools require Go:

```bash
# Install Go (https://go.dev/doc/install)
sudo apt-get install -y golang-go

# Set GOPATH (if not already)
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

# Install all PD tools
go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
go install -v github.com/projectdiscovery/dnsx/cmd/dnsx@latest
go install -v github.com/projectdiscovery/httpx/cmd/httpx@latest
go install -v github.com/projectdiscovery/naabu/v2/cmd/naabu@latest
go install -v github.com/projectdiscovery/katana/cmd/katana@latest
go install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest

# Nmap via apt
sudo apt-get install -y nmap

# Initialize Nuclei templates
nuclei -update-templates
```

---

## Configuration

All configuration is driven by environment variables loaded from `.env`:

```bash
cp .env.example .env
```

| Variable                | Default        | Description                            |
| ----------------------- | -------------- | -------------------------------------- |
| `DB_HOST`               | `localhost`    | PostgreSQL host                        |
| `DB_PORT`               | `5432`         | PostgreSQL port                        |
| `DB_NAME`               | `xfinder`      | Database name                          |
| `DB_USER`               | `xfinder`      | Database user                          |
| `DB_PASSWORD`           | *(empty)*      | Database password                      |
| `SHODAN_API_KEY`        | *(empty)*      | Shodan API key (optional)              |
| `VIRUSTOTAL_API_KEY`    | *(empty)*      | VirusTotal API key (optional)          |
| `DEFAULT_THREADS`       | `20`           | Default thread count                   |
| `HTTPX_TIMEOUT`         | `15`           | httpx timeout (seconds)                |
| `DNSX_TIMEOUT`          | `10`           | dnsx timeout (seconds)                 |
| `NAABU_TIMEOUT`         | `15`           | Naabu timeout (seconds)                |
| `NMAP_TIMEOUT`          | `60`           | Nmap timeout (seconds)                 |
| `KATANA_TIMEOUT`        | `120`          | Katana timeout (seconds)               |
| `NUCLEI_TIMEOUT`        | `180`          | Nuclei timeout (seconds)               |
| `SCAN_RATE`             | `1000`         | Packets/requests per second            |
| `SCAN_INTERVAL_MINUTES` | `60`           | Default rescan interval                |
| `OUTPUT_DIR`            | `./output`     | JSON output directory                  |
| `LOG_LEVEL`             | `INFO`         | Logging verbosity                      |
| `NUCLEI_SEVERITY`       | `low,medium,high,critical` | Nuclei severity filter      |

---

## Usage

### Interactive CLI

```bash
python main.py
```

Renders the SOC-style menu:

```text
========================================

           XFinder

 External Attack Surface Management

========================================

1. Subdomain Discovery
2. DNS Enumeration
3. Cloud Discovery
4. Port Discovery
5. Web/API Discovery
6. Vulnerability Scan
7. Full Scan
8. View Previous Scans
9. Configuration
10. Exit
```

After every scan, XFinder prompts:

```text
Run this scan automatically every 60 minutes?
[Y] Yes   [N] No
```

### Example Scans

```bash
# Launch the CLI then choose:
# 7. Full Scan
# Enter: example.com
# Threads: 50
# Timeout: 60

# Or schedule recurring scans from within the menu (option Y after a scan)
```

---

## Scan Workflow

XFinder enforces an **optimized pipeline** to avoid wasting resources:

| Step         | Tool       | Runs when                              | Output cached for next step |
| ------------ | ---------- | -------------------------------------- | --------------------------- |
| 1. Subdomains| Subfinder  | Always                                 | `ctx.subdomains`            |
| 2. DNS       | dnsx       | After step 1                           | `ctx.cache["dns_records"]`  |
| 3. Live HTTP | httpx      | After step 2 (only resolved hosts)     | `ctx.live_hosts`            |
| 4a. Ports    | Naabu      | After step 3 (only live hosts)         | `ctx.ports` (ip β†’ ports)    |
| 4b. Cloud    | Cloud detect| After step 3                          | `ctx.cache["http_results"]` |
| 5. Services  | Nmap       | After step 4a (only on found ports)    | `ctx.cache["nmap_results"]` |
| 6. Crawl     | Katana     | After step 3                           | `ctx.cache["katana_results"]` |
| 7. Vulns     | Nuclei     | After step 3 (tech-aware templates)    | `ctx.cache["nuclei_results"]` |

The order of scanner classes in `scanners/registry.py` controls the chain. The engine never skips ahead β€” if step 3 finds zero live hosts, steps 4-7 are no-ops.

---

## Database Schema

12 normalized tables, all scoped by `scan_id` for append-only history:

```text
targets             (id, domain, created_at, is_active)
   β”‚
   β”œβ”€β–Ί scans        (id, target_id, scan_type, status, started_at,
   β”‚                finished_at, duration_seconds, error, output_dir)
   β”‚      β”‚
   β”‚      β”œβ”€β–Ί subdomains      (id, scan_id, target_id, name, is_resolved,
   β”‚      β”‚                    is_live_http, source, created_at)
   β”‚      β”‚       β”‚
   β”‚      β”‚       β”œβ”€β–Ί dns_records       (id, scan_id, subdomain_id,
   β”‚      β”‚       β”‚                       record_type, value, ttl)
   β”‚      β”‚       β”œβ”€β–Ί http_information  (id, scan_id, subdomain_id, url,
   β”‚      β”‚       β”‚                       status_code, title, server_header,
   β”‚      β”‚       β”‚                       content_length, response_time_ms,
   β”‚      β”‚       β”‚                       scheme, webserver, tech_blob)
   β”‚      β”‚       β”‚       β”‚
   β”‚      β”‚       β”‚       └─► technologies  (id, scan_id, http_info_id,
   β”‚      β”‚       β”‚                            category, name, version)
   β”‚      β”‚       β”œβ”€β–Ί cloud_assets      (id, scan_id, subdomain_id,
   β”‚      β”‚       β”‚                       provider, cdn, waf, is_cloud_hosted)
   β”‚      β”‚       └─► ip_addresses      (id, scan_id, subdomain_id, address,
   β”‚      β”‚                               version, reverse_dns, asn, asn_org,
   β”‚      β”‚                               country, hosting_provider)
   β”‚      β”‚               β”‚
   β”‚      β”‚               └─► ports      (id, scan_id, ip_address_id, port,
   β”‚      β”‚                               protocol, state)
   β”‚      β”‚                       β”‚
   β”‚      β”‚                       └─► services  (id, scan_id, port_id,
   β”‚      β”‚                                        name, product, version, os)
   β”‚      β”œβ”€β–Ί api_endpoints       (id, scan_id, source_host, method, url,
   β”‚      β”‚                        body, tag)
   β”‚      └─► vulnerabilities     (id, scan_id, template_id, name, severity,
   β”‚                               description, matched_url, matched_at,
   β”‚                               evidence, reference_urls, tags,
   β”‚                               cvss_score, discovered_at)
```

### Schema Initialization

On first run, `python main.py` calls `init_db()` which runs `Base.metadata.create_all(...)`. This is idempotent β€” safe to call repeatedly. For production migrations, use Alembic.

---

## JSON Output Structure

Every scan produces a timestamped folder:

```text
output/
└─ example.com/
   └─ 2026-07-01_10-00-00/
      β”œβ”€ subdomains.json       # All discovered subdomains
      β”œβ”€ dns.json              # DNS records per subdomain
      β”œβ”€ http.json             # HTTP fingerprint per live host
      β”œβ”€ cloud.json            # Cloud/CDN/WAF classification
      β”œβ”€ ports.json            # Open ports per IP
      β”œβ”€ services.json         # Nmap service/version/OS
      β”œβ”€ technologies.json     # Detected web technologies
      β”œβ”€ api.json              # Crawled endpoints
      β”œβ”€ vulnerabilities.json  # Nuclei findings
      β”œβ”€ changes.json          # Diff vs previous scan
      └─ full_scan.json        # Consolidated summary
```

Reports are **never overwritten**. The historical record is preserved per spec.

---

## Change Detection

After each scan, XFinder compares the current scan with the most recent previous completed scan for the same target. The diff is persisted to `changes.json` and stored in the `summary` of `full_scan.json`.

Detected change types:

- **New / Removed Subdomains**
- **New Open Ports / Closed Ports**
- **Technology Changes** (added/removed techs per HTTP service)
- **DNS Changes** (per-subdomain record additions/removals)
- **Cloud Changes** (provider/CDN/WAF transitions)
- **New / Resolved Vulnerabilities** (by template ID + matched URL)
- **New / Removed API Endpoints**

Each category is summarized in a `summary` block with counts.

---

## Scheduler

XFinder uses APScheduler's `BackgroundScheduler` for recurring scans:

```python
from scheduler.scheduler import get_scheduler

sched = get_scheduler()
sched.start()
sched.schedule(
    target="example.com",
    scan_type="full",
    interval_minutes=60,
)
```

Features:
- **No `while True` loops** β€” uses APScheduler's own event loop
- **Coalesce + max_instances=1** β€” prevents overlapping runs of the same target
- **Misfire grace time = 300s** β€” recovers from short downtimes
- **Replace semantics** β€” scheduling the same (target, scan_type) replaces the existing job
- **Minimum interval = 5 minutes** β€” protects against accidental DoS of your own infrastructure

---

## Extending XFinder

### Adding a New Scanner

1. Subclass `BaseScanner`:

```python
# scanners/my_tool.py
from scanners.base import BaseScanner, ScanResult
from utils.helpers import run_subprocess

class MyToolScanner(BaseScanner):
    name = "my_tool"
    description = "Does something cool"
    required_tools = ["my_tool"]

    def run(self) -> ScanResult:
        res = run_subprocess(["my_tool", self.ctx.target],
                             timeout=self.ctx.timeout)
        if not res.ok:
            return ScanResult(
                scanner=self.name, success=False,
                duration_seconds=0.0, error=res.stderr,
            )
        return ScanResult(
            scanner=self.name, success=True,
            duration_seconds=0.0,
            data={"result": res.stdout},
        )
```

2. Register it in `scanners/registry.py`:

```python
from scanners.my_tool import MyToolScanner

SCANNERS = {
    # ... existing entries ...
    "my_scan": [SubfinderScanner, MyToolScanner],
}
SCAN_LABELS["my_scan"] = "My Custom Scan"
```

3. Add persistence logic in `scanners/engine.py::_persist_result` if you want DB storage.

That's it β€” the CLI menu and scheduler will pick up the new scan type automatically.

### Adding a New Enrichment Module

Create a new file in `enrichment/` with an `enrich(target)` function returning a dict. See `enrichment/shodan.py` for the pattern.

---

## Testing

XFinder ships with **96 automated tests**:

```bash
# Run all tests
python -m pytest tests/ -v

# With coverage
python -m pytest tests/ --cov=. --cov-report=term-missing
```

### Test Categories

| File                            | Coverage                                    |
| ------------------------------- | ------------------------------------------- |
| `test_validators.py`            | Domain/IP/URL validation                    |
| `test_helpers.py`               | Subprocess, JSON parsing, helpers           |
| `test_settings.py`              | Configuration loader                        |
| `test_cloud.py`                 | Cloud/CDN/WAF detection                     |
| `test_change_detection.py`      | Change diff logic                           |
| `test_scanners.py`              | BaseScanner + registry                      |
| `test_database.py`              | Repository layer (SQLite)                   |
| `test_scheduler.py`             | APScheduler integration                     |
| `test_engine_integration.py`    | End-to-end scan with mocked scanners        |
| `test_install.py`               | Dependency detection logic                  |

### Test Results

```text
======================= 96 passed, 33 warnings in 0.67s ========================
```

The warnings are deprecation notices from `datetime.utcnow()` (still functional in Python 3.13, scheduled for removal in a future version).

---

## Project Structure

```text
XFinder/
β”œβ”€β”€ main.py                  # Entry point
β”œβ”€β”€ cli.py                   # Rich CLI menu
β”œβ”€β”€ config.py                # (Re-export of config.settings)
β”œβ”€β”€ install.py               # Dependency verifier
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”‚
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ settings.py          # Pydantic settings loader
β”‚   └── database.py          # SQLAlchemy engine + session scope
β”‚
β”œβ”€β”€ scanners/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ base.py              # BaseScanner + ScanContext + ScanResult
β”‚   β”œβ”€β”€ registry.py          # Scan-type β†’ scanner-class mapping
β”‚   β”œβ”€β”€ engine.py            # Orchestration engine
β”‚   β”œβ”€β”€ subfinder.py
β”‚   β”œβ”€β”€ dnsx.py
β”‚   β”œβ”€β”€ httpx.py
β”‚   β”œβ”€β”€ naabu.py
β”‚   β”œβ”€β”€ nmap.py
β”‚   β”œβ”€β”€ katana.py
β”‚   └── nuclei.py
β”‚
β”œβ”€β”€ enrichment/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ cloud.py             # Cloud/CDN/WAF detection
β”‚   β”œβ”€β”€ asn.py               # ASN/org/country via Team Cymru DNS
β”‚   β”œβ”€β”€ ssl.py               # SSL certificate metadata
β”‚   β”œβ”€β”€ whois.py             # RDAP + WHOIS fallback
β”‚   β”œβ”€β”€ shodan.py            # Shodan API
β”‚   └── virustotal.py        # VirusTotal v3 API
β”‚
β”œβ”€β”€ database/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ models.py            # 12 SQLAlchemy ORM models
β”‚   └── repository.py        # Data-access layer (batched writes)
β”‚
β”œβ”€β”€ scheduler/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── scheduler.py         # APScheduler wrapper
β”‚
β”œβ”€β”€ reports/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── json_export.py       # Per-scan JSON + change detection
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ logger.py            # Rotating file + console logger
β”‚   β”œβ”€β”€ helpers.py           # Subprocess, JSON, iteration helpers
β”‚   └── validators.py        # Domain/IP/URL validation
β”‚
β”œβ”€β”€ tests/                   # 96 tests (unit + integration)
β”‚   β”œβ”€β”€ conftest.py
β”‚   β”œβ”€β”€ test_validators.py
β”‚   β”œβ”€β”€ test_helpers.py
β”‚   β”œβ”€β”€ test_settings.py
β”‚   β”œβ”€β”€ test_cloud.py
β”‚   β”œβ”€β”€ test_change_detection.py
β”‚   β”œβ”€β”€ test_scanners.py
β”‚   β”œβ”€β”€ test_database.py
β”‚   β”œβ”€β”€ test_scheduler.py
β”‚   β”œβ”€β”€ test_engine_integration.py
β”‚   └── test_install.py
β”‚
β”œβ”€β”€ docs/
β”‚   β”œβ”€β”€ USER_GUIDE.md
β”‚   β”œβ”€β”€ TROUBLESHOOTING.md
β”‚   β”œβ”€β”€ ARCHITECTURE.md      # This README's architecture section, expanded
β”‚   └── architecture_diagram.md
β”‚
β”œβ”€β”€ samples/
β”‚   β”œβ”€β”€ scan_examples/       # Sample JSON output
β”‚   └── db_records/          # Sample DB row examples
β”‚
└── logs/                    # Runtime logs (auto-created)
```

---

## Roadmap

- [ ] Alembic migration scripts for production schema evolution
- [ ] Web UI (FastAPI + React) for browsing scan history
- [ ] Slack/Discord/Teams alerts on new vulnerabilities
- [ ] CVSS-based risk scoring (currently disabled per spec)
- [ ] Multi-target batch scans from a CSV file
- [ ] GraphQL API for programmatic access
- [ ] Docker image (for users who want it, despite the no-Docker spec)
- [ ] Plugin marketplace (install community scanners via pip)

---

## License

MIT License. See [LICENSE](LICENSE) for details.

---

## Disclaimer

XFinder is intended for **authorized security testing only**. Always obtain written permission before scanning infrastructure you do not own or operate. The authors are not responsible for misuse of this tool.