Share
## https://sploitus.com/exploit?id=59505BC0-DE3A-56CF-96BF-33C4639271E6
# MCATester โ€” AI-Powered OSINT & Vulnerability Discovery Platform

> Built during a security research internship at the National e-Governance Division (NeGD), MeitY, New Delhi.

[![Python](https://img.shields.io/badge/Python-3.12-blue)](https://python.org)
[![FastAPI](https://img.shields.io/badge/FastAPI-0.100%2B-green)](https://fastapi.tiangolo.com)
[![Groq](https://img.shields.io/badge/AI-Groq%20LLaMA%203.3%2070B-orange)](https://groq.com)
[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)

MCATester is a full-stack OSINT and vulnerability discovery platform that turns passive reconnaissance into **confirmed, zero-false-positive security findings** โ€” with an AI decision layer that makes the scanner adaptive rather than just automated.

---

## The core problem it solves

Most scanners produce noise. Running gobuster + nikto + sqlmap on a real target produces hundreds of raw results requiring hours of manual filtering. MCATester produces clean findings โ€” a SQLi finding means the database actually executed a sleep command, an XSS finding means the payload was reflected unescaped in the HTML response.

**On mca.gov.in (before vs after noise reduction):**

```
First version:  68 findings โ€” 61 false positives (all 403 responses)
Current version: 11 findings โ€” 0 false positives
```

The key insight: 403 responses are ambiguous. A WAF returning 403 on `/admin` doesn't mean admin exists. Content confirmation โ€” checking what the 403 response body actually contains โ€” eliminates this entire class of false positive.

---

## Real findings โ€” Ministry of Corporate Affairs, India

Discovered during authorized research on `mca.gov.in`:

```
CRITICAL  CVE-2023-27997  CVSS 9.8
          vpnv3.mca.gov.in:4111 โ€” Fortinet SSL VPN pre-auth heap overflow
          Unauthenticated remote code execution, no credentials required

CRITICAL  CVE-2022-40684  CVSS 9.8
          Fortinet authentication bypass โ€” full admin access without credentials
          Affected: FortiOS 7.0.0-7.0.6, 7.2.0-7.2.1

CRITICAL  CVE-2018-13379  CVSS 9.1
          Fortinet path traversal โ€” VPN session credentials readable
          via /remote/fgt_lang without authentication

HIGH      Unauthenticated File-Serving API
          pminternship.mca.gov.in/mca-api/files/get-file-by-path
          No auth required to request arbitrary file paths

HIGH      CVE-2023-24486  CVSS 8.8
          GroupWise WebAccess XSS + session hijack
          mail.mca.gov.in โ€” active groupware installation
```

Responsibly disclosed to CERT-In (`incident@cert-in.org.in`) with full PDF report.

---

## Confirmed findings on demo.testfire.net (deliberately vulnerable lab)

```
CRITICAL  SQL Injection โ€” Time-based blind (PostgreSQL confirmed)
          URL    : http://demo.testfire.net/search.jsp
          Payload: '; SELECT pg_sleep(3)--
          Evidence: 3.8s response vs 0.6s baseline

CRITICAL  Swagger/OpenAPI UI exposed publicly
          URL    : http://demo.testfire.net/swagger/properties.json
          Email leaked: jsmtih@altoromutual.com

HIGH      Reflected XSS
          URL    : http://demo.testfire.net/search.jsp
          Payload:  reflected unescaped in HTML response

[AI-Agent] Risk: CRITICAL (score: 9.5/10)
[AI-Agent] โ†’ Remove public Swagger access
[AI-Agent] โ†’ Patch CVE-2025-24813 (Tomcat partial PUT RCE, CVSS 9.8)
[AI-Agent] โ†’ Fix SQLi with parameterized queries
```

---

## Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     MCATester โ€” 16 Stage Pipeline               โ”‚
โ”‚                                                                 โ”‚
โ”‚  Stage 1   DNS + Whois + Subdomain Enum (crt.sh/VT/HT)        โ”‚
โ”‚  Stage 2   Recursive Asset Discovery (parallel, 20 threads)    โ”‚
โ”‚  Stage 3   Subdomain Takeover Detection (20 services)          โ”‚
โ”‚  Stage 4   Threat Intel (urlscan / AbuseIPDB / OTX)           โ”‚
โ”‚  Stage 5   Tech Stack Detection (WhatWeb + headers)            โ”‚
โ”‚  Stage 6   AI Context Injector โ€” Gemini generates dork queries โ”‚
โ”‚  Stage 7   Google Dorking (20+ categories, DDG + Serper)       โ”‚
โ”‚  Stage 8   Fetch + Content Confirmation (35 patterns)          โ”‚
โ”‚  Stage 9   Active Probing + WAF Detection                      โ”‚
โ”‚  Stage 10  Attack Chain Orchestrator                           โ”‚
โ”‚  Stage 11  Header Security Analysis                            โ”‚
โ”‚  Stage 12  Payload Injection (SQLi / XSS / Traversal)         โ”‚
โ”‚  Stage 13  CVE Correlation + NVD Enrichment                   โ”‚
โ”‚  Stage 14  AI Decision Engine (Groq) โ† 5 decision points      โ”‚
โ”‚  Stage 15  Gemini Report Generation                            โ”‚
โ”‚  Stage 16  PDF Export + Webhook Alerts                         โ”‚
โ”‚                                                                 โ”‚
โ”‚  FastAPI backend + SQLite + Real-time dashboard                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

The AI Decision Engine (Stage 14) is not just report-writing โ€” it makes actual decisions at 5 points: target triage, URL prioritization, injection targeting, CVE exploitability assessment, and final risk ranking.

---

## How MCATester compares

| Capability | MCATester | Nikto | gobuster | Burp Suite Free |
|---|:---:|:---:|:---:|:---:|
| Zero false positives | โœ“ | โœ— | โœ— | Manual |
| CVE correlation | โœ“ | Partial | โœ— | โœ— |
| SQLi/XSS confirmation | โœ“ | โœ— | โœ— | Manual |
| Subdomain takeover | โœ“ | โœ— | โœ— | โœ— |
| AI risk scoring | โœ“ | โœ— | โœ— | โœ— |
| Attack chain orchestration | โœ“ | โœ— | โœ— | Manual |
| Real-time dashboard | โœ“ | โœ— | โœ— | โœ“ |
| PDF report | โœ“ | โœ— | โœ— | Pro only |
| Drift detection | โœ“ | โœ— | โœ— | โœ— |
| Webhook alerts | โœ“ | โœ— | โœ— | โœ— |

---

## Features

### Passive Recon
- DNS (A/MX/NS/TXT/SOA/Reverse), Whois
- Subdomain enumeration โ€” VirusTotal, crt.sh, HackerTarget, sublist3r (45+ subdomains found on real targets)
- GitHub recon โ€” repositories referencing the target
- Threat intelligence โ€” urlscan.io, AbuseIPDB, OTX AlienVault
- Email discovery + Holehe registration checking (400+ sites)
- IP intelligence โ€” ASN, ISP, geolocation

### Active Discovery
- Parallel recursive asset scanning (28 assets in 3 min)
- WhatWeb tech stack fingerprinting
- 66+ path probes with WAF detection
- AI-targeted probing โ€” Gemini generates paths specific to detected tech stack
- Subdomain takeover detection (20 services: GitHub Pages, Heroku, Netlify, Vercel, AWS S3, Azure, Shopify, HubSpot, Zendesk...)

### Vulnerability Confirmation
- **SQL Injection** โ€” error-based + time-based blind, auto-detects MySQL/MSSQL/PostgreSQL
- **Reflected XSS** โ€” safe payload reflection detection
- **Path Traversal** โ€” file API parameter testing with content confirmation
- **WAF pre-check** โ€” if WAF blocks all pages, skip injection (saves ~3 min on hardened targets)
- **Content Confirmation** โ€” 35 signatures, kills 403 false positives

### Attack Chain Orchestrator
When one finding is confirmed, automatically fires follow-up probes:
- Swagger found โ†’ probe 12 API endpoints
- VPN login found โ†’ probe Fortinet-specific paths
- File API found โ†’ test 10 traversal payloads (deduplicated by base endpoint)
- Webmail found โ†’ probe 7 credential paths

### CVE Intelligence
- Static knowledge base โ€” Fortinet, Lotus Domino, GroupWise, Tomcat, Apache, nginx, WordPress, PHP
- NVD API enrichment for confirmed CVEs
- Auto-matches detected tech stack to CVE database
- AI exploitability assessment with confidence levels

### AI Decision Engine (Groq)
5 real decisions per scan โ€” not just report formatting:

```
Decision 1: Target triage
  โ†’ Classifies as government/enterprise/SaaS
  โ†’ Identifies high-value subdomains to prioritize

Decision 2: URL ranking
  โ†’ Ranks 40+ discovered URLs by exploitation potential
  โ†’ VPN login page > generic content page

Decision 3: Injection targeting
  โ†’ Selects which pages are worth injection testing
  โ†’ Skips pages with no injectable parameters

Decision 4: CVE exploitability
  โ†’ Assesses if correlated CVEs are likely exploitable
  โ†’ Considers service accessibility + version ranges

Decision 5: Final risk assessment
  โ†’ Risk score (0-10)
  โ†’ Executive summary (2-3 sentences for management)
  โ†’ Technical summary (attack vectors for security team)
  โ†’ Specific immediate actions
```

### Dashboard & Reporting
- Real-time web dashboard โ€” live scan status, severity donut, risk trend chart
- Attack Chains page โ€” findings grouped by CVE Intelligence / Active Exploitation / Infrastructure
- Alerts page โ€” all CRITICAL/HIGH findings across all scans, grouped by target with timestamps
- Drift detection โ€” scan-over-scan comparison, flags new/resolved/changed findings
- PDF report โ€” VAPT-style with findings, evidence, CVSS scores, remediation steps
- Webhooks โ€” Slack, Discord, Telegram for HIGH+ findings

---

## Installation

**Requirements:** Python 3.10+, Linux or WSL2, nmap

```bash
git clone https://github.com/yourusername/MCATester.git
cd MCATester

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

# Optional but improves results significantly
pip install groq
sudo apt install nmap whatweb
```

### API Keys (`.env`)

```bash
cp .env.example .env
# Edit .env with your keys
```

| Key | Where to get | Cost |
|---|---|---|
| `GEMINI_API_KEY` | aistudio.google.com | Free (15 req/min) |
| `SERPER_API_KEY` | serper.dev | Free (2500/month) |
| `VIRUSTOTAL_API_KEY` | virustotal.com | Free (500/day) |
| `GROQ_API_KEY` | console.groq.com | Free (fast) |
| `SHODAN_API_KEY` | shodan.io | $49/year |

---

## Usage

### CLI

```bash
# Full scan โ€” all 16 stages
python osint_agent.py mca.gov.in

# Passive only โ€” no active probing or injection
python osint_agent.py mca.gov.in --passive

# Skip recursive discovery (faster โ€” ~5 min vs ~10 min)
python osint_agent.py mca.gov.in --no-recursive
```

### Dashboard

```bash
python server.py
# Open http://localhost:8000
```

Enter domain โ†’ Start Scan โ†’ watch results populate in real time.

---

## Scan performance

```
mca.gov.in (45 subdomains, WAF protected):
  Total time    : ~10 minutes
  Findings      : 11 (zero false positives)
  False positives: 0 (was 61 in v1)

demo.testfire.net (no WAF, vulnerable):
  Total time    : ~12 minutes
  Findings      : 15 (confirmed SQLi + XSS + CVEs)

Time breakdown (approximate):
  Subdomain enum        : 2 min  (crt.sh + VirusTotal sequential)
  Recursive discovery   : 3 min  (28 assets parallel)
  Dorking               : 2 min  (DDG + Serper)
  Payload injection     : 0 min  (WAF pre-check skips on mca.gov.in)
                          3 min  (full testing on demo.testfire.net)
  CVE + AI decisions    : 1 min  (5 Groq calls)
  Other stages          : 2 min
```

---

## Project structure

```
MCATester/
โ”œโ”€โ”€ osint_agent.py           # Main pipeline โ€” 16 stages, CLI entry
โ”œโ”€โ”€ server.py                # FastAPI backend โ€” scan management + API
โ”œโ”€โ”€ orchestrator.py          # Attack chain engine
โ”œโ”€โ”€ ai_decision_engine.py    # Groq LLM โ€” 5 decision points per scan
โ”œโ”€โ”€ ai_context_injector.py   # Gemini โ€” targeted dork + path generation
โ”œโ”€โ”€ cve_correlation.py       # CVE matching + NVD API enrichment
โ”œโ”€โ”€ payload_injector.py      # SQLi/XSS/traversal with WAF pre-check
โ”œโ”€โ”€ subdomain_takeover.py    # Dangling CNAME โ€” 20 services
โ”œโ”€โ”€ recursive_discovery.py   # Parallel subdomain + port scanner
โ”œโ”€โ”€ delta_detection.py       # Scan-over-scan diff
โ”œโ”€โ”€ content_confirmation.py  # 35-pattern false-positive eliminator
โ”œโ”€โ”€ webhooks.py              # Slack/Discord/Telegram alerts
โ”œโ”€โ”€ osint_features.py        # PDF report generator
โ”œโ”€โ”€ osint_identity.py        # IP intel + Holehe
โ”œโ”€โ”€ search.py                # DDG/Serper wrapper
โ”œโ”€โ”€ static/
โ”‚   โ””โ”€โ”€ index.html           # Real-time dashboard SPA
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ .env.example
โ””โ”€โ”€ README.md
```

---

## Responsible use

**Only test systems you own or have explicit written permission to test.**

Built-in safety measures:
- Warning banner on every CLI run
- `--passive` mode disables all active testing
- Injection payloads are read-only diagnostics โ€” no write operations
- Rate limiting (1s between requests)
- WAF pre-check skips injection when target is hardened
- 403 responses never reported as findings

For disclosures: India โ†’ CERT-In `incident@cert-in.org.in`

---

## Tech stack

| Layer | Technology |
|---|---|
| Pipeline | Python 3.12 |
| Backend API | FastAPI + SQLite |
| Frontend | Vanilla JS + CSS custom properties |
| AI decisions | Groq โ€” llama-3.3-70b-versatile |
| AI context | Google Gemini 2.5 Flash |
| PDF generation | ReportLab |
| Port scanning | Shodan InternetDB + nmap fallback |
| Tech detection | WhatWeb + header inference |
| Subdomain data | crt.sh + VirusTotal + HackerTarget |

---

## Roadmap

- [ ] Screenshot capture โ€” Playwright screenshots of all discovered assets
- [ ] Scheduled scanning โ€” 24h autonomous monitoring with drift alerts
- [ ] nuclei integration โ€” template-based CVE confirmation
- [ ] Multi-target mode โ€” scan an entire organization at once
- [ ] SARIF export โ€” GitHub Security tab integration

---

## Author

**SANKARAYOUGI SRIVASTESWAR** โ€” B.Tech Computer Science, VIT-AP University  
Security research intern, National e-Governance Division (NeGD), MeitY, New Delhi

---

*For authorized security testing and research only. The author is not responsible for misuse.*