Exploit for CVE-2026-7482

Name: Exploit for CVE-2026-7482
Rating: 5 (142 reviews)
2026-05-05 | CVSS 9.1
## https://sploitus.com/exploit?id=5211335D-7270-5C40-9861-07AB5B8CE981
# CVE-2026-7482: Ollama Heap Out-of-Bounds Read (1-Day PoC)

This repository contains a 1-day Proof of Concept (PoC) exploitation chain for **CVE-2026-7482**, an unauthenticated Out-of-Bounds (OOB) Read vulnerability in Ollama's GGUF model loader (versions prior to 0.17.1). 

**Note:** This is a 1-day research reproduction. I did not discover the original CVE. This PoC was engineered based on the public advisory details to demonstrate the mechanics of the vulnerability for educational and defensive research purposes.

## Vulnerability Overview
By supplying a maliciously crafted, truncated GGUF file to the `/api/create` endpoint, an attacker can force the quantization parser in `fs/ggml/gguf.go` and `server/quantization.go` to read past the allocated heap buffer. The leaked memory is then exfiltrated by pushing the resulting model artifact to an attacker-controlled Docker registry via the `/api/push` endpoint.

## Technical Details: The Exploit Primitive
During this 1-day research, reproducing the crash was trivial, but achieving stable exfiltration without crashing the server or hitting API validation blocks required specific architectural forging:

1. **Frontend Validation Bypass:** The payload must be tagged as `F16` (`general.file_type = 1`) to satisfy the Ollama API's strict pre-flight checks.
2. **Quantizer Coercion:** We request a `Q4_K_M` down-quantization. Because the payload is seen as F16, the C++ `ggml` backend is forced to process the payload rather than performing a safe, 1:1 memory copy.
3. **Perfect Block Alignment:** The target tensor (`token_embd.weight`) must be shaped as a 2D matrix where the innermost dimension is exactly `256` (e.g., `[num_rows, 256]`). This strictly aligns with `Q4_K_M` block requirements, preventing the backend from skipping the layer.
4. **Physical Truncation:** The physical file is truncated to 32 bytes. When the matrix multiplication loop runs, it hits EOF and over-reads directly into the adjacent heap space.

## Prerequisites
```bash
pip install requests numpy gguf
```
You also need a publicly accessible HTTP listener (like Ngrok) to catch the exfiltrated Docker layer pushes.

## Usage

**1. Start the Rogue Registry**
Start the listener to catch the leaked memory blobs.
```bash
sudo python3 registry.py
```

**2. Forge the Malicious Payload**
Generate the truncated GGUF file. You can adjust `TARGET_LEAK_SIZE_MB` inside the script to control how much heap memory is scraped per request. (Recommended: 0.5MB to 2.0MB to avoid segfaulting unmapped pages).
```bash
python3 forge.py
```

**3. Fire the Exploit**
Edit `exploit.py` to include your target IP and your rogue registry URL, then execute:
```bash
python3 exploit.py
```

**4. Analyze the Artifact**
The registry will drop the leaked heap dumps into the `exfils/` directory. 

*Note on Data Integrity (The Quantization Trap):* While the exploit successfully captures and exfiltrates up to several megabytes of server heap memory, the data is subjected to Ollama's `Q4_K_M` down-quantization algorithm during the OOB read. The backend casts the raw memory bytes to `float16` and applies a lossy 4-bit block compression scheme. Consequently, the leaked memory is mathematically mangled. Standard ASCII extraction tools will yield binary garbage, making plaintext credential recovery practically unviable via this specific coercion path.

## Disclaimer
This project is for educational and authorized vulnerability research purposes only. Do not use this tool against systems you do not own or have explicit permission to test.