Share
## https://sploitus.com/exploit?id=49BE78C3-623A-5774-A92C-46CF44D0BFE8
# CVE-2023-24329 โ€” Parser Differential Lab

> **Educational use only.** This lab exists to demonstrate a real vulnerability in a safe, isolated environment. Never run this against systems you do not own. Never reuse the intentionally broken filter code in any production system.

A self-contained Docker lab demonstrating **CVE-2023-24329** โ€” a parser differential in Python's `urllib.parse.urlparse()` that allows bypass of URL scheme and host filters on **Python 
cd CVE-2023-24329-lab
```

### Beat 1 & 2 โ€” Vulnerable stack

```bash
# Start the vulnerable stack
docker compose -f docker-compose.vulnerable.yml up --build -d

# Beat 1: show the filter working
docker compose -f docker-compose.vulnerable.yml exec attacker python exploit.py baseline

# Beat 2: bypass the filter (local file read + SSRF)
docker compose -f docker-compose.vulnerable.yml exec attacker python exploit.py exploit
```

### Beat 3 โ€” Patched stack

```bash
# Swap to the patched interpreter
docker compose -f docker-compose.vulnerable.yml down
docker compose -f docker-compose.fixed.yml up --build -d

# Beat 3: confirm the patch holds
docker compose -f docker-compose.fixed.yml exec attacker python exploit.py verify
```

### Tear down

```bash
docker compose -f docker-compose.fixed.yml down
```

---

## Directory Layout

```
CVE-2023-24329-lab/
โ”œโ”€โ”€ docker-compose.vulnerable.yml   # Python 3.11.3 (affected)
โ”œโ”€โ”€ docker-compose.fixed.yml        # Python 3.11.4 (patched)
โ”œโ”€โ”€ vulnerable-api/
โ”‚   โ”œโ”€โ”€ app.py                      # Flask API with the naive filter
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ Dockerfile
โ”œโ”€โ”€ internal-service/
โ”‚   โ”œโ”€โ”€ app.py                      # Fake internal metadata endpoint
โ”‚   โ”œโ”€โ”€ requirements.txt
โ”‚   โ””โ”€โ”€ Dockerfile
โ””โ”€โ”€ attacker/
    โ”œโ”€โ”€ exploit.py                  # Demo driver (baseline / exploit / verify)
    โ”œโ”€โ”€ requirements.txt
    โ””โ”€โ”€ Dockerfile
```

The vulnerable and fixed API services share the **same source code** โ€” only the base image Python version differs. This is the key scientific-control property of the lab.

---

## How the Bypass Works

The vulnerable API filter (simplified):

```python
parsed = urllib.parse.urlparse(url)

if parsed.scheme.lower() in {"file", "gopher", "ftp", "data"}:
    return 403  # blocked

if parsed.hostname in {"localhost", "127.0.0.1", "internal-service"}:
    return 403  # blocked

urllib.request.urlopen(url)  # fetch the original, unmodified string
```

The bypass payload is a **single leading space**:

```
 file:///etc/passwd
^
space (0x20)
```

On Python โ‰ค 3.11.3, `urlparse` sees an empty scheme and no hostname โ†’ filter passes. `urlopen` strips the space โ†’ fetches `file:///etc/passwd`.

On Python โ‰ฅ 3.11.4, `urlparse` strips the space first โ†’ correctly sees `scheme=file` โ†’ filter blocks with `403`.

---

## The Fix (What Patched Python Does)

CPython [issue #102153](https://github.com/python/cpython/issues/102153) โ€” the fix strips C0 control characters and spaces from the start of the URL before parsing. After the patch both the parser and the fetcher agree on what the URL is, so the filter cannot be bypassed this way.

The correct defensive pattern regardless of Python version:

```python
# Parse โ†’ reconstruct from parts โ†’ pass the rebuilt URL downstream.
# Both the filter and the fetcher then operate on the same string.
parsed = urllib.parse.urlparse(url)
safe_url = parsed.geturl()  # rebuilt from components
urllib.request.urlopen(safe_url)
```

---

## Key Teaching Points

1. **Parser differentials are a vulnerability class, not a one-off bug.** The same idea drives HTTP request smuggling, SAML confusion attacks, and log4j's `${jndi:...}` bypasses.
2. **Blocklists fail when the parser lies to you.** Enumerating bad inputs is a losing game.
3. **Validate the reconstructed URL**, not the raw input string.
4. **One minor version, one tiny patch, massive consequence.** The CPython fix is a handful of lines.

---

## References

- NVD: [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329)
- CPython issue: [github.com/python/cpython/issues/102153](https://github.com/python/cpython/issues/102153)
- Original disclosure: Yebo Cao โ€” search "CVE-2023-24329 Yebo Cao"
- Broader class: PortSwigger โ€” SSRF filter bypass techniques
- Broader class: James Kettle โ€” HTTP Desync Attacks

---

## Guardrails

- Never expose `internal-service` ports to the host.
- Never run this lab on a machine connected to a production network.
- The filter code in `vulnerable-api/app.py` is **deliberately broken for teaching purposes** โ€” do not copy it into any real system.

---

## License

MIT โ€” free to use, share, and adapt for educational purposes with attribution.