## https://sploitus.com/exploit?id=A6500AE2-6D3D-5689-B049-BFCE1470ED76
# Copy Fail (CVE-2026-31431) โ Kubernetes Container Escape PoC
A proof-of-concept demonstrating how a **fully unprivileged container** can achieve **node-level code execution** on Kubernetes by exploiting the CVE-2026-31431 Linux kernel page-cache corruption bug through shared container image layers.
> **Disclaimer:** This repository is published for educational and defensive purposes only. Use it exclusively on systems you own or have explicit authorization to test.
## Background
CVE-2026-31431 ("Copy Fail") is a Linux kernel vulnerability in the page-cache Copy-on-Write (CoW) path. An `AF_ALG` splice race allows an unprivileged process to corrupt the page-cache pages of a **read-only** file. The corruption persists in the kernel page cache and is visible to every process that subsequently reads or executes the file โ including processes in other containers or on the host.
For full details on the original vulnerability, see [copy.fail](https://copy.fail/).
## How It Works
The attack chain has three stages: **page-cache corruption**, **cross-container propagation**, and **privileged execution**.
### 1. Page-Cache Corruption via AF_ALG Splice Race
The kernel's `AF_ALG` (crypto) subsystem exposes a socket-based interface for userspace cryptographic operations. The exploit abuses a race condition in how the kernel handles `splice()` from a file into an AF_ALG socket:
1. Open the target binary (e.g. `/usr/sbin/ipset`) **read-only**.
2. Create an AF_ALG AEAD socket bound to `authencesn(hmac(sha256),cbc(aes))`.
3. Send a small payload chunk through the AF_ALG socket with `MSG_MORE`, telling the kernel to expect more data.
4. `splice()` the target file's contents from an fd โ pipe โ AF_ALG socket.
5. Due to the CoW bug, the kernel **writes the attacker's payload bytes into the target file's page-cache pages** instead of properly isolating them.
The exploit repeats this for each 4-byte window until the entire target binary's cached pages are overwritten with a custom payload.
No write permission to the file is needed. The file on disk is unchanged โ only the in-memory page cache is corrupted.
### 2. Cross-Container Propagation via Image Layer Sharing
Container runtimes (containerd, CRI-O) use overlay filesystems. When two containers share the same image layer, the kernel serves their file reads from the **same page-cache pages**.
This PoC image is built `FROM registry.k8s.io/kube-proxy:v1.35.2`. The kube-proxy DaemonSet on every Kubernetes node uses the exact same base layer. As a result, `/usr/sbin/ipset` in both containers maps to the **identical set of page-cache pages**.
When the unprivileged PoC container corrupts ipset's page cache, the corruption is immediately visible to the privileged kube-proxy container on the same node โ with zero cross-container communication.
### 3. Privileged Execution by kube-proxy
kube-proxy runs as a **privileged** DaemonSet with `hostNetwork: true`. It periodically invokes `/usr/sbin/ipset` to manage iptables/ipset rules. When it next executes ipset, the kernel loads the corrupted page-cache pages, executing the attacker's payload with kube-proxy's full privileges:
- Full root on the node
- All capabilities
- Access to host namespaces
The payload in this PoC (`payload/payload.c`) simply mounts the host root filesystem and writes a marker file to `/root/res` as proof of node-level code execution.
### Attack Flow Diagram
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PoC Container โ โ kube-proxy Container โ
โ (unprivileged) โ โ (privileged) โ
โ โ โ โ
โ 1. Open /usr/sbin/ipset โ โ โ
โ (read-only) โ โ โ
โ โ โ โ
โ 2. AF_ALG splice race โ โ โ
โ corrupts page cache โ โ โ
โ โ โ โ โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โผ โ
โโโโโโโโโโโโโโโโโโโโโโโ โ
โ Kernel Page Cache โ โ
โ /usr/sbin/ipset โโโโโโโโโโโโโโโโโโโโโโโ
โ (CORRUPTED) โ 3. kube-proxy executes ipset
โ contains attacker's โ โ loads corrupted pages
โ payload bytes โ โ payload runs as root
โโโโโโโโโโโโโโโโโโโโโโโ on the host
```
## Repository Structure
```
.
โโโ cmd/copyfail/main.go # Entry point; embeds compiled payload
โโโ internal/
โ โโโ exploit/
โ โ โโโ exploit.go # Core exploit: AF_ALG splice race loop
โ โ โโโ patch.go # Splits payload into 4-byte patch windows
โ โโโ alg/
โ โโโ alg.go # AF_ALG AEAD socket abstraction
โโโ payload/
โ โโโ payload.c # Validation payload (mount host fs, write marker)
โ โโโ nolibc/ # Kernel's tiny libc for static, no-dependency payloads
โโโ deploy/
โ โโโ poc.yaml # Kubernetes Deployment manifest
โโโ Dockerfile # Built FROM kube-proxy to share image layers
โโโ Makefile # Build orchestration
โโโ docs/ # Validation evidence from ACK (Alibaba Cloud)
```
## Prerequisites
- Go 1.25+
- A cross-compiler for the nolibc payload (default: `x86_64-linux-gnu-gcc`)
- Docker / Buildx
- A Kubernetes cluster running kube-proxy as a DaemonSet with `imagePullPolicy: IfNotPresent` (the default)
- Linux kernel **before** the CVE-2026-31431 fix
## Building
```bash
# Build payload + Go binary
make build
# Build Docker image
make docker-build
# Build and push to GHCR
make docker-push IMAGE=ghcr.io//copy-fail-poc TAG=latest
```
For `arm64` targets:
```bash
make build CC=aarch64-linux-gnu-gcc GOARCH=arm64
```
## Usage
### Deploy the PoC
```bash
kubectl apply -f deploy/poc.yaml
```
The Deployment creates a single unprivileged pod. It:
1. Runs `/bin/copyfail -target /usr/sbin/ipset` to corrupt the page cache.
2. Sleeps indefinitely so the pod stays running for observation.
### Verify the Escape
After kube-proxy next executes ipset (this typically happens within seconds due to its reconciliation loop, or on its next restart), check the **node**:
```bash
# SSH into the node, or use a privileged debug pod
cat /root/res
# Expected output: [*] success
```
The presence of `/root/res` on the host filesystem proves that attacker-supplied code executed with node-level privileges โ written from inside kube-proxy's privileged container context.
### Clean Up
```bash
kubectl delete -f deploy/poc.yaml
# On the affected node(s), remove the marker and restart kube-proxy:
rm -f /root/res
systemctl restart kubelet # or delete the kube-proxy pod to force re-pull
```
## Why kube-proxy + ipset?
kube-proxy is an ideal target because it is:
1. **Present on every node** โ runs as a DaemonSet.
2. **Highly privileged** โ `privileged: true`, `hostNetwork: true`.
3. **Ships ipset in its image** โ ipset is a setuid binary used for iptables management.
4. **Uses `imagePullPolicy: IfNotPresent`** โ once the attacker's image is pulled and shares the same base layer, the overlay lower-dir pages are shared.
Any privileged DaemonSet whose image contains a predictable binary could be targeted the same way.
## Customizing the Payload
The default payload (`payload/payload.c`) is a validation-only program that writes a marker file. To build a custom payload:
1. Edit `payload/payload.c`. The program is built against `nolibc` (the kernel's minimal C library) for a static, dependency-free binary.
2. Run `make payload` to cross-compile.
3. The compiled payload is embedded into the Go binary via `//go:embed`.
## Affected Versions
- **Linux kernel**: All versions before the CVE-2026-31431 patch.
- **Kubernetes**: Any version using an unpatched node kernel. The vulnerability is in the kernel, not in Kubernetes itself. Kubernetes merely provides the execution context (shared image layers + privileged DaemonSets) that elevates the impact from local page-cache corruption to full container escape.
## Mitigation
- **Patch the kernel.** This is the definitive fix.
- **Enable image layer isolation.** Some runtimes support per-container filesystem snapshots that prevent page-cache sharing.
- **Use read-only root filesystems** for kube-proxy (does not fully mitigate, but limits payload capabilities).
- **Restrict pod scheduling** to prevent untrusted workloads from landing on nodes running privileged DaemonSets with shared base images.
## Credits
- **CVE-2026-31431 discovery and disclosure**: [Theori / Xint](https://copy.fail/)
- **Cross-platform C payload**: [Tony Gies](https://github.com/tgies/copy-fail-c) (LGPL-2.1-or-later OR MIT)
- **nolibc**: Linux kernel selftests (`tools/include/nolibc/`)
## License
The Go exploit code in this repository is provided as-is for research purposes.
The payload (`payload/payload.c`) is derived from [copy-fail-c](https://github.com/tgies/copy-fail-c) and is dual-licensed under **LGPL-2.1-or-later** OR **MIT**. See [LICENSE-LGPL](LICENSE-LGPL) and [LICENSE-MIT](LICENSE-MIT).