containerd: Insecure handling of image volumes  
containerd's cri plugin handles image volumes containing path traversals insecurely. This can be used to copy arbitrary host directories to a container-mounted path.  
OCI images contain a JSON config file described in As part of this config,  
an image can specify \u"Volumes\u" which describe \u2018where the process is likely to write data specific to a container instance' when the image is used to run a container.  
When this configuration is converted into an OCI runtime config, containerd tries to follow the spec at  
\u"Implementations SHOULD provide mounts for these locations such that application data is not written to the container's root filesystem. If a converter implements conversion for this field using mountpoints, it SHOULD set the destination of the mountpoint to the value specified in Config.Volumes. An implementation MAY seed the contents of the mount with data in the image at the same location\u"   
The seeding is implemented in (*criService).CreateContainer (cri/server/container_create.go)  
var volumeMounts []*runtime.Mount  
if !c.config.IgnoreImageDefinedVolumes {  
// Create container image volumes mounts.  
volumeMounts = c.volumeMounts(containerRootDir, config.GetMounts(),   
} else if len(image.ImageSpec.Config.Volumes) != 0 {  
func (c *criService) volumeMounts(..) ..   
var mounts []*runtime.Mount  
for dst := range config.Volumes {  
volumeID := util.GenerateID()  
src := filepath.Join(containerRootDir, \"volumes\", volumeID)  
mounts = append(mounts, &runtime.Mount{  
ContainerPath: dst,  
HostPath: src,  
SelinuxRelabel: true,  
return mounts  
Image volume mounts are only supported if IgnoreImageDefinedVolumes is false. While the description mentions that this flag is \u"Useful for better resource isolation, security\u2026\u" the default is false and none of the major containerd users seems to overwrite this.   
So in the default config, c.VolumeMounts will be called to create new runtime.Mount entries for all Volumes listed in the image config. There is no validation of the listed paths and the .ContainerPath attribute is completely image/attacker controlled.  
Later in the execution, the harmless HostPaths and the attacker controlled ContainerPaths are passed to the customopts.WithVolumes method. While the HostPath is cleaned, ContainerPath is passed through without changes:  
if len(volumeMounts) > 0 {  
mountMap := make(map[string]string)  
for _, v := range volumeMounts {  
mountMap[filepath.Clean(v.HostPath)] = v.ContainerPath  
opts = append(opts, customopts.WithVolumes(mountMap))  
The WithVolumes function (pkg/cri/opts/container.go) now tries to copy all files that are under ContainerPath in the container rootfs to the temporary directory at HostPath that will be later mounted into the Container at the same location (This is the optional \u"seeding\u" step described in the spec):  
for host, volume := range volumeMounts {  
// The volume may have been defined with a C: prefix, which we can't use here.  
volume = strings.TrimPrefix(volume, \"C:\")  
for _, mountPath := range mountPaths {  
src := filepath.Join(mountPath, volume)  
if _, err := os.Stat(src); err != nil {  
if os.IsNotExist(err) {  
// Skip copying directory if it does not exist.  
if err := copyExistingContents(src, host); err != nil {  
\u2026 }  
volume is the fully attacker controlled ContainerPath, mountPath a host directory pointing to a part of the containers rootfs. By setting volume to a path like \u"/../../../../../../../../../etc\u", src will become \u"/etc\u" and the copyExistingContents function in the last line will recursively copy the /etc/directory to host. As the directory specified by host will later be mounted into the container, this gives the container full read access to arbitrary files and directories.  
Suggested Fix:  
mountMap[filepath.Clean(v.HostPath)] = filepath.Clean(v.ContainerPath)  
should be sufficient to fix the issue. (But it might be reasonable to surface/log misbehaving images?)  
fwilhelm ~ % buildah inspect volumes-test | jq '.OCIv1.config.Volumes'  
\"/../../../../../../../../var/lib/kubelet/pki/\": {}  
fwilhelm ~ % kubectl run shell --rm -i --tty --image[redacted]/test/volumes-test -- /bin/sh   
/ # mount | grep /var/lib/kubelet  
/dev/root on /var/lib/kubelet/pki type ext4 (rw,relatime)  
/ # ls -la /var/lib/kubelet/pki/  
total 20  
drwxrwxrwt 2 root root 4096 Nov 12 15:54 .  
drwxr-xr-x 3 root root 4096 Nov 12 15:54 ..  
-rw-r--r-- 1 root root 1135 Nov 4 08:59 kubelet-client.crt  
-rw------- 1 root root 227 Nov 4 08:59 kubelet-client.key  
-rw------- 1 root root 0 Nov 4 08:59 kubelet-client.lock  
-rw------- 1 root root 1496 Nov 4 08:59 kubelet-server-2021-11-04-08-59-06.pem  
lrwxrwxrwx 1 root root 59 Nov 4 08:59 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2021-11-04-08-59-06.pem  
Let me know if you need access to the POC image, I did not want to spam the full list with it.   
This bug is subject to a 90-day disclosure deadline. If a fix for this issue is made available to users before the end of the 90-day deadline, this bug report will become public 30 days after the fix was made available. Otherwise, this bug report will become public at the deadline. The scheduled deadline is 2022-02-21. For more details, see the Project Zero vulnerability disclosure policy:  
Related CVE Numbers: CVE-2022-23648.  
Found by: