Modern Vulnerability Research Techniques on Embedded Systems
This guide takes a look at vetting an embedded system (An ASUS RT-AC51U) using AFL, angr, a cross compiler, and some binary instrumentation without access to the physical device. We'll go from static firmware to thousands of executions per second of fuzzing on emulated code. (Sorry no 0days in this post)
Asus is kind enough to provide the firmware for their devices online. Their firmware is generally a root file system packed into a single file using squashfs. As shown below, binwalk can run through this file system and identify the filesystem for us.
Binwalk supports carving the filesystem out of the firmware image through the -Mre flags and will put the resulting root file system into a folder titled squash-fs
The LD_PRELOAD trick is a method of hooking symbols in a given binary to call your symbol, which the loader and placed before the reference to the original symbol. This can be used to hook function, like malloc and free in the case of libraries like libdheap, to call your own code and perform logging or other intrumentation based analysis. The general format requires compiling a small stub of c code and then running your binary like this:
I wanted to try a trick I saw online to create a fast and effective fuzzer for network protocol fuzzing. This github gist shows a PoC of creating an LD_PRELOAD'd library that intercepts libc's call to main and replaces it with our own.
#define_GNU_SOURCE#include<stdio.h>#include<dlfcn.h>/* Trampoline for the real main() */staticint (*main_orig)(int,char**,char**);/* Our fake main() that gets called by __libc_start_main() */intmain_hook(int argc,char**argv,char**envp){// Do my stuff}/* * Wrapper for __libc_start_main() that replaces the real main * function with our hooked version. */int__libc_start_main(int (*main)(int,char**,char**),int argc,char**argv,int (*init)(int,char**,char**),void (*fini)(void),void (*rtld_fini)(void),void*stack_end){ /* Save the real main function address */ main_orig = main; /* Find the real __libc_start_main()... */typeof(&__libc_start_main) orig =dlsym(RTLD_NEXT,"__libc_start_main"); /* ... and call it with our custom main function */returnorig(main_hook, argc, argv, init, fini, rtld_fini, stack_end);}
My thought was to then call a function inside of the now loaded binary starting from main. Any following calls or symbol look ups from the directly called function should resolve correctly because the main binary is loaded into memory!
Defining a function prototype and then calling a function seemed to work. I can pull a function address out of a binary and jump to it with arbitrary arguments and the compiler abi will place to arguments into the runtime correctly to call the function. :
/* Our fake main() that gets called by __libc_start_main() */
int main_hook(int argc, char **argv, char **envp)
{
char user_buf[512] = {"\x00"};
read(0, user_buf, 512);
int (*do_thing_ptr)() = 0x401f30;
int ret_val = (*do_thing_ptr)(user_buf, 0, 0);
printf("Ret val %d\n",ret_val);
return 0;
}
This process is very manual and slow... Let's speed it up!
Setting up
The extracted firmware executables are all mips little endian based and are interpreted through uClibc.
DockCross does not support uClibc cross compiling yet so I needed to build my own cross compilers. Using buildroot I created a uClibc cross compiler for my Ubuntu 18.04 machine. To save time in the future I've posted this toolchain and a couple others online here. This toolchain enables quick cross compiling of our LD_PRELOADed libraries.
The target is the asusdiscovery service. There has already been a CVE for it and it proves to be hard to fuzz manually. The discovery service periodically sends packets out across the network, scanning for other ASUS routers. When another ASUS router sees this discover packet, it responds with it's information and the discovery service parses it.
These response-based network services can be hard to fuzz through traditional network fuzzing tools like BooFuzz. So we're going to find where it parses the response and fuzz that logic directly with our new-found LD_PRELOAD tricks.
Pulling symbol information from this binary yields a quick tell to which function does the parsing ParseASUSDiscoveryPackage:
The function appears to be instantiating a 512 byte buffer and reading from a given network file descriptor through the recvfrom function. A quick visit to recvfrom's manpage reveals that the second argument going into recvfrom will contain the network input, the input we can control.
RECV(2) Linux Programmer's Manual RECV(2)NAME recv, recvfrom, recvmsg - receive a message from a socketSYNOPSIS #include <sys/types.h> #include <sys/socket.h> ssize_t recv(int sockfd, void *buf, size_t len, int flags); ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);
This user input is immediately passed to the PROCESS_UNPACK_GET_INFO function. This function in responsible for parsing the user input and relaying that information to the router.
Opening the function in ghidra reveals a large parsing function. This looks perfect for fuzzing!
aa
The next step is interacting with the function and providing input into that first argument. The first step towards running this as an independent function is recovering the function prototype. Ghidra shows the defined function prototype as below.
Similarly to the PoC of the LD_PRELOAD main hook shown above, I needed to hook the main function. For uClibc that function is __uClibc_main. Using the same trick as above, we'll define a function prototype for the function we want to call, then hook uClibc's main function and then jump directly to the function we want to call with our arguments.
To make this process easier, I created a tool to identify function prototypes and slot them into templated c code. The current iteration of stub-builder will accept a file and a given function to instrument. The tool is imperfect and will use radare2 to identify (often wrongly) function prototypes and place them into the c stub.
An example for the command can be seen below. The stub builder uses radare2 for it's function recovery and fails to identify the first argument as a char* so we need to fixup the main_hook.c.
$stub_builder-Fusr/sbin/asusdiscoveryrecovernamePROCESS_UNPACK_GET_INFO[+] Modify main_hook.c to call instrumented function[+] Compile with "gcc main_hook.c -o main_hook.so -fPIC -shared -ldl"[+] Hook with: LD_PRELOAD=./main_hook.so./usr/sbin/asusdiscovery[+] Created main_hook.c
Hardcoded values can be inserted instead. The below command supplies the address, argument prototype and the expected return type:
#define_GNU_SOURCE#include<stdio.h>#include<dlfcn.h>//gcc main_hook.c -o main_hook.so -fPIC -shared -ldl/* Trampoline for the real main() */staticint (*main_orig)(int,char**,char**);/* Our fake main() that gets called by __libc_start_main() */intmain_hook(int argc,char**argv,char**envp){//<arg declarations here>char user_buf[512] = {"\x00"};//scanf("%512s", user_buf);read(0, user_buf,512);int (*do_thing_ptr)(char*,int,int) =0x401f30;int ret_val = (*do_thing_ptr)(user_buf,0,0);printf("Ret val %d\n",ret_val);return0;}//uClibc_main/* * Wrapper for __libc_start_main() that replaces the real main * function with our hooked version. */int__uClibc_main(int (*main)(int,char**,char**),int argc,char**argv,int (*init)(int,char**,char**),void (*fini)(void),void (*rtld_fini)(void),void*stack_end){ /* Save the real main function address */ main_orig = main; /* Find the real __libc_start_main()... */typeof(&__uClibc_main) orig =dlsym(RTLD_NEXT,"__uClibc_main"); /* ... and call it with our custom main function */returnorig(main_hook, argc, argv, init, fini, rtld_fini, stack_end);}
The code above will accept input from STDIN and pass it into the parsing function directly. This enable us to test and get return values of the functions without any networking compoonents required.
Running the code
Cross compiling the shared object using the provided cross compilers is shown below. The resulting file will be named main_hook.so
We now have the binary running and it's accepting our input and passing it directly to the function. The next stage is generating a set of valid input data to seed our fuzzer with.
Generating valid input for a test corpus
Sending in random strings of "A"s will not yield new discovered paths through the parsing function. Looking at the function decompilation we can see there is a quick check performed in a funciton titled UnpackGetInfo_NEW . This is the first function we need to look at, to determine if there are any early exits from initial parses.
This function first checks for a set of magic bytes before continueing. It's looking for "\x0c\x16\x00\x1f" to be the first bytes in network input. Without these magic bytes it will exit early and indicate through it's return code to discard the input.
Supplying this magic value immediatly returns a different result when running the binary:
$ python2 -c 'print "\x0c\x16\x1f\x00" + "A"*100' | qemu-mipsel -L . -E LD_PRELOAD=/lib/libdl.so.0:/main_hook.so ./usr/sbin/asusdiscovery
Ret val 1
The function returns more than just a single return value based on the parse or unpack. There appears to be checks on lines 12, 15, 32, 33 and returns a result based on the input on line 50.
This is a perfect time to breakout angr to create a valid input to hit line 50! The following code will create a 300 byte symbolic buffer and have angr solve the constraints required to pass each check in the unpacking function to yield all potential return results. We are intersted in the analysis path that reached the furthest part of the parsing function. The script below will print out each path end address and the required input to reach that path.
import angrimport angr.sim_options as soimport claripysymbol ="UnpackGetInfo_NEW"# Create a project with history trackingp = angr.Project('/home/caffix/firmware/asus/RT-AC51U/ext_fw/squashfs-root/usr/sbin/asusdiscovery')extras ={so.REVERSE_MEMORY_NAME_MAP, so.TRACK_ACTION_HISTORY}# User input will be 300 symbolic bytesuser_arg = claripy.BVS("user_arg", 300*8)# State starts at function addressstart_addr = p.loader.find_symbol(symbol).rebased_addrstate = p.factory.blank_state(addr=start_addr, add_options=extras)# Store symbolic user_input bufferstate.memory.store(0x100000, user_arg)state.regs.a0 =0x100000# Run to exhaustionsimgr = p.factory.simgr(state)simgr.explore()# Print each path and the inputs requiredfor path in simgr.unconstrained:print("{} : {}".format(path,hex([x for x in path.history.bbl_addrs][-1]))) u_input = path.solver.eval(user_arg, cast_to=bytes)print(u_input)
One of the outputs is shown below, and this input can then be sent back into the program through the above qemu command to validate that it passes the checks.
<SimState @ <BV32 reg_ra_51_32{UNINITIALIZED}>> : 0x401c4c
b'\x0c\x16\x1f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x82\x80\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
### Running the input
$ printf '\x0c\x16\x1f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x82\x80\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' | qemu-mipsel -L . -E LD_PRELOAD=/lib/libdl.so.0:/main_hook.so ./usr/sbin/asusdiscovery
Ret val 1
I've put each of these inputs into individual files for AFL to read from later.
$ ls afl_input/
test_case1 test_case2 test_case3 test_case4 test_case5
Fuzzing the function
Using the AFL build process outlined here will provide AFL with qemu mode which will fuzz asusdiscovery with the script:
You will get some incredibly slow fuzzing at about 1-2 execution per second. The afl fork server is taking way to long to spawn off newly forked processes.
Adding the AFL_NO_FORKSRV=1 will prevent AFL from creating a forkserver just before main and forking off new processes. For this type of hooking and emulation it runs much faster at about 85 executions per second:
We can do better... Specifically we can use Abiondo's fork of AFL that he describes his blog post here. Abiondo implemented an idea for QEMU that is quoted at speeding up the qemu emulation speed on a scale of 3 to 4 times. That should put us at 300 or 400 executions per second.
My idea was to move the instrumentation into the translated code by injecting a snippet of TCG IR at the beginning of every TB. This way, the instrumentation becomes part of the emulated program, so we don’t need to go back into the emulator at every block, and we can re-enable chaining.
Downloading and running the fork of AFL follows the exact same build process:
Rerunning the previous fuzzing command script WITHOUT the AFL_NO_FORKSRV environment variable produces some absolutely insane results:
Final fuzzing results
After about 24 hours of fuzzing, hardly any new paths were discovered. Doing some more static analysis on the parsing functions revealed very few spots in the functions for any potentially dangerous user input to corrupt anything.
Over the course of using the LD_PRELOAD trick paired with jumping directly to a function I wanted to fuzz, I was able to save tons of time inside of GDB trying to see what code paths were valid. By using Abiondo's fork of AFL I was able to get execution times on par with AFL compiling code speeds. Getting thousands of executions per second doesn't generally happen when fuzzing applications in AFL's QEMU mode and I was happy to see 2000 plus executions per second.