Thursday, May 21, 2015

Broken, Abandoned, and Forgotten Code, Part 5

In previous installments I shared proof-of-concept code that would exercise the Netgear R6200's hidden (and badly broken) SetFirmware SOAP action. It satisfied the various wonky conditions necessary to get into the sa_parseRcvCmd() function. Then I showed where in that function a firmware would be decoded from the SOAP request and written to flash. I showed how to identify a code path that leads to firmware writing. In part four, I showed how an undersized malloc() means a stock firmware crashes upnpd. Although we'll work around that bug later, for this and the next several installments we'll be working out how the firmware image gets parsed so we can create our own.

Updated Exploit Code

I last updated the exploit code for part 3, in which I showed how to form the complete SOAP request. In this part, I've added several Python modules to aid in reverse engineering and reconstructing a firmware image. If you've previously cloned the repository, now would be a good time to do a pull. You can clone the git repo from:

Analyzing httpd

We know that the code path in upnpd that accepts a firmware and writes it to flash memory is severely broken. When given a legitimate firmware obtained from Netgear, it crashes. In order to reverse engineer the firmware format, it may be easier to analyze a program that is known to work properly when upgrading: the web interface.

In the next several posts I'll describe analysis of the embedded HTTP daemon to understand how it processes a firmware image file. I'll also describe how to use the Bowcaster exploit development framework to aid in dynamic analysis and to develop an understanding of the firmware header composition. The goal is to generate a firmware image out of an existing filesystem and kernel. Bonus points if we can either create a firmware image that is identical to the original or if we can explain what the differences are and why those differences don't get in the way.

You can debug the web server by copying GDB to the physical R6200 router, or you can debug the embedded httpd in emulation. The first option requires less up-front effort, but the second option is more convenient once you have it working. Running upnpd and httpd in emulation requires faking some hardware and some binary patching. Before proceeding, you may want to read my previous posts on debugging with QEMU and IDA Pro and on patching, emulating and debugging using IDA Pro (which specifically addresses httpd). If you're playing along at home, I strongly recommend getting the web server and the UPnP daemon up and running in QEMU and debugging them with IDA Pro. During the next several posts, there will be a few aspects I don't explain in depth. These these things will be relatively straightforward if you have your working environment set up like mine.

Firmware Composition

Before we actually upload a firmware to the web interface, let's first see how a firmware image file is composed, and identify any sections that are already understood and don't need reverse engineering.

A good starting point is Craig Heffner's binwalk.


zach@devaron:~/code/wifi-reversing/netgear/r6200 (0) $ binwalk R6200-V1.0.0.28_1.0.24.chk

DECIMAL    HEX        DESCRIPTION
-------------------------------------------------------------------------------------------------------------------
58         0x3A       TRX firmware header, little endian, header size: 28 bytes, image size: 8851456 bytes, CRC32: 0xEE839C0 flags: 0x0, version: 1
86         0x56       LZMA compressed data, properties: 0x5D, dictionary size: 65536 bytes, uncompressed size: 3920006 bytes
1328446    0x14453E   Squashfs filesystem, little endian, non-standard signature,  version 3.0, size: 7517734 bytes,  853 inodes, blocksize: 65536 bytes, created: Wed Sep 19 19:27:19 2012

Binwalk identifies three sections: A TRX header at offset 58, an LZMA section at offset 86, and a Squashfs filesystem at offset 1328446. The TRX header is well understood. It's a firmware header format that dates back to at least the venerable Linksys WRT54g.

Here's a diagram (courtesy of the OpenWRT wiki) of the TRX header's format:

0                   1                   2                   3   
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
 +---------------------------------------------------------------+
 |                     magic number ('HDR0')                     |
 +---------------------------------------------------------------+
 |                  length (header size + data)                  |
 +---------------+---------------+-------------------------------+
 |                       32-bit CRC value                        |
 +---------------+---------------+-------------------------------+
 |           TRX flags           |          TRX version          |
 +-------------------------------+-------------------------------+
 |                      Partition offset[0]                      |
 +---------------------------------------------------------------+
 |                      Partition offset[1]                      |
 +---------------------------------------------------------------+
 |                      Partition offset[2]                      |
 +---------------------------------------------------------------+


There's no need for analysis here. In the part_5 directory in the git repo, I've provided a module that generates a TRX header.

We also don't need to analyze the Squashfs filesystem. At least not yet. Although there are many variations of Squashfs, there are also a lot of tools that will generate Squashfs images. We'll investigate more closely later, but for now, this is a known quantity.

When there is only one LZMA section, and it's near the beginning of an image--after the TRX header and before the filesystem--that is often the compressed Linux kernel. That's easy to verify. Extract out that section and decompress it to see if it's a Linux kernel.


zach@devaron:~/code/wifi-reversing/netgear/r6200 (130) $ binwalk R6200-V1.0.0.28_1.0.24.chk
DECIMAL    HEX        DESCRIPTION
-------------------------------------------------------------------------------------------------------------------
58         0x3A       TRX firmware header, little endian, header size: 28 bytes, image size: 8851456 bytes, CRC32: 0xEE839C0 flags: 0x0, version: 1
86         0x56       LZMA compressed data, properties: 0x5D, dictionary size: 65536 bytes, uncompressed size: 3920006 bytes
1328446    0x14453E   Squashfs filesystem, little endian, non-standard signature,  version 3.0, size: 7517734 bytes,  853 inodes, blocksize: 65536 bytes, created: Wed Sep 19 19:27:19 2012

zach@devaron:~/code/wifi-reversing/netgear/r6200 (0) $ dd if=R6200-V1.0.0.28_1.0.24.chk skip=86 count=`expr 1328446 - 86` bs=1 of=kernel.7z
1328360+0 records in
1328360+0 records out
1328360 bytes (1.3 MB) copied, 0.953731 s, 1.4 MB/s
zach@devaron:~/code/wifi-reversing/netgear/r6200 (0) $ p7zip -d kernel.7z

7-Zip (A) [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,4 CPUs)

Processing archive: kernel.7z

Extracting  kernel

Everything is Ok

Size:       3920006
Compressed: 1328360
zach@devaron:~/code/wifi-reversing/netgear/r6200 (0) $ strings kernel | grep Linux
Linux version 2.6.22 (peter@localhost.localdomain) (gcc version 4.2.3) #213 PREEMPT Thu Sep 20 10:22:07 CST 2012

So we have the TRX header, compressed Linux kernel, and the squashfs filesystem. The TRX header starts at offset 58, leaving only 58 bytes of unidentified data. Not bad! What are the chances that this 58-byte header is just a haiku about a man from Nantucket?

It's possible this header is documented somewhere, but if so, I'm not aware of it. Even if it is, it's worth going to the trouble of reversing it. Doing so is instructional. It also exposes interesting bugs in the HTTP and UPnP daemons.

Part 5's example code takes advantage of a project I created, called Bowcaster. Bowcaster has a class called OverflowBuffer that generates a pattern string for debugging buffer overflows. It also gives you the ability to replace sections of that string with things like ROP gadgets, fixed strings, and other data types. The pattern string Bowcaster generates for you looks like:

Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8A


In the pattern string, no sequence of three or more characters is ever repeated. OverflowBuffer provides a find_offset() method. This makes it easy to identify at what offset a given value seen in a register or in memory during a debugging session is found.

Even though we're not debugging a buffer overflow, the OverflowBuffer class is still useful. As we identify each field and what value it should contain, it's easy to plug in those values at the right offsets as if they are ROP gadgets.

The following code fragment, taken from part 5's exploit code, uses Bowcaster to generate a stand-in for the header:


from bowcaster.development import OverflowBuffer
from bowcaster.development import SectionCreator

class MysteryHeader(object):
    def __init__(self,endianness,size):
        SC=SectionCreator(endianness,logger=logger)
        self.header=OverflowBuffer(endianness,size,
                            overflow_sections=SC.section_list,
                            logger=logger)


The stand-in header is shown below:

fake header in memory
Above we see Bowcaster's pattern string in memory just prior to the TRX header.

The first parsing of this header takes place in the function abCheckBoardID(), called by http_d(). In this function the first header field that is inspected is a strcmp() between the string "*#$^" and the firmware data starting at offset 0.
check magic signature


This appears to be a magic number or signature. Adding it to our Python header class:

from bowcaster.development import OverflowBuffer
from bowcaster.development import SectionCreator

class MysteryHeader(object):
    MAGIC="*#$^\x00"
    MAGIC_OFF=0
    def __init__(self,endianness,size):
        SC=SectionCreator(endianness,logger=logger)
        #add the magic signature "*#$^"
        SC.string_section(self.MAGIC_OFF,self.MAGIC,
                            description="Magic bytes for header.")
                            
        self.header=OverflowBuffer(endianness,size,
                            overflow_sections=SC.section_list,
                            logger=logger)


If the firmware doesn't have this signature, no other parsing takes place. Also, note that the signature string must be null terminated since the comparison is performed using a strcmp().

The next few things worth pointing out involve what appears to be a size field right after the signature string. Here's a look at a hex dump of our generated firmware header:
hex dump of firmware


Below we see a memcpy() at address 0x0041C550 that uses the size field highlighted in the above hex dump:
memcpy header size


There are a few things worth calling out here. First is the byte order. This is a little endian system, so we would expect to see 0x61413100 in register $s0. The byte order in the register matching the byte order on disk means this data is interpreted as big endian. A couple of basic blocks prior to the location of the memcpy() are where the byte-swapping occurs to convert this big endian value to little endian. This is the first sign that the 58-byte leading header should be big endian even though the rest of the file, and indeed the target hardware itself, is little endian.

Another thing; the null terminator of the "*#$^" string overlaps with the high byte of the size field. It is serendipitous that the size field is big endian encoded and its value is small enough to have a leading zero (the stock firmware's size field contains 0x0000003a). This appears to be an innocuous bug. Instead of a strcmp() to check the signature string, a memcmp() or an integer comparison should have been used.

But wait, there's more! If you haven't guessed already, this is a buffer overflow. It would be a really nice one, too, except that it requires authentication. I won't discuss it in detail here, because we'll see an identical one when we circle back to upnpd. But if you're playing along at home, feel free check it out. Exploitation is straightforward.

The last thing worth noting is the OverflowBuffer class's find_offset() method. The value found in register $s0 is a combination of a null terminator plus three characters of the pattern sequence: "\x001Aa". We can use find_offset() to figure out where in the header this value came from:


zach@devaron:~/code/broken_abandoned/part_5 (0) $ ./buildfw.py find=0x00314161 kernel.lzma squashfs.bin
 [@] Building firmware from input files: ['kernel.lzma', 'squashfs.bin']
 [@] TRX crc32: 0x0ee839c0
 [@] Creating ambit header.
 [@] Finding offset of 0x00314161
 [+] Offset: 4

It's easy to encode the size value into the header using Bowcaster:

#observed size in real-world examples.
#this may be variable
HEADER_SIZE=58
HEADER_SIZE_OFF=4

SC.gadget_section(self.HEADER_SIZE_OFF,self.size,"Size field representing length of ambit header.")

In the next part, I'll continue discussing the abCheckBoardID() function. I'll also discuss a checksum function whose algorithm is difficult to identify and how we deal with that. Then I'll discuss what other functions also are responsible for inspecting and parsing the firmware header.

Thursday, May 14, 2015

Broken, Abandoned, and Forgotten Code, Part 4

In the last post, I described how upnpd's sa_parseRcvCmd() function finds the body of a SOAP request and how it parses that SOAP request. This is a large and complicated function that processes many types of SOAP requests. I demonstrated how to work out the desired path of execution to decode and write firmware. At the end I made an educated guess as to how the SOAP request should be formed, and how the firmware should be represented in the request body.

In this post, we'll start with some prototype code that will exercise the portions of upnpd we have analyzed so far. It satisfies the conditions that I described in parts 1, 2, and 3. Including:

  • The necessary timing games described in parts 1 and 2
  • The minimum Content-Length described in Part 1
  • The HTTP headers I described in Part 2
  • The SOAP request body I described in part 3

PoC Exploit Code

In the previous installment, I updated the git repository with working exploit code that satisfies the above conditions. There is no update to the code for Part 4; the previous part's code is sufficient for now. You can clone the repo from:
https://github.com/zcutlip/broken_abandoned

Emulation and Debugging

Strictly speaking, you don't need to debug upnpd for this installment, although it may help a little. In the next several installations of this series, however, it is assumed that readers who are following along will be emulating and debugging the target processes. If you are following along, it's worth checking out my post on remote debugging with QEMU and IDA Pro. In that article, I walk you through running upnpd in emulation and attaching IDA Pro for debugging.

The exploit code from Part 3 allows you to specify an optional file as a command line argument to encode into the request. If you don't provide an input file, then the entire firmware data will consist of a string "A"s. This is a good starting point as a long string of "A"s is easy to identify in a debugger's memory trace.

soap request in memory
Debugger memory trace showing the SOAP request just before base64 decoding.


Crash! Hopes and Dreams Wrecked

When I got to this point in my analysis, naturally the first thing I did was encode a legitimate firmware file into the request, in hopes the firmware would be successfully written. Surprise. This was not successful. The upnpd deamon crashed processing the request. It was at this point in the summer of 2013 that I chucked my laptop into the river and seriously considered a career change.

Don't chuck your laptop into the river. Instead, let's figure out why the program crashes when given a legitimate firmware. I encoded a legitimate firmware file (obtained from Netgear's support website) into the SOAP request. When I sent that request to the UPnP daemon, the daemon crashed in sa_base64_decode(). My initial assumption was that this was a non-standard, possibly buggy, base64 decoder. I spent some time reversing the base64 decoding function. There was no obvious problem with it. Laptops were chucked.

It turns out, the problem isn't with the base64 decoder, but something more obvious. The problem is with the buffer that the firmware gets decoded into.

undersized malloc
Allocate a 4MB buffer for decoding

In the above screenshot, we see memory being allocated. The resulting buffer is used to hold the base64 decoded firmware. Note the instruction right after the jump to malloc() (On MIPS, the instruction right after a jump gets executed at the same time as the jump):
lui    $a0, 0x40
For those less familiar with MIPS assembly, the lui instruction means "load upper immediate." This will load 0x40 into the upper half of the $a0 register. That means $a0 will contain 0x400000, or 4194304 in decimal. By convention, the $a0 register contains the first argument to a function, in this case malloc(), resulting in a 4MB[1] buffer to decode the firmware into. The size of a typical firmware image for this device is over 8MB:

zach@devaron:~/code/wifi-reversing/netgear/r6200 (0) $ ls -l R6200-V1.0.0.28_1.0.24.chk
-rw-r--r-- 1 zach zach 8851514 Jan 27  2014 R6200-V1.0.0.28_1.0.24.chk

In fact it's closer to 9MB. This is what crashes the program. It's unclear why the decoding isn't done in place or why the distance between the opening and closing <NewFirmware> tags, which is calculated right before this operation, isn't used to allocate the buffer.

distance between open & closing tags


In any case, this is the surest sign yet that the SetFirmware SOAP action isn't completely implemented, and likely never actually worked in production firmware. If we're going to exercise this functionality without crashing the program, it will be necessary to generate a replacement firmware image than is dramatically smaller[2] than the stock firmware. While possible, this is a non-trivial effort and will come with severe limitations. I'll discuss shrinking the firmware in a later post.

Before spending time on making a smaller firmware, we have a few other things to work through. We need to work out (1) what sort of validation, if any, is done on the decoded firmware, and (2) how to satisfy that validation. Further, there may be additional bugs in upnpd preventing a firmware from being written to flash memory. If so, there will be no point in figuring out how to shrink the firmware.

We'll start reverse engineering the firmware format in the next post.

-----------------------------
[1] Technically this should be 4MiB, but in order to write that you have to say "mebibytes," which is dumb. If you hear anyone saying "mebibyte" in public, you should punch them in the face. So I'm kicking it old-school with "MB."

[2] This is the first of two potential buffer overflows that I am aware of in the firmware processing code. Some may see an opportunity here to exploit a heap-based buffer overflow. The approach I went with was to shrink the firmware to avoid crashing upnpd.

Thursday, May 07, 2015

Broken, Abandoned, and Forgotten Code, Part 3

In the previous posts, I talked about the hidden "SetFirmware" SOAP action in the Netgear R6200's UPnP daemon, and the weird timing games we have play to deal with UPnP daemon's broken networking code. I also discussed the haphazard parsing of the HTTP headers across multiple functions. I made a guess at what headers might get our SetFirmware SOAP request passed to the sa_parseRcvCmd() function where hopefully an encapsulated firmware image will be decoded.

In this post I'll discuss how the sa_parseRcvCmd() function actually parses, or attempts to parse, the SOAP request body.

Updated Exploit Code

Previously, I published a git repository containing proof-of-concept code that demonstrates what I discussed in part 2. The repository has been updated for part 3, so if you've cloned it, now is good time to do a pull. The new code will generate the complete SetFirmware SOAP request to flash an updated firmware to the router. You can get the repo here:

Parsing the SOAP Request Body

The sa_parseRcvCmd() function is large and difficult to describe. Attempting to reverse engineer the entire function would be tiresome.

graph view of sa_parseRecvCmd
Graph view of the sa_parseRcvCmd function
The above figure is a bird's eye view of this function. To give some perspective, the following figure is the first basic block, which includes the function prologue that sets up a long list of local variables in addition to the first bit of parsing of the SOAP request body.

prologue of sa_parseRcvCmd
Check out all those local variables.


Rather than try to understand the entire function, an easier approach is to decide where in the function we want execution to reach and work backwards from there. This way offers a better chance of finding out if the desired code is reachable, and if it is, what paths will lead there.

If we spend some time browsing the disassembly, we start to see what appears to be a group of blocks responsible for decoding the firmware from the SOAP request body and writing it to flash memory.

annotated firmare write


Looking even closer, we can identify the actual block where the firmware is written to flash.

write firmware to flash


It's easy to guess that this block writes to flash memory based on the blocks that lead up to it (an earlier block opens /dev/mtd1 for writing) as well as the error string that will be printed if the write fails. This block at 0x0042466C is our goal and the path that leads to it is how we must get there.

Working backwards, we come to a block at 0x00423C38 that appears, based on symbols and error strings, to base64 decode the firmware image.

base64 decoding firmware


From this we can guess that the firmware image should be base64 encoded into the SOAP request body. We might also guess that the sa_CheckBoardID() function in the above figure performs some sort of parsing of the decoded firmware. Once we've worked out the code path that gets to this block, we'll start working forwards again and spend some time investigating this function.

Working backwards even further, we find a cluster of blocks with many outbound paths. One of these paths (the block at 0x004238C8) leads to the base64 decoding section. This part of the function is particularly tortured, so here's the summary. This cluster appears to be a part of a large loop. On each pass through the loop, a variable is checked against a number of constants. Each comparison, if a match, results in a branch to a different path of execution. The constant that leads to the base64 decoding operation is 0xFF3A. While not actionable at the moment, this is worth noting.

Check for 0xFF3A
Looking for several constants. 0xFF3A leads to firmware decoding.


From there we can go backwards a little further and reach the function prologue, discussed earlier. With a general idea of the path that is required to get the firmware decoded and written, we can start working forwards again. We now have a better idea of what code paths to focus on and what ones can be ignored.

It is at the start of sa_parseRcvCmd() where we find the first hints at how the actual body of the SOAP request should be structured. At the very beginning of this function, a substring search for ":Body>" is performed. This would find the canonical <SOAP-ENV:Body> XML tag that surrounds a SOAP message body. It would also find the non-canonical <HOLY-SHIT-THIS-CODE-IS-SHITTY:Body> XML tag. So, you know, whatever.

Search for Body xml tag
Naive string search for ":Body>".


Once the body is located, the function loops over a table of strings, called s_keyword. This is the loop described earlier that checks for a series of constants on each iteration. The s_keyword table is an array of structs that are formed approximately like the following:


struct k_struct
{
    uint32_t action;
    char *keyword;
    uint32_t what_the_shit_is_this;
};

For each of these structures, the request body is searched for an opening and closing XML tag constructed from the corresponding keyword. If a tag is found then the keyword's corresponding action code is checked to determine the code path to take.

s_keword loop

Searching Haystack for s_keyword strings
Perform a strstr() for the first string in the s_keyword table.

Inspecting the s_keyword table reveals the keyword that corresponds to the magic 0xFF3A action code: "NewFirmware".

NewFirmware in s_keyword table


If a <NewFirmware> tag is found inside the soap body tag, then execution proceeds to allocate memory for the decoded firmware, and then on to writing it to flash memory as discussed above.

In the previous part, I made a guess at what HTTP headers would get the request into the sa_parseRecvCmd function. At this point we now have enough information to speculate as to how the body of the SOAP request should be formed.


POST /soap/server_sa/SetFirmware HTTP/1.1
Accept-Encoding: identity
Content-Length: 102401
Soapaction: "urn:DeviceConfig"
Host: 127.0.0.1
User-Agent: Python-urllib/2.7
Connection: close
Content-Type: text/xml ;charset="utf-8"

<SOAP-ENV:Body>
    <NewFirmware>
        <!-- Base64-encoded firmware image goes here? -->
    </NewFirmware>
</Body>


If this guess is right, the function first looks for the opening Body tag. Then it looks for one of a variety of inner tags, NewFirmware being the one we're interested in. And inside that, hopefully, it will find our base64 encoded firmware image and will decode and write it to flash. Are we almost home free? Stay tuned.

Thursday, April 30, 2015

Broken, Abandoned, and Forgotten Code, Part 2

In the part 1, I showed how the Netgear R6200's upnpd binary contains what appears to be a hidden SOAP action related to the string "SetFirmware". I also showed how we can get into the upnp_receive_firmware_packets() function if we play timing games and send our request in multiple parts.

In this part I'll describe additional timing considerations needed to avoid hanging the server. I'll also discuss sloppy parsing of the SOAP request, and I'll make some guesses as to how that request should be formed.

If you're following along, the first proof-of-concept code is available. Clone my git repo from:
https://github.com/zcutlip/broken_abandoned

Each installment in this series that has new or updated code will have a separate directory in the repository. This week's code is under part_2.

Receiving Firmware Bytes

The conditions I described previously are:
  • The request should be broken up into two or more parts, with the first being no larger than 8,190 bytes.
  • "Content-length:" should be somewhere in the data, presumably in the HTTP headers (because this would make sense), but not necessarily.
  • The content length should be greater than 102,401 bytes.
  • The string "SetFirmware" should be somewhere in the data.
If those conditions are satisfied, then upnp_receive_firmware_packets() gets called from upnp_main() at 0x4144E4. In this function, a select(), recv(), and memcpy() loop receives the remainder of the request. This proceeds fairly sanely, with one problem.

upnp receive firmware select loop
The select() and recv() loop doesn't check for closed connections

If the client closes the connection immediately after sending the request, this function gets caught in an infinite loop. The cause for this is a little tricky to explain.

From the select(2) Linux man page:
A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.

If the peer has closed its end of the connection, then select() indicates the socket is ready because a recv() would not block. The way Unix TCP sockets work, when the remote end of a connection closes, a recv() on that socket returns zero. In the loop, the return value from recv() is checked for errors (negative values), but if there are no errors, it is assumed that data was received, and the loop returns to select(). This results in the function looping indefinitely if the client shuts down the connection too soon.

The only two ways this loop ever terminates are (a) if select() or recv() return an error, or (b) if select() returns zero, indicating a timeout with no file descriptors ready for I/O. This means the requesting client must not close the connection immediately after it has sent the request. It should send the request, and then pause before closing the connection. Sleeping a few seconds should suffice.

However, there's an additional implication. Recall from before that we had to sleep 1-2 seconds in upnp_main() in order to get into this function. It turns out that if we slept longer, then the select() would time out, returning zero, and the loop would end before we had sent the rest of the request. So, while it's critical to sleep a second or two, it's also critical to sleep no more than that.

In review, the steps should be:

  • Send 8,190 bytes or fewer, but hold the connection open
  • Sleep 1-2 seconds, but no more
  • Send the rest of the request, but hold the connection open
  • Sleep a few more seconds
  • Close the connection


The following code fragment sends chunks with appropriate sizes and sleep periods to get us into upnp_receive_firmware_packets() and to avoid getting into an infinite loop with select():


def special_upnp_send(addr,port,data):
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.connect((addr,port))
    #only send first 8190 bytes of request
    sock.send(data[:8190]) 
    #sleep to ensure first recv()
    #only gets this first chunk.
    time.sleep(2) 
    #Hopefully in upnp_receiv_firmware_packets()
    #by now, so we can send the rest.
    sock.send(data[8190:])  
    #Sleep a bit more so server doesn't end up
    #in an infinite select() loop.
    #Select's timeout is set to 1 sec,
    #so we need to give enough time
    #for the loop to go back to select,
    #and for the timeout to happen,
    #returning an error.
    time.sleep(10)
    sock.close()

More Broken and Lazy Parsing

Once the entire request has been received, it is parsed, or "parsed" as it were, piecemeal, across several functions. The upnp_receive_firmware_packets() function calls sub_4134A8(). This function inspects the beginning of the received request (the first 1023 bytes, to be precise) for for the HTTP method. If the request is a POST, the soap_method_check() function is called at 0x413774.

Check for POST HTTP method
Checking for the POST HTTP method


Call to soap_method_check
Calling soap_method_check()


In soap_method_check() several naive stristr() calls search for a series of strings across the entire request buffer. Based on several of the more recognizable strings, such as "Public_UPNP_C1", these strings are UPnP control URLs that might be requested by the POST. Although these strings may be placed literally anywhere (starting to sound familiar?) in the request and still trigger their respective code paths, presumably a typical request would be structured like so:

POST /Public_UPNP_C1 HTTP/1.1

One of the control URLs that is checked is "soap/server_sa". If that URL is found in the request, the function sa_method_check() is called. Note that we still don't know for certain where the UPnP daemon actually expects the "SetFirmware" string to be located. However, based on other, similar string references, it seems likely that this string should be part of the UPnP control URL: "soap/server_sa/SetFirmware".

call to sa_method_check
A call to sa_method_check if "soap/server_sa" is found
The sa_method_check() function loops over a list of valid strings corresponding to the "SOAPAction:" header, and for each string in the list performs a naive stristr() across the entire request buffer. The string "DeviceConfig", if found anywhere in the request, results in a call to sub_43292C(). This enormous function repeatedly calls sa_findKeyword(), passing it the request buffer as well as various keys to be looked up in the "s_Event" dictionary.

graph view of sub_43292C
The enormous graph of sub_43292c(). This function looks for keywords in the SOAP request.


The sa_findKeyword() function searches the request buffer for the corresponding string from the "s_Event" dictionary. The original "SetFirmware" string is referenced by the key 49. If it is found, again, anywhere in the request, the function sa_parseRcvCmd() is called.

search for SetFirmware string
Repeated calls of sa_findKeyword(). Index 49 corresponds to "SetFirmware."


The following HTTP request headers should, based on what we have observed so far, get the request into the sa_parseRcvCmd() function.


request="".join["POST /soap/server_sa/SetFirmware HTTP/1.1\r\n",
                 "Accept-Encoding: identity\r\n",
                 "Content-Length: 102401\r\n",
                 "Soapaction: \"urn:DeviceConfig\"\r\n",
                 "Host: 127.0.0.1\r\n",
                 "Connection: close\r\n",
                 "Content-Type: text/xml ;charset=\"utf-8\"\r\n\r\n"]


Forming an HTTP request that would exercise the proper code path was an exercise in guesswork due to the many naive string searches littered along the way and an absence of anything resembling structured parsing.

It is in the sa_parseRcvCmd() function that an encoded firmware image is extracted and decoded from the request body, and assuming the right conditions are met, written to the router's flash storage, replacing the existing firmware.

Up until now, it has remained at least possible, however improbable, that the vendor may have designed a client to send the magic SOAP requests and to play the timing games necessary to exercise the firmware updating functionality. In the next part I'll start discussing sa_parseRcvCmd(),  a complicated function with lots of code paths and lots of bugs. It is also this function where it becomes even clearer that the firmware updating capability of this UPnP server is not completely implemented and cannot actually work under normal conditions.

Thursday, April 23, 2015

Broken, Abandoned, and Forgotten Code, Part 1

Introduction

This series of posts describes how abandoned, partially implemented functionality can be exploited to gain complete, persistent control of Netgear wireless routers. I'll describe a hidden SOAP method in the UPnP stack that, at first glance, appeared to allow unauthenticated remote firmware upload to the router. After some reverse engineering, it became apparent this functionality was never fully implemented, and could never work properly given a well formed SOAP request and firmware image. If it could work at all, it would be with only the most contrived of inputs.

Someone may have thought shipping dead code was okay because an exploit scenario would be so contrived. Someone may not have considered that contrived inputs are the stock-in-trade of vulnerability researchers.

In this series, I'll describe the process of specially crafting a malicious firmware image and a SOAP request in order to route around the many artifacts of incomplete implementation in order to gain persistent control of the router. I'll discuss reverse engineering the proper firmware header format, as well as the the improper one that will work with the broken code. Together, we'll go from discovery to complete, persistent compromise.

Rules of Engagement

In order to make the challenge more interesting and to more clearly demonstrate the thesis, I decided to not take advantage of any shortcuts by exploiting vulnerabilities in the broken code path. I treated all bugs I encountered along the way as hurdles to overcome. For example, there is a buffer overflow that I will describe in a future post. I could exploit this buffer overflow to subvert the flow of execution and execute shellcode that would write my firmware, but that would be cheating. The point of this project is to show that dead code can represent a powerful attack vector, even when it is non-functional.

Target Device

The device I'll be describing in this series is the Netgear R6200 802.11ac router. Here are some specifics about the router:
  • Linux based
  • Little endian MIPS
  • Firmware version 1.0.0.28
  • Originally released in 2012
  • US$200 retail price when released
I only worked with the 1.0.0.28 firmware, which I believe is the original released version. I haven't looked into later versions. That will remain an exercise for the reader. I will add that as recently as January 2015, I ordered an R6200 from Amazon, and it came with firmware 1.0.0.28 installed.

I <3 UPnP

Universal Plug and Play services on SOHO routers make for a nice attack surface for vulnerability research. UPnP services are often capable of system-level modifications that are protected only by a thin veil of obscurity. When I found strings referencing "firmware" in the Netgear R6200[1] 801.11ac router's UPnP binary I knew this daemon was going to be an interesting target. Most SOHO router exploits do not offer persistence[2], owing to their read-only storage. An unauthenticated firmware upload is an opportunity to persist undetected on the gateway device for months or even years.

Firmware Unpacking and Strings Analysis

Upon unpacking the R6200's firmware, you can easily identify the UPnP daemon as /usr/sbin/upnpd. Source code is not available for this application, so research is an exercise in binary analysis.

Initial strings analysis of the binary reveals a "SetFirmware" string:

terminal - strings upnpd
Strings analysis on upnpd binary, showing "SetFirmware"


Hopefully this string is somehow related to modifying the device's firmware. Static analysis reveals how the "SetFirmware" string is referenced in the binary:

SetFirmware reference in upnpd
Reference to "SetFirmware" from upnp_main()

As shown in the above screenshot, "SetFirmware" is referenced exactly once, from upnp_main() at offset 0x4142C4.

Lazy Parsing

When upnpd receives a SOAP request, the upnp_main() function does the following:
  • recv() from a TCP socket
  • check that it received (seemingly arbitrarily) 8,190 bytes or fewer.
  • perform a lazy parse of incoming requests by performing stristr() string searches on the received data.
The upnp_main() function searches for the string "Content-length:" literally anywhere (wtf?) in the received data. If the value following "Content-length:" is greater than or equal to (again, seemingly arbitrary) 102401, as checked by atoi() another stristr() is performed, searching for the "SetFirmware" string. Again, this string may be anywhere in the received data. If the string is found, upnp_receive_firmware_packets() is called at 0x4144E4.

call to upnp_receive_firmware_packets()
A call to upnp_receive_firmware_packets()


It's worth noting the implication of these two size checks, the first for 8,190 or less and the second for a content length greater that 102,401. The request must either have a forged content-length header, or the requesting client must avoid sending the entire request in one operation. In the latter case, the request should send no more than 8,190 bytes, pause, then send the rest.

It is also worth noting that at this stage it is unclear how the "SetFirmware" request should be structured. It also is unclear if it should even be a SOAP request (we will proceed on the assumption that it is), or some other protocol. The only things that are known about the request are:
  • The request should be broken up into two or more parts, with the first being no larger than 8,190 bytes.
  • "Content-length:" should be somewhere in the data, presumably in the HTTP headers (because this would make sense), but not necessarily.
  • The content length should be greater than 102,401 bytes.
  • The string "SetFirmware" should be somewhere in the data.

This is the first bug that suggests this code doesn't actually work, at least not naturally. When you send() a request, you should be able to send the entire request in one operation. Your operating system's TCP/IP stack (usually in the kernel) will handle chunking the data as necessary. Further, the remote host's TCP/IP stack will handle unchunking the data as necessary. These details are abstracted from userspace code, and the receiving program should be able continue receiving until the remote end has closed the connection or until some maximum allowable size has been received. We're able to work around this anomalous behavior by sending only a small chunk of the data, then sleeping before sending the rest.

In the next part, I'll describe another bug, this time a misuse of select(), that also suggests this code never actually worked in the wild. I'll go on to describe how to make it work anyway. I'll also discuss how the broken, lazy parsing makes it difficult to know how the SOAP request should be formed such that execution follows a desirable code path.

------------------------------
[1] Although the R6200 was the primary device researched, preliminary analysis of other devices, including the R6300 v1, indicates presence of the same vulnerabilities described on this blog.
[2] It should be noted that non-persistent exploits are attractive in their own right, as the attacker may remove all traces of the compromise from the device by merely rebooting it.

Wednesday, April 22, 2015

Broken, Abandoned, and Forgotten Code: Prologue

A Secret Passage to Persistant SOHO Router Pwnage


Almost two years ago plus a house selling, a cross-country move, a house buying, a job change, and a wedding, I downloaded and unpacked the firmware for Netgear's then-new R6200 wireless router. This was one of Netgear's first entries into the nascent 802.11ac market. At around US$200 at the time, this device was at the high end of the Netgear lineup. Finding some cool vulnerabilities in some of the newest, swankiest, consumer WiFi gear would make for a neat paper, or at least a good blog post or two.

In June 2013, I started investigating the R6200. Right away, there were suspicious strings and code paths in the UPnP daemon that were too interesting ignore. If I was right, I would be able to flash a malicious firmware to the device from the local network without authentication. Answering the sirens' call, I spent a few weeks trying to unravel this shit-show of a daemon. I finally gave up, deciding the code I was investigating was too broken to ever actually work, and was therefore not exploitable.

Fast forward six months to December. Having worked through my anger from wasted weeks of work over the summer, the project was back on my mind. I decided to revisit it, this time with a new approach. My original approach was to reverse engineer what appeared to be a backdoor update capability. I gave up when I realized the backdoor was likely never completely implemented and could never actually work as intended. My new approach was to see if I could specially craft an exploit that would route around all the broken networking code and broken parsing code in order to get the router to accept my firmware without crashing.

Spoiler: In the end I was successful. The project had become interesting enough that I planned to write it up and submit it to a conference. But, well, life happened, and here we are nearly a year and a half later.

What comes next amounts, I think, to the equivalent of a small book describing this project. Over the next 14 or so posts, I'll cover all of the various challenges involved and how I solved them, including the following:
  • Reverse engineering the upnpd binary
  • Broken networking code and how to deal with it
  • Using Bowcaster to reverse engineer an undocumented firmware header
  • Unpacking, modifying, and repacking the firmware
... and many others. I plan to post about one article a week. I'll include complete, working exploit code as well as code to generate proper headers and to repack the firmware.

My hope is that, with the necessary tools and a little prerequisite reversing experience, you can follow along and reproduce this project.

In the mean time, here's a video to give you a tease. The left window is a minicom serial connection showing you what's going on under the hood. The right window is where actual exploitation is happening.



R6200 Firmware Upload from Zach on Vimeo.

Stay tuned. Hopefully it will be fun.

Update: Part 1 is up! Hope you enjoy it!

Friday, February 20, 2015

Bowcaster Feature: multipart/form-data

Need to reverse engineer or exploit a file upload vulnerability in an embedded web server? I added a multipart/form-data class to Bowcaster to help with that. You can have a look here:
https://github.com/zcutlip/bowcaster/blob/master/src/bowcaster/clients/http.py

Here's some background:
I've been reverse engineering how the Netgear R6200 web server parses a new firmware image when you use the firmware update facility in the web interface. Manually browsing to the router's web interface, then to the firmware update form, then browsing to a firmware file on disk, then clicking "upload" gets really tedious after a few times.

NETGEAR Router R6200 Firmware Upload
Updating the Netgear R6200's firmware through the web interface.


I wanted to use Bowcaster's HttpClient class to do this programmatically. Unfortunately, it lacked the ability to generate a multipart/form-data POST body, so I added a class to help with that. I know; there are 3rd party Python modules to do this already, but I don't want Bowcaster to have any dependencies other than Python, itself.

Anyway, below is an example program that uses the class to upload a firmware image, which you can get here from Bowcaster's example code. In order to identify the fields that need to be populated, I manually uploaded a file through a web browser, and analyzed the POST request in Burp Suite.


#!/usr/bin/env python

"""
Example code to upload a firmware file to the Netgear R6200 firmware update form.
"""

import sys
import os
from bowcaster.common import Logging
from bowcaster.clients import HttpClient
from bowcaster.clients import HTTPError
from bowcaster.clients import MultipartForm

def send_fw(url,fw_file):
    
    logger=Logging(max_level=Logging.DEBUG)
    logger.LOG_INFO("Sending %s" % fw_file)
    logger.LOG_INFO("to %s" % url)

    fw_file_basename=os.path.basename(fw_file)
    
    logger.LOG_INFO("Creating headers.")
    headers={"Accept":
    "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
    headers["Accept-Language"]="en-US,en;q=0.5"
    headers["Accept-Encoding"]="gzip, deflate"
    headers["Referer"]="http://192.168.127.141/UPG_upgrade.htm"

    #admin:password
    headers["Authorization"]="Basic YWRtaW46cGFzc3dvcmQ="
    headers["Connection"]="keep-alive"

    logger.LOG_INFO("Creating post data")
    
    mf=MultipartForm()
    mf.add_field("buttonHit","Upgrade")
    mf.add_field("buttonValue","Upload")
    mf.add_field("IS_check_upgrade","0")
    mf.add_field("ver_check_enable","1")
    mf.add_file("mtenFWUpload",fw_file)
    mf.add_field("upfile",fw_file_basename)
    mf.add_field("Upgrade","Upload")
    mf.add_field("progress","")
    post_data=str(mf)
    headers["Content-Length"]=("%s" % len(post_data))
    headers["Content-Type"]=mf.get_content_type()
    client=HttpClient()
    logger.LOG_INFO("Sending request.")
    resp=client.send(url,headers=headers,post_data=post_data,logger=logger)
    
    return resp
    
    
    
def main(fw_file,host=None):
    if not host:
        host="192.168.127.141"
    url="http://%s/upgrade_check.cgi" % host
    resp=send_fw(url,fw_file)
    print resp

if __name__ == "__main__":
    if(len(sys.argv) == 2):
        main(sys.argv[1])
    elif len(sys.argv) == 3:
        main(sys.argv[1],host=sys.argv[2])
    else:
        print("Specify at least firmware file.")
        sys.exit(1)