How the Stack works
How the Stack works
André Eichhofer
The stack is a part of the Random Access Memory (RAM) where local variables are stored. When a program is run by the operating system, the executable will be held in the memory. The memory consists of different areas and looks like the following:
┌─────────────────┐
0xFFFFFFFF │ │
│ │
│ Kernel │
│ │
│ │
├─────────────────┤
│ │
│ │
│ Stack │
│ │
│ │
├─────────────────┤
│ │
│ │
│ Heap │
│ │
│ │
├─────────────────┤
│ │
│ Data │
│ │
├─────────────────┤
│ │
│ Text │
0x00000000 │ │
└─────────────────┘
- Kernel: contains command line parameters that are passed to the program
- Text: contains the actual code of the programm. Text area is read only, because the code must not be changed
- Data: contains initialized and unitialized variables
- Heap: contains large objects (images, files etc)
- Stack: holds local variables for each of the functions of the program, the stack is writable as the variables may change
When a new function is called, these are pushed on the end of the stack. Since the stack grows downward, every item pushed on top of the stack, will make it grow towards the low memory address area. For example, if a programm calls a function, the parameters of the function are pushed on top of the stack making the other entries of the stack growing downwards.
Structure of the stack
The stack consists of several registers which are used to store data. To simplify the diagram is turned upside down, which means that the higher memory addresses are downside.
┌───────────────────────────┐
0x00000000 │ │
│ extended stack pointer │
│ │
├───────────────────────────┤
▲ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ Buffer │ │
│ │ │ │
│ │ │ │
│ │ │ │
stack growth │ │ │ │ memory
│ ├───────────────────────────┤ │
│ │ │ │
│ │ extended base pointer │ │
│ │ │ │
│ ├───────────────────────────┤ │
│ │ │ │
│ │ instruction pointer │ │
│ │ │ ▼
├───────────────────────────┤
│ │
│ parent stack │
│ │
0xFFFFFFFF └───────────────────────────┘
The content of the registers are variable and are situated at specific addresses in the memory.
-
The (extended) stack pointer (esp) points to the top of the stack. It is followed by the buffer which holds the content of variabled (or parameters) of a specific function of the programm. The size of the buffer must be allocated by the function of the programm. The address of the esp register is changing constantly.
-
The (extended) instruction pointer (eip) points to the next instruction the programm is about to execute and hold the return address.
-
The (extended) base pointer (ebp) stays always the same. That means that we can use the base pointer as an anchor to find parameters and local variables.
Note that dependent from the system the name of the registers are different. In 64-bit-systems the registers are called rsp, rip, rbp, etc.
Examing the stack
Compile a C file, load it in GDB and disasemble a function. The output is something like
(gdb) disas func
0x00000305 <+0> push %ebp
...
...
Examing addresses in the stack
0x00000305
is the address of the instruction written in base 16 or hexadecimal. This is where the instruction lives in the memory. Note that the address consists of 8 characters. In GDB, a word is 4 bytes (1 byte = 8 bits). Addresses are one word, or 8 bytes = 32 bits. That’s why it’s called 32 bit architecture. Registers are also 32 bits.
In 64 bit systems the address my look like
0x0000000000000305 <+0> push %ebp
...
If the program has not been run yet, the registers are empty and if you would want to inspect them you get the error No registers
.
Run the program to get the registers filled.
Examine registers in the stack
To examine registers it might be necessary to set a breakpoint before. List the function with
disas/s {name of function}
and set the breakpoint at the specific line
- Example:
b 9
To get an overview on the recent frame
info frames
You can examine any register in the stack by the following commands:
info registers
: overview of registersinfo registers {register}
: show specific register
You can print the memory (contents) of registers with:
x/
{number of units} {unit} {register / register address}- number of units: how many units to print
- unit: x(integer, hex), s(string)
- register: register (e.g. $rsp), register address
If you type
x/12x $rsp
GDB will print 100 addresses from the stack pointer register and the output will be something like this
(gdb) x /12x $esp
0x2fc0: 0x00000000 0x00000000 0x00000000 0x00000000
0x2fd0: 0x00000000 0x00000000 0x00000000 0x00000000
0x2fe0: 0x00000000 0x00003fb8 0xffffffff 0x00000001
Note that gdb prints the memory with an offset of 4. Means, that each colum is an address. In more verbose view the output would look like this:
(gdb) x /12x $esp
0x2fc0: 0x00000000
0x2fd0: 0x00000000
0x2fe0: 0x00000000
0x2fea: 0x00003fb8
0x2feb: 0xffffffff
... and so on
To get a more verbose view line by line use commando:
- (gdb)
x/12s $esp
which prints the content of the memory as string line by line.
Each column (or each address line) represents 1 word. Each word is 4 bytes (or 8 bits).
(gdb) x /12x $esp
0x2fc0: 0x00000000 0x00000000
| |
V V
1 Word 1 Word
Examine (extended) stack pointer
Get content of the stack
The (extended) stack pointer is a marker (register) that points always to the top of the stack. The stack pointer is labeled
$esp
in 32-bit architecture and$rsp
in 64-bit architecture
Get the address of the stack pointer with
info frames
info registers
info registers {$rsp} / {$esp}
Get the content of the stack pointer with
x /{number of bytes} {format} $esp / $rsp
Examples:
x /100x $rsp
: list 100 bytes of stack pointer in hex formatx /100s $rsp
: list 100 bytes of stack pointer in string format
Find variables in the stack
To find the content of variables in the stack, type
info args
and the output could be something like
variable = 0x7fffffffe6a4 "content"
| |
V V
address of variable content of variable
It’s possible that the address of the variable - 0x7fffffffe6a4
- is located far beyond the base pointer and seems to be out of scope of the stack from current frame. However, the stack might point to that address:
(gdb) x /100x $rsp
0x7fffffffe2a0: 0xffffffff 0x00000000 0xffffe6a4 <–– address of variable
If you print the stack in string format you see the content of the address the stack points to
(gdb) x /100s $rsp
0x7fffffffe2a8: "\244\346\377\377\377\177"
0x7fffffffe2af: ""
0x7fffffffe2b0: "content" <–– content of 0x7fffffffe6a4
Examening (extended) base pointer
The (extended) base pointer is a register that always points to the base of the stack. The base pointer points to a higher memory address from the bottom of the stack downwards. In GDB it is labeled as
$ebp
in 32-bit architecture and$rbp
in 64-bit architecture.
Unlike the stack pointer, the base pointer always stays at the same address. That means that all local variables and parameters are at a fixed offset from the base pointer even as the stack pointer moves with push and pop.
Get the address of the stack pointer with
info frames
info registers
info registers {$rbp} / {$rsp}
Get the content of the base pointer with
x /{format} $rbp / $ebp
Examples:
¯¯¯¯¯¯¯¯¯¯¯¯¯
Get memory address of $rbp
(gdb) info frame
rbp at 0x7fffffffe320
Example: Find $rbp in the stack
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
(gdb) x /100x $rsp
...
...
0x7fffffffe320: 0xffffe340 0x00007fff 0x00400593 0 x00000000
| |
V V
$rbp points to higher memory address
...
Get content of $rbp
(gdb) x /x 0x7fffffffe320
0x7fffffffe320: 0xffffe340 <–– $rbp points to 0xffffe340
Examine (extended) instruction pointer
The instruction pointer is a register that points to the next instruction of the function. In GDB it is labeled as
$eip
in 32-bit architecture and$rip
in 64-bit architecture.
In the stack the instruction pointer is situated on a little bit higer memory address than the base pointer. The instruction pointer is always located
- 4 byte offset from base pointer in 32-bit-systems
- 8 byte offset from base pointer in 64-bit-system
Get the address of the stack pointer with
info frames
info registers
info registers {$eip} / {$rip}
As the instruction pointer is located at a fixed offset from the basepointer you can find the register with
x /x $ebp+4
in 32-bit systemsx /x $rbp+8
in 64-bit systems
Example:
¯¯¯¯¯¯¯¯¯¯¯¯¯
(gdb) x /100x $rsp
...
...
0x7fffffffe310: 0xf7de3b40 0x00007fff 0x00000000 0x00000000
0x7fffffffe320: 0xffffe340 0x00007fff 0x00400593 0x00000000
... | |
... V V
base pointer instruction pointer
In the example the base pointer is located at the memory address 0x7fffffffe320
and points to the memory address 0xffffe340
. The instruction pointer is exactly 8 bytes offset from the rbp at the address 0x7fffffffe328
and it points to the return address 0x00400593
.
Calculating the buffer size
When testing buffer overflows its necessary to calculate the buffer size to adjust the size of the payload. You can calculate the buffer size from the beginning to the instruction pointer or to any function pointer.
Method 1: Calculate the buffer size manually
For calculating the buffer size you need to make a payload of unique strings and then find a part of that string in the stack.
We create a payload with
- 350 * letter
A
and - 100 * random numbers
==> 450 bytes
#!/usr/bin/python
attack = 'A' * 350
for i in range(0,5):
for j in range(0,10):
attack += str(i) + str(j)
print attack
Make a payload from the python script with
./calculation_payload > calculation
Open the program with gdb and execute it with the payload
(gdb) run < calculation
Print the stack and notice the value of rip
0x7fffffffe400: 0x32343134 0x34343334 0x38333733 0x30343933
| |
V V
Instruction Pointer
Here, the $rip is at 0x7fffffffe408
and it contains the values
0x38333733
and 0x30343933
.
You need to convert the hex value to ascii and then reverse the value as it is noted in little endian:
- 38333733 = 8373 => 3738
- 30343933 = 0493 => 3940
Find the numbers in the payload:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
0001020304050607080910111213141516171819202122232425262728293031
3233343536|37383940|414243444546474849
|
V
Ascii value from $rip
Count the A
character
= 350
plus the numbers until the 37383940
= 74
Total: 424 characters ==> 424 byte buffer size
Method 2: Use pattern create and pattern offset
The method above can be automated with the scripts pattern create and pattern offet from metasploit framework.
Method 3: Substract the memory address in Gdb
In gdb you can substract memory addresses to get the length of the buffer.
(gdb) x /100x $rsp
0x7fffffffe220: 0xffffffff 0x00000000 0xffffe62c 0x00007fff
0x7fffffffe230: 0x90909090 0x90909090 0x90909090 0x90909090
0x7fffffffe240: 0x90909090 0x90909090 0x90909090 0x90909090
0x7fffffffe250: 0x90909090 0x90909090 0x90909090 0x90909090
0x7fffffffe260: 0x90909090 0x90909090 0x90909090 0x90909090
0x7fffffffe270: 0x90909090 0x90909090 0x90909090 0x90909090
0x7fffffffe280: 0x90909090 0x90909090 0xcccccccc 0xcccccccc
0x7fffffffe290: 0xcccccccc 0xcccccccc 0xcccccccc 0xcccccccc
0x7fffffffe2a0: 0xcccccccc 0xcccccccc 0x41414141 0x41414141
| |
V V
instruction pointer
In the example above, assume that the buffer begins at the address
0x7fffffffe230
and the instruction pointer is at the address
0x7fffffffe2a8
.
As the stack grows from downwards the address 0x7fffffffe2a8
is greater than 0x7fffffffe230
. With gdb you can substract the addresses with
(gdb) p/d 0x7fffffffe2a8 - 0x7fffffffe230
$1 = 120
So, the buffer size is 120 byte.