While the attack we’ve seen in the first part are indeed powerful, they are limited to simple calls to functions already linked by the program we’re going to attack. To appreciate full power we should exploit the technique used in the first part for injecting arbitrary code in the program.
The first experiment I tried was calling VirtualProtect to change the permissions on the stack. However it requires to know the exact address that was used in a previous call to VirtualAlloc to work properly, and I couldn’t find an immediate way to know that address. After that I investigated to check if I could trick the memory manager in changin permissions using VirtualAlloc. Doing this I found this wonderful blog article. Take your time and read it, it’s very interesting. Now, that’s time to try his trick on an executable.
All you read here (and its sequel posts) is just a concrete implementation of many proposed ways to exploit buffer overflows to execute arbitrary code; in particular this post will focus on bypassing the DEP (Data Execution Prevention) which uses the NX bit on AMD chips and XD bit on Intel chips to prevent code execution from data segments (NX and XD are, as far as I know, the same bit with two different names so that Intel’s pride was satisfied – just like amd64 techology is also called x86-64, x64, aa64 and em64t – wasn’t a name enough ? :|) .
The DEP is really a needed feature which was incredibly missing from x86 (ia32 ;)) processors. It is a good protection which effectively makes harder to write exploits for unchecked buffers bugs. However it doesn’t make them impossible – it just makes them slightly harder to do.
If this post (and its sequels) strikes you down in fear, then remember to update all your applications to the latest versions and check them consistently for bugfixes repairing buffer overflows, and if you are a programmer, triple check your code to look for these bugs.
Please, note also that nothing in these posts is an original idea, since these methods are discussed in theory in many sites around the internet. What was missing was a proof of concept of those exploits, which you can find here.
As an aside from the 1st part, we try to optimize the performances of the solutions found in part 1, with special attention to the assembly version.
First, let’s take two timings of the versions presented in part 1, compiling in release with optimizations turned on, running on an Athlon64 3000+ 1.8GHz and with times taken with timeGetTime.
These are the times (native C++, assembly) :
- Assembly : 1012ms
- C++ : 1605ms
There are a couple of optimization we could try to make the code faster.