Packing The Unknown: Just-in-Time Decryption w/ Vectored Exception Handling
Background
Attackers can use packers to hide malicious files from a security scanner. They create an executable that wraps around another binary file, and when executed, will unwrap its payload and execute it. Packers can add a number of intermediate steps to prolong detection:
Packers are great for avoiding a malware flag before the payload is running. However, dynamic run-time scans can defeat them. These scans will analyse the virtual memory of a program to perform signature matching and API heuristic checks. To try and overcome this, a packer can use the traditional process hollowing approach. A packer using this technique will inject its payload into a different process, aiming to avoid detection from run-time memory scanners. There are several problems with this approach:
- A new process starts in a suspended state. This is a necessity, as injecting into an established, running process can cause unpredictable behavior. A good security scanner will hook the Windows API to capture new process spawns, and will apply similar run-time memory analysis on these entities.
- The API call stack is obvious: the Windows API calls to perform process injection are very well documented. Heuristics can determine process hollowing attempts by tracking these.
- Integrity checks can defeat it; comparing the original process on disk with what is executing in memory will highlight the suspicious process.
Process Hollowing Alternatives And Challenges
So how does an attacker avoid detection then? They have to get creative. Just-in-time decryption is a similar method to run-time decryption, however, portions of the payload decrypt only when required by the processor. There is never a point in time where the payload is visible in its entirety. Virtualisation has a similar effect to JIT decryption, and has made an appearance on the malware scene on many occasions. Whilst these methods avoid detection, they cannot work as a traditional packer. These protections are usually applied at the intermediate level of the compilation process. A traditional packer takes a precompiled binary as its input. Applying virtualisation or decrypt-in-time techniques to an advanced instruction set such as x86/x64 is no trivial task; reliable decompile<->recompile tool chains do not exist.
Just-in-Time (JIT) Decryption via VEH
The challenges of applying JIT decryption to compiled x86 assembly is what motivated this post. When thinking of a solution to this problem, one idea was to mutate the payload’s assembly to insert jumps toward decrypt and encrypt routines. However, this adds a lot of complexity regarding updating the relative addressing, stack restoration, relocations and more. That ultimately persuaded me to think of a lazier way this can be done: through vectored exception handling (VEH).
The idea is complex: we take an input x86 file, parse the Portable Executable (PE) format it is stored under, and break each section down into blocks. The block size is not relevant, it can be anything; ideally small enough to not contain a solid signature. We then encrypt these blocks and rebuild the PE structure, with additional metadata to show each blocks start and end address. This encrypted PE file is then stored in the run-time stub. Figure 1 shows the run-time stub’s basic architecture and it contains two key buffers: an active buffer, representing the virtual memory for the payload, and an encrypted buffer, holding the encoded blocks. The PEImage class will act as the loader, mimicking Window’s PE loading process, populating the encrypted buffer with the correct regions determined from the PE file.
Once the PE file has been parsed, the entry point for the payload is executed by the processor. However, this is not within the ‘encrypted buffer’ where the real, encrypted blocks are kept. It is actually entering the ‘active buffer’. This active buffer contains nothing more than interrupts to trigger the VEH. The exception handler is shown in Figure 2.
As the processor moves within the active buffer, exceptions will trigger. Once an exception is fired, we look at the address the processor is trying to access and find the encrypted block where the real data is kept (using the forementioned metadata). We then decode this data and copy it to the active buffer, where the exception was first triggered. After this, we restore the processor’s instruction pointer back to where this data has now been copied to. Execution can now resume, with the original payload instructions being processed. This process will repeat; each time this occurs though, the previous block that was decoded within the active buffer will be removed since it is no longer required.
The need for two buffers is due to the fact that if we are working within the single, encrypted buffer, the processor’s instruction pointer would be trying to execute encoded instructions. Whilst this could cause an invalid instruction exception, there is a chance that the instruction is valid and causes undefined behavior from what the payload expects.
Outcome
The result of this is that the decrypted payload is never visible in whole on the disk, or within memory. The greatest downside to this is the additional overhead. Of course, not everything needs to be encrypted. Selectively creating blocks to encode and decode can reduce this overhead and perhaps obscure the process from analysis. However, the principle behind this is that the binary being shielded is unknown to the tool chain.
VEH abuse has been seen in the wild many times before, including in the use of packing. I am unsure of any cases where it has been used for Just-in-Time decryption, but given the volume of malware today, it likely has. This approach suffers some of the same limitations that process hollowing does, namely, the API call stack is not negated. API heuristics are a powerful tool that can be used to identify mutating variants, and this example is no exception.
To hide from security scanners, operating on a compiled binary alone is not enough: evasive action has to be taken at the source code level and conducted through the compilation process. Whilst this work can seem counter-intuitive to security efforts, by understanding how attackers are defeating detection mechanisms, especially at a malware-as-a-service level, we can make stronger defences.