Commit Graph

745 Commits

Author SHA1 Message Date
Paolo Bonzini
48b1893ae3 Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM x86 PMU changes for 6.4:

 - Disallow virtualizing legacy LBRs if architectural LBRs are available,
   the two are mutually exclusive in hardware

 - Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
   after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
   validate PERF_CAPABILITIES

 - Apply PMU filters to emulated events and add test coverage to the
   pmu_event_filter selftest

 - Misc cleanups and fixes
2023-04-26 15:53:36 -04:00
Paolo Bonzini
807b758496 Merge tag 'kvm-x86-mmu-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM x86 MMU changes for 6.4:

 - Tweak FNAME(sync_spte) to avoid unnecessary writes+flushes when the
   guest is only adding new PTEs

 - Overhaul .sync_page() and .invlpg() to share the .sync_page()
   implementation, i.e. utilize .sync_page()'s optimizations when emulating
   invalidations

 - Clean up the range-based flushing APIs

 - Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
   A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
   changed SPTE" overhead associated with writing the entire entry

 - Track the number of "tail" entries in a pte_list_desc to avoid having
   to walk (potentially) all descriptors during insertion and deletion,
   which gets quite expensive if the guest is spamming fork()

 - Misc cleanups
2023-04-26 15:50:01 -04:00
Sean Christopherson
cf9f4c0eb1 KVM: x86/mmu: Refresh CR0.WP prior to checking for emulated permission faults
Refresh the MMU's snapshot of the vCPU's CR0.WP prior to checking for
permission faults when emulating a guest memory access and CR0.WP may be
guest owned.  If the guest toggles only CR0.WP and triggers emulation of
a supervisor write, e.g. when KVM is emulating UMIP, KVM may consume a
stale CR0.WP, i.e. use stale protection bits metadata.

Note, KVM passes through CR0.WP if and only if EPT is enabled as CR0.WP
is part of the MMU role for legacy shadow paging, and SVM (NPT) doesn't
support per-bit interception controls for CR0.  Don't bother checking for
EPT vs. NPT as the "old == new" check will always be true under NPT, i.e.
the only cost is the read of vcpu->arch.cr4 (SVM unconditionally grabs CR0
from the VMCB on VM-Exit).

Reported-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lkml.kernel.org/r/677169b4-051f-fcae-756b-9a3e1bb9f8fe%40grsecurity.net
Fixes: fb509f76ac ("KVM: VMX: Make CR0.WP a guest owned bit")
Tested-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20230405002608.418442-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10 15:25:36 -07:00
Sean Christopherson
9ed3bf4112 KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V code
Refactor Hyper-V's range-based TLB flushing API to take a gfn+nr_pages
pair instead of a struct, and bury said struct in Hyper-V specific code.

Passing along two params generates much better code for the common case
where KVM is _not_ running on Hyper-V, as forwarding the flush on to
Hyper-V's hv_flush_remote_tlbs_range() from kvm_flush_remote_tlbs_range()
becomes a tail call.

Cc: David Matlack <dmatlack@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230405003133.419177-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10 15:17:29 -07:00
Sean Christopherson
8a1300ff95 KVM: x86: Rename Hyper-V remote TLB hooks to match established scheme
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used
by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code,
arch hooks from common code, etc.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-10 15:17:29 -07:00
Sean Christopherson
fb3146b4dc KVM: x86: Add a helper to query whether or not a vCPU has ever run
Add a helper to query if a vCPU has run so that KVM doesn't have to open
code the check on last_vmentry_cpu being set to a magic value.

No functional change intended.

Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: Like Xu <like.xu.linux@gmail.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20230311004618.920745-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-04-06 14:57:22 -07:00
Paolo Bonzini
2fdcc1b324 KVM: x86/mmu: Avoid indirect call for get_cr3
Most of the time, calls to get_guest_pgd result in calling
kvm_read_cr3 (the exception is only nested TDP).  Hardcode
the default instead of using the get_cr3 function, avoiding
a retpoline if they are enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/r/20230322013731.102955-2-minipli@grsecurity.net
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-22 07:46:42 -07:00
Sean Christopherson
f3d90f901d KVM: x86/mmu: Clean up mmu.c functions that put return type on separate line
Adjust a variety of functions in mmu.c to put the function return type on
the same line as the function declaration.  As stated in the Linus
specification:

  But the "on their own line" is complete garbage to begin with. That
  will NEVER be a kernel rule. We should never have a rule that assumes
  things are so long that they need to be on multiple lines.

  We don't put function return types on their own lines either, even if
  some other projects have that rule (just to get function names at the
  beginning of lines or some other odd reason).

Leave the functions generated by BUILD_MMU_ROLE_REGS_ACCESSOR() as-is,
that code is basically illegible no matter how it's formatted.

No functional change intended.

Link: https://lore.kernel.org/mm-commits/CAHk-=wjS-Jg7sGMwUPpDsjv392nDOOs0CtUtVkp=S6Q7JzFJRw@mail.gmail.com
Signed-off-by: Ben Gardon <bgardon@google.com>
Link: https://lore.kernel.org/r/20230202182809.1929122-4-bgardon@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 16:02:47 -07:00
Sean Christopherson
eddd9e8302 KVM: x86/mmu: Replace comment with an actual lockdep assertion on mmu_lock
Assert that mmu_lock is held for write in __walk_slot_rmaps() instead of
hoping the function comment will magically prevent introducing bugs.

Signed-off-by: Ben Gardon <bgardon@google.com>
Link: https://lore.kernel.org/r/20230202182809.1929122-3-bgardon@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 16:02:07 -07:00
Sean Christopherson
727ae37701 KVM: x86/mmu: Rename slot rmap walkers to add clarity and clean up code
Replace "slot_handle_level" with "walk_slot_rmaps" to better capture what
the helpers are doing, and to slightly shorten the function names so that
each function's return type and attributes can be placed on the same line
as the function declaration.

No functional change intended.

Link: https://lore.kernel.org/mm-commits/CAHk-=wjS-Jg7sGMwUPpDsjv392nDOOs0CtUtVkp=S6Q7JzFJRw@mail.gmail.com
Signed-off-by: Ben Gardon <bgardon@google.com>
Link: https://lore.kernel.org/r/20230202182809.1929122-2-bgardon@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 16:02:06 -07:00
David Matlack
9d4655da1a KVM: x86/mmu: Use gfn_t in kvm_flush_remote_tlbs_range()
Use gfn_t instead of u64 for kvm_flush_remote_tlbs_range()'s parameters,
since gfn_t is the standard type for GFNs throughout KVM.

Opportunistically rename pages to nr_pages to make its role even more
obvious.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20230126184025.2294823-6-dmatlack@google.com
[sean: convert pages to gfn_t too, and rename]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 15:36:20 -07:00
David Matlack
8c63e8c217 KVM: x86/mmu: Rename kvm_flush_remote_tlbs_with_address()
Rename kvm_flush_remote_tlbs_with_address() to
kvm_flush_remote_tlbs_range(). This name is shorter, which reduces the
number of callsites that need to be broken up across multiple lines, and
more readable since it conveys a range of memory is being flushed rather
than a single address.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20230126184025.2294823-5-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 15:16:12 -07:00
David Matlack
28e4b4597d KVM: x86/mmu: Collapse kvm_flush_remote_tlbs_with_{range,address}() together
Collapse kvm_flush_remote_tlbs_with_range() and
kvm_flush_remote_tlbs_with_address() into a single function. This
eliminates some lines of code and a useless NULL check on the range
struct.

Opportunistically switch from ENOTSUPP to EOPNOTSUPP to make checkpatch
happy.

Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20230126184025.2294823-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-17 15:08:41 -07:00
Lai Jiangshan
141705b783 KVM: x86/mmu: Track tail count in pte_list_desc to optimize guest fork()
Rework "struct pte_list_desc" and pte_list_{add|remove} to track the tail
count, i.e. number of PTEs in non-head descriptors, and to always keep all
tail descriptors full so that adding a new entry and counting the number
of entries is done in constant time instead of linear time.

No visible performace is changed in tests.  But pte_list_add() is no longer
shown in the perf result for the COWed pages even the guest forks millions
of tasks.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230113122910.672417-1-jiangshanlai@gmail.com
[sean: reword shortlog, tweak changelog, add lots of comments, add BUG_ON()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 19:07:37 -07:00
Lai Jiangshan
19ace7d6ca KVM: x86/mmu: Skip calling mmu->sync_spte() when the spte is 0
Sync the spte only when the spte is set and avoid the indirect branch.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216235321.735214-5-jiangshanlai@gmail.com
[sean: add wrapper instead of open coding each check]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:55 -07:00
Lai Jiangshan
9fd4a4e3a3 KVM: x86/mmu: Remove FNAME(invlpg) and use FNAME(sync_spte) to update vTLB instead.
In hardware TLB, invalidating TLB entries means the translations are
removed from the TLB.

In KVM shadowed vTLB, the translations (combinations of shadow paging
and hardware TLB) are generally maintained as long as they remain "clean"
when the TLB of an address space (i.e. a PCID or all) is flushed with
the help of write-protections, sp->unsync, and kvm_sync_page(), where
"clean" in this context means that no updates to KVM's SPTEs are needed.

However, FNAME(invlpg) always zaps/removes the vTLB if the shadow page is
unsync, and thus triggers a remote flush even if the original vTLB entry
is clean, i.e. is usable as-is.

Besides this, FNAME(invlpg) is largely is a duplicate implementation of
FNAME(sync_spte) to invalidate a vTLB entry.

To address both issues, reuse FNAME(sync_spte) to share the code and
slightly modify the semantics, i.e. keep the vTLB entry if it's "clean"
and avoid remote TLB flush.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216235321.735214-3-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:54 -07:00
Lai Jiangshan
ed335278bd KVM: x86/mmu: Allow the roots to be invalid in FNAME(invlpg)
Don't assume the current root to be valid, just check it and remove
the WARN().

Also move the code to check if the root is valid into FNAME(invlpg)
to simplify the code.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216235321.735214-2-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:53 -07:00
Lai Jiangshan
2c86c444e2 KVM: x86/mmu: Use kvm_mmu_invalidate_addr() in nested_ept_invalidate_addr()
Use kvm_mmu_invalidate_addr() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216235321.735214-1-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:53 -07:00
Lai Jiangshan
9ebc3f51da KVM: x86/mmu: Use kvm_mmu_invalidate_addr() in kvm_mmu_invpcid_gva()
Use kvm_mmu_invalidate_addr() instead open calls to mmu->invlpg().

No functional change intended.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-10-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:52 -07:00
Lai Jiangshan
cd42853e95 kvm: x86/mmu: Use KVM_MMU_ROOT_XXX for kvm_mmu_invalidate_addr()
The @root_hpa for kvm_mmu_invalidate_addr() is called with @mmu->root.hpa
or INVALID_PAGE where @mmu->root.hpa is to invalidate gva for the current
root (the same meaning as KVM_MMU_ROOT_CURRENT) and INVALID_PAGE is to
invalidate gva for all roots (the same meaning as KVM_MMU_ROOTS_ALL).

Change the argument type of kvm_mmu_invalidate_addr() and use
KVM_MMU_ROOT_XXX instead so that we can reuse the function for
kvm_mmu_invpcid_gva() and nested_ept_invalidate_addr() for invalidating
gva for different set of roots.

No fuctionalities changed.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-9-jiangshanlai@gmail.com
[sean: massage comment slightly]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:51 -07:00
Sean Christopherson
f94db0c8b9 KVM: x86/mmu: Sanity check input to kvm_mmu_free_roots()
Tweak KVM_MMU_ROOTS_ALL to precisely cover all current+previous root
flags, and add a sanity in kvm_mmu_free_roots() to verify that the set
of roots to free doesn't stray outside KVM_MMU_ROOTS_ALL.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-8-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:51 -07:00
Lai Jiangshan
c3c6c9fc5d KVM: x86/mmu: Move the code out of FNAME(sync_page)'s loop body into mmu.c
Rename mmu->sync_page to mmu->sync_spte and move the code out
of FNAME(sync_page)'s loop body into mmu.c.

No functionalities change intended.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-6-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 17:19:44 -07:00
Lai Jiangshan
8ef228c20c KVM: x86/mmu: Set mmu->sync_page as NULL for direct paging
mmu->sync_page for direct paging is never called.

And both mmu->sync_page and mm->invlpg only make sense in shadow paging.
Setting mmu->sync_page as NULL for direct paging makes it consistent
with mm->invlpg which is set NULL for the case.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-5-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 12:49:53 -07:00
Lai Jiangshan
51dddf6c49 KVM: x86/mmu: Check mmu->sync_page pointer in kvm_sync_page_check()
Assert that mmu->sync_page is non-NULL as part of the sanity checks
performed before attempting to sync a shadow page.  Explicitly checking
mmu->sync_page is all but guaranteed to be redundant with the existing
sanity check that the MMU is indirect, but the cost is negligible, and
the explicit check also serves as documentation.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-4-jiangshanlai@gmail.com
[sean: increase verbosity of changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 12:44:19 -07:00
Lai Jiangshan
90e444702a KVM: x86/mmu: Move the check in FNAME(sync_page) as kvm_sync_page_check()
Prepare to check mmu->sync_page pointer before calling it.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-3-jiangshanlai@gmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 12:42:15 -07:00
Lai Jiangshan
753b43c9d1 KVM: x86/mmu: Use 64-bit address to invalidate to fix a subtle bug
FNAME(invlpg)() and kvm_mmu_invalidate_gva() take a gva_t, i.e. unsigned
long, as the type of the address to invalidate.  On 32-bit kernels, the
upper 32 bits of the GPA will get dropped when an L2 GPA address is
invalidated in the shadowed nested TDP MMU.

Convert it to u64 to fix the problem.

Reported-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Link: https://lore.kernel.org/r/20230216154115.710033-2-jiangshanlai@gmail.com
[sean: tweak changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16 12:41:05 -07:00
Sean Christopherson
258d985f6e KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pages
Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults
on self-changing writes to shadowed page tables instead of propagating
that information to the emulator via a semi-persistent vCPU flag.  Using
a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented,
as it's not at all obvious that clearing the flag only when emulation
actually occurs is correct.

E.g. if KVM sets the flag and then retries the fault without ever getting
to the emulator, the flag will be left set for future calls into the
emulator.  But because the flag is consumed if and only if both
EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because
EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated
MMIO, or while L2 is active, KVM avoids false positives on a stale flag
since FNAME(page_fault) is guaranteed to be run and refresh the flag
before it's ultimately consumed by the tail end of reexecute_instruction().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230202182817.407394-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14 10:28:56 -04:00
David Matlack
7f604e92fb KVM: x86/mmu: Make tdp_mmu_allowed static
Make tdp_mmu_allowed static since it is only ever used within
arch/x86/kvm/mmu/mmu.c.

Link: https://lore.kernel.org/kvm/202302072055.odjDVd5V-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20230213212844.3062733-1-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-16 12:29:50 -05:00
Christophe JAILLET
11b36fe7d4 KVM: x86/mmu: Use kstrtobool() instead of strtobool()
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.

In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.

While at it, include the corresponding header file (<linux/kstrtox.h>)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/670882aa04dbdd171b46d3b20ffab87158454616.1673689135.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:49 -08:00
Hou Wenlong
4ad980aea7 KVM: x86/mmu: Cleanup range-based flushing for given page
Use the new kvm_flush_remote_tlbs_gfn() helper to cleanup the call sites
of range-based flushing for given page, which makes the code clear.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/593ee1a876ece0e819191c0b23f56b940d6686db.1665214747.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:48 -08:00
Hou Wenlong
3cdf93746f KVM: x86/mmu: Fix wrong gfn range of tlb flushing in validate_direct_spte()
The spte pointing to the children SP is dropped, so the whole gfn range
covered by the children SP should be flushed. Although, Hyper-V may
treat a 1-page flush the same if the address points to a huge page, it
still would be better to use the correct size of huge page.

Fixes: c3134ce240 ("KVM: Replace old tlb flush function with new one to flush a specified range.")
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/5f297c566f7d7ff2ea6da3c66d050f69ce1b8ede.1665214747.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:48 -08:00
Hou Wenlong
1b2dc73604 KVM: x86/mmu: Fix wrong start gfn of tlb flushing with range
When a spte is dropped, the start gfn of tlb flushing should be the gfn
of spte not the base gfn of SP which contains the spte. Also introduce a
helper function to do range-based flushing when a spte is dropped, which
would help prevent future buggy use of
kvm_flush_remote_tlbs_with_address() in such case.

Fixes: c3134ce240 ("KVM: Replace old tlb flush function with new one to flush a specified range.")
Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/72ac2169a261976f00c1703e88cda676dfb960f5.1665214747.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:47 -08:00
Hou Wenlong
9ffe926537 KVM: x86/mmu: Fix wrong gfn range of tlb flushing in kvm_set_pte_rmapp()
When the spte of hupe page is dropped in kvm_set_pte_rmapp(), the whole
gfn range covered by the spte should be flushed. However,
rmap_walk_init_level() doesn't align down the gfn for new level like tdp
iterator does, then the gfn used in kvm_set_pte_rmapp() is not the base
gfn of huge page. And the size of gfn range is wrong too for huge page.
Use the base gfn of huge page and the size of huge page for flushing
tlbs for huge page. Also introduce a helper function to flush the given
page (huge or not) of guest memory, which would help prevent future
buggy use of kvm_flush_remote_tlbs_with_address() in such case.

Fixes: c3134ce240 ("KVM: Replace old tlb flush function with new one to flush a specified range.")
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/0ce24d7078fa5f1f8d64b0c59826c50f32f8065e.1665214747.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:46 -08:00
Hou Wenlong
c667a3baed KVM: x86/mmu: Move round_gfn_for_level() helper into mmu_internal.h
Rounding down the GFN to a huge page size is a common pattern throughout
KVM, so move round_gfn_for_level() helper in tdp_iter.c to
mmu_internal.h for common usage. Also rename it as gfn_round_for_level()
to use gfn_* prefix and clean up the other call sites.

Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/r/415c64782f27444898db650e21cf28eeb6441dfa.1665214747.git.houwenlong.hwl@antgroup.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:45 -08:00
Wei Liu
a7e48ef77f KVM: x86/mmu: fix an incorrect comment in kvm_mmu_new_pgd()
There is no function named kvm_mmu_ensure_valid_pgd().

Fix the comment and remove the pair of braces to conform to Linux kernel
coding style.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128214709.224710-1-wei.liu@kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-01-24 10:05:45 -08:00
Sean Christopherson
8d20bd6381 KVM: x86: Unify pr_fmt to use module name for all KVM modules
Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks
use consistent formatting across common x86, Intel, and AMD code.  In
addition to providing consistent print formatting, using KBUILD_MODNAME,
e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and
SGX and ...) as technologies without generating weird messages, and
without causing naming conflicts with other kernel code, e.g. "SEV: ",
"tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems.

Opportunistically move away from printk() for prints that need to be
modified anyways, e.g. to drop a manual "kvm: " prefix.

Opportunistically convert a few SGX WARNs that are similarly modified to
WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good
that they would fire repeatedly and spam the kernel log without providing
unique information in each print.

Note, defining pr_fmt yields undesirable results for code that uses KVM's
printk wrappers, e.g. vcpu_unimpl().  But, that's a pre-existing problem
as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's
wrappers is relatively limited in KVM x86 code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20221130230934.1014142-35-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:47:35 -05:00
Paolo Bonzini
fc471e8310 Merge branch 'kvm-late-6.1' into HEAD
x86:

* Change tdp_mmu to a read-only parameter

* Separate TDP and shadow MMU page fault paths

* Enable Hyper-V invariant TSC control

selftests:

* Use TAP interface for kvm_binary_stats_test and tsc_msrs_test

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:36:47 -05:00
Sean Christopherson
dfe0ecc6f5 KVM: x86/mmu: Pivot on "TDP MMU enabled" when handling direct page faults
When handling direct page faults, pivot on the TDP MMU being globally
enabled instead of checking if the target MMU is a TDP MMU.  Now that the
TDP MMU is all-or-nothing, if the TDP MMU is enabled, KVM will reach
direct_page_fault() if and only if the MMU is a TDP MMU.  When TDP is
enabled (obviously required for the TDP MMU), only non-nested TDP page
faults reach direct_page_fault(), i.e. nonpaging MMUs are impossible, as
NPT requires paging to be enabled and EPT faults use ept_page_fault().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221012181702.3663607-8-seanjc@google.com>
[Use tdp_mmu_enabled variable. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:26 -05:00
Sean Christopherson
78fdd2f09f KVM: x86/mmu: Pivot on "TDP MMU enabled" to check if active MMU is TDP MMU
Simplify and optimize the logic for detecting if the current/active MMU
is a TDP MMU.  If the TDP MMU is globally enabled, then the active MMU is
a TDP MMU if it is direct.  When TDP is enabled, so called nonpaging MMUs
are never used as the only form of shadow paging KVM uses is for nested
TDP, and the active MMU can't be direct in that case.

Rename the helper and take the vCPU instead of an arbitrary MMU, as
nonpaging MMUs can show up in the walk_mmu if L1 is using nested TDP and
L2 has paging disabled.  Taking the vCPU has the added bonus of cleaning
up the callers, all of which check the current MMU but wrap code that
consumes the vCPU.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221012181702.3663607-9-seanjc@google.com>
[Use tdp_mmu_enabled variable. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:25 -05:00
Sean Christopherson
de0322f575 KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page()
Use is_tdp_mmu_page() instead of querying sp->tdp_mmu_page directly so
that all users benefit if KVM ever finds a way to optimize the logic.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221012181702.3663607-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:25 -05:00
David Matlack
6c882ef4fc KVM: x86/mmu: Rename __direct_map() to direct_map()
Rename __direct_map() to direct_map() since the leading underscores are
unnecessary. This also makes the page fault handler names more
consistent: kvm_tdp_mmu_page_fault() calls kvm_tdp_mmu_map() and
direct_page_fault() calls direct_map().

Opportunistically make some trivial cleanups to comments that had to be
modified anyway since they mentioned __direct_map(). Specifically, use
"()" when referring to functions, and include kvm_tdp_mmu_map() among
the various callers of disallowed_hugepage_adjust().

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-11-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:24 -05:00
David Matlack
9f33697ac7 KVM: x86/mmu: Stop needlessly making MMU pages available for TDP MMU faults
Stop calling make_mmu_pages_available() when handling TDP MMU faults.
The TDP MMU does not participate in the "available MMU pages" tracking
and limiting so calling this function is unnecessary work when handling
TDP MMU faults.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-10-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:23 -05:00
David Matlack
9aa8ab43b3 KVM: x86/mmu: Split out TDP MMU page fault handling
Split out the page fault handling for the TDP MMU to a separate
function.  This creates some duplicate code, but makes the TDP MMU fault
handler simpler to read by eliminating branches and will enable future
cleanups by allowing the TDP MMU and non-TDP MMU fault paths to diverge.

Only compile in the TDP MMU fault handler for 64-bit builds since
kvm_tdp_mmu_map() does not exist in 32-bit builds.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-9-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:22 -05:00
David Matlack
e5e6f8d254 KVM: x86/mmu: Initialize fault.{gfn,slot} earlier for direct MMUs
Move the initialization of fault.{gfn,slot} earlier in the page fault
handling code for fully direct MMUs. This will enable a future commit to
split out TDP MMU page fault handling without needing to duplicate the
initialization of these 2 fields.

Opportunistically take advantage of the fact that fault.gfn is
initialized in kvm_tdp_page_fault() rather than recomputing it from
fault->addr.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-8-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:21 -05:00
David Matlack
354c908c06 KVM: x86/mmu: Handle no-slot faults in kvm_faultin_pfn()
Handle faults on GFNs that do not have a backing memslot in
kvm_faultin_pfn() and drop handle_abnormal_pfn(). This eliminates
duplicate code in the various page fault handlers.

Opportunistically tweak the comment about handling gfn > host.MAXPHYADDR
to reflect that the effect of returning RET_PF_EMULATE at that point is
to avoid creating an MMIO SPTE for such GFNs.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:21 -05:00
David Matlack
cd08d178ff KVM: x86/mmu: Avoid memslot lookup during KVM_PFN_ERR_HWPOISON handling
Pass the kvm_page_fault struct down to kvm_handle_error_pfn() to avoid a
memslot lookup when handling KVM_PFN_ERR_HWPOISON. Opportunistically
move the gfn_to_hva_memslot() call and @current down into
kvm_send_hwpoison_signal() to cut down on line lengths.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:20 -05:00
David Matlack
56c3a4e4a2 KVM: x86/mmu: Handle error PFNs in kvm_faultin_pfn()
Handle error PFNs in kvm_faultin_pfn() rather than relying on the caller
to invoke handle_abnormal_pfn() after kvm_faultin_pfn().
Opportunistically rename kvm_handle_bad_page() to kvm_handle_error_pfn()
to make it more consistent with is_error_pfn().

This commit moves KVM closer to being able to drop
handle_abnormal_pfn(), which will reduce the amount of duplicate code in
the various page fault handlers.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-5-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:19 -05:00
David Matlack
ba6e3fe255 KVM: x86/mmu: Grab mmu_invalidate_seq in kvm_faultin_pfn()
Grab mmu_invalidate_seq in kvm_faultin_pfn() and stash it in struct
kvm_page_fault. The eliminates duplicate code and reduces the amount of
parameters needed for is_page_fault_stale().

Preemptively split out __kvm_faultin_pfn() to a separate function for
use in subsequent commits.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-4-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:18 -05:00
David Matlack
09732d2b4d KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled
Move kvm_mmu_{init,uninit}_tdp_mmu() behind tdp_mmu_enabled. This makes
these functions consistent with the rest of the calls into the TDP MMU
from mmu.c, and which is now possible since tdp_mmu_enabled is only
modified when the x86 vendor module is loaded. i.e. It will never change
during the lifetime of a VM.

This change also enabled removing the stub definitions for 32-bit KVM,
as the compiler will just optimize the calls out like it does for all
the other TDP MMU functions.

No functional change intended.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:17 -05:00
David Matlack
1f98f2bd8e KVM: x86/mmu: Change tdp_mmu to a read-only parameter
Change tdp_mmu to a read-only parameter and drop the per-vm
tdp_mmu_enabled. For 32-bit KVM, make tdp_mmu_enabled a macro that is
always false so that the compiler can continue omitting cals to the TDP
MMU.

The TDP MMU was introduced in 5.10 and has been enabled by default since
5.15. At this point there are no known functionality gaps between the
TDP MMU and the shadow MMU, and the TDP MMU uses less memory and scales
better with the number of vCPUs. In other words, there is no good reason
to disable the TDP MMU on a live system.

Purposely do not drop tdp_mmu=N support (i.e. do not force 64-bit KVM to
always use the TDP MMU) since tdp_mmu=N is still used to get test
coverage of KVM's shadow MMU TDP support, which is used in 32-bit KVM.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220921173546.2674386-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-29 15:33:16 -05:00