Currently if architectures want to support HOTPLUG_SMT they need to
provide a topology_is_primary_thread() telling the framework which
thread in the SMT cannot offline. However arm64 doesn't have a
restriction on which thread in the SMT cannot offline, a simplest
choice is that just make 1st thread as the "primary" thread. So
just make this as the default implementation in the framework and
let architectures like x86 that have special primary thread to
override this function (which they've already done).
There's no need to provide a stub function if !CONFIG_SMP or
!CONFIG_HOTPLUG_SMT. In such case the testing CPU is already
the 1st CPU in the SMT so it's always the primary thread.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20250311075143.61078-2-yangyicong@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The architecture specific parts of resctrl provide helpers like
is_mbm_total_enabled() and is_mbm_local_enabled() to hide accesses to the
rdt_mon_features bitmap.
Exposing a group of helpers between the architecture and filesystem code is
preferable to a single unsigned-long like rdt_mon_features. Helpers can be more
readable and have a well defined behaviour, while allowing architectures to hide
more complex behaviour.
Once the filesystem parts of resctrl are moved, these existing helpers can no
longer live in internal.h. Move them to include/linux/resctrl.h Once these are
exposed to the wider kernel, they should have a 'resctrl_arch_' prefix, to fit
the rest of the arch<->fs interface.
Move and rename the helpers that touch rdt_mon_features directly. is_mbm_event()
and is_mbm_enabled() are only called from rdtgroup.c, so can be moved into that
file.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-19-james.morse@arm.com
When resctrl is fully factored into core and per-arch code, each arch will
need to use some resctrl common definitions in order to define its own
specializations and helpers. Following conventional practice, it would be
desirable to put the dependent arch definitions in an <asm/resctrl.h> header
that is included by the common <linux/resctrl.h> header. However, this can
make it awkward to avoid a circular dependency between <linux/resctrl.h> and
the arch header.
To avoid such dependencies, move the affected common types and constants into
a new header that does not need to depend on <linux/resctrl.h> or on the arch
headers.
The same logic applies to the monitor-configuration defines, move these too.
Some kind of enumeration for events is needed between the filesystem and
architecture code. Take the x86 definition as its convenient for x86.
The definition of enum resctrl_event_id is needed to allow the architecture
code to define resctrl_arch_mon_ctx_alloc() and resctrl_arch_mon_ctx_free().
The definition of enum resctrl_res_level is needed to allow the architecture
code to define resctrl_arch_set_cdp_enabled() and
resctrl_arch_get_cdp_enabled().
The bits for mbm_local_bytes_config et al are ABI, and must be the same on all
architectures. These are documented in Documentation/arch/x86/resctrl.rst
The maintainers entry for these headers was missed when resctrl.h was created.
Add a wildcard entry to match both resctrl.h and resctrl_types.h.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-14-james.morse@arm.com
rdtgroup_rmdir_ctrl() and rdtgroup_rmdir_mon() set the per-CPU pqr_state for
CPUs that were part of the rmdir()'d group.
Another architecture might not have a 'pqr_state', its hardware may need the
values in a different format. MPAM's equivalent of RMID values are not unique,
and always need the CLOSID to be provided too.
There is only one caller that modifies a single value, (rdtgroup_rmdir_mon()).
MPAM always needs both CLOSID and RMID for the hardware value as these are
written to the same system register.
As rdtgroup_rmdir_mon() has the CLOSID on hand, only provide a helper to set
both values. These values are read by __resctrl_sched_in(), but may be written
by a different CPU without any locking, add READ/WRTE_ONCE() to avoid torn
values.
Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Tested-by: Amit Singh Tomar <amitsinght@marvell.com> # arm64
Tested-by: Shanker Donthineni <sdonthineni@nvidia.com> # arm64
Tested-by: Babu Moger <babu.moger@amd.com>
Link: https://lore.kernel.org/r/20250311183715.16445-10-james.morse@arm.com
When extra warnings are enabled, the cc_mask definition in <asm/coco.h>
causes a build failure with GCC:
arch/x86/include/asm/coco.h:28:18: error: 'cc_mask' defined but not used [-Werror=unused-const-variable=]
28 | static const u64 cc_mask = 0;
Add a cc_get_mask() function mirroring cc_set_mask() for the one
user of the variable outside of the CoCo implementation.
Fixes: a0a8d15a79 ("x86/tdx: Preserve shared bit on mprotect()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250310131114.2635497-1-arnd@kernel.org
--
v2: use an inline helper instead of a __maybe_unused annotaiton.
Pull KVM fixes from Paolo Bonzini:
"arm64:
- Fix a couple of bugs affecting pKVM's PSCI relay implementation
when running in the hVHE mode, resulting in the host being entered
with the MMU in an unknown state, and EL2 being in the wrong mode
x86:
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow
- Ensure DEBUGCTL is context switched on AMD to avoid running the
guest with the host's value, which can lead to unexpected bus lock
#DBs
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't
properly emulate BTF. KVM's lack of context switching has meant BTF
has always been broken to some extent
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as
the guest can enable DebugSwap without KVM's knowledge
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes
to RO memory" phase without actually generating a write-protection
fault
- Fix a printf() goof in the SEV smoke test that causes build
failures with -Werror
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when
PERFMON_V2 isn't supported by KVM"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Explicitly zero EAX and EBX when PERFMON_V2 isn't supported by KVM
KVM: selftests: Fix printf() format goof in SEV smoke test
KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage
KVM: SVM: Don't rely on DebugSwap to restore host DR0..DR3
KVM: SVM: Save host DR masks on CPUs with DebugSwap
KVM: arm64: Initialize SCTLR_EL1 in __kvm_hyp_init_cpu()
KVM: arm64: Initialize HCR_EL2.E2H early
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: SVM: Manually context switch DEBUGCTL if LBR virtualization is disabled
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: SVM: Suppress DEBUGCTL.BTF on AMD
KVM: SVM: Drop DEBUGCTL[5:2] from guest's effective value
KVM: selftests: Assert that STI blocking isn't set after event injection
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow
KVM x86 fixes for 6.14-rcN #2
- Set RFLAGS.IF in C code on SVM to get VMRUN out of the STI shadow.
- Ensure DEBUGCTL is context switched on AMD to avoid running the guest with
the host's value, which can lead to unexpected bus lock #DBs.
- Suppress DEBUGCTL.BTF on AMD (to match Intel), as KVM doesn't properly
emulate BTF. KVM's lack of context switching has meant BTF has always been
broken to some extent.
- Always save DR masks for SNP vCPUs if DebugSwap is *supported*, as the guest
can enable DebugSwap without KVM's knowledge.
- Fix a bug in mmu_stress_tests where a vCPU could finish the "writes to RO
memory" phase without actually generating a write-protection fault.
- Fix a printf() goof in the SEV smoke test that causes build failures with
-Werror.
- Explicitly zero EAX and EBX in CPUID.0x8000_0022 output when PERFMON_V2
isn't supported by KVM.
To support multiple PTP clocks, the VDSO data structure needs to be
reworked. All clock specific data will end up in struct vdso_clock and in
struct vdso_time_data there will be array of VDSO clocks. At the moment,
vdso_clock is simply a define which maps vdso_clock to vdso_time_data.
To prepare for the rework of the data structures, replace the struct
vdso_time_data pointer with a struct vdso_clock pointer where applicable.
No functional change.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250303-vdso-clock-v1-15-c1b5c69a166f@linutronix.de
Andy reported the following build warning from head_32.S:
In file included from arch/x86/kernel/head_32.S:29:
arch/x86/include/asm/pgtable_32.h:59:5: error: "PTRS_PER_PMD" is not defined, evaluates to 0 [-Werror=undef]
59 | #if PTRS_PER_PMD > 1
The reason is that on 2-level i386 paging the folded in PMD's
PTRS_PER_PMD constant is not defined in assembly headers,
only in generic MM C headers.
Instead of trying to fish out the definition from the generic
headers, just define it - it even has a comment for it already...
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/Z8oa8AUVyi2HWfo9@gmail.com
Compared to the SNP Guest Request, the "Extended" version adds data pages for
receiving certificates. If not enough pages provided, the HV can report to the
VM how much is needed so the VM can reallocate and repeat.
Commit
ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
moved handling of the allocated/desired pages number out of scope of said
mutex and create a possibility for a race (multiple instances trying to
trigger Extended request in a VM) as there is just one instance of
snp_msg_desc per /dev/sev-guest and no locking other than snp_cmd_mutex.
Fix the issue by moving the data blob/size and the GHCB input struct
(snp_req_data) into snp_guest_req which is allocated on stack now and accessed
by the GHCB caller under that mutex.
Stop allocating SEV_FW_BLOB_MAX_SIZE in snp_msg_alloc() as only one of four
callers needs it. Free the received blob in get_ext_report() right after it is
copied to the userspace. Possible future users of snp_send_guest_request() are
likely to have different ideas about the buffer size anyways.
Fixes: ae596615d9 ("virt: sev-guest: Reduce the scope of SNP command mutex")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250307013700.437505-3-aik@amd.com
Background:
===========
Currently kernel-mode FPU is not always usable in softirq context on
x86, since softirqs can nest inside a kernel-mode FPU section in task
context, and nested use of kernel-mode FPU is not supported.
Therefore, x86 SIMD-optimized code that can be called in softirq context
has to sometimes fall back to non-SIMD code. There are two options for
the fallback, both of which are pretty terrible:
(a) Use a scalar fallback. This can be 10-100x slower than vectorized
code because it cannot use specialized instructions like AES, SHA,
or carryless multiplication.
(b) Execute the request asynchronously using a kworker. In other
words, use the "crypto SIMD helper" in crypto/simd.c.
Currently most of the x86 en/decryption code (skcipher and aead
algorithms) uses option (b), since this avoids the slow scalar fallback
and it is easier to wire up. But option (b) is still really bad for its
own reasons:
- Punting the request to a kworker is bad for performance too.
- It forces the algorithm to be marked as asynchronous
(CRYPTO_ALG_ASYNC), preventing it from being used by crypto API
users who request a synchronous algorithm. That's another huge
performance problem, which is especially unfortunate for users who
don't even do en/decryption in softirq context.
- It makes all en/decryption operations take a detour through
crypto/simd.c. That involves additional checks and an additional
indirect call, which slow down en/decryption for *everyone*.
Fortunately, the skcipher and aead APIs are only usable in task and
softirq context in the first place. Thus, if kernel-mode FPU were to be
reliably usable in softirq context, no fallback would be needed.
Indeed, other architectures such as arm, arm64, and riscv have already
done this.
Changes implemented:
====================
Therefore, this patch updates x86 accordingly to reliably support
kernel-mode FPU in softirqs.
This is done by just disabling softirq processing in kernel-mode FPU
sections (when hardirqs are not already disabled), as that prevents the
nesting that was problematic.
This will delay some softirqs slightly, but only ones that would have
otherwise been nested inside a task context kernel-mode FPU section.
Any such softirqs would have taken the slow fallback path before if they
tried to do any en/decryption. Now these softirqs will just run at the
end of the task context kernel-mode FPU section (since local_bh_enable()
runs pending softirqs) and will no longer take the slow fallback path.
Alternatives considered:
========================
- Make kernel-mode FPU sections fully preemptible. This would require
growing task_struct by another struct fpstate which is more than 2K.
- Make softirqs save/restore the kernel-mode FPU state to a per-CPU
struct fpstate when nested use is detected. Somewhat interesting, but
seems unnecessary when a simpler solution exists.
Performance results:
====================
I did some benchmarks with AES-XTS encryption of 16-byte messages (which is
unrealistically small, but this makes it easier to see the overhead of
kernel-mode FPU...). The baseline was 384 MB/s. Removing the use of
crypto/simd.c, which this work makes possible, increases it to 487 MB/s,
a +27% improvement in throughput.
CPU was AMD Ryzen 9 9950X (Zen 5). No debugging options were enabled.
[ mingo: Prettified the changelog and added performance results. ]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Link: https://lore.kernel.org/r/20250304204954.3901-1-ebiggers@kernel.org
Also change the alignment of the percpu hot section:
- PERCPU_SECTION(INTERNODE_CACHE_BYTES)
+ PERCPU_SECTION(L1_CACHE_BYTES)
As vSMP will muck with INTERNODE_CACHE_BYTES that invalidates the
too-large-section assert we do:
ASSERT(__per_cpu_hot_end - __per_cpu_hot_start <= 64, "percpu cache hot section too large")
[ mingo: Added INTERNODE_CACHE_BYTES fix & explanation. ]
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250303165246.2175811-3-brgerst@gmail.com
Separate the input from the clobbers in preparation for appending the
input.
Do this in preparation of changing the ASM_CALL_CONSTRAINT primitive.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
smp_store_cpu_info() is just a wrapper around identify_secondary_cpu()
without further value.
Move the extra bits from smp_store_cpu_info() into identify_secondary_cpu()
and remove the wrapper.
[ darwi: Make it compile and fix up the xen/smp_pv.c instance ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-9-darwi@linutronix.de
Commit:
e0ba94f14f ("x86/tlb_info: get last level TLB entry number of CPU")
introduced u16 "info" arrays for each TLB type.
Since 2012 and each array stores just one type of information: the
number of TLB entries for its respective TLB type.
Replace such arrays with simple variables.
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-8-darwi@linutronix.de
<asm/cpuid.h> uses static_assert() at multiple locations but it does not
include the CPP macro's definition at linux/build_bug.h.
Include the needed header to make <asm/cpuid.h> self-sufficient.
This gets triggered when cpuid.h is included in new C files, which is to
be done in further commits.
Fixes: 43d86e3cd9 ("x86/cpu: Provide cpuid_read() et al.")
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-5-darwi@linutronix.de
CALL_NOSPEC macro is used to generate Spectre-v2 mitigation friendly
indirect branches. At compile time the macro defaults to indirect branch,
and at runtime those can be patched to thunk based mitigations.
This approach is opposite of what is done for the rest of the kernel, where
the compile time default is to replace indirect calls with retpoline thunk
calls.
Make CALL_NOSPEC consistent with the rest of the kernel, default to
retpoline thunk at compile time when CONFIG_MITIGATION_RETPOLINE is
enabled.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-1-96599fed0f33@linux.intel.com
Fix some related issues (done in a single patch to avoid introducing
intermediate bisect warnings):
1) The SMP version of mwait_play_dead() doesn't return, but its
!SMP counterpart does. Make its calling behavior consistent by
resolving the !SMP version to a BUG(). It should never be called
anyway, this just enforces that at runtime and enables its callers
to be marked as __noreturn.
2) While the SMP definition of mwait_play_dead() is annotated as
__noreturn, the declaration isn't. Nor is it listed in
tools/objtool/noreturns.h. Fix that.
3) Similar to #1, the SMP version of acpi_processor_ffh_play_dead()
doesn't return but its !SMP counterpart does. Make the !SMP
version a BUG(). It should never be called.
4) acpi_processor_ffh_play_dead() doesn't return, but is lacking any
__noreturn annotations. Fix that.
This fixes the following objtool warnings:
vmlinux.o: warning: objtool: acpi_processor_ffh_play_dead+0x67: mwait_play_dead() is missing a __noreturn annotation
vmlinux.o: warning: objtool: acpi_idle_play_dead+0x3c: acpi_processor_ffh_play_dead() is missing a __noreturn annotation
Fixes: a7dd183f0b ("x86/smp: Allow calling mwait_play_dead with an arbitrary hint")
Fixes: 541ddf31e3 ("ACPI/processor_idle: Add FFH state handling")
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/e885c6fa9e96a61471b33e48c2162d28b15b14c5.1740962711.git.jpoimboe@kernel.org
The safe_smp_processor_id() function was originally implemented in:
dc2bc768a0 ("stack overflow safe kdump: safe_smp_processor_id()")
to mitigate the CPU number corruption on a stack overflow. At the time,
x86-32 stored the CPU number in thread_struct, which was located at the
bottom of the task stack and thus vulnerable to an overflow.
The CPU number is now located in percpu memory, so this workaround
is no longer needed.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/20250303170115.2176553-1-brgerst@gmail.com
When handling an "AP Create" event, return an error if the "requested" SEV
features for the vCPU don't exactly match KVM's view of the VM-scoped
features. There is no known use case for heterogeneous SEV features across
vCPUs, and while KVM can't actually enforce an exact match since the value
in RAX isn't guaranteed to match what the guest shoved into the VMSA, KVM
can at least avoid knowingly letting the guest run in an unsupported state.
E.g. if a VM is created with DebugSwap disabled, KVM will intercept #DBs
and DRs for all vCPUs, even if an AP is "created" with DebugSwap enabled in
its VMSA.
Note, the GHCB spec only "requires" that "AP use the same interrupt
injection mechanism as the BSP", but given the disaster that is DebugSwap
and SEV_FEATURES in general, it's safe to say that AMD didn't consider all
possible complications with mismatching features between the BSP and APs.
Opportunistically fold the check into the relevant request flavors; the
"request < AP_DESTROY" check is just a bizarre way of implementing the
AP_CREATE_ON_INIT => AP_CREATE fallthrough.
Fixes: e366f92ea9 ("KVM: SEV: Support SEV-SNP AP Creation NAE event")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20250227012541.3234589-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
CALL_NOSPEC macro is used to generate Spectre-v2 mitigation friendly
indirect branches. At compile time the macro defaults to indirect branch,
and at runtime those can be patched to thunk based mitigations.
This approach is opposite of what is done for the rest of the kernel, where
the compile time default is to replace indirect calls with retpoline thunk
calls.
Make CALL_NOSPEC consistent with the rest of the kernel, default to
retpoline thunk at compile time when CONFIG_MITIGATION_RETPOLINE is
enabled.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250228-call-nospec-v3-1-96599fed0f33@linux.intel.com
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Cc: stable@vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Define independent macros for the RWX protection bits that are enumerated
via EXIT_QUALIFICATION for EPT Violations, and tie them to the RWX bits in
EPT entries via compile-time asserts. Piggybacking the EPTE defines works
for now, but it creates holes in the EPT_VIOLATION_xxx macros and will
cause headaches if/when KVM emulates Mode-Based Execution (MBEC), or any
other features that introduces additional protection information.
Opportunistically rename EPT_VIOLATION_RWX_MASK to EPT_VIOLATION_PROT_MASK
so that it doesn't become stale if/when MBEC support is added.
No functional change intended.
Cc: Jon Kohler <jon@nutanix.com>
Cc: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250227000705.3199706-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Those defines are only used in the definition of the various
EPT_VIOLATIONS_ACC_* macros which are then used to extract respective
bits from vmexit error qualifications. Remove the _BIT defines and
redefine the _ACC ones via BIT() macro. No functional changes.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Link: https://lore.kernel.org/r/20250227000705.3199706-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Commit:
03b122da74 ("x86/sgx: Hook arch_memory_failure() into mainline code")
... added <linux/mm.h> to <asm/set_memory.h> to provide some helpers.
However the following commit:
b3fdf9398a ("x86/mce: relocate set{clear}_mce_nospec() functions")
... moved the inline definitions someplace else, and now <asm/set_memory.h>
just declares a bunch of mostly self-contained functions.
No need for the whole <linux/mm.h> inclusion to declare functions; just
remove that include. This helps avoid circular dependency headaches
(e.g. if <linux/mm.h> ends up including <linux/set_memory.h>).
This change requires a couple of include fixups not to break the
build:
* <asm/smp.h>: including <asm/thread_info.h> directly relies on
<linux/thread_info.h> having already been included, because the
former needs the BAD_STACK/NOT_STACK constants defined in the
latter. This is no longer the case when <asm/smp.h> is included from
some driver file - just include <linux/thread_info.h> to stay out
of trouble.
* sev-guest.c relies on <asm/set_memory.h> including <linux/mm.h>,
so we just need to make that include explicit.
[ mingo: Cleaned up the changelog ]
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20241212080904.2089632-3-kevin.brodsky@arm.com