linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-20 00:31:51 +00:00

Author	SHA1	Message	Date
Yosry Ahmed	549435aab4	x86/bugs: Move the X86_FEATURE_USE_IBPB check into callers indirect_branch_prediction_barrier() only performs the MSR write if X86_FEATURE_USE_IBPB is set, using alternative_msr_write(). In preparation for removing X86_FEATURE_USE_IBPB, move the feature check into the callers so that they can be addressed one-by-one, and use X86_FEATURE_IBPB instead to guard the MSR write. Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20250227012712.3193063-2-yosry.ahmed@linux.dev	2025-02-27 10:57:20 +01:00
Uros Bizjak	adf6819278	x86/bootflag: Micro-optimize sbf_write() Change parity bit with XOR when !parity instead of masking bit out and conditionally setting it when !parity. Saves a couple of bytes in the object file. Co-developed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250226153709.6370-1-ubizjak@gmail.com	2025-02-27 10:14:00 +01:00
Christophe Leroy	d856bc3ac7	static_call_inline: Provide trampoline address when updating sites In preparation of support of inline static calls on powerpc, provide trampoline address when updating sites, so that when the destination function is too far for a direct function call, the call site is patched with a call to the trampoline. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/5efe0cffc38d6f69b1ec13988a99f1acff551abf.1733245362.git.christophe.leroy@csgroup.eu	2025-02-26 21:09:43 +05:30
Benjamin Berg	5d3b81d4d8	x86/fpu: Avoid copying dynamic FP state from init_task in arch_dup_task_struct() The init_task instance of struct task_struct is statically allocated and may not contain the full FP state for userspace. As such, limit the copy to the valid area of both init_task and 'dst' and ensure all memory is initialized. Note that the FP state is only needed for userspace, and as such it is entirely reasonable for init_task to not contain parts of it. Fixes: `5aaeb5c01c` ("x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86") Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250226133136.816901-1-benjamin@sipsolutions.net ---- v2: - Fix code if arch_task_struct_size < sizeof(init_task) by using memcpy_and_pad.	2025-02-26 15:32:34 +01:00
Borislav Petkov	8442df2b49	x86/bugs: KVM: Add support for SRSO_MSR_FIX Add support for CPUID Fn8000_0021_EAX[31] (SRSO_MSR_FIX). If this bit is 1, it indicates that software may use MSR BP_CFG[BpSpecReduce] to mitigate SRSO. Enable BpSpecReduce to mitigate SRSO across guest/host boundaries. Switch back to enabling the bit when virtualization is enabled and to clear the bit when virtualization is disabled because using a MSR slot would clear the bit when the guest is exited and any training the guest has done, would potentially influence the host kernel when execution enters the kernel and hasn't VMRUN the guest yet. More detail on the public thread in Link below. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241202120416.6054-1-bp@kernel.org	2025-02-26 15:13:06 +01:00
Peter Zijlstra	dfebe7362f	x86/ibt: Optimize the fineibt-bhi arity 1 case Saves a CALL to an out-of-line thunk for the common case of 1 argument. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.927885784@infradead.org	2025-02-26 13:49:11 +01:00
Peter Zijlstra	0c92385dc0	x86/ibt: Implement FineIBT-BHI mitigation While WAIT_FOR_ENDBR is specified to be a full speculation stop; it has been shown that some implementations are 'leaky' to such an extend that speculation can escape even the FineIBT preamble. To deal with this, add additional hardening to the FineIBT preamble. Notably, using a new LLVM feature: `e223485c9b` which encodes the number of arguments in the kCFI preamble's register. Using this register<->arity mapping, have the FineIBT preamble CALL into a stub clobbering the relevant argument registers in the speculative case. Scott sayeth thusly: Microarchitectural attacks such as Branch History Injection (BHI) and Intra-mode Branch Target Injection (IMBTI) [1] can cause an indirect call to mispredict to an adversary-influenced target within the same hardware domain (e.g., within the kernel). Instructions at the mispredicted target may execute speculatively and potentially expose kernel data (e.g., to a user-mode adversary) through a microarchitectural covert channel such as CPU cache state. CET-IBT [2] is a coarse-grained control-flow integrity (CFI) ISA extension that enforces that each indirect call (or indirect jump) must land on an ENDBR (end branch) instruction, even speculatively. FineIBT is a software technique that refines CET-IBT by associating each function type with a 32-bit hash and enforcing (at the callee) that the hash of the caller's function pointer type matches the hash of the callee's function type. However, recent research [3] has demonstrated that the conditional branch that enforces FineIBT's hash check can be coerced to mispredict, potentially allowing an adversary to speculatively bypass the hash check: __cfi_foo: ENDBR64 SUB R10d, 0x01234567 JZ foo # Even if the hash check fails and ZF=0, this branch could still mispredict as taken UD2 foo: ... The techniques demonstrated in [3] require the attacker to be able to control the contents of at least one live register at the mispredicted target. Therefore, this patch set introduces a sequence of CMOV instructions at each indirect-callable target that poisons every live register with data that the attacker cannot control whenever the FineIBT hash check fails, thus mitigating any potential attack. The security provided by this scheme has been discussed in detail on an earlier thread [4]. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html [2] Intel Software Developer's Manual, Volume 1, Chapter 18 [3] https://www.vusec.net/projects/native-bhi/ [4] https://lore.kernel.org/lkml/20240927194925.707462984@infradead.org/ There are some caveats for certain processors, see [1] for more info Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.820402212@infradead.org	2025-02-26 13:49:11 +01:00
Ingo Molnar	ac3144f91b	Merge tag 'v6.14-rc4' into x86/fpu, to pick up fixes and refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-02-26 13:05:16 +01:00
Peter Zijlstra	97e59672a9	x86/ibt: Add paranoid FineIBT mode Due to concerns about circumvention attacks against FineIBT on 'naked' ENDBR, add an additional caller side hash check to FineIBT. This should make it impossible to pivot over such a 'naked' ENDBR instruction at the cost of an additional load. The specific pivot reported was against the SYSCALL entry site and FRED will have all those holes fixed up. https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/ This specific fineibt_paranoid_start[] sequence was concocted by Scott. Suggested-by: Scott Constable <scott.d.constable@intel.com> Reported-by: Jennifer Miller <jmill@asu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.598033084@infradead.org	2025-02-26 12:27:45 +01:00
Peter Zijlstra	029f718fed	x86/traps: Decode LOCK Jcc.d8 as #UD Because overlapping code sequences are all the rage. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.486463917@infradead.org	2025-02-26 12:24:17 +01:00
Peter Zijlstra	06926c6cdb	x86/ibt: Optimize the FineIBT instruction sequence Scott notes that non-taken branches are faster. Abuse overlapping code that traps instead of explicit UD2 instructions. And LEA does not modify flags and will have less dependencies. Suggested-by: Scott Constable <scott.d.constable@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.371942555@infradead.org	2025-02-26 12:24:09 +01:00
Peter Zijlstra	e33d805a10	x86/traps: Allow custom fixups in handle_bug() The normal fixup in handle_bug() is simply continuing at the next instruction. However upcoming patches make this the wrong thing, so allow handlers (specifically handle_cfi_failure()) to over-ride regs->ip. The callchain is such that the fixup needs to be done before it is determined if the exception is fatal, as such, revert any changes in that case. Additionally, have handle_cfi_failure() remember the regs->ip value it starts with for reporting. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.275223080@infradead.org	2025-02-26 12:22:39 +01:00
Peter Zijlstra	2e044911be	x86/traps: Decode 0xEA instructions as #UD FineIBT will start using 0xEA as #UD. Normally '0xEA' is a 'bad', invalid instruction for the CPU. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.166774696@infradead.org	2025-02-26 12:22:10 +01:00
Nikolay Borisov	6447828875	x86/mce/inject: Remove call to mce_notify_irq() The call to mce_notify_irq() has been there since the initial version of the soft inject mce machinery, introduced in `ea149b36c7` ("x86, mce: add basic error injection infrastructure"). At that time it was functional since injecting an MCE resulted in the following call chain: raise_mce() ->machine_check_poll() ->mce_log() - sets notfiy_user_bit ->mce_notify_user() (current mce_notify_irq) consumed the bit and called the usermode helper. However, with the introduction of `011d826111` ("RAS: Add a Corrected Errors Collector") the code got moved around and the usermode helper began to be called via the early notifier mce_first_notifier() rendering the call in raise_local() defunct as the mce_need_notify bit (ex notify_user) is only being set from the early notifier. Remove the noop call and make mce_notify_irq() static. No functional changes. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20250225143348.268469-1-nik.borisov@suse.com	2025-02-26 12:18:37 +01:00
Ingo Molnar	5d703825fd	x86/alternatives: Clean up preprocessor conditional block comments When in the middle of a kernel source code file a kernel developer sees a lone #else or #endif: ... #else ... It's not obvious at a glance what those preprocessor blocks are conditional on, if the starting #ifdef is outside visible range. So apply the standard pattern we use in such cases elsewhere in the kernel for large preprocessor blocks: #ifdef CONFIG_XXX ... ... ... #endif /* CONFIG_XXX / ... #ifdef CONFIG_XXX ... ... ... #else / !CONFIG_XXX: / ... ... ... #endif / !CONFIG_XXX / ( Note that in the #else case we use the / !CONFIG_XXX / marker in the final #endif, not / CONFIG_XXX */, which serves as an easy visual marker to differentiate #else or #elif related #endif closures from singular #ifdef/#endif blocks. ) Also clean up __CFI_DEFAULT definition with a bit more vertical alignment applied, and a pointless tab converted to the standard space we use in such definitions. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org>	2025-02-26 12:15:15 +01:00
Peter Zijlstra	500a41acb0	x86/ibt: Add exact_endbr() helper For when we want to exactly match ENDBR, and not everything that we can scribble it with. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124200.059556588@infradead.org	2025-02-26 12:11:18 +01:00
Peter Zijlstra	9a54fb3134	x86/cfi: Add 'cfi=warn' boot option Rebuilding with CONFIG_CFI_PERMISSIVE=y enabled is such a pain, esp. since clang is so slow. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Link: https://lore.kernel.org/r/20250224124159.924496481@infradead.org	2025-02-26 12:10:48 +01:00
Arnd Bergmann	9de7695925	x86/irq: Define trace events conditionally When both of X86_LOCAL_APIC and X86_THERMAL_VECTOR are disabled, the irq tracing produces a W=1 build warning for the tracing definitions: In file included from include/trace/trace_events.h:27, from include/trace/define_trace.h:113, from arch/x86/include/asm/trace/irq_vectors.h:383, from arch/x86/kernel/irq.c:29: include/trace/stages/init.h:2:23: error: 'str__irq_vectors__trace_system_name' defined but not used [-Werror=unused-const-variable=] Make the tracepoints conditional on the same symbosl that guard their usage. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250225213236.3141752-1-arnd@kernel.org	2025-02-25 22:44:35 +01:00
Russell Senior	bebe35bb73	x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems I still have some Soekris net4826 in a Community Wireless Network I volunteer with. These devices use an AMD SC1100 SoC. I am running OpenWrt on them, which uses a patched kernel, that naturally has evolved over time. I haven't updated the ones in the field in a number of years (circa 2017), but have one in a test bed, where I have intermittently tried out test builds. A few years ago, I noticed some trouble, particularly when "warm booting", that is, doing a reboot without removing power, and noticed the device was hanging after the kernel message: [ 0.081615] Working around Cyrix MediaGX virtual DMA bugs. If I removed power and then restarted, it would boot fine, continuing through the message above, thusly: [ 0.081615] Working around Cyrix MediaGX virtual DMA bugs. [ 0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor. [ 0.100000] Enable Memory access reorder on Cyrix/NSC processor. [ 0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 [ 0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1) [...] In order to continue using modern tools, like ssh, to interact with the software on these old devices, I need modern builds of the OpenWrt firmware on the devices. I confirmed that the warm boot hang was still an issue in modern OpenWrt builds (currently using a patched linux v6.6.65). Last night, I decided it was time to get to the bottom of the warm boot hang, and began bisecting. From preserved builds, I narrowed down the bisection window from late February to late May 2019. During this period, the OpenWrt builds were using 4.14.x. I was able to build using period-correct Ubuntu 18.04.6. After a number of bisection iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as the commit that introduced the warm boot hang. `07aaa7e3d6` Looking at the upstream changes in the stable kernel between 4.14.112 and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5 So, I tried reverting just that kernel change on top of the breaking OpenWrt commit, and my warm boot hang went away. Presumably, the warm boot hang is due to some register not getting cleared in the same way that a loss of power does. That is approximately as much as I understand about the problem. More poking/prodding and coaching from Jonas Gorski, it looks like this test patch fixes the problem on my board: Tested against v6.6.67 and v4.14.113. Fixes: `18fb053f9b` ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors") Debugged-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Russell Senior <russell@personaltelco.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2025-02-25 22:44:01 +01:00
Dmytro Maluka	96f41f644c	x86/of: Don't use DTB for SMP setup if ACPI is enabled There are cases when it is useful to use both ACPI and DTB provided by the bootloader, however in such cases we should make sure to prevent conflicts between the two. Namely, don't try to use DTB for SMP setup if ACPI is enabled. Precisely, this prevents at least: - incorrectly calling register_lapic_address(APIC_DEFAULT_PHYS_BASE) after the LAPIC was already successfully enumerated via ACPI, causing noisy kernel warnings and probably potential real issues as well - failed IOAPIC setup in the case when IOAPIC is enumerated via mptable instead of ACPI (e.g. with acpi=noirq), due to mpparse_parse_smp_config() overridden by x86_dtb_parse_smp_config() Signed-off-by: Dmytro Maluka <dmaluka@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250105172741.3476758-2-dmaluka@chromium.org	2025-02-25 22:13:02 +01:00
Thorsten Blum	8e8f030649	x86/mtrr: Remove unnecessary strlen() in mtrr_write() The local variable length already holds the string length after calling strncpy_from_user(). Using another local variable linlen and calling strlen() is therefore unnecessary and can be removed. Remove linlen and strlen() and use length instead. No change in functionality intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250225131621.329699-2-thorsten.blum@linux.dev	2025-02-25 20:50:55 +01:00
Waiman Long	fe37c699ae	x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() Depending on the type of panics, it was found that the __register_nmi_handler() function can be called in NMI context from nmi_shootdown_cpus() leading to a lockdep splat: WARNING: inconsistent lock state inconsistent {INITIAL USE} -> {IN-NMI} usage. lock(&nmi_desc[0].lock); <Interrupt> lock(&nmi_desc[0].lock); Call Trace: _raw_spin_lock_irqsave __register_nmi_handler nmi_shootdown_cpus kdump_nmi_shootdown_cpus native_machine_crash_shutdown __crash_kexec In this particular case, the following panic message was printed before: Kernel panic - not syncing: Fatal hardware error! This message seemed to be given out from __ghes_panic() running in NMI context. The __register_nmi_handler() function which takes the nmi_desc lock with irq disabled shouldn't be called from NMI context as this can lead to deadlock. The nmi_shootdown_cpus() function can only be invoked once. After the first invocation, all other CPUs should be stuck in the newly added crash_nmi_callback() and cannot respond to a second NMI. Fix it by adding a new emergency NMI handler to the nmi_desc structure and provide a new set_emergency_nmi_handler() helper to set crash_nmi_callback() in any context. The new emergency handler will preempt other handlers in the linked list. That will eliminate the need to take any lock and serve the panic in NMI use case. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20250206191844.131700-1-longman@redhat.com	2025-02-25 14:38:43 +01:00
Uros Bizjak	dc8bd769e7	x86/ioperm: Use atomic64_inc_return() in ksys_ioperm() Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref) to use optimized implementation on targets that define atomic_inc_return() and to remove now unneeded initialization of the %eax/%edx register pair before the call to atomic64_inc_return(). On x86_32 the code improves from: 1b0: b9 00 00 00 00 mov $0x0,%ecx 1b1: R_386_32 .bss 1b5: 89 43 0c mov %eax,0xc(%ebx) 1b8: 31 d2 xor %edx,%edx 1ba: b8 01 00 00 00 mov $0x1,%eax 1bf: e8 fc ff ff ff call 1c0 <ksys_ioperm+0xa8> 1c0: R_386_PC32 atomic64_add_return_cx8 1c4: 89 03 mov %eax,(%ebx) 1c6: 89 53 04 mov %edx,0x4(%ebx) to: 1b0: be 00 00 00 00 mov $0x0,%esi 1b1: R_386_32 .bss 1b5: 89 43 0c mov %eax,0xc(%ebx) 1b8: e8 fc ff ff ff call 1b9 <ksys_ioperm+0xa1> 1b9: R_386_PC32 atomic64_inc_return_cx8 1bd: 89 03 mov %eax,(%ebx) 1bf: 89 53 04 mov %edx,0x4(%ebx) Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250223161355.3607-1-ubizjak@gmail.com	2025-02-23 19:18:18 +01:00
Linus Torvalds	5cf80612d3	Merge tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix AVX-VNNI CPU feature dependency bug triggered via the 'noxsave' boot option - Fix typos in the SVA documentation - Add Tony Luck as RDT co-maintainer and remove Fenghua Yu * tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: docs: arch/x86/sva: Fix two grammar errors under Background and FAQ x86/cpufeatures: Make AVX-VNNI depend on AVX MAINTAINERS: Change maintainer for RDT	2025-02-22 10:45:02 -08:00
Dave Young	a2498e5c45	x86/kexec: Export e820_table_kexec[] to sysfs Previously the e820_table_kexec[] was exported to sysfs since kexec-tools uses the memmap entries to prepare the e820 table for the new kernel. The following commit, ~8 years ago, introduced e820_table_firmware[] and changed the behavior to export the firmware table instead: `12df216c61` ("x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table") Originally the kexec_file_load and kexec_load syscalls both used e820_table_kexec[]. Since the sysfs exported entries are from e820_table_firmware[] people now need to tune both tables for kexec. Restore the old behavior so the kexec_load and kexec_file_load syscalls work with only one table update. The e820_table_firmware[] is used by hibernation kernel code and it works without the sysfs exporting. Also remove the SEV e820_table_firmware[] updating code. Also update the code comments here and drop the comments about setup_data reservation since it is not needed any more after this change was made a year ago: `fc7f27cda8` ("x86/kexec: Do not update E820 kexec table for setup_data") [ mingo: Tidy up the changelog and comments. ] Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Link: https://lore.kernel.org/r/Z5jcb1GKhLvH8kDc@darkstar.users.ipa.redhat.com	2025-02-22 12:44:45 +01:00
Uros Bizjak	5bebe2e4fe	x86/boot: Change some static bootflag functions to bool The return values of some functions are of boolean type. Change the type of these function to bool and adjust their return values. No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250129154920.6773-1-ubizjak@gmail.com	2025-02-22 12:30:59 +01:00
Borislav Petkov (AMD)	50cef76d5c	x86/microcode/AMD: Load only SHA256-checksummed patches Load patches for which the driver carries a SHA256 checksum of the patch blob. This can be disabled by adding "microcode.amd_sha_check=off" on the kernel cmdline. But it is highly NOT recommended. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>	2025-02-22 11:46:05 +01:00
Nuno Das Neves	db912b8954	hyperv: Change hv_root_partition into a function Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>	2025-02-22 02:21:45 +00:00
Brian Gerst	684b12916a	x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32 process_64.c is not built on native 32-bit, so CONFIG_X86_32 will never be set. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-3-brgerst@gmail.com	2025-02-21 22:32:25 +01:00
Brian Gerst	2df1ad0d25	x86/arch_prctl: Simplify sys_arch_prctl() Use in_ia32_syscall() instead of a compat syscall entry. No change in functionality intended. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com	2025-02-21 22:32:25 +01:00
Thorsten Blum	0156338a18	x86/apic: Use str_disabled_enabled() helper in print_ipi_mode() Remove hard-coded strings by using the str_disabled_enabled() helper. No change in functionality intended. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250209210333.5666-2-thorsten.blum@linux.dev	2025-02-21 16:54:22 +01:00
Rik van Riel	f2c5c21058	x86/mm: Remove pv_ops.mmu.tlb_remove_table call Every pv_ops.mmu.tlb_remove_table call ends up calling tlb_remove_table. Get rid of the indirection by simply calling tlb_remove_table directly, and not going through the paravirt function pointers. Suggested-by: Qi Zheng <zhengqi.arch@bytedance.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Manali Shukla <Manali.Shukla@amd.com> Tested-by: Brendan Jackman <jackmanb@google.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250213161423.449435-3-riel@surriel.com	2025-02-21 16:20:12 +01:00
Rik van Riel	a37259732a	x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional Currently x86 uses CONFIG_MMU_GATHER_TABLE_FREE when using paravirt, and not when running on bare metal. There is no real good reason to do things differently for each setup. Make them all the same. Currently get_user_pages_fast synchronizes against page table freeing in two different ways: - on bare metal, by blocking IRQs, which block TLB flush IPIs - on paravirt, with MMU_GATHER_RCU_TABLE_FREE This is done because some paravirt TLB flush implementations handle the TLB flush in the hypervisor, and will do the flush even when the target CPU has interrupts disabled. Always handle page table freeing with MMU_GATHER_RCU_TABLE_FREE. Using RCU synchronization between page table freeing and get_user_pages_fast() allows bare metal to also do TLB flushing while interrupts are disabled. Various places in the mm do still block IRQs or disable preemption as an implicit way to block RCU frees. That makes it safe to use INVLPGB on AMD CPUs. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Manali Shukla <Manali.Shukla@amd.com> Tested-by: Brendan Jackman <jackmanb@google.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250213161423.449435-2-riel@surriel.com	2025-02-21 16:20:11 +01:00
Mike Rapoport (Microsoft)	efe659ac01	x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related code E820_TYPE_RESERVED_KERN is a relict from the ancient history that was used to early reserve setup_data, see: `28bb223795` ("x86: move reserve_setup_data to setup.c") Nowadays setup_data is anyway reserved in memblock and there is no point in carrying E820_TYPE_RESERVED_KERN that behaves exactly like E820_TYPE_RAM but only complicates the code. A bonus for removing E820_TYPE_RESERVED_KERN is a small but measurable speedup of 20 microseconds in init_mem_mappings() on a VM with 32GB or RAM. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20250214090651.3331663-5-rppt@kernel.org	2025-02-21 16:05:00 +01:00
Mike Rapoport (Microsoft)	d45dd0a9b2	x86/boot: Split parsing of boot_params into the parse_boot_params() helper function Makes setup_arch() a bit easier to comprehend. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-4-rppt@kernel.org	2025-02-21 16:05:00 +01:00
Mike Rapoport (Microsoft)	297fb82eba	x86/boot: Split kernel resources setup into the setup_kernel_resources() helper function Makes setup_arch() a bit easier to comprehend. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-3-rppt@kernel.org	2025-02-21 16:05:00 +01:00
Mike Rapoport (Microsoft)	2d6bff3139	x86/boot: Move setting of memblock parameters to e820__memblock_setup() Changing memblock parameters, namely bottom_up and allocation upper limit does not have any effect before memblock initialization in e820__memblock_setup(). Move the calls to memblock_set_bottom_up() and memblock_set_current_limit() to e820__memblock_setup() to group all the memblock initial setup and make setup_arch() more readable. No functional changes. Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250214090651.3331663-2-rppt@kernel.org	2025-02-21 16:05:00 +01:00
Guilherme G. Piccoli	d90c9de9de	x86/tsc: Always save/restore TSC sched_clock() on suspend/resume TSC could be reset in deep ACPI sleep states, even with invariant TSC. That's the reason we have sched_clock() save/restore functions, to deal with this situation. But what happens is that such functions are guarded with a check for the stability of sched_clock - if not considered stable, the save/restore routines aren't executed. On top of that, we have a clear comment in native_sched_clock() saying that even with TSC unstable, we continue using TSC for sched_clock due to its speed. In other words, if we have a situation of TSC getting detected as unstable, it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles could bring bogus sched_clock values due to the lack of the save/restore mechanism, causing warnings like this: [22.954918] ------------[ cut here ]------------ [22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405 [22.954923] If you just came from a suspend/resume, [22.954923] please switch to the trace global clock: [22.954923] echo global > /sys/kernel/tracing/trace_clock [22.954923] or add trace_clock=global to the kernel command line [22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0 Notice that the above was reproduced even with "trace_clock=global". The fix for that is to _always_ save/restore the sched_clock on suspend cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies the sched_clock_stable() check becomes relevant to save/restore the sched_clock. Debugged-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250215210314.351480-1-gpiccoli@igalia.com	2025-02-21 15:27:38 +01:00
Artem Bityutskiy	64aad4749d	ACPI/processor_idle: Export acpi_processor_ffh_play_dead() The kernel test robot reported the following build error: >> ERROR: modpost: "acpi_processor_ffh_play_dead" [drivers/acpi/processor.ko] undefined! Caused by this recently merged commit: `541ddf31e3` ("ACPI/processor_idle: Add FFH state handling") The build failure is due to an oversight in the 'CONFIG_ACPI_PROCESSOR=m' case, the function export is missing. Add it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502151207.FA9UO1iX-lkp@intel.com/ Fixes: `541ddf31e3` ("ACPI/processor_idle: Add FFH state handling") Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/de5bf4f116779efde315782a15146fdc77a4a044.camel@linux.intel.com	2025-02-21 15:17:07 +01:00
Ingo Molnar	affe678f35	Merge tag 'v6.14-rc3' into x86/mm, to pick up fixes before merging new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-02-21 15:02:56 +01:00
Stanislav Spassov	1937e18cc3	x86/fpu: Fix guest FPU state buffer allocation size Ongoing work on an optimization to batch-preallocate vCPU state buffers for KVM revealed a mismatch between the allocation sizes used in fpu_alloc_guest_fpstate() and fpstate_realloc(). While the former allocates a buffer sized to fit the default set of XSAVE features in UABI form (as per fpu_user_cfg), the latter uses its ksize argument derived (for the requested set of features) in the same way as the sizes found in fpu_kernel_cfg, i.e. using the compacted in-kernel representation. The correct size to use for guest FPU state should indeed be the kernel one as seen in fpstate_realloc(). The original issue likely went unnoticed through a combination of UABI size typically being larger than or equal to kernel size, and/or both amounting to the same number of allocated 4K pages. Fixes: `69f6ed1d14` ("x86/fpu: Provide infrastructure for KVM FPU cleanup") Signed-off-by: Stanislav Spassov <stanspas@amazon.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250218141045.85201-1-stanspas@amazon.de	2025-02-21 14:47:26 +01:00
Dan Carpenter	06dd759b68	x86/module: Remove unnecessary check in module_finalize() The "calls" pointer can no longer be NULL after the following commit: `ab9fea5948` ("x86/alternative: Simplify callthunk patching") Delete this unnecessary check. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/fcbb2f57-0714-4139-b441-8817365c16a1@stanley.mountain	2025-02-21 14:27:42 +01:00
Eric Biggers	5171207284	x86/cpufeatures: Make AVX-VNNI depend on AVX The 'noxsave' boot option disables support for AVX, but support for the AVX-VNNI feature was still declared on CPUs that support it. Fix this. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20250220060124.89622-1-ebiggers@kernel.org	2025-02-21 14:19:16 +01:00
Thomas Gleixner	751dc837da	genirq: Introduce common irq_force_complete_move() implementation CONFIG_GENERIC_PENDING_IRQ requires an architecture specific implementation of irq_force_complete_move() for CPU hotplug. At the moment, only x86 implements this unconditionally, but for RISC-V irq_force_complete_move() is only needed when the RISC-V IMSIC driver is in use and not needed otherwise. To allow runtime configuration of this mechanism, introduce a common irq_force_complete_move() implementation in the interrupt core code, which only invokes the completion function, when a interrupt chip in the hierarchy implements it. Switch X86 over to the new mechanism. No functional change intended. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250217085657.789309-5-apatel@ventanamicro.com	2025-02-20 15:19:26 +01:00
Mario Limonciello	c9e6f7fb1c	x86/ACPI: CPPC: Add missing include Some minimial kernel configurations will fail with -Werror=implicit-function-declaration due to a missing header include. Add that header. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20250211203314.762755-1-superm1@kernel.org [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-02-18 19:15:08 +01:00
Joel Granados	c305a4e983	x86: Move sysctls into arch/x86 Move the following sysctl tables into arch/x86/kernel/setup.c: panic_on_{unrecoverable_nmi,io_nmi} bootloader_{type,version} io_delay_type unknown_nmi_panic acpi_realmode_flags Variables moved from include/linux/ to arch/x86/include/asm/ because there is no longer need for them outside arch/x86/kernel: acpi_realmode_flags panic_on_{unrecoverable_nmi,io_nmi} Include <asm/nmi.h> in arch/s86/kernel/setup.h in order to bring in panic_on_{io_nmi,unrecovered_nmi}. This is part of a greater effort to move ctl tables into their respective subsystems which will reduce the merge conflicts in kerenel/sysctl.c. Signed-off-by: Joel Granados <joel.granados@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250218-jag-mv_ctltables-v1-8-cd3698ab8d29@kernel.org	2025-02-18 11:08:36 +01:00
Ingo Molnar	e8f925c320	Merge tag 'v6.14-rc3' into x86/core, to pick up fixes Pick up upstream x86 fixes before applying new patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-02-18 11:07:15 +01:00
Brian Gerst	38a4968b31	x86/percpu/64: Remove INIT_PER_CPU macros Now that the load and link addresses of percpu variables are the same, these macros are no longer necessary. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-12-brgerst@gmail.com	2025-02-18 10:15:50 +01:00
Brian Gerst	b5c4f95351	x86/percpu/64: Remove fixed_percpu_data Now that the stack protector canary value is a normal percpu variable, fixed_percpu_data is unused and can be removed. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-10-brgerst@gmail.com	2025-02-18 10:15:43 +01:00
Brian Gerst	9d7de2aa8b	x86/percpu/64: Use relative percpu offsets The percpu section is currently linked at absolute address 0, because older compilers hard-coded the stack protector canary value at a fixed offset from the start of the GS segment. Now that the canary is a normal percpu variable, the percpu section does not need to be linked at a specific address. x86-64 will now calculate the percpu offsets as the delta between the initial percpu address and the dynamically allocated memory, like other architectures. Note that GSBASE is limited to the canonical address width (48 or 57 bits, sign-extended). As long as the kernel text, modules, and the dynamically allocated percpu memory are all in the negative address space, the delta will not overflow this limit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250123190747.745588-9-brgerst@gmail.com	2025-02-18 10:15:27 +01:00

... 3 4 5 6 7 ...

20375 Commits