linux

mirror of https://github.com/raspberrypi/linux.git synced 2025-12-06 01:49:46 +00:00

Author	SHA1	Message	Date
Maximilian Dittgen	a24f7afce0	KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID, rather than a physical redistributor address, for its RDbase command argument. As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates over CPU IDs in the range [0, nr_cpus), passing them as the RDbase vcpu_id argument to its_send_mapc_cmd(). However, its_encode_target() in the its_send_mapc_cmd() selftest handler expects RDbase arguments to be formatted with a 16 bit offset, as shown by the 16-bit target_addr right shift its implementation: its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16) At the moment, all CPU IDs passed into its_send_mapc_cmd() have no offset, therefore becoming 0x0 after the bit shift. Thus, when vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c, it always interprets the RDbase target CPU as CPU 0. All interrupts sent to collections will be processed by vCPU 0, which defeats the purpose of this multi-vCPU test. Fix by creating procnum_to_rdbase() helper function, which left-shifts the vCPU parameter received by its_send_mapc_cmd 16 bits before passing it to its_encode_target for encoding. Signed-off-by: Maximilian Dittgen <mdittgen@amazon.de> Link: https://patch.msgid.link/20251020145946.48288-1-mdittgen@amazon.de Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-30 16:12:30 +00:00
Sascha Bischoff	da888524c3	KVM: arm64: vgic-v3: Trap all if no in-kernel irqchip If there is no in-kernel irqchip for a GICv3 host set all of the trap bits to block all accesses. This fixes the no-vgic-v3 selftest again. Fixes: `3193287ddf` ("KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/23072856-6b8c-41e2-93d1-ea8a240a7079@sirena.org.uk Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Sebastian Ott <sebott@redhat.com> Tested-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20251021094358.1963807-1-sascha.bischoff@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-30 16:11:21 +00:00
Marc Zyngier	ca88ecdce5	arm64: Revamp HCR_EL2.E2H RES1 detection We currently have two ways to identify CPUs that only implement FEAT_VHE and not FEAT_E2H0: - either they advertise it via ID_AA64MMFR4_EL1.E2H0, - or the HCR_EL2.E2H bit is RAO/WI However, there is a third category of "cpus" that fall between these two cases: on CPUs that do not implement FEAT_FGT, it is IMPDEF whether an access to ID_AA64MMFR4_EL1 can trap to EL2 when the register value is zero. A consequence of this is that on systems such as Neoverse V2, a NV guest cannot reliably detect that it is in a VHE-only configuration (E2H is writable, and ID_AA64MMFR0_EL1 is 0), despite the hypervisor's best effort to repaint the id register. Replace the RAO/WI test by a sequence that makes use of the VHE register remnapping between EL1 and EL2 to detect this situation, and work out whether we get the VHE behaviour even after having set HCR_EL2.E2H to 0. This solves the NV problem, and provides a more reliable acid test for CPUs that do not completely follow the letter of the architecture while providing a RES1 behaviour for HCR_EL2.E2H. Suggested-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Tested-by: Jan Kotas <jank@cadence.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/15A85F2B-1A0C-4FA7-9FE4-EEC2203CC09E@global.cadence.com	2025-10-14 08:18:40 +01:00
Oliver Upton	e0b5a7967d	KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available Marc reports that the performance of running an L3 guest has regressed by 60% as a result of setting MDCR_EL2.TDA to hide bad architecture. That's of course terrible for the single user of recursive NV ;-) While there's nothing to be done on non-FGT systems, take advantage of the precise write trap of MDSCR_EL1 and leave the rest of the debug registers untrapped. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Oliver Upton	fb10ddf35c	KVM: arm64: Compute per-vCPU FGTs at vcpu_load() To date KVM has used the fine-grained traps for the sake of UNDEF enforcement (so-called FGUs), meaning the constituent parts could be computed on a per-VM basis and folded into the effective value when programmed. Prepare for traps changing based on the vCPU context by computing the whole mess of them at vcpu_load(). Aggressively inline all the helpers to preserve the build-time checks that were there before. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:44:37 +01:00
Marc Zyngier	5c7cf1e44e	KVM: arm64: selftests: Fix misleading comment about virtual timer encoding The userspace-visible encoding for CNTV_CVAL_EL0 and CNTVCNT_EL0 have been swapped for as long as usersapce has had access to the registers. This is documented in arch/arm64/include/uapi/asm/kvm.h. Despite that, the get_reg_list test has unhelpful comments indicating the wrong register for the encoding. Replace this with definitions exposed in the include file, and a comment explaining again the brokenness. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:43:12 +01:00
Marc Zyngier	4da5a9af78	KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list Add yet another configuration, this time dealing E2H=0. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	6418330c84	KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit The hyp virtual timer registers only exist when VHE is present, Similarly, VNCR_EL2 only exists when NV2 is present. Make these dependencies explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	386aac77da	KVM: arm64: Kill leftovers of ad-hoc timer userspace access Now that the whole timer infrastructure is handled as system register accesses, get rid of the now unused ad-hoc infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	892f7c38ba	KVM: arm64: Fix WFxT handling of nested virt The spec for WFxT indicates that the parameter to the WFxT instruction is relative to the reading of CNTVCT_EL0. This means that the implementation needs to take the execution context into account, as CNTVOFF_EL2 does not always affect readings of CNTVCT_EL0 (such as when HCR_EL2.E2H is 1 and that we're in host context). This also rids us of the last instance of KVM_REG_ARM_TIMER_CNT outside of the userspace interaction code. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	c3be3a48fb	KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure Moving the counter registers is a bit more involved than for the control and comparator (there is no shadow data for the counter), but still pretty manageable. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	8af198980e	KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure As for the control registers, move the comparator registers to the common infrastructure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:41 +01:00
Marc Zyngier	09424d5d7d	KVM: arm64: Move CNT_CTL_EL0 userspace accessors to generic infrastructure Remove the handling of CNT_CTL_EL0 from guest.c, and move it to sys_regs.c, using a new TIMER_REG() definition to encapsulate it. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	77a0c42eaf	KVM: arm64: Add timer UAPI workaround to sysreg infrastructure Amongst the numerous bugs that plague the KVM/arm64 UAPI, one of the most annoying thing is that the userspace view of the virtual timer has its CVAL and CNT encodings swapped. In order to reduce the amount of code that has to know about this, start by adding handling for this bug in the sys_reg code. Nothing is making use of it yet, as the code responsible for userspace interaction is catching the accesses early. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	a92d552266	KVM: arm64: Make timer_set_offset() generally accessible Move the timer_set_offset() helper to arm_arch_timer.h, so that it is next to timer_get_offset(), and accessible by the rest of KVM. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	8625a670af	KVM: arm64: Replace timer context vcpu pointer with timer_id Having to follow a pointer to a vcpu is pretty dumb, when the timers are are a fixed offset in the vcpu structure itself. Trade the vcpu pointer for a timer_id, which can then be used to compute the vcpu address as needed. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	aa68975c97	KVM: arm64: Introduce timer_context_to_vcpu() helper We currently have a vcpu pointer nested into each timer context. As we are about to remove this pointer, introduce a helper (aptly named timer_context_to_vcpu()) that returns this pointer, at least until we repaint the data structure. Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Marc Zyngier	4cab5c857d	KVM: arm64: Hide CNTHV__EL2 from userspace for nVHE guests Although we correctly UNDEF any CNTHV__EL2 access from the guest when E2H==0, we still expose these registers to userspace, which is a bad idea. Drop the ad-hoc UNDEF injection and switch to a .visibility() callback which will also hide the register from userspace. Fixes: `0e45981028` ("KVM: arm64: timer: Don't adjust the EL2 virtual timer offset") Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:42:40 +01:00
Sascha Bischoff	164ecbf73c	Documentation: KVM: Update GICv3 docs for GICv5 hosts GICv5 hosts optionally include FEAT_GCIE_LEGACY, which allows them to execute GICv3-based VMs on GICv5 hardware. Update the GICv3 documentation to reflect this now that GICv3 guests are supports on compatible GICv5 hosts. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:40:58 +01:00
Sascha Bischoff	3193287ddf	KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests The ICH_HCR_EL2 traps are used when running on GICv3 hardware, or when running a GICv3-based guest using FEAT_GCIE_LEGACY on GICv5 hardware. When running a GICv2 guest on GICv3 hardware the traps are used to ensure that the guest never sees any part of GICv3 (only GICv2 is visible to the guest), and when running a GICv3 guest they are used to trap in specific scenarios. They are not applicable for a GICv2-native guest, and won't be applicable for a(n upcoming) GICv5 guest. The traps themselves are configured in the vGIC CPU IF state, which is stored as a union. Updating the wrong aperture of the union risks corrupting state, and therefore needs to be avoided at all costs. Bail early if we're not running a compatible guest (GICv2 on GICv3 hardware, GICv3 native, GICv3 on GICv5 hardware). Trap everything unconditionally if we're running a GICv2 guest on GICv3 hardware. Otherwise, conditionally set up GICv3-native trapping. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:40:33 +01:00
Oliver Upton	d5e6310a0d	KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress vgic_lpi_stress rather hilariously leaves IRQs disabled for the duration of the test. While the ITS translation of MSIs happens regardless of this, for completeness the guest should actually handle the LPIs. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:28:27 +01:00
Zenghui Yu	2192d348c0	KVM: arm64: selftests: Allocate vcpus with correct size vcpus array contains pointers to struct kvm_vcpu {}. It is way overkill to allocate the array with (nr_cpus * sizeof(struct kvm_vcpu)). Fix the allocation by using the correct size. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:27:55 +01:00
Mukesh Ojha	c35dd83866	KVM: arm64: Guard PMSCR_EL1 initialization with SPE presence check Commit `efad60e460` ("KVM: arm64: Initialize PMSCR_EL1 when in VHE") does not perform sufficient check before initializing PMSCR_EL1 to 0 when running in VHE mode. On some platforms, this causes the system to hang during boot, as EL3 has not delegated access to the Profiling Buffer to the Non-secure world, nor does it reinject an UNDEF on sysreg trap. To avoid this issue, restrict the PMSCR_EL1 initialization to CPUs that support Statistical Profiling Extension (FEAT_SPE) and have the Profiling Buffer accessible in Non-secure EL1. This is determined via a new helper `cpu_has_spe()` which checks both PMSVer and PMBIDR_EL1.P. This ensures the initialization only affects CPUs where SPE is implemented and usable, preventing boot failures on platforms where SPE is not properly configured. Fixes: `efad60e460` ("KVM: arm64: Initialize PMSCR_EL1 when in VHE") Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:26:36 +01:00
Zenghui Yu	9a7f87eb58	KVM: arm64: selftests: Sync ID_AA64PFR1, MPIDR, CLIDR in guest We forgot to sync several registers (ID_AA64PFR1, MPIDR, CLIDR) in guest to make sure that the guest had seen the written value. Add them to the list. Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev> Reviewed-By: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Osama Abdelkader	05a02490fa	KVM: arm64: Remove unreachable break after return Remove an unnecessary 'break' statement that follows a 'return' in arch/arm64/kvm/at.c. The break is unreachable. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	a133052666	KVM: selftests: Fix irqfd_test for non-x86 architectures The KVM_IRQFD ioctl fails if no irqchip is present in-kernel, which isn't too surprising as there's not much KVM can do for an IRQ if it cannot resolve a destination. As written the irqfd_test assumes that a 'default' VM created in selftests has an in-kernel irqchip created implicitly. That may be the case on x86 but it isn't necessarily true on other architectures. Add an arch predicate indicating if 'default' VMs get an irqchip and make the irqfd_test depend on it. Work around arm64 VGIC initialization requirements by using vm_create_with_one_vcpu(), ignoring the created vCPU as it isn't used for the test. Reported-by: Sebastian Ott <sebott@redhat.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: Sean Christopherson <seanjc@google.com> Fixes: `7e9b231c40` ("KVM: selftests: Add a KVM_IRQFD test to verify uniqueness requirements") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	cc4309324d	KVM: arm64: Document vCPU event ioctls as requiring init'ed vCPU KVM rejects calls to KVM_{GET,SET}_VCPU_EVENTS for an uninitialized vCPU as of commit cc96679f3c03 ("KVM: arm64: Prevent access to vCPU events before init"). Update the corresponding API documentation. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	0aa1b76fe1	KVM: arm64: Prevent access to vCPU events before init Another day, another syzkaller bug. KVM erroneously allows userspace to pend vCPU events for a vCPU that hasn't been initialized yet, leading to KVM interpreting a bunch of uninitialized garbage for routing / injecting the exception. In one case the injection code and the hyp disagree on whether the vCPU has a 32bit EL1 and put the vCPU into an illegal mode for AArch64, tripping the BUG() in exception_target_el() during the next injection: kernel BUG at arch/arm64/kvm/inject_fault.c:40! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 3 UID: 0 PID: 318 Comm: repro Not tainted 6.17.0-rc4-00104-g10fd0285305d #6 PREEMPT Hardware name: linux,dummy-virt (DT) pstate: 21402009 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : exception_target_el+0x88/0x8c lr : pend_serror_exception+0x18/0x13c sp : ffff800082f03a10 x29: ffff800082f03a10 x28: ffff0000cb132280 x27: 0000000000000000 x26: 0000000000000000 x25: ffff0000c2a99c20 x24: 0000000000000000 x23: 0000000000008000 x22: 0000000000000002 x21: 0000000000000004 x20: 0000000000008000 x19: ffff0000c2a99c20 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 00000000200000c0 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff800082f03af8 x7 : 0000000000000000 x6 : 0000000000000000 x5 : ffff800080f621f0 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 000000000040009b x1 : 0000000000000003 x0 : ffff0000c2a99c20 Call trace: exception_target_el+0x88/0x8c (P) kvm_inject_serror_esr+0x40/0x3b4 __kvm_arm_vcpu_set_events+0xf0/0x100 kvm_arch_vcpu_ioctl+0x180/0x9d4 kvm_vcpu_ioctl+0x60c/0x9f4 __arm64_sys_ioctl+0xac/0x104 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xf0 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c Code: f946bc01 b4fffe61 9101e020 17fffff2 (d4210000) Reject the ioctls outright as no sane VMM would call these before KVM_ARM_VCPU_INIT anyway. Even if it did the exception would've been thrown away by the eventual reset of the vCPU's state. Cc: stable@vger.kernel.org # 6.17 Fixes: `b7b27facc7` ("arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTS") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Sean Christopherson	cb49b7b862	KVM: arm64: selftests: Track width of timer counter as "int", not "uint64_t" Store the width of arm64's timer counter as an "int", not a "uint64_t". ilog2() returns an "int", and more importantly using what is an "unsigned long" under the hood makes clang unhappy due to a type mismatch when clamping the width to a sane value. arm64/arch_timer_edge_cases.c:1032:10: error: comparison of distinct pointer types ('typeof (width) ' (aka 'unsigned long ') and 'typeof (56) ' (aka 'int ')) [-Werror,-Wcompare-distinct-pointer-types] 1032 \| width = clamp(width, 56, 64); \| ^~~~~~~~~~~~~~~~~~~~ tools/include/linux/kernel.h:47:45: note: expanded from macro 'clamp' 47 \| #define clamp(val, lo, hi) min((typeof(val))max(val, lo), hi) \| ^~~~~~~~~~~~ tools/include/linux/kernel.h:33:17: note: expanded from macro 'max' 33 \| (void) (&_max1 == &_max2); \ \| ~~~~~~ ^ ~~~~~~ tools/include/linux/kernel.h:39:9: note: expanded from macro 'min' 39 \| typeof(x) _min1 = (x); \ \| ^ Fixes: `fad4cf9448` ("KVM: arm64: selftests: Determine effective counter width in arch_timer_edge_cases") Cc: Sebastian Ott <sebott@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	890c608b4d	KVM: arm64: selftests: Test effective value of HCR_EL2.AMO A defect against the architecture now allows an implementation to treat AMO as 1 when HCR_EL2.{E2H, TGE} = {1, 0}. KVM now takes advantage of this interpretation to address a quality of emulation issue w.r.t. SError injection. Add a corresponding test case and expect a pending SError to be taken. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	a46c09b382	KVM: arm64: Use the in-context stage-1 in __kvm_find_s1_desc_level() Running the external_aborts selftest at EL2 leads to an ugly splat due to the stage-1 MMU being disabled for the walked context, owing to the fact that __kvm_find_s1_desc_level() is hardcoded to the EL1&0 regime. Select the appropriate translation regime for the stage-1 walk based on the current vCPU context. Fixes: `b8e625167a` ("KVM: arm64: Add S1 IPA to page table level walker") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Marc Zyngier	9a1950f977	KVM: arm64: nv: Don't advance PC when pending an SVE exception Jan reports that running a nested guest on Neoverse-V2 leads to a WARN in the host due to simultaneously pending an exception and PC increment after an access to ZCR_EL2. Returning true from a sysreg accessor is an indication that the sysreg instruction has been retired. Of course this isn't the case when we've pended a synchronous SVE exception for the guest. Fix the return value and let the exception propagate to the guest as usual. Reported-by: Jan Kotas <jank@cadence.com> Closes: https://lore.kernel.org/kvmarm/865xd61tt5.wl-maz@kernel.org/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:03 +01:00
Oliver Upton	ed25dcfbc4	KVM: arm64: nv: Don't treat ZCR_EL2 as a 'mapped' register Unlike the other mapped EL2 sysregs ZCR_EL2 isn't guaranteed to be resident when a vCPU is loaded as it actually follows the SVE context. As such, the contents of ZCR_EL1 may belong to another guest if the vCPU has been preempted before reaching sysreg emulation. Unconditionally use the in-memory value of ZCR_EL2 and switch to the memory-only accessors. The in-memory value is guaranteed to be valid as fpsimd_lazy_switch_to_{guest,host}() will restore/save the register appropriately. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>	2025-10-13 14:17:02 +01:00
Linus Torvalds	3a86608788	Linux 6.18-rc1	2025-10-12 13:42:36 -07:00
Linus Torvalds	3dd7b81235	Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "One revert because of a regression in the I2C core which has sadly not showed up during its time in -next" * tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: boardinfo: Annotate code used in init phase only"	2025-10-12 13:27:56 -07:00
Linus Torvalds	8765f46791	Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Borislav Petkov: - Skip interrupt ID 0 in sifive-plic during suspend/resume because ID 0 is reserved and accessing reserved register space could result in undefined behavior - Fix a function's retval check in aspeed-scu-ic * tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check	2025-10-12 08:45:52 -07:00
Linus Torvalds	67029a49db	Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: "The previous fix to trace_marker required updating trace_marker_raw as well. The difference between trace_marker_raw from trace_marker is that the raw version is for applications to write binary structures directly into the ring buffer instead of writing ASCII strings. This is for applications that will read the raw data from the ring buffer and get the data structures directly. It's a bit quicker than using the ASCII version. Unfortunately, it appears that our test suite has several tests that test writes to the trace_marker file, but lacks any tests to the trace_marker_raw file (this needs to be remedied). Two issues came about the update to the trace_marker_raw file that syzbot found: - Fix tracing_mark_raw_write() to use per CPU buffer The fix to use the per CPU buffer to copy from user space was needed for both the trace_maker and trace_maker_raw file. The fix for reading from user space into per CPU buffers properly fixed the trace_marker write function, but the trace_marker_raw file wasn't fixed properly. The user space data was correctly written into the per CPU buffer, but the code that wrote into the ring buffer still used the user space pointer and not the per CPU buffer that had the user space data already written. - Stop the fortify string warning from writing into trace_marker_raw After converting the copy_from_user_nofault() into a memcpy(), another issue appeared. As writes to the trace_marker_raw expects binary data, the first entry is a 4 byte identifier. The entry structure is defined as: struct { struct trace_entry ent; int id; char buf[]; }; The size of this structure is reserved on the ring buffer with: size = sizeof(entry) + cnt; Then it is copied from the buffer into the ring buffer with: memcpy(&entry->id, buf, cnt); This use to be a copy_from_user_nofault(), but now converting it to a memcpy() triggers the fortify-string code, and causes a warning. The allocated space is actually more than what is copied, as the cnt used also includes the entry->id portion. Allocating sizeof(entry) plus cnt is actually allocating 4 bytes more than what is needed. Change the size function to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); And update the memcpy() to unsafe_memcpy()" * tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Stop fortify-string from warning in tracing_mark_raw_write() tracing: Fix tracing_mark_raw_write() to use buf and not ubuf	2025-10-11 16:06:04 -07:00
Linus Torvalds	c04022dccb	Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux Pull Kbuild fixes from Nathan Chancellor: - Fix UAPI types check in headers_check.pl - Only enable -Werror for hostprogs with CONFIG_WERROR / W=e - Ignore fsync() error when output of gen_init_cpio is a pipe - Several little build fixes for recent modules.builtin.modinfo series * tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections kbuild: Add '.rel.*' strip pattern for vmlinux kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux gen_init_cpio: Ignore fsync() returning EINVAL on pipes scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs kbuild: uapi: Strip comments before size type check	2025-10-11 15:47:12 -07:00
Wolfram Sang	a8482d2c90	Revert "i2c: boardinfo: Annotate code used in init phase only" This reverts commit `1a2b423be6` because we got a regression report and need time to find out the details. Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>	2025-10-11 23:57:33 +02:00
Linus Torvalds	98906f9d85	Merge tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "This cycle, we have a new RTC driver, for the SpacemiT P1. The optee driver gets alarm support. We also get a fix for a race condition that was fairly rare unless while stress testing the alarms. Subsystem: - Fix race when setting alarm - Ensure alarm irq is enabled when UIE is enabled - remove unneeded 'fast_io' parameter in regmap_config New driver: - SpacemiT P1 RTC Drivers: - efi: Remove wakeup functionality - optee: add alarms support - s3c: Drop support for S3C2410 - zynqmp: Restore alarm functionality after kexec transition" * tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits) rtc: interface: Ensure alarm irq is enabled when UIE is enabled rtc: tps6586x: Fix initial enable_irq/disable_irq balance rtc: cpcap: Fix initial enable_irq/disable_irq balance rtc: isl12022: Fix initial enable_irq/disable_irq balance rtc: interface: Fix long-standing race when setting alarm rtc: pcf2127: fix watchdog interrupt mask on pcf2131 rtc: zynqmp: Restore alarm functionality after kexec transition rtc: amlogic-a4: Optimize global variables rtc: sd2405al: Add I2C address. rtc: Kconfig: move symbols to proper section rtc: optee: make optee_rtc_pm_ops static rtc: optee: Fix error code in optee_rtc_read_alarm() rtc: optee: fix error code in probe() dt-bindings: rtc: Convert apm,xgene-rtc to DT schema rtc: spacemit: support the SpacemiT P1 RTC rtc: optee: add alarm related rtc ops to optee rtc driver rtc: optee: remove unnecessary memory operations rtc: optee: fix memory leak on driver removal rtc: x1205: Fix Xicor X1205 vendor prefix dt-bindings: rtc: Fix Xicor X1205 vendor prefix ...	2025-10-11 11:56:47 -07:00
Linus Torvalds	2a6edd867b	Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just before or during the merge window. The most important one is the qla2xxx which reverts a conversion to fix flexible array member warnings, that went up in this merge window but which turned out on further testing to be causing data corruption" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS scsi: ufs: sysfs: Make HID attributes visible scsi: mvsas: Fix use-after-free bugs in mvs_work_queue scsi: ufs: core: Fix PM QoS mutex initialization scsi: ufs: core: Fix runtime suspend error deadlock Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue" scsi: target: target_core_configfs: Add length check to avoid buffer overflow	2025-10-11 11:49:00 -07:00
Linus Torvalds	9591fdb061	Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Borislav Petkov: - Remove a bunch of asm implementing condition flags testing in KVM's emulator in favor of int3_emulate_jcc() which is written in C - Replace KVM fastops with C-based stubs which avoids problems with the fastop infra related to latter not adhering to the C ABI due to their special calling convention and, more importantly, bypassing compiler control-flow integrity checking because they're written in asm - Remove wrongly used static branches and other ugliness accumulated over time in hyperv's hypercall implementation with a proper static function call to the correct hypervisor call variant - Add some fixes and modifications to allow running FRED-enabled kernels in KVM even on non-FRED hardware - Add kCFI improvements like validating indirect calls and prepare for enabling kCFI with GCC. Add cmdline params documentation and other code cleanups - Use the single-byte 0xd6 insn as the official #UD single-byte undefined opcode instruction as agreed upon by both x86 vendors - Other smaller cleanups and touchups all over the place * tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86,retpoline: Optimize patch_retpoline() x86,ibt: Use UDB instead of 0xEA x86/cfi: Remove __noinitretpoline and __noretpoline x86/cfi: Add "debug" option to "cfi=" bootparam x86/cfi: Standardize on common "CFI:" prefix for CFI reports x86/cfi: Document the "cfi=" bootparam options x86/traps: Clarify KCFI instruction layout compiler_types.h: Move __nocfi out of compiler-specific header objtool: Validate kCFI calls x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware x86/fred: Install system vector handlers even if FRED isn't fully enabled x86/hyperv: Use direct call to hypercall-page x86/hyperv: Clean up hv_do_hypercall() KVM: x86: Remove fastops KVM: x86: Convert em_salc() to C KVM: x86: Introduce EM_ASM_3WCL KVM: x86: Introduce EM_ASM_1SRC2 KVM: x86: Introduce EM_ASM_2CL KVM: x86: Introduce EM_ASM_2W ...	2025-10-11 11:19:16 -07:00
Linus Torvalds	2f0a750453	Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Simplify inline asm flag output operands now that the minimum compiler version supports the =@ccCOND syntax - Remove a bunch of AS_* Kconfig symbols which detect assembler support for various instruction mnemonics now that the minimum assembler version supports them all - The usual cleanups all over the place * tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__ x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h> x86/mtrr: Remove license boilerplate text with bad FSF address x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h> x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h> x86/entry/fred: Push __KERNEL_CS directly x86/kconfig: Remove CONFIG_AS_AVX512 crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ crypto: X86 - Remove CONFIG_AS_VAES crypto: x86 - Remove CONFIG_AS_GFNI x86/kconfig: Drop unused and needless config X86_64_SMP	2025-10-11 10:51:14 -07:00
Linus Torvalds	6bb71f0fe5	Merge tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: "A NULL pointer deref hotfix" * tag 'slab-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: fix barn NULL pointer dereference on memoryless nodes	2025-10-11 10:40:24 -07:00
Linus Torvalds	fbde105f13	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Finish constification of 1st parameter of bpf_d_path() (Rong Tao) - Harden userspace-supplied xdp_desc validation (Alexander Lobakin) - Fix metadata_dst leak in __bpf_redirect_neigh_v{4,6}() (Daniel Borkmann) - Fix undefined behavior in {get,put}_unaligned_be32() (Eric Biggers) - Use correct context to unpin bpf hash map with special types (KaFai Wan) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add test for unpinning htab with internal timer struct bpf: Avoid RCU context warning when unpinning htab with internal structs xsk: Harden userspace-supplied xdp_desc validation bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6} libbpf: Fix undefined behavior in {get,put}_unaligned_be32() bpf: Finish constification of 1st parameter of bpf_d_path()	2025-10-11 10:31:38 -07:00
Linus Torvalds	ae13bd2310	Merge tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more updates from Andrew Morton: "Just one series here - Mike Rappoport has taught KEXEC handover to preserve vmalloc allocations across handover" * tag 'mm-nonmm-stable-2025-10-10-15-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: lib/test_kho: use kho_preserve_vmalloc instead of storing addresses in fdt kho: add support for preserving vmalloc allocations kho: replace kho_preserve_phys() with kho_preserve_pages() kho: check if kho is finalized in __kho_preserve_order() MAINTAINERS, .mailmap: update Umang's email address	2025-10-11 10:27:52 -07:00
Linus Torvalds	971370a88c	Merge tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "7 hotfixes. All 7 are cc:stable and all 7 are for MM. All singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-10-10-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: hugetlb: avoid soft lockup when mprotect to large memory area fsnotify: pass correct offset to fsnotify_mmap_perm() mm/ksm: fix flag-dropping behavior in ksm_madvise mm/damon/vaddr: do not repeat pte_offset_map_lock() until success mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage mm/thp: fix MTE tag mismatch when replacing zero-filled subpages memcg: skip cgroup_file_notify if spinning is not allowed	2025-10-11 10:14:55 -07:00
Steven Rostedt	54b91e54b1	tracing: Stop fortify-string from warning in tracing_mark_raw_write() The way tracing_mark_raw_write() records its data is that it has the following structure: struct { struct trace_entry; int id; char buf[]; }; But memcpy(&entry->id, buf, size) triggers the following warning when the size is greater than the id: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4) WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Modules linked in: CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000 FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0 Call Trace: <TASK> tracing_mark_raw_write+0x1fe/0x290 ? __pfx_tracing_mark_raw_write+0x10/0x10 ? security_file_permission+0x50/0xf0 ? rw_verify_area+0x6f/0x4b0 vfs_write+0x1d8/0xdd0 ? __pfx_vfs_write+0x10/0x10 ? __pfx_css_rstat_updated+0x10/0x10 ? count_memcg_events+0xd9/0x410 ? fdget_pos+0x53/0x5e0 ksys_write+0x182/0x200 ? __pfx_ksys_write+0x10/0x10 ? do_user_addr_fault+0x4af/0xa30 do_syscall_64+0x63/0x350 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa61d318687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- This is because fortify string sees that the size of entry->id is only 4 bytes, but it is writing more than that. But this is OK as the dynamic_array is allocated to handle that copy. The size allocated on the ring buffer was actually a bit too big: size = sizeof(entry) + cnt; But cnt includes the 'id' and the buffer data, so adding cnt to the size of entry actually allocates too much on the ring buffer. Change the allocation to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); and the memcpy() to unsafe_memcpy() with an added justification. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home Fixes: `64cf7d058a` ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-11 11:27:27 -04:00
Vlastimil Babka	fd6db58867	slab: fix barn NULL pointer dereference on memoryless nodes Phil reported a boot failure once sheaves become used in commits `59faa4da7c` ("maple_tree: use percpu sheaves for maple_node_cache") and `3accabda4d` ("mm, vma: use percpu sheaves for vm_area_struct cache"): BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 21 UID: 0 PID: 818 Comm: kworker/u398:0 Not tainted 6.17.0-rc3.slab+ #5 PREEMPT(voluntary) Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.26.0 07/30/2025 RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0 FS: 0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0 Call Trace: <TASK> ? srso_return_thunk+0x5/0x5f ? vm_area_alloc+0x1e/0x60 kmem_cache_alloc_noprof+0x4ec/0x5b0 vm_area_alloc+0x1e/0x60 create_init_stack_vma+0x26/0x210 alloc_bprm+0x139/0x200 kernel_execve+0x4a/0x140 call_usermodehelper_exec_async+0xd0/0x190 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ret_from_fork+0xf0/0x110 ? __pfx_call_usermodehelper_exec_async+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 0000000000000040 ---[ end trace 0000000000000000 ]--- RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0 Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89 RSP: 0018:ffffd2d10950bdb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8a775dab74b0 RCX: 00000000ffffffff RDX: 0000000000000cc0 RSI: ffff8a6800804000 RDI: ffff8a680004e300 RBP: ffffd2d10950be40 R08: 0000000000000060 R09: ffffffffb9367388 R10: 00000000000149e8 R11: ffff8a6f87a38000 R12: 0000000000000cc0 R13: 0000000000000cc0 R14: ffff8a680004e300 R15: 00000000000000c0 FS: 0000000000000000(0000) GS:ffff8a77a3541000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 0000000e1aa24000 CR4: 00000000003506f0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x36a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- And noted "this is an AMD EPYC 7401 with 8 NUMA nodes configured such that memory is only on 2 of them." # numactl --hardware available: 8 nodes (0-7) node 0 cpus: 0 8 16 24 32 40 48 56 64 72 80 88 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 2 10 18 26 34 42 50 58 66 74 82 90 node 1 size: 31584 MB node 1 free: 30397 MB node 2 cpus: 4 12 20 28 36 44 52 60 68 76 84 92 node 2 size: 0 MB node 2 free: 0 MB node 3 cpus: 6 14 22 30 38 46 54 62 70 78 86 94 node 3 size: 0 MB node 3 free: 0 MB node 4 cpus: 1 9 17 25 33 41 49 57 65 73 81 89 node 4 size: 0 MB node 4 free: 0 MB node 5 cpus: 3 11 19 27 35 43 51 59 67 75 83 91 node 5 size: 32214 MB node 5 free: 31625 MB node 6 cpus: 5 13 21 29 37 45 53 61 69 77 85 93 node 6 size: 0 MB node 6 free: 0 MB node 7 cpus: 7 15 23 31 39 47 55 63 71 79 87 95 node 7 size: 0 MB node 7 free: 0 MB Linus decoded the stacktrace to get_barn() and get_node() and determined that kmem_cache->node[numa_mem_id()] is NULL. The problem is due to a wrong assumption that memoryless nodes only exist on systems with CONFIG_HAVE_MEMORYLESS_NODES, where numa_mem_id() points to the nearest node that has memory. SLUB has been allocating its kmem_cache_node structures only on nodes with memory and so it does with struct node_barn. For kmem_cache_node, get_partial_node() checks if get_node() result is not NULL, which I assumed was for protection from a bogus node id passed to kmalloc_node() but apparently it's also for systems where numa_mem_id() (used when no specific node is given) might return a memoryless node. Fix the sheaves code the same way by checking the result of get_node() and bailing out if it's NULL. Note that cpus on such memoryless nodes will have degraded sheaves performance, which can be improved later, preferably by making numa_mem_id() work properly on such systems. Fixes: `2d517aa09b` ("slab: add opt-in caching layer of percpu sheaves") Reported-and-tested-by: Phil Auld <pauld@redhat.com> Closes: https://lore.kernel.org/all/20251010151116.GA436967@pauld.westford.csb/ Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/all/CAHk-%3Dwg1xK%2BBr%3DFJ5QipVhzCvq7uQVPt5Prze6HDhQQ%3DQD_BcQ@mail.gmail.com/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz>	2025-10-11 10:47:57 +02:00
Steven Rostedt	bda745ee8f	tracing: Fix tracing_mark_raw_write() to use buf and not ubuf The fix to use a per CPU buffer to read user space tested only the writes to trace_marker. But it appears that the selftests are missing tests to the trace_maker_raw file. The trace_maker_raw file is used by applications that writes data structures and not strings into the file, and the tools read the raw ring buffer to process the structures it writes. The fix that reads the per CPU buffers passes the new per CPU buffer to the trace_marker file writes, but the update to the trace_marker_raw write read the data from user space into the per CPU buffer, but then still used then passed the user space address to the function that records the data. Pass in the per CPU buffer and not the user space address. TODO: Add a test to better test trace_marker_raw. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: `64cf7d058a` ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-10-10 23:58:44 -04:00

1 2 3 4 5 ...

1396681 Commits