Commit Graph

224487 Commits

Author SHA1 Message Date
Dandan Zhang
494b0792d9 LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall()
The kvm_hypercall() set for LoongArch is limited to a1-a5. So the
mention of a6 in the comment is undefined that needs to be rectified.

Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Dandan Zhang <zhangdandan@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07 17:37:14 +08:00
Yuli Wang
296b03ce38 LoongArch: KVM: Remove unnecessary definition of KVM_PRIVATE_MEM_SLOTS
1. "KVM_PRIVATE_MEM_SLOTS" is renamed as "KVM_INTERNAL_MEM_SLOTS".

2. "KVM_INTERNAL_MEM_SLOTS" defaults to zero, so it is not necessary to
define it in LoongArch's asm/kvm_host.h.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bdd1c37a315bc50ab14066c4852bc8dcf070451e
Link: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b075450868dbc0950f0942617f222eeb989cad10
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: Yuli Wang <wangyuli@uniontech.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07 17:37:14 +08:00
Huacai Chen
4574815abf LoongArch: Use accessors to page table entries instead of direct dereference
As very well explained in commit 20a004e7b0 ("arm64: mm: Use
READ_ONCE/WRITE_ONCE when accessing page tables"), an architecture whose
page table walker can modify the PTE in parallel must use READ_ONCE()/
WRITE_ONCE() macro to avoid any compiler transformation.

So apply that to LoongArch which is such an architecture, in order to
avoid potential problems.

Similar to commit edf9556472 ("riscv: Use accessors to page table
entries instead of direct dereference").

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07 17:37:11 +08:00
Miao Wang
e688c22073 LoongArch: Enable general EFI poweroff method
efi_shutdown_init() can register a general sys_off handler named
efi_power_off(). Enable this by providing efi_poweroff_required(),
like arm and x86. Since EFI poweroff is also supported on LoongArch,
and the enablement makes the poweroff function usable for hardwares
which lack ACPI S5.

We prefer ACPI poweroff rather than EFI poweroff (like x86), so we only
require EFI poweroff if acpi_gbl_reduced_hardware or acpi_no_s5 is true.

Cc: stable@vger.kernel.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Miao Wang <shankerwangmiao@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-08-07 17:37:11 +08:00
Linus Torvalds
a5dbd76a89 Merge tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Prevent a deadlock on cpu_hotplug_lock in the aperf/mperf driver.

   A recent change in the ACPI code which consolidated code pathes moved
   the invocation of init_freq_invariance_cppc() to be moved to a CPU
   hotplug handler. The first invocation on AMD CPUs ends up enabling a
   static branch which dead locks because the static branch enable tries
   to acquire cpu_hotplug_lock but that lock is already held write by
   the hotplug machinery.

   Use static_branch_enable_cpuslocked() instead and take the hotplug
   lock read for the Intel code path which is invoked from the
   architecture code outside of the CPU hotplug operations.

 - Fix the number of reserved bits in the sev_config structure bit field
   so that the bitfield does not exceed 64 bit.

 - Add missing Zen5 model numbers

 - Fix the alignment assumptions of pti_clone_pgtable() and
   clone_entry_text() on 32-bit:

   The code assumes PMD aligned code sections, but on 32-bit the kernel
   entry text is not PMD aligned. So depending on the code size and
   location, which is configuration and compiler dependent, entry text
   can cross a PMD boundary. As the start is not PMD aligned adding PMD
   size to the start address is larger than the end address which
   results in partially mapped entry code for user space. That causes
   endless recursion on the first entry from userspace (usually #PF).

   Cure this by aligning the start address in the addition so it ends up
   at the next PMD start address.

   clone_entry_text() enforces PMD mapping, but on 32-bit the tail might
   eventually be PTE mapped, which causes a map fail because the PMD for
   the tail is not a large page mapping. Use PTI_LEVEL_KERNEL_IMAGE for
   the clone() invocation which resolves to PTE on 32-bit and PMD on
   64-bit.

 - Zero the 8-byte case for get_user() on range check failure on 32-bit

   The recend consolidation of the 8-byte get_user() case broke the
   zeroing in the failure case again. Establish it by clearing ECX
   before the range check and not afterwards as that obvioulsy can't be
   reached when the range check fails

* tag 'x86-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
  x86/mm: Fix pti_clone_entry_text() for i386
  x86/mm: Fix pti_clone_pgtable() alignment assumption
  x86/setup: Parse the builtin command line before merging
  x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
  x86/sev: Fix __reserved field in sev_config
  x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
2024-08-04 08:57:08 -07:00
Linus Torvalds
1ddeb0ef3c Merge tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Thomas Gleixner:

 - Move the smp_processor_id() invocation back into the non-preemtible
   region, so that the result is valid to use

 - Add the missing package C2 residency counters for Sierra Forest CPUs
   to make the newly added support actually useful

* tag 'perf-urgent-2024-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix smp_processor_id()-in-preemptible warnings
  perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
2024-08-04 08:42:18 -07:00
Rob Herring (Arm)
ff58838015 arm: dts: arm: versatile-ab: Fix duplicate clock node name
Commit 04f08ef291 ("arm/arm64: dts: arm: Use generic clock and
regulator nodenames") renamed nodes and created 2 "clock-24000000" nodes
(at different paths).

The kernel can't handle these duplicate names even though they are at
different paths.  Fix this by renaming one of the nodes to "clock-pclk".

This name is aligned with other Arm boards (those didn't have a known
frequency to use in the node name).

Fixes: 04f08ef291 ("arm/arm64: dts: arm: Use generic clock and regulator nodenames")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-08-04 08:24:15 -07:00
Linus Torvalds
1dd950f288 Merge tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:

 - fix unaligned memory accesses when calling BPF functions

 - adjust memory size constants to fix possible DMA corruptions

* tag 'parisc-for-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix a possible DMA corruption
  parisc: fix unaligned accesses in BPF
2024-08-03 09:03:21 -07:00
Linus Torvalds
041b1061d8 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:

 - Expand the speculative SSBS errata workaround to more CPUs

 - Ensure jump label changes are visible to all CPUs with a
   kick_all_cpus_sync() (and also enable jump label batching as part of
   the fix)

 - The shadow call stack sanitiser is currently incompatible with Rust,
   make CONFIG_RUST conditional on !CONFIG_SHADOW_CALL_STACK

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
  rust: SHADOW_CALL_STACK is incompatible with Rust
  arm64: errata: Expand speculative SSBS workaround (again)
  arm64: cputype: Add Cortex-A725 definitions
  arm64: cputype: Add Cortex-X1C definitions
2024-08-02 13:46:43 -07:00
Linus Torvalds
725d410fac Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "The bulk of the changes here is a largish change to guest_memfd,
  delaying the clearing and encryption of guest-private pages until they
  are actually added to guest page tables. This started as "let's make
  it impossible to misuse the API" for SEV-SNP; but then it ballooned a
  bit.

  The new logic is generally simpler and more ready for hugepage support
  in guest_memfd.

  Summary:

   - fix latent bug in how usage of large pages is determined for
     confidential VMs

   - fix "underline too short" in docs

   - eliminate log spam from limited APIC timer periods

   - disallow pre-faulting of memory before SEV-SNP VMs are initialized

   - delay clearing and encrypting private memory until it is added to
     guest page tables

   - this change also enables another small cleanup: the checks in
     SNP_LAUNCH_UPDATE that limit it to non-populated, private pages can
     now be moved in the common kvm_gmem_populate() function

   - fix compilation error that the RISC-V merge introduced in selftests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86/mmu: fix determination of max NPT mapping level for private pages
  KVM: riscv: selftests: Fix compile error
  KVM: guest_memfd: abstract how prepared folios are recorded
  KVM: guest_memfd: let kvm_gmem_populate() operate only on private gfns
  KVM: extend kvm_range_has_memory_attributes() to check subset of attributes
  KVM: cleanup and add shortcuts to kvm_range_has_memory_attributes()
  KVM: guest_memfd: move check for already-populated page to common code
  KVM: remove kvm_arch_gmem_prepare_needed()
  KVM: guest_memfd: make kvm_gmem_prepare_folio() operate on a single struct kvm
  KVM: guest_memfd: delay kvm_gmem_prepare_folio() until the memory is passed to the guest
  KVM: guest_memfd: return locked folio from __kvm_gmem_get_pfn
  KVM: rename CONFIG_HAVE_KVM_GMEM_* to CONFIG_HAVE_KVM_ARCH_GMEM_*
  KVM: guest_memfd: do not go through struct page
  KVM: guest_memfd: delay folio_mark_uptodate() until after successful preparation
  KVM: guest_memfd: return folio from __kvm_gmem_get_pfn()
  KVM: x86: disallow pre-fault for SNP VMs before initialization
  KVM: Documentation: Fix title underline too short warning
  KVM: x86: Eliminate log spam from limited APIC timer periods
2024-08-02 10:17:49 -07:00
Paolo Bonzini
1773014a97 Merge branch 'kvm-fixes' into HEAD
* fix latent bug in how usage of large pages is determined for
  confidential VMs

* fix "underline too short" in docs

* eliminate log spam from limited APIC timer periods

* disallow pre-faulting of memory before SEV-SNP VMs are initialized

* delay clearing and encrypting private memory until it is added to
  guest page tables

* this change also enables another small cleanup: the checks in
  SNP_LAUNCH_UPDATE that limit it to non-populated, private pages
  can now be moved in the common kvm_gmem_populate() function
2024-08-02 12:33:43 -04:00
Linus Torvalds
948752d2e0 Merge tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to avoid dropping some of the internal pseudo-extensions, which
   breaks *envcfg dependency parsing

 - The kernel entry address is now aligned in purgatory, which avoids a
   misaligned load that can lead to crash on systems that don't support
   misaligned accesses early in boot

 - The FW_SFENCE_VMA_RECEIVED perf event was duplicated in a handful of
   perf JSON configurations, one of them been updated to
   FW_SFENCE_VMA_ASID_SENT

 - The starfive cache driver is now restricted to 64-bit systems, as it
   isn't 32-bit clean

 - A fix for to avoid aliasing legacy-mode perf counters with software
   perf counters

 - VM_FAULT_SIGSEGV is now handled in the page fault code

 - A fix for stalls during CPU hotplug due to IPIs being disabled

 - A fix for memblock bounds checking. This manifests as a crash on
   systems with discontinuous memory maps that have regions that don't
   fit in the linear map

* tag 'riscv-for-linus-6.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix linear mapping checks for non-contiguous memory regions
  RISC-V: Enable the IPI before workqueue_online_cpu()
  riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
  perf: riscv: Fix selecting counters in legacy mode
  cache: StarFive: Require a 64-bit system
  perf arch events: Fix duplicate RISC-V SBI firmware event name
  riscv/purgatory: align riscv_kernel_entry
  riscv: cpufeature: Do not drop Linux-internal extensions
2024-08-02 09:33:35 -07:00
Linus Torvalds
66242ef25e Merge tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:

 - remove unused empty CPU alternatives header file

 - fix recently and erroneously removed exception handling when loading
   an invalid floating point register

 - ptdump fixes to reflect the recent changes due to the uncoupling of
   physical vs virtual kernel address spaces

 - changes to avoid the unnecessary splitting of large pages in kernel
   mappings

 - add the missing MODULE_DESCRIPTION for the CIO modules

* tag 's390-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Keep inittext section writable
  s390/vmlinux.lds.S: Move ro_after_init section behind rodata section
  s390/mm: Get rid of RELOC_HIDE()
  s390/mm/ptdump: Improve sorting of markers
  s390/mm/ptdump: Add support for relocated lowcore mapping
  s390/mm/ptdump: Fix handling of identity mapping area
  s390/cio: Add missing MODULE_DESCRIPTION() macros
  s390/alternatives: Remove unused empty header file
  s390/fpu: Re-add exception handling in load_fpu_state()
2024-08-02 09:29:54 -07:00
Linus Torvalds
29ccb40f2b Merge tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic fixes from Arnd Bergmann:
 "These are three important bug fixes for the cross-architecture tree,
  fixing a regression with the new syscall.tbl file, the inconsistent
  numbering for the new uretprobe syscall and a bug with iowrite64be on
  alpha"

* tag 'asm-generic-fixes-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  syscalls: fix syscall macros for newfstat/newfstatat
  uretprobe: change syscall number, again
  alpha: fix ioread64be()/iowrite64be() helpers
2024-08-02 09:14:48 -07:00
Will Deacon
cfb00a3578 arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
Although the Arm architecture permits concurrent modification and
execution of NOP and branch instructions, it still requires some
synchronisation to ensure that other CPUs consistently execute the newly
written instruction:

 >  When the modified instructions are observable, each PE that is
 >  executing the modified instructions must execute an ISB or perform a
 >  context synchronizing event to ensure execution of the modified
 >  instructions

Prior to commit f6cc0c5016 ("arm64: Avoid calling stop_machine() when
patching jump labels"), the arm64 jump_label patching machinery
performed synchronisation using stop_machine() after each modification,
however this was problematic when flipping static keys from atomic
contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
so we switched to the _nosync() patching routines to avoid "scheduling
while atomic" BUG()s during boot.

In hindsight, the analysis of the issue in f6cc0c5016 isn't quite
right: it cites the use of IPIs in the default patching routines as the
cause of the lockup, whereas stop_machine() does not rely on IPIs and
the I-cache invalidation is performed using __flush_icache_range(),
which elides the call to kick_all_cpus_sync(). In fact, the blocking
wait for other CPUs is what triggers the BUG() and the problem remains
even after f6cc0c5016, for example because we could block on the
jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
avoid the static key entirely in commit a862fc2254
("clocksource/arm_arch_timer: Remove use of workaround static key").

This all leaves the jump_label patching code in a funny situation on
arm64 as we do not synchronise with other CPUs to reduce the likelihood
of a bug which no longer exists. Consequently, toggling a static key on
one CPU cannot be assumed to take effect on other CPUs, leading to
potential issues, for example with missing preempt notifiers.

Rather than revert f6cc0c5016 and go back to stop_machine() for each
patch site, implement arch_jump_label_transform_apply() and kick all
the other CPUs with an IPI at the end of patching.

Cc: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Fixes: f6cc0c5016 ("arm64: Avoid calling stop_machine() when patching jump labels")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20240731133601.3073-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-02 15:07:01 +01:00
Arnd Bergmann
343416f0c1 syscalls: fix syscall macros for newfstat/newfstatat
The __NR_newfstat and __NR_newfstatat macros accidentally got renamed
in the conversion to the syscall.tbl format, dropping the 'new' portion
of the name.

In an unrelated change, the two syscalls are no longer architecture
specific but are once more defined on all 64-bit architectures, so the
'newstat' ABI keyword can be dropped from the table as a simplification.

Fixes: Fixes: 4fe53bf2ba ("syscalls: add generic scripts/syscall.tbl")
Closes: https://lore.kernel.org/lkml/838053e0-b186-4e9f-9668-9a3384a71f23@app.fastmail.com/T/#t
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-02 15:20:47 +02:00
Arnd Bergmann
54233a4254 uretprobe: change syscall number, again
Despite multiple attempts to get the syscall number assignment right
for the newly added uretprobe syscall, we ended up with a bit of a mess:

 - The number is defined as 467 based on the assumption that the
   xattrat family of syscalls would use 463 through 466, but those
   did not make it into 6.11.

 - The include/uapi/asm-generic/unistd.h file still lists the number
   463, but the new scripts/syscall.tbl that was supposed to have the
   same data lists 467 instead as the number for arc, arm64, csky,
   hexagon, loongarch, nios2, openrisc and riscv. None of these
   architectures actually provide a uretprobe syscall.

 - All the other architectures (powerpc, arm, mips, ...) don't list
   this syscall at all.

There are two ways to make it consistent again: either list it with
the same syscall number on all architectures, or only list it on x86
but not in scripts/syscall.tbl and asm-generic/unistd.h.

Based on the most recent discussion, it seems like we won't need it
anywhere else, so just remove the inconsistent assignment and instead
move the x86 number to the next available one in the architecture
specific range, which is 335.

Fixes: 5c28424e9a ("syscalls: Fix to add sys_uretprobe to syscall.tbl")
Fixes: 190fec72df ("uprobe: Wire up uretprobe system call")
Fixes: 63ded11097 ("uprobe: Change uretprobe syscall scope and number")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-02 15:18:49 +02:00
David Gow
dd35a09332 x86/uaccess: Zero the 8-byte get_range case on failure on 32-bit
While zeroing the upper 32 bits of an 8-byte getuser on 32-bit x86 was
fixed by commit 8c860ed825 ("x86/uaccess: Fix missed zeroing of ia32 u64
get_user() range checking") it was broken again in commit 8a2462df15
("x86/uaccess: Improve the 8-byte getuser() case").

This is because the register which holds the upper 32 bits (%ecx) is being
cleared _after_ the check_range, so if the range check fails, %ecx is never
cleared.

This can be reproduced with:
./tools/testing/kunit/kunit.py run --arch i386 usercopy

Instead, clear %ecx _before_ check_range in the 8-byte case. This
reintroduces a bit of the ugliness we were trying to avoid by adding
another #ifndef CONFIG_X86_64, but at least keeps check_range from needing
a separate bad_get_user_8 jump.

Fixes: 8a2462df15 ("x86/uaccess: Improve the 8-byte getuser() case")
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/all/20240731073031.4045579-1-davidgow@google.com
2024-08-01 21:19:10 +02:00
Stuart Menefy
3b6564427a riscv: Fix linear mapping checks for non-contiguous memory regions
The RISC-V kernel already has checks to ensure that memory which would
lie outside of the linear mapping is not used. However those checks
use memory_limit, which is used to implement the mem= kernel command
line option (to limit the total amount of memory, not its address
range). When memory is made up of two or more non-contiguous memory
banks this check is incorrect.

Two changes are made here:
 - add a call in setup_bootmem() to memblock_cap_memory_range() which
   will cause any memory which falls outside the linear mapping to be
   removed from the memory regions.
 - remove the check in create_linear_mapping_page_table() which was
   intended to remove memory which is outside the liner mapping based
   on memory_limit, as it is no longer needed. Note a check for
   mapping more memory than memory_limit (to implement mem=) is
   unnecessary because of the existing call to
   memblock_enforce_memory_limit().

This issue was seen when booting on a SV39 platform with two memory
banks:
  0x00,80000000 1GiB
  0x20,00000000 32GiB
This memory range is 158GiB from top to bottom, but the linear mapping
is limited to 128GiB, so the lower block of RAM will be mapped at
PAGE_OFFSET, and the upper block straddles the top of the linear
mapping.

This causes the following Oops:
[    0.000000] Linux version 6.10.0-rc2-gd3b8dd5b51dd-dirty (stuart.menefy@codasip.com) (riscv64-codasip-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41.0.20231213) #20 SMP Sat Jun 22 11:34:22 BST 2024
[    0.000000] memblock_add: [0x0000000080000000-0x00000000bfffffff] early_init_dt_add_memory_arch+0x4a/0x52
[    0.000000] memblock_add: [0x0000002000000000-0x00000027ffffffff] early_init_dt_add_memory_arch+0x4a/0x52
...
[    0.000000] memblock_alloc_try_nid: 23724 bytes align=0x8 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] memblock_reserve: [0x00000027ffff5350-0x00000027ffffaffb] memblock_alloc_range_nid+0xb8/0x132
[    0.000000] Unable to handle kernel paging request at virtual address fffffffe7fff5350
[    0.000000] Oops [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0-rc2-gd3b8dd5b51dd-dirty #20
[    0.000000] Hardware name: codasip,a70x (DT)
[    0.000000] epc : __memset+0x8c/0x104
[    0.000000]  ra : memblock_alloc_try_nid+0x74/0x84
[    0.000000] epc : ffffffff805e88c8 ra : ffffffff806148f6 sp : ffffffff80e03d50
[    0.000000]  gp : ffffffff80ec4158 tp : ffffffff80e0bec0 t0 : fffffffe7fff52f8
[    0.000000]  t1 : 00000027ffffb000 t2 : 5f6b636f6c626d65 s0 : ffffffff80e03d90
[    0.000000]  s1 : 0000000000005cac a0 : fffffffe7fff5350 a1 : 0000000000000000
[    0.000000]  a2 : 0000000000005cac a3 : fffffffe7fffaff8 a4 : 000000000000002c
[    0.000000]  a5 : ffffffff805e88c8 a6 : 0000000000005cac a7 : 0000000000000030
[    0.000000]  s2 : fffffffe7fff5350 s3 : ffffffffffffffff s4 : 0000000000000000
[    0.000000]  s5 : ffffffff8062347e s6 : 0000000000000000 s7 : 0000000000000001
[    0.000000]  s8 : 0000000000002000 s9 : 00000000800226d0 s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : ffffffff8080a928 t4 : ffffffff8080a928
[    0.000000]  t5 : ffffffff8080a928 t6 : ffffffff8080a940
[    0.000000] status: 0000000200000100 badaddr: fffffffe7fff5350 cause: 000000000000000f
[    0.000000] [<ffffffff805e88c8>] __memset+0x8c/0x104
[    0.000000] [<ffffffff8062349c>] early_init_dt_alloc_memory_arch+0x1e/0x48
[    0.000000] [<ffffffff8043e892>] __unflatten_device_tree+0x52/0x114
[    0.000000] [<ffffffff8062441e>] unflatten_device_tree+0x9e/0xb8
[    0.000000] [<ffffffff806046fe>] setup_arch+0xd4/0x5bc
[    0.000000] [<ffffffff806007aa>] start_kernel+0x76/0x81a
[    0.000000] Code: b823 02b2 bc23 02b2 b023 04b2 b423 04b2 b823 04b2 (bc23) 04b2
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

The problem is that memblock (unaware that some physical memory cannot
be used) has allocated memory from the top of memory but which is
outside the linear mapping region.

Signed-off-by: Stuart Menefy <stuart.menefy@codasip.com>
Fixes: c99127c452 ("riscv: Make sure the linear mapping does not use the kernel mapping")
Reviewed-by: David McKay <david.mckay@codasip.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240622114217.2158495-1-stuart.menefy@codasip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01 11:46:09 -07:00
Ackerley Tng
aca0ec970d KVM: x86/mmu: fix determination of max NPT mapping level for private pages
The `if (req_max_level)` test was meant ignore req_max_level if
PG_LEVEL_NONE was returned. Hence, this function should return
max_level instead of the ignored req_max_level.

This is only a latent issue for now, since guest_memfd does not
support large pages.

Signed-off-by: Ackerley Tng <ackerleytng@google.com>
Message-ID: <20240801173955.1975034-1-ackerleytng@google.com>
Fixes: f32fb32820 ("KVM: x86: Add hook for determining max NPT mapping level")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-01 14:13:11 -04:00
Mark Rutland
adeec61a47 arm64: errata: Expand speculative SSBS workaround (again)
A number of Arm Ltd CPUs suffer from errata whereby an MSR to the SSBS
special-purpose register does not affect subsequent speculative
instructions, permitting speculative store bypassing for a window of
time.

We worked around this for a number of CPUs in commits:

* 7187bb7d0b ("arm64: errata: Add workaround for Arm errata 3194386 and 3312417")
* 75b3c43eab ("arm64: errata: Expand speculative SSBS workaround")

Since then, similar errata have been published for a number of other Arm
Ltd CPUs, for which the same mitigation is sufficient. This is described
in their respective Software Developer Errata Notice (SDEN) documents:

* Cortex-A76 (MP052) SDEN v31.0, erratum 3324349
  https://developer.arm.com/documentation/SDEN-885749/3100/

* Cortex-A77 (MP074) SDEN v19.0, erratum 3324348
  https://developer.arm.com/documentation/SDEN-1152370/1900/

* Cortex-A78 (MP102) SDEN v21.0, erratum 3324344
  https://developer.arm.com/documentation/SDEN-1401784/2100/

* Cortex-A78C (MP138) SDEN v16.0, erratum 3324346
  https://developer.arm.com/documentation/SDEN-1707916/1600/

* Cortex-A78C (MP154) SDEN v10.0, erratum 3324347
  https://developer.arm.com/documentation/SDEN-2004089/1000/

* Cortex-A725 (MP190) SDEN v5.0, erratum 3456106
  https://developer.arm.com/documentation/SDEN-2832921/0500/

* Cortex-X1 (MP077) SDEN v21.0, erratum 3324344
  https://developer.arm.com/documentation/SDEN-1401782/2100/

* Cortex-X1C (MP136) SDEN v16.0, erratum 3324346
  https://developer.arm.com/documentation/SDEN-1707914/1600/

* Neoverse-N1 (MP050) SDEN v32.0, erratum 3324349
  https://developer.arm.com/documentation/SDEN-885747/3200/

* Neoverse-V1 (MP076) SDEN v19.0, erratum 3324341
  https://developer.arm.com/documentation/SDEN-1401781/1900/

Note that due to the manner in which Arm develops IP and tracks errata,
some CPUs share a common erratum number and some CPUs have multiple
erratum numbers for the same HW issue.

On parts without SB, it is necessary to use ISB for the workaround. The
spec_bar() macro used in the mitigation will expand to a "DSB SY; ISB"
sequence in this case, which is sufficient on all affected parts.

Enable the existing mitigation by adding the relevant MIDRs to
erratum_spec_ssbs_list. The list is sorted alphanumerically (involving
moving Neoverse-V3 after Neoverse-V2) so that this is easy to audit and
potentially extend again in future. The Kconfig text is also updated to
clarify the set of affected parts and the mitigation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240801101803.1982459-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-01 16:11:28 +01:00
Mark Rutland
9ef54a3845 arm64: cputype: Add Cortex-A725 definitions
Add cputype definitions for Cortex-A725. These will be used for errata
detection in subsequent patches.

These values can be found in the Cortex-A725 TRM:

  https://developer.arm.com/documentation/107652/0001/

... in table A-247 ("MIDR_EL1 bit descriptions").

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240801101803.1982459-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-01 16:11:27 +01:00
Mark Rutland
58d245e03c arm64: cputype: Add Cortex-X1C definitions
Add cputype definitions for Cortex-X1C. These will be used for errata
detection in subsequent patches.

These values can be found in the Cortex-X1C TRM:

  https://developer.arm.com/documentation/101968/0002/

... in section B2.107 ("MIDR_EL1, Main ID Register, EL1").

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240801101803.1982459-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-01 16:11:26 +01:00
Nick Hu
3908ba2e0b RISC-V: Enable the IPI before workqueue_online_cpu()
Sometimes the hotplug cpu stalls at the arch_cpu_idle() for a while after
workqueue_online_cpu(). When cpu stalls at the idle loop, the reschedule
IPI is pending. However the enable bit is not enabled yet so the cpu stalls
at WFI until watchdog timeout. Therefore enable the IPI before the
workqueue_online_cpu() to fix the issue.

Fixes: 63c5484e74 ("workqueue: Add multiple affinity scopes and interface to select them")
Signed-off-by: Nick Hu <nick.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20240717031714.1946036-1-nick.hu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01 07:15:43 -07:00
Zhe Qiao
0c710050c4 riscv/mm: Add handling for VM_FAULT_SIGSEGV in mm_fault_error()
Handle VM_FAULT_SIGSEGV in the page fault path so that we correctly
kill the process and we don't BUG() the kernel.

Fixes: 07037db5d4 ("RISC-V: Paging and MMU")
Signed-off-by: Zhe Qiao <qiaozhe@iscas.ac.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240731084547.85380-1-qiaozhe@iscas.ac.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01 07:15:27 -07:00
Daniel Maslowski
fb197c5d2f riscv/purgatory: align riscv_kernel_entry
When alignment handling is delegated to the kernel, everything must be
word-aligned in purgatory, since the trap handler is then set to the
kexec one. Without the alignment, hitting the exception would
ultimately crash. On other occasions, the kernel's handler would take
care of exceptions.
This has been tested on a JH7110 SoC with oreboot and its SBI delegating
unaligned access exceptions and the kernel configured to handle them.

Fixes: 736e30af58 ("RISC-V: Add purgatory")
Signed-off-by: Daniel Maslowski <cyrevolt@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01 07:14:34 -07:00
Arnd Bergmann
b07ce24df7 alpha: fix ioread64be()/iowrite64be() helpers
Compile-testing the crypto/caam driver on alpha showed a pre-existing
problem on alpha with iowrite64be() missing:

ERROR: modpost: "iowrite64be" [drivers/crypto/caam/caam_jr.ko] undefined!

The prototypes were added a while ago when we started using asm-generic/io.h,
but the implementation was still missing. At some point the ioread64/iowrite64
helpers were added, but the big-endian versions are still missing, and
the generic version (using readq/writeq) is would not work here.

Change it to wrap ioread64()/iowrite64() instead.

Fixes: beba3771d9 ("crypto: caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST")
Fixes: e19d4ebc53 ("alpha: add full ioread64/iowrite64 implementation")
Fixes: 7e772dad99 ("alpha: Use generic <asm-generic/io.h>")
Closes: https://lore.kernel.org/all/CAHk-=wgEyzSxTs467NDOVfBSzWvUS6ztcwhiy=M3xog==KBmTw@mail.gmail.com/
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-01 15:48:03 +02:00
Peter Zijlstra
3db03fb499 x86/mm: Fix pti_clone_entry_text() for i386
While x86_64 has PMD aligned text sections, i386 does not have this
luxery. Notably ALIGN_ENTRY_TEXT_END is empty and _etext has PAGE
alignment.

This means that text on i386 can be page granular at the tail end,
which in turn means that the PTI text clones should consistently
account for this.

Make pti_clone_entry_text() consistent with pti_clone_kernel_text().

Fixes: 16a3fe634f ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2024-08-01 14:52:56 +02:00
Peter Zijlstra
41e71dbb0e x86/mm: Fix pti_clone_pgtable() alignment assumption
Guenter reported dodgy crashes on an i386-nosmp build using GCC-11
that had the form of endless traps until entry stack exhaust and then
#DF from the stack guard.

It turned out that pti_clone_pgtable() had alignment assumptions on
the start address, notably it hard assumes start is PMD aligned. This
is true on x86_64, but very much not true on i386.

These assumptions can cause the end condition to malfunction, leading
to a 'short' clone. Guess what happens when the user mapping has a
short copy of the entry text?

Use the correct increment form for addr to avoid alignment
assumptions.

Fixes: 16a3fe634f ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240731163105.GG33588@noisy.programming.kicks-ass.net
2024-08-01 14:52:56 +02:00
Borislav Petkov (AMD)
bf514327c3 x86/setup: Parse the builtin command line before merging
Commit in Fixes was added as a catch-all for cases where the cmdline is
parsed before being merged with the builtin one.

And promptly one issue appeared, see Link below. The microcode loader
really needs to parse it that early, but the merging happens later.

Reshuffling the early boot nightmare^W code to handle that properly would
be a painful exercise for another day so do the chicken thing and parse the
builtin cmdline too before it has been merged.

Fixes: 0c40b1c7a8 ("x86/setup: Warn when option parsing is done too early")
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240730152108.GAZqkE5Dfi9AuKllRw@fat_crate.local
Link: https://lore.kernel.org/r/20240722152330.GCZp55ck8E_FT4kPnC@fat_crate.local
2024-07-31 21:46:35 +02:00
Samuel Holland
b75a22e7d4 riscv: cpufeature: Do not drop Linux-internal extensions
The Linux-internal Xlinuxenvcfg ISA extension is omitted from the
riscv_isa_ext array because it has no DT binding and should not appear
in /proc/cpuinfo. The logic added in commit 625034abd5 ("riscv: add
ISA extensions validation callback") assumes all extensions are included
in riscv_isa_ext, and so riscv_resolve_isa() wrongly drops Xlinuxenvcfg
from the final ISA string. Instead, accept such Linux-internal ISA
extensions as if they have no validation callback.

Fixes: 625034abd5 ("riscv: add ISA extensions validation callback")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240718213011.2600150-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-31 09:53:13 -07:00
Heiko Carstens
33bd8d153c s390: Keep inittext section writable
There is no added security by making the inittext section non-writable,
however it does split part of the kernel mapping into 4K mappings
instead of 1M mappings:

---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000        14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000       796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000       228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000         4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000       332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000       692K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000         1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000         2M PTE RW NX <---
0x000003ffe1700000-0x000003ffe1800000         1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000       504K PTE RW NX
---[ Kernel Image End ]---

Keep the inittext writable and enable instruction execution protection
(aka noexec) later to prevent this. This also allows to use the
generic free_initmem() implementation.

---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000        14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000       796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000       228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000         4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000       332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000       692K PTE RW NX
0x000003ffe1400000-0x000003ffe1800000         4M PMD RW NX <---
0x000003ffe1800000-0x000003ffe187e000       504K PTE RW NX
---[ Kernel Image End ]---

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
75c10d5377 s390/vmlinux.lds.S: Move ro_after_init section behind rodata section
The .data.rel.ro and .got section were added between the rodata and
ro_after_init data section, which adds an RW mapping in between all RO
mapping of the kernel image:

---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000        14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000       796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000       228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000         4M PMD RO NX
0x000003ffe1300000-0x000003ffe1331000       196K PTE RO NX
0x000003ffe1331000-0x000003ffe13b3000       520K PTE RW NX <---
0x000003ffe13b3000-0x000003ffe13d5000       136K PTE RO NX
0x000003ffe13d5000-0x000003ffe1400000       172K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000         1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000         2M PTE RW NX
0x000003ffe1700000-0x000003ffe1800000         1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000       504K PTE RW NX
---[ Kernel Image End ]---

Move the ro_after_init data section again right behind the rodata
section to prevent interleaving RO and RW mappings:

---[ Kernel Image Start ]---
0x000003ffe0000000-0x000003ffe0e00000        14M PMD RO X
0x000003ffe0e00000-0x000003ffe0ec7000       796K PTE RO X
0x000003ffe0ec7000-0x000003ffe0f00000       228K PTE RO NX
0x000003ffe0f00000-0x000003ffe1300000         4M PMD RO NX
0x000003ffe1300000-0x000003ffe1353000       332K PTE RO NX
0x000003ffe1353000-0x000003ffe1400000       692K PTE RW NX
0x000003ffe1400000-0x000003ffe1500000         1M PMD RW NX
0x000003ffe1500000-0x000003ffe1700000         2M PTE RW NX
0x000003ffe1700000-0x000003ffe1800000         1M PMD RW NX
0x000003ffe1800000-0x000003ffe187e000       504K PTE RW NX
---[ Kernel Image End ]---

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
1e72ba5566 s390/mm: Get rid of RELOC_HIDE()
Since __va(0) does not translate to NULL anymore remove RELOC_HIDE()
which was only added to get rid of a compile warning with clang W=1:

    arch/s390/mm/vmem.c:666:36: warning: performing pointer arithmetic on
     a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
      666 |                 __set_memory_4k(__va(0), __va(0) + ident_map_size);
          |                                          ~~~~~~~ ^

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
7e12284c52 s390/mm/ptdump: Improve sorting of markers
Use the sort() from lib/sort.c to sort markers instead of the private
implementation. The current implementation does not sort markers
properly if they have to be moved downwards:

---[ Real Memory Copy Area Start ]---
0x0000035b903ff000-0x0000035b90400000         4K PTE I
---[ vmalloc Area Start ]---
---[ Real Memory Copy Area End ]---

Add a new member to each marker which indicates if a marker is start
of an area. If addresses of areas are equal consider an address which
defines the start of an area higher than the address which defines the
end of an area. In result the output is sorted as intended:

---[ Real Memory Copy Area Start ]---
0x0000019cedcff000-0x0000019cedd00000         4K PTE I
---[ Real Memory Copy Area End ]---
---[ vmalloc Area Start ]---

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
7e4d4cfed6 s390/mm/ptdump: Add support for relocated lowcore mapping
The page table dumper contains a hard coded assumption that the first
mapped area starts at address zero. With a relocated lowcore this is
not true anymore. Subsequently the first entry (lowcore) is printed as
if it would contain everything from address zero until the end of the
location of the lowcore area.

Fix this by adding a single "Kernel Virtual Address Space" entry,
which always starts at address zero. It ends when the lowcore area
starts which is either address zero, or its relocated address.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
373953444c s390/mm/ptdump: Fix handling of identity mapping area
Since virtual and real addresses are not the same anymore the
assumption that the kernel image is contained within the identity
mapping is also not true anymore.

Fix this by adding two explicit areas and at the correct locations: one
for the 8kb lowcore area, and one for the identity mapping.

Fixes: c98d2ecae0 ("s390/mm: Uncouple physical vs virtual address spaces")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
0a34c027a3 s390/alternatives: Remove unused empty header file
Remove the unused and empty arch/s390/kernel/alternative.h header
file which was added by mistake.

Fixes: 5ade5be4ed ("s390: Add infrastructure to patch lowcore accesses")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Heiko Carstens
4734406c39 s390/fpu: Re-add exception handling in load_fpu_state()
With the recent rewrite of the fpu code exception handling for the
lfpc instruction within load_fpu_state() was erroneously removed.

Add it again to prevent that loading invalid floating point register
values cause an unhandled specification exception.

Fixes: 8c09871a95 ("s390/fpu: limit save and restore to used registers")
Cc: stable@vger.kernel.org
Reported-by: Aristeu Rozanski <aris@redhat.com>
Tested-by: Aristeu Rozanski <aris@redhat.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2024-07-31 16:30:20 +02:00
Li Huafei
f73cefa3b7 perf/x86: Fix smp_processor_id()-in-preemptible warnings
The following bug was triggered on a system built with
CONFIG_DEBUG_PREEMPT=y:

 # echo p > /proc/sysrq-trigger

 BUG: using smp_processor_id() in preemptible [00000000] code: sh/117
 caller is perf_event_print_debug+0x1a/0x4c0
 CPU: 3 UID: 0 PID: 117 Comm: sh Not tainted 6.11.0-rc1 #109
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x4f/0x60
  check_preemption_disabled+0xc8/0xd0
  perf_event_print_debug+0x1a/0x4c0
  __handle_sysrq+0x140/0x180
  write_sysrq_trigger+0x61/0x70
  proc_reg_write+0x4e/0x70
  vfs_write+0xd0/0x430
  ? handle_mm_fault+0xc8/0x240
  ksys_write+0x9c/0xd0
  do_syscall_64+0x96/0x190
  entry_SYSCALL_64_after_hwframe+0x4b/0x53

This is because the commit d4b294bf84 ("perf/x86: Hybrid PMU support
for counters") took smp_processor_id() outside the irq critical section.
If a preemption occurs in perf_event_print_debug() and the task is
migrated to another cpu, we may get incorrect pmu debug information.
Move smp_processor_id() back inside the irq critical section to fix this
issue.

Fixes: d4b294bf84 ("perf/x86: Hybrid PMU support for counters")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20240729220928.325449-1-lihuafei1@huawei.com
2024-07-31 12:57:39 +02:00
Perry Yuan
bf5641eccf x86/CPU/AMD: Add models 0x60-0x6f to the Zen5 range
Add some new Zen5 models for the 0x1A family.

  [ bp: Merge the 0x60 and 0x70 ranges. ]

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240729064626.24297-1-bp@kernel.org
2024-07-30 15:22:52 +02:00
Pavan Kumar Paluri
c14e411458 x86/sev: Fix __reserved field in sev_config
sev_config currently has debug, ghcbs_initialized, and use_cas fields.
However, __reserved count has not been updated. Fix this.

Fixes: 34ff659017 ("x86/sev: Use kernel provided SVSM Calling Areas")
Signed-off-by: Pavan Kumar Paluri <papaluri@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240729180808.366587-1-papaluri@amd.com
2024-07-30 10:07:26 +02:00
Linus Torvalds
3894840a7a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux
Pull ARM updates from Russell King:

 - ftrace: don't assume stack frames are contiguous in memory

 - remove unused mod_inwind_map structure

 - spelling fixes

 - allow use of LD dead code/data elimination

 - fix callchain_trace() return value

 - add support for stackleak gcc plugin

 - correct some reset asm function prototypes for CFI

[ Missed the merge window because Russell forgot to push out ]

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux:
  ARM: 9408/1: mm: CFI: Fix some erroneous reset prototypes
  ARM: 9407/1: Add support for STACKLEAK gcc plugin
  ARM: 9406/1: Fix callchain_trace() return value
  ARM: 9404/1: arm32: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
  ARM: 9403/1: Alpine: Spelling s/initialiing/initializing/
  ARM: 9402/1: Kconfig: Spelling s/Cortex A-/Cortex-A/
  ARM: 9400/1: Remove unused struct 'mod_unwind_map'
2024-07-29 10:33:51 -07:00
Mikulas Patocka
7ae04ba36b parisc: fix a possible DMA corruption
ARCH_DMA_MINALIGN was defined as 16 - this is too small - it may be
possible that two unrelated 16-byte allocations share a cache line. If
one of these allocations is written using DMA and the other is written
using cached write, the value that was written with DMA may be
corrupted.

This commit changes ARCH_DMA_MINALIGN to be 128 on PA20 and 32 on PA1.1 -
that's the largest possible cache line size.

As different parisc microarchitectures have different cache line size, we
define arch_slab_minalign(), cache_line_size() and
dma_get_cache_alignment() so that the kernel may tune slab cache
parameters dynamically, based on the detected cache line size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-29 16:19:07 +02:00
Mikulas Patocka
1fd2c10acb parisc: fix unaligned accesses in BPF
There were spurious unaligned access warnings when calling BPF code.
Sometimes, the warnings were triggered with any incoming packet, making
the machine hard to use.

The reason for the warnings is this: on parisc64, pointers to functions
are not really pointers to functions, they are pointers to 16-byte
descriptor. The first 8 bytes of the descriptor is a pointer to the
function and the next 8 bytes of the descriptor is the content of the
"dp" register. This descriptor is generated in the function
bpf_jit_build_prologue.

The problem is that the function bpf_int_jit_compile advertises 4-byte
alignment when calling bpf_jit_binary_alloc, bpf_jit_binary_alloc
randomizes the returned array and if the array happens to be not aligned
on 8-byte boundary, the descriptor generated in bpf_jit_build_prologue is
also not aligned and this triggers the unaligned access warning.

Fix this by advertising 8-byte alignment on parisc64 when calling
bpf_jit_binary_alloc.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2024-07-29 16:19:07 +02:00
Jonathan Cameron
0f7ced7d62 x86/aperfmperf: Fix deadlock on cpu_hotplug_lock
The broken patch results in a call to init_freq_invariance_cppc() in a CPU
hotplug handler in both the path for initially present CPUs and those
hotplugged later.  That function includes a one time call to
amd_set_max_freq_ratio() which in turn calls freq_invariance_enable() that has
a static_branch_enable() which takes the cpu_hotlug_lock which is already
held.

Avoid the deadlock by using static_branch_enable_cpuslocked() as the lock will
always be already held.  The equivalent path on Intel does not already hold
this lock, so take it around the call to freq_invariance_enable(), which
results in it being held over the call to register_syscall_ops, which looks to
be safe to do.

Fixes: c1385c1f0b ("ACPI: processor: Simplify initial onlining to use same path for cold and hotplug")
Closes: https://lore.kernel.org/all/CABXGCsPvqBfL5hQDOARwfqasLRJ_eNPBbCngZ257HOe=xbWDkA@mail.gmail.com/
Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240729105504.2170-1-Jonathan.Cameron@huawei.com
2024-07-29 15:32:37 +02:00
Zhenyu Wang
b1d0e15c87 perf/x86/intel/cstate: Add pkg C2 residency counter for Sierra Forest
Package C2 residency counter is also available on Sierra Forest.
So add it support in srf_cstates.

Fixes: 3877d55a0d ("perf/x86/intel/cstate: Add Sierra Forest support")
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20240717031609.74513-1-zhenyuw@linux.intel.com
2024-07-29 12:16:22 +02:00
Linus Torvalds
1a251f52cf minmax: make generic MIN() and MAX() macros available everywhere
This just standardizes the use of MIN() and MAX() macros, with the very
traditional semantics.  The goal is to use these for C constant
expressions and for top-level / static initializers, and so be able to
simplify the min()/max() macros.

These macro names were used by various kernel code - they are very
traditional, after all - and all such users have been fixed up, with a
few different approaches:

 - trivial duplicated macro definitions have been removed

   Note that 'trivial' here means that it's obviously kernel code that
   already included all the major kernel headers, and thus gets the new
   generic MIN/MAX macros automatically.

 - non-trivial duplicated macro definitions are guarded with #ifndef

   This is the "yes, they define their own versions, but no, the include
   situation is not entirely obvious, and maybe they don't get the
   generic version automatically" case.

 - strange use case #1

   A couple of drivers decided that the way they want to describe their
   versioning is with

	#define MAJ 1
	#define MIN 2
	#define DRV_VERSION __stringify(MAJ) "." __stringify(MIN)

   which adds zero value and I just did my Alexander the Great
   impersonation, and rewrote that pointless Gordian knot as

	#define DRV_VERSION "1.2"

   instead.

 - strange use case #2

   A couple of drivers thought that it's a good idea to have a random
   'MIN' or 'MAX' define for a value or index into a table, rather than
   the traditional macro that takes arguments.

   These values were re-written as C enum's instead. The new
   function-line macros only expand when followed by an open
   parenthesis, and thus don't clash with enum use.

Happily, there weren't really all that many of these cases, and a lot of
users already had the pattern of using '#ifndef' guarding (or in one
case just using '#undef MIN') before defining their own private version
that does the same thing. I left such cases alone.

Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28 15:49:18 -07:00
Linus Torvalds
4477b39c32 minmax: add a few more MIN_T/MAX_T users
Commit 3a7e02c040 ("minmax: avoid overly complicated constant
expressions in VM code") added the simpler MIN_T/MAX_T macros in order
to avoid some excessive expansion from the rather complicated regular
min/max macros.

The complexity of those macros stems from two issues:

 (a) trying to use them in situations that require a C constant
     expression (in static initializers and for array sizes)

 (b) the type sanity checking

and MIN_T/MAX_T avoids both of these issues.

Now, in the whole (long) discussion about all this, it was pointed out
that the whole type sanity checking is entirely unnecessary for
min_t/max_t which get a fixed type that the comparison is done in.

But that still leaves min_t/max_t unnecessarily complicated due to
worries about the C constant expression case.

However, it turns out that there really aren't very many cases that use
min_t/max_t for this, and we can just force-convert those.

This does exactly that.

Which in turn will then allow for much simpler implementations of
min_t()/max_t().  All the usual "macros in all upper case will evaluate
the arguments multiple times" rules apply.

We should do all the same things for the regular min/max() vs MIN/MAX()
cases, but that has the added complexity of various drivers defining
their own local versions of MIN/MAX, so that needs another level of
fixes first.

Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
Cc: David Laight <David.Laight@aculab.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-28 13:41:14 -07:00
Linus Torvalds
bf80f1391a Merge tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
 "Most of this is a treewide change to of_property_for_each_u32() which
  was small enough to do in one go before rc1 and avoids the need to
  create of_property_for_each_u32_some_new_name().

   - Treewide conversion of of_property_for_each_u32() to drop internal
     arguments making struct property opaque

   - Add binding for Amlogic A4 SoC watchdog

   - Fix constraints for AD7192 'single-channel' property"

* tag 'devicetree-fixes-for-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  dt-bindings: iio: adc: ad7192: Fix 'single-channel' constraints
  of: remove internal arguments from of_property_for_each_u32()
  dt-bindings: watchdog: add support for Amlogic A4 SoCs
2024-07-27 12:46:16 -07:00