Commit Graph

5152 Commits

Author SHA1 Message Date
Linus Torvalds
3ca112b71f Merge tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:

 - Documentation update: Add a note about argument and return value
   fetching is the best effort because it depends on the type.

 - objpool: Fix to make internal global variables static in
   test_objpool.c.

 - kprobes: Unify kprobes_exceptions_nofify() prototypes. There are the
   same prototypes in asm/kprobes.h for some architectures, but some of
   them are missing the prototype and it causes a warning. So move the
   prototype into linux/kprobes.h.

 - tracing: Fix to check the tracepoint event and return event at
   parsing stage. The tracepoint event doesn't support %return but if
   $retval exists, it will be converted to %return silently. This finds
   that case and rejects it.

 - tracing: Fix the order of the descriptions about the parameters of
   __kprobe_event_gen_cmd_start() to be consistent with the argument
   list of the function.

* tag 'probes-fixes-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing/kprobes: Fix the order of argument descriptions
  tracing: fprobe-event: Fix to check tracepoint event and return
  kprobes: unify kprobes_exceptions_nofify() prototypes
  lib: test_objpool: make global variables static
  Documentation: tracing: Add a note about argument and retval access
2023-11-10 16:35:04 -08:00
Linus Torvalds
ac347a0655 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
 "Mostly PMU fixes and a reworking of the pseudo-NMI disabling on broken
  MediaTek firmware:

   - Move the MediaTek GIC quirk handling from irqchip to core. Before
     the merging window commit 44bd78dd2b ("irqchip/gic-v3: Disable
     pseudo NMIs on MediaTek devices w/ firmware issues") temporarily
     addressed this issue. Fixed now at a deeper level in the arch code

   - Reject events meant for other PMUs in the CoreSight PMU driver,
     otherwise some of the core PMU events would disappear

   - Fix the Armv8 PMUv3 driver driver to not truncate 64-bit registers,
     causing some events to be invisible

   - Remove duplicate declaration of __arm64_sys##name following the
     patch to avoid prototype warning for syscalls

   - Typos in the elf_hwcap documentation"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/syscall: Remove duplicate declaration
  Revert "arm64: smp: avoid NMI IPIs with broken MediaTek FW"
  arm64: Move MediaTek GIC quirk handling from irqchip to core
  arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
  perf: arm_cspmu: Reject events meant for other PMUs
  Documentation/arm64: Fix typos in elf_hwcaps
2023-11-10 12:22:14 -08:00
Arnd Bergmann
abc28463c8 kprobes: unify kprobes_exceptions_nofify() prototypes
Most architectures that support kprobes declare this function in their
own asm/kprobes.h header and provide an override, but some are missing
the prototype, which causes a warning for the __weak stub implementation:

kernel/kprobes.c:1865:12: error: no previous prototype for 'kprobe_exceptions_notify' [-Werror=missing-prototypes]
 1865 | int __weak kprobe_exceptions_notify(struct notifier_block *self,

Move the prototype into linux/kprobes.h so it is visible to all
the definitions.

Link: https://lore.kernel.org/all/20231108125843.3806765-4-arnd@kernel.org/

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-11-10 19:59:05 +09:00
Kevin Brodsky
f86128050d arm64/syscall: Remove duplicate declaration
Commit 6ac19f9651 ("arm64: avoid prototype warnings for syscalls")
added missing declarations to various syscall wrapper macros. It
however proved a little too zealous in __SYSCALL_DEFINEx(), as a
declaration for __arm64_sys##name was already present. A declaration
is required before the call to ALLOW_ERROR_INJECTION(), so keep
the original one and remove the new one.

Fixes: 6ac19f9651 ("arm64: avoid prototype warnings for syscalls")
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20231109141153.250046-1-kevin.brodsky@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-09 17:19:14 +00:00
Ilkka Koskinen
403edfa436 arm64/arm: arm_pmuv3: perf: Don't truncate 64-bit registers
The driver used to truncate several 64-bit registers such as PMCEID[n]
registers used to describe whether architectural and microarchitectural
events in range 0x4000-0x401f exist. Due to discarding the bits, the
driver made the events invisible, even if they existed.

Moreover, PMCCFILTR and PMCR registers have additional bits in the upper
32 bits. This patch makes them available although they aren't currently
used. Finally, functions handling PMXEVCNTR and PMXEVTYPER registers are
removed as they not being used at all.

Fixes: df29ddf4f0 ("arm64: perf: Abstract system register accesses away")
Reported-by: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Acked-by: Will Deacon <will@kernel.org>
Closes: https://lore.kernel.org/..
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20231102183012.1251410-1-ilkka@os.amperecomputing.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-11-07 11:00:57 +00:00
Linus Torvalds
8f6f76a6a2 Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00
Linus Torvalds
ecae0bd517 Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "Many singleton patches against the MM code. The patch series which are
  included in this merge do the following:

   - Kemeng Shi has contributed some compation maintenance work in the
     series 'Fixes and cleanups to compaction'

   - Joel Fernandes has a patchset ('Optimize mremap during mutual
     alignment within PMD') which fixes an obscure issue with mremap()'s
     pagetable handling during a subsequent exec(), based upon an
     implementation which Linus suggested

   - More DAMON/DAMOS maintenance and feature work from SeongJae Park i
     the following patch series:

	mm/damon: misc fixups for documents, comments and its tracepoint
	mm/damon: add a tracepoint for damos apply target regions
	mm/damon: provide pseudo-moving sum based access rate
	mm/damon: implement DAMOS apply intervals
	mm/damon/core-test: Fix memory leaks in core-test
	mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval

   - In the series 'Do not try to access unaccepted memory' Adrian
     Hunter provides some fixups for the recently-added 'unaccepted
     memory' feature. To increase the feature's checking coverage. 'Plug
     a few gaps where RAM is exposed without checking if it is
     unaccepted memory'

   - In the series 'cleanups for lockless slab shrink' Qi Zheng has done
     some maintenance work which is preparation for the lockless slab
     shrinking code

   - Qi Zheng has redone the earlier (and reverted) attempt to make slab
     shrinking lockless in the series 'use refcount+RCU method to
     implement lockless slab shrink'

   - David Hildenbrand contributes some maintenance work for the rmap
     code in the series 'Anon rmap cleanups'

   - Kefeng Wang does more folio conversions and some maintenance work
     in the migration code. Series 'mm: migrate: more folio conversion
     and unification'

   - Matthew Wilcox has fixed an issue in the buffer_head code which was
     causing long stalls under some heavy memory/IO loads. Some cleanups
     were added on the way. Series 'Add and use bdev_getblk()'

   - In the series 'Use nth_page() in place of direct struct page
     manipulation' Zi Yan has fixed a potential issue with the direct
     manipulation of hugetlb page frames

   - In the series 'mm: hugetlb: Skip initialization of gigantic tail
     struct pages if freed by HVO' has improved our handling of gigantic
     pages in the hugetlb vmmemmep optimizaton code. This provides
     significant boot time improvements when significant amounts of
     gigantic pages are in use

   - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
     rationalization and folio conversions in the hugetlb code

   - Yin Fengwei has improved mlock()'s handling of large folios in the
     series 'support large folio for mlock'

   - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
     added statistics for memcg v1 users which are available (and
     useful) under memcg v2

   - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
     prctl so that userspace may direct the kernel to not automatically
     propagate the denial to child processes. The series is named 'MDWE
     without inheritance'

   - Kefeng Wang has provided the series 'mm: convert numa balancing
     functions to use a folio' which does what it says

   - In the series 'mm/ksm: add fork-exec support for prctl' Stefan
     Roesch makes is possible for a process to propagate KSM treatment
     across exec()

   - Huang Ying has enhanced memory tiering's calculation of memory
     distances. This is used to permit the dax/kmem driver to use 'high
     bandwidth memory' in addition to Optane Data Center Persistent
     Memory Modules (DCPMM). The series is named 'memory tiering:
     calculate abstract distance based on ACPI HMAT'

   - In the series 'Smart scanning mode for KSM' Stefan Roesch has
     optimized KSM by teaching it to retain and use some historical
     information from previous scans

   - Yosry Ahmed has fixed some inconsistencies in memcg statistics in
     the series 'mm: memcg: fix tracking of pending stats updates
     values'

   - In the series 'Implement IOCTL to get and optionally clear info
     about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
     which permits us to atomically read-then-clear page softdirty
     state. This is mainly used by CRIU

   - Hugh Dickins contributed the series 'shmem,tmpfs: general
     maintenance', a bunch of relatively minor maintenance tweaks to
     this code

   - Matthew Wilcox has increased the use of the VMA lock over
     file-backed page faults in the series 'Handle more faults under the
     VMA lock'. Some rationalizations of the fault path became possible
     as a result

   - In the series 'mm/rmap: convert page_move_anon_rmap() to
     folio_move_anon_rmap()' David Hildenbrand has implemented some
     cleanups and folio conversions

   - In the series 'various improvements to the GUP interface' Lorenzo
     Stoakes has simplified and improved the GUP interface with an eye
     to providing groundwork for future improvements

   - Andrey Konovalov has sent along the series 'kasan: assorted fixes
     and improvements' which does those things

   - Some page allocator maintenance work from Kemeng Shi in the series
     'Two minor cleanups to break_down_buddy_pages'

   - In thes series 'New selftest for mm' Breno Leitao has developed
     another MM self test which tickles a race we had between madvise()
     and page faults

   - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
     and an optimization to the core pagecache code

   - Nhat Pham has added memcg accounting for hugetlb memory in the
     series 'hugetlb memcg accounting'

   - Cleanups and rationalizations to the pagemap code from Lorenzo
     Stoakes, in the series 'Abstract vma_merge() and split_vma()'

   - Audra Mitchell has fixed issues in the procfs page_owner code's new
     timestamping feature which was causing some misbehaviours. In the
     series 'Fix page_owner's use of free timestamps'

   - Lorenzo Stoakes has fixed the handling of new mappings of sealed
     files in the series 'permit write-sealed memfd read-only shared
     mappings'

   - Mike Kravetz has optimized the hugetlb vmemmap optimization in the
     series 'Batch hugetlb vmemmap modification operations'

   - Some buffer_head folio conversions and cleanups from Matthew Wilcox
     in the series 'Finish the create_empty_buffers() transition'

   - As a page allocator performance optimization Huang Ying has added
     automatic tuning to the allocator's per-cpu-pages feature, in the
     series 'mm: PCP high auto-tuning'

   - Roman Gushchin has contributed the patchset 'mm: improve
     performance of accounted kernel memory allocations' which improves
     their performance by ~30% as measured by a micro-benchmark

   - folio conversions from Kefeng Wang in the series 'mm: convert page
     cpupid functions to folios'

   - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
     kmemleak'

   - Qi Zheng has improved our handling of memoryless nodes by keeping
     them off the allocation fallback list. This is done in the series
     'handle memoryless nodes more appropriately'

   - khugepaged conversions from Vishal Moola in the series 'Some
     khugepaged folio conversions'"

[ bcachefs conflicts with the dynamically allocated shrinkers have been
  resolved as per Stephen Rothwell in

     https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/

  with help from Qi Zheng.

  The clone3 test filtering conflict was half-arsed by yours truly ]

* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
  mm/damon/sysfs: update monitoring target regions for online input commit
  mm/damon/sysfs: remove requested targets when online-commit inputs
  selftests: add a sanity check for zswap
  Documentation: maple_tree: fix word spelling error
  mm/vmalloc: fix the unchecked dereference warning in vread_iter()
  zswap: export compression failure stats
  Documentation: ubsan: drop "the" from article title
  mempolicy: migration attempt to match interleave nodes
  mempolicy: mmap_lock is not needed while migrating folios
  mempolicy: alloc_pages_mpol() for NUMA policy without vma
  mm: add page_rmappable_folio() wrapper
  mempolicy: remove confusing MPOL_MF_LAZY dead code
  mempolicy: mpol_shared_policy_init() without pseudo-vma
  mempolicy trivia: use pgoff_t in shared mempolicy tree
  mempolicy trivia: slightly more consistent naming
  mempolicy trivia: delete those ancient pr_debug()s
  mempolicy: fix migrate_pages(2) syscall return nr_failed
  kernfs: drop shared NUMA mempolicy hooks
  hugetlbfs: drop shared NUMA mempolicy pretence
  mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
  ...
2023-11-02 19:38:47 -10:00
Linus Torvalds
6803bd7956 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Generalized infrastructure for 'writable' ID registers, effectively
     allowing userspace to opt-out of certain vCPU features for its
     guest

   - Optimization for vSGI injection, opportunistically compressing
     MPIDR to vCPU mapping into a table

   - Improvements to KVM's PMU emulation, allowing userspace to select
     the number of PMCs available to a VM

   - Guest support for memory operation instructions (FEAT_MOPS)

   - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
     bugs and getting rid of useless code

   - Changes to the way the SMCCC filter is constructed, avoiding wasted
     memory allocations when not in use

   - Load the stage-2 MMU context at vcpu_load() for VHE systems,
     reducing the overhead of errata mitigations

   - Miscellaneous kernel and selftest fixes

  LoongArch:

   - New architecture for kvm.

     The hardware uses the same model as x86, s390 and RISC-V, where
     guest/host mode is orthogonal to supervisor/user mode. The
     virtualization extensions are very similar to MIPS, therefore the
     code also has some similarities but it's been cleaned up to avoid
     some of the historical bogosities that are found in arch/mips. The
     kernel emulates MMU, timer and CSR accesses, while interrupt
     controllers are only emulated in userspace, at least for now.

  RISC-V:

   - Support for the Smstateen and Zicond extensions

   - Support for virtualizing senvcfg

   - Support for virtualized SBI debug console (DBCN)

  S390:

   - Nested page table management can be monitored through tracepoints
     and statistics

  x86:

   - Fix incorrect handling of VMX posted interrupt descriptor in
     KVM_SET_LAPIC, which could result in a dropped timer IRQ

   - Avoid WARN on systems with Intel IPI virtualization

   - Add CONFIG_KVM_MAX_NR_VCPUS, to allow supporting up to 4096 vCPUs
     without forcing more common use cases to eat the extra memory
     overhead.

   - Add virtualization support for AMD SRSO mitigation (IBPB_BRTYPE and
     SBPB, aka Selective Branch Predictor Barrier).

   - Fix a bug where restoring a vCPU snapshot that was taken within 1
     second of creating the original vCPU would cause KVM to try to
     synchronize the vCPU's TSC and thus clobber the correct TSC being
     set by userspace.

   - Compute guest wall clock using a single TSC read to avoid
     generating an inaccurate time, e.g. if the vCPU is preempted
     between multiple TSC reads.

   - "Virtualize" HWCR.TscFreqSel to make Linux guests happy, which
     complain about a "Firmware Bug" if the bit isn't set for select
     F/M/S combos. Likewise "virtualize" (ignore) MSR_AMD64_TW_CFG to
     appease Windows Server 2022.

   - Don't apply side effects to Hyper-V's synthetic timer on writes
     from userspace to fix an issue where the auto-enable behavior can
     trigger spurious interrupts, i.e. do auto-enabling only for guest
     writes.

   - Remove an unnecessary kick of all vCPUs when synchronizing the
     dirty log without PML enabled.

   - Advertise "support" for non-serializing FS/GS base MSR writes as
     appropriate.

   - Harden the fast page fault path to guard against encountering an
     invalid root when walking SPTEs.

   - Omit "struct kvm_vcpu_xen" entirely when CONFIG_KVM_XEN=n.

   - Use the fast path directly from the timer callback when delivering
     Xen timer events, instead of waiting for the next iteration of the
     run loop. This was not done so far because previously proposed code
     had races, but now care is taken to stop the hrtimer at critical
     points such as restarting the timer or saving the timer information
     for userspace.

   - Follow the lead of upstream Xen and ignore the VCPU_SSHOTTMR_future
     flag.

   - Optimize injection of PMU interrupts that are simultaneous with
     NMIs.

   - Usual handful of fixes for typos and other warts.

  x86 - MTRR/PAT fixes and optimizations:

   - Clean up code that deals with honoring guest MTRRs when the VM has
     non-coherent DMA and host MTRRs are ignored, i.e. EPT is enabled.

   - Zap EPT entries when non-coherent DMA assignment stops/start to
     prevent using stale entries with the wrong memtype.

   - Don't ignore guest PAT for CR0.CD=1 && KVM_X86_QUIRK_CD_NW_CLEARED=y

     This was done as a workaround for virtual machine BIOSes that did
     not bother to clear CR0.CD (because ancient KVM/QEMU did not bother
     to set it, in turn), and there's zero reason to extend the quirk to
     also ignore guest PAT.

  x86 - SEV fixes:

   - Report KVM_EXIT_SHUTDOWN instead of EINVAL if KVM intercepts
     SHUTDOWN while running an SEV-ES guest.

   - Clean up the recognition of emulation failures on SEV guests, when
     KVM would like to "skip" the instruction but it had already been
     partially emulated. This makes it possible to drop a hack that
     second guessed the (insufficient) information provided by the
     emulator, and just do the right thing.

  Documentation:

   - Various updates and fixes, mostly for x86

   - MTRR and PAT fixes and optimizations"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (164 commits)
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: x86: Service NMI requests after PMI requests in VM-Enter path
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  ...
2023-11-02 15:45:15 -10:00
Linus Torvalds
1e0c505e13 Merge tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull ia64 removal and asm-generic updates from Arnd Bergmann:

 - The ia64 architecture gets its well-earned retirement as planned,
   now that there is one last (mostly) working release that will be
   maintained as an LTS kernel.

 - The architecture specific system call tables are updated for the
   added map_shadow_stack() syscall and to remove references to the
   long-gone sys_lookup_dcookie() syscall.

* tag 'asm-generic-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  hexagon: Remove unusable symbols from the ptrace.h uapi
  asm-generic: Fix spelling of architecture
  arch: Reserve map_shadow_stack() syscall number for all architectures
  syscalls: Cleanup references to sys_lookup_dcookie()
  Documentation: Drop or replace remaining mentions of IA64
  lib/raid6: Drop IA64 support
  Documentation: Drop IA64 from feature descriptions
  kernel: Drop IA64 support from sig_fault handlers
  arch: Remove Itanium (IA-64) architecture
2023-11-01 15:28:33 -10:00
Linus Torvalds
56ec8e4cd8 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
 "No major architecture features this time around, just some new HWCAP
  definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.

  The bulk of the changes is reworking of the CPU capability checking
  code (cpus_have_cap() etc).

   - Major refactoring of the CPU capability detection logic resulting
     in the removal of the cpus_have_const_cap() function and migrating
     the code to "alternative" branches where possible

   - Backtrace/kgdb: use IPIs and pseudo-NMI

   - Perf and PMU:

      - Add support for Ampere SoC PMUs

      - Multi-DTC improvements for larger CMN configurations with
        multiple Debug & Trace Controllers

      - Rework the Arm CoreSight PMU driver to allow separate
        registration of vendor backend modules

      - Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
        driver; use device_get_match_data() in the xgene driver; fix
        NULL pointer dereference in the hisi driver caused by calling
        cpuhp_state_remove_instance(); use-after-free in the hisi driver

   - HWCAP updates:

      - FEAT_SVE_B16B16 (BFloat16)

      - FEAT_LRCPC3 (release consistency model)

      - FEAT_LSE128 (128-bit atomic instructions)

   - SVE: remove a couple of pseudo registers from the cpufeature code.
     There is logic in place already to detect mismatched SVE features

   - Miscellaneous:

      - Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
        bouncing is needed. The buffer is still required for small
        kmalloc() buffers

      - Fix module PLT counting with !RANDOMIZE_BASE

      - Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
        synchronisation code out of the set_ptes() loop

      - More compact cpufeature displaying enabled cores

      - Kselftest updates for the new CPU features"

 * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  arm64/mm: Hoist synchronization out of set_ptes() loop
  ...
2023-11-01 09:34:55 -10:00
Paolo Bonzini
45b890f768 Merge tag 'kvmarm-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.7

 - Generalized infrastructure for 'writable' ID registers, effectively
   allowing userspace to opt-out of certain vCPU features for its guest

 - Optimization for vSGI injection, opportunistically compressing MPIDR
   to vCPU mapping into a table

 - Improvements to KVM's PMU emulation, allowing userspace to select
   the number of PMCs available to a VM

 - Guest support for memory operation instructions (FEAT_MOPS)

 - Cleanups to handling feature flags in KVM_ARM_VCPU_INIT, squashing
   bugs and getting rid of useless code

 - Changes to the way the SMCCC filter is constructed, avoiding wasted
   memory allocations when not in use

 - Load the stage-2 MMU context at vcpu_load() for VHE systems, reducing
   the overhead of errata mitigations

 - Miscellaneous kernel and selftest fixes
2023-10-31 16:37:07 -04:00
Linus Torvalds
3cf3fabccb Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Info Molnar:
 "Futex improvements:

   - Add the 'futex2' syscall ABI, which is an attempt to get away from
     the multiplex syscall and adds a little room for extentions, while
     lifting some limitations.

   - Fix futex PI recursive rt_mutex waiter state bug

   - Fix inter-process shared futexes on no-MMU systems

   - Use folios instead of pages

  Micro-optimizations of locking primitives:

   - Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
     architectures, to improve lockref code generation

   - Improve the x86-32 lockref_get_not_zero() main loop by adding
     build-time CMPXCHG8B support detection for the relevant lockref
     code, and by better interfacing the CMPXCHG8B assembly code with
     the compiler

   - Introduce arch_sync_try_cmpxchg() on x86 to improve
     sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
     users to sync_try_cmpxchg().

   - Micro-optimize rcuref_put_slowpath()

  Locking debuggability improvements:

   - Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well

   - Enforce atomicity of sched_submit_work(), which is de-facto atomic
     but was un-enforced previously.

   - Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
     semantics

   - Fix ww_mutex self-tests

   - Clean up const-propagation in <linux/seqlock.h> and simplify the
     API-instantiation macros a bit

  RT locking improvements:

   - Provide the rt_mutex_*_schedule() primitives/helpers and use them
     in the rtmutex code to avoid recursion vs. rtlock on the PI state.

   - Add nested blocking lockdep asserts to rt_mutex_lock(),
     rtlock_lock() and rwbase_read_lock()

  .. plus misc fixes & cleanups"

* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  futex: Don't include process MM in futex key on no-MMU
  locking/seqlock: Fix grammar in comment
  alpha: Fix up new futex syscall numbers
  locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
  locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
  locking/seqlock: Change __seqprop() to return the function pointer
  locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
  locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
  locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
  locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
  locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
  locking/seqlock: Fix typo in comment
  futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
  locking/local, arch: Rewrite local_add_unless() as a static inline function
  locking/debug: Fix debugfs API return value checks to use IS_ERR()
  locking/ww_mutex/test: Make sure we bail out instead of livelock
  locking/ww_mutex/test: Fix potential workqueue corruption
  locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
  futex: Add sys_futex_requeue()
  futex: Add flags2 argument to futex_requeue()
  ...
2023-10-30 12:38:48 -10:00
Oliver Upton
123f42f0ad Merge branch kvm-arm64/pmu_pmcr_n into kvmarm/next
* kvm-arm64/pmu_pmcr_n:
  : User-defined PMC limit, courtesy Raghavendra Rao Ananta
  :
  : Certain VMMs may want to reserve some PMCs for host use while running a
  : KVM guest. This was a bit difficult before, as KVM advertised all
  : supported counters to the guest. Userspace can now limit the number of
  : advertised PMCs by writing to PMCR_EL0.N, as KVM's sysreg and PMU
  : emulation enforce the specified limit for handling guest accesses.
  KVM: selftests: aarch64: vPMU test for validating user accesses
  KVM: selftests: aarch64: vPMU register test for unimplemented counters
  KVM: selftests: aarch64: vPMU register test for implemented counters
  KVM: selftests: aarch64: Introduce vpmu_counter_access test
  tools: Import arm_pmuv3.h
  KVM: arm64: PMU: Allow userspace to limit PMCR_EL0.N for the guest
  KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run
  KVM: arm64: Add {get,set}_user for PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR}
  KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
  KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0
  KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler
  KVM: arm64: PMU: Introduce helpers to set the guest's PMU

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:24:19 +00:00
Oliver Upton
53ce49ea75 Merge branch kvm-arm64/mops into kvmarm/next
* kvm-arm64/mops:
  : KVM support for MOPS, courtesy of Kristina Martsenko
  :
  : MOPS adds new instructions for accelerating memcpy(), memset(), and
  : memmove() operations in hardware. This series brings virtualization
  : support for KVM guests, and allows VMs to run on asymmetrict systems
  : that may have different MOPS implementations.
  KVM: arm64: Expose MOPS instructions to guests
  KVM: arm64: Add handler for MOPS exceptions

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:21:19 +00:00
Oliver Upton
a87a36436c Merge branch kvm-arm64/writable-id-regs into kvmarm/next
* kvm-arm64/writable-id-regs:
  : Writable ID registers, courtesy of Jing Zhang
  :
  : This series significantly expands the architectural feature set that
  : userspace can manipulate via the ID registers. A new ioctl is defined
  : that makes the mutable fields in the ID registers discoverable to
  : userspace.
  KVM: selftests: Avoid using forced target for generating arm64 headers
  tools headers arm64: Fix references to top srcdir in Makefile
  KVM: arm64: selftests: Test for setting ID register from usersapce
  tools headers arm64: Update sysreg.h with kernel sources
  KVM: selftests: Generate sysreg-defs.h and add to include path
  perf build: Generate arm64's sysreg-defs.h and add to include path
  tools: arm64: Add a Makefile for generating sysreg-defs.h
  KVM: arm64: Document vCPU feature selection UAPIs
  KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1
  KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1
  KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1
  KVM: arm64: Bump up the default KVM sanitised debug version to v8p8
  KVM: arm64: Reject attempts to set invalid debug arch version
  KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version
  KVM: arm64: Use guest ID register values for the sake of emulation
  KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS
  KVM: arm64: Allow userspace to get the writable masks for feature ID registers

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:21:09 +00:00
Oliver Upton
54b44ad26c Merge branch kvm-arm64/sgi-injection into kvmarm/next
* kvm-arm64/sgi-injection:
  : vSGI injection improvements + fixes, courtesy Marc Zyngier
  :
  : Avoid linearly searching for vSGI targets using a compressed MPIDR to
  : index a cache. While at it, fix some egregious bugs in KVM's mishandling
  : of vcpuid (user-controlled value) and vcpu_idx.
  KVM: arm64: Clarify the ordering requirements for vcpu/RD creation
  KVM: arm64: vgic-v3: Optimize affinity-based SGI injection
  KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available
  KVM: arm64: Build MPIDR to vcpu index cache at runtime
  KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff()
  KVM: arm64: Use vcpu_idx for invalidation tracking
  KVM: arm64: vgic: Use vcpu_idx for the debug information
  KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id
  KVM: arm64: vgic-v3: Refactor GICv3 SGI generation
  KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id
  KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:19:13 +00:00
Oliver Upton
df26b77915 Merge branch kvm-arm64/stage2-vhe-load into kvmarm/next
* kvm-arm64/stage2-vhe-load:
  : Setup stage-2 MMU from vcpu_load() for VHE
  :
  : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest
  : entry/exit in VHE mode as the host is running at EL2. Despite this KVM
  : reloads the stage-2 on every guest entry, which is needless.
  :
  : This series moves the setup of the stage-2 MMU context to vcpu_load()
  : when running in VHE mode. This is likely to be a win across the board,
  : but also allows us to remove an ISB on the guest entry path for systems
  : with one of the speculative AT errata.
  KVM: arm64: Move VTCR_EL2 into struct s2_mmu
  KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()
  KVM: arm64: Rename helpers for VHE vCPU load/put
  KVM: arm64: Reload stage-2 for VMID change on VHE
  KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()
  KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:56 +00:00
Oliver Upton
51e6079614 Merge branch kvm-arm64/nv-trap-fixes into kvmarm/next
* kvm-arm64/nv-trap-fixes:
  : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier
  :
  :  - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the
  :    NV trap encoding
  :
  :  - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI
  :    where appropriate for NV guests
  KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
  KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs
  KVM: arm64: Refine _EL2 system register list that require trap reinjection
  arm64: Add missing _EL2 encodings
  arm64: Add missing _EL12 encodings

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:46 +00:00
Oliver Upton
25a35c1a3d Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/next
* kvm-arm64/smccc-filter-cleanups:
  : Cleanup the management of KVM's SMCCC maple tree
  :
  : Avoid the cost of maintaining the SMCCC filter maple tree if userspace
  : hasn't writen a rule to the filter. While at it, rip out the now
  : unnecessary VM flag to indicate whether or not the SMCCC filter was
  : configured.
  KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured
  KVM: arm64: Only insert reserved ranges when SMCCC filter is used
  KVM: arm64: Add a predicate for testing if SMCCC filter is configured

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:37 +00:00
Oliver Upton
d47dcb67fc Merge branch kvm-arm64/feature-flag-refactor into kvmarm/next
* kvm-arm64/feature-flag-refactor:
  : vCPU feature flag cleanup
  :
  : Clean up KVM's handling of vCPU feature flags to get rid of the
  : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu().
  KVM: arm64: Get rid of vCPU-scoped feature bitmap
  KVM: arm64: Remove unused return value from kvm_reset_vcpu()
  KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Prevent NV feature flag on systems w/o nested virt
  KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl
  KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler
  KVM: arm64: Add generic check for system-supported vCPU features

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:14 +00:00
Oliver Upton
054056bf98 Merge branch kvm-arm64/misc into kvmarm/next
* kvm-arm64/misc:
  : Miscellaneous updates
  :
  :  - Put an upper bound on the number of I-cache invalidations by
  :    cacheline to avoid soft lockups
  :
  :  - Get rid of bogus refererence count transfer for THP mappings
  :
  :  - Do a local TLB invalidation on permission fault race
  :
  :  - Fixes for page_fault_test KVM selftest
  :
  :  - Add a tracepoint for detecting MMIO instructions unsupported by KVM
  KVM: arm64: Add tracepoint for MMIO accesses where ISV==0
  KVM: arm64: selftest: Perform ISB before reading PAR_EL1
  KVM: arm64: selftest: Add the missing .guest_prepare()
  KVM: arm64: Always invalidate TLB for stage-2 permission faults
  KVM: arm64: Do not transfer page refcount for THP adjustment
  KVM: arm64: Avoid soft lockups due to I-cache maintenance
  arm64: tlbflush: Rename MAX_TLBI_OPS
  KVM: arm64: Don't use kerneldoc comment for arm64_check_features()

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30 20:18:00 +00:00
Catalin Marinas
14dcf78a6c Merge branch 'for-next/cpus_have_const_cap' into for-next/core
* for-next/cpus_have_const_cap: (38 commits)
  : cpus_have_const_cap() removal
  arm64: Remove cpus_have_const_cap()
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
  arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
  arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
  arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
  arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
  arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
  arm64: Avoid cpus_have_const_cap() for ARM64_MTE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
  arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
  ...
2023-10-26 17:10:18 +01:00
Catalin Marinas
2baca17e6a Merge branch 'for-next/feat_lse128' into for-next/core
* for-next/feat_lse128:
  : HWCAP for FEAT_LSE128
  kselftest/arm64: add FEAT_LSE128 to hwcap test
  arm64: add FEAT_LSE128 HWCAP
2023-10-26 17:10:07 +01:00
Catalin Marinas
023113fe66 Merge branch 'for-next/feat_lrcpc3' into for-next/core
* for-next/feat_lrcpc3:
  : HWCAP for FEAT_LRCPC3
  selftests/arm64: add HWCAP2_LRCPC3 test
  arm64: add FEAT_LRCPC3 HWCAP
2023-10-26 17:10:05 +01:00
Catalin Marinas
2a3f8ce3bb Merge branch 'for-next/feat_sve_b16b16' into for-next/core
* for-next/feat_sve_b16b16:
  : Add support for FEAT_SVE_B16B16 (BFloat16)
  kselftest/arm64: Verify HWCAP2_SVE_B16B16
  arm64/sve: Report FEAT_SVE_B16B16 to userspace
2023-10-26 17:10:01 +01:00
Catalin Marinas
1519018ccb Merge branches 'for-next/sve-remove-pseudo-regs', 'for-next/backtrace-ipi', 'for-next/kselftest', 'for-next/misc' and 'for-next/cpufeat-display-cores', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
  perf: hisi: Fix use-after-free when register pmu fails
  drivers/perf: hisi_pcie: Initialize event->cpu only on success
  drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
  perf/arm-cmn: Enable per-DTC counter allocation
  perf/arm-cmn: Rework DTC counters (again)
  perf/arm-cmn: Fix DTC domain detection
  drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
  drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
  drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
  drivers/perf: xgene: Use device_get_match_data()
  perf/amlogic: add missing MODULE_DEVICE_TABLE
  docs/perf: Add ampere_cspmu to toctree to fix a build warning
  perf: arm_cspmu: ampere_cspmu: Add support for Ampere SoC PMU
  perf: arm_cspmu: Support implementation specific validation
  perf: arm_cspmu: Support implementation specific filters
  perf: arm_cspmu: Split 64-bit write to 32-bit writes
  perf: arm_cspmu: Separate Arm and vendor module

* for-next/sve-remove-pseudo-regs:
  : arm64/fpsimd: Remove the vector length pseudo registers
  arm64/sve: Remove SMCR pseudo register from cpufeature code
  arm64/sve: Remove ZCR pseudo register from cpufeature code

* for-next/backtrace-ipi:
  : Add IPI for backtraces/kgdb, use NMI
  arm64: smp: Don't directly call arch_smp_send_reschedule() for wakeup
  arm64: smp: avoid NMI IPIs with broken MediaTek FW
  arm64: smp: Mark IPI globals as __ro_after_init
  arm64: kgdb: Implement kgdb_roundup_cpus() to enable pseudo-NMI roundup
  arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should try for NMI
  arm64: smp: Add arch support for backtrace using pseudo-NMI
  arm64: smp: Remove dedicated wakeup IPI
  arm64: idle: Tag the arm64 idle functions as __cpuidle
  irqchip/gic-v3: Enable support for SGIs to act as NMIs

* for-next/kselftest:
  : Various arm64 kselftest updates
  kselftest/arm64: Validate SVCR in streaming SVE stress test

* for-next/misc:
  : Miscellaneous patches
  arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
  arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
  arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
  clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
  arm64: Remove system_uses_lse_atomics()
  arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
  arm64/mm: Hoist synchronization out of set_ptes() loop
  arm64: swiotlb: Reduce the default size if no ZONE_DMA bouncing needed

* for-next/cpufeat-display-cores:
  : arm64 cpufeature display enabled cores
  arm64: cpufeature: Change DBM to display enabled cores
  arm64: cpufeature: Display the set of cores with a feature
2023-10-26 17:09:52 +01:00
Marc Zyngier
3f7915ccc9 KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI
When trapping accesses from a NV guest that tries to access
SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI,
as if AArch32 wasn't implemented.

This involves a bit of repainting to make the visibility
handler more generic.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:25:06 +00:00
Miguel Luis
41f6c93447 arm64: Add missing _EL2 encodings
Some _EL2 encodings are missing. Add them.

Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
[maz: dropped secure encodings]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:24:49 +00:00
Miguel Luis
d5cb781b77 arm64: Add missing _EL12 encodings
Some _EL12 encodings are missing. Add them.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Miguel Luis <miguel.luis@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25 00:24:34 +00:00
Raghavendra Rao Ananta
4d20debf9c KVM: arm64: PMU: Set PMCR_EL0.N for vCPU based on the associated PMU
The number of PMU event counters is indicated in PMCR_EL0.N.
For a vCPU with PMUv3 configured, the value is set to the same
value as the current PE on every vCPU reset.  Unless the vCPU is
pinned to PEs that has the PMU associated to the guest from the
initial vCPU reset, the value might be different from the PMU's
PMCR_EL0.N on heterogeneous PMU systems.

Fix this by setting the vCPU's PMCR_EL0.N to the PMU's PMCR_EL0.N
value. Track the PMCR_EL0.N per guest, as only one PMU can be set
for the guest (PMCR_EL0.N must be the same for all vCPUs of the
guest), and it is convenient for updating the value.

To achieve this, the patch introduces a helper,
kvm_arm_pmu_get_max_counters(), that reads the maximum number of
counters from the arm_pmu associated to the VM. Make the function
global as upcoming patches will be interested to know the value
while setting the PMCR.N of the guest from userspace.

KVM does not yet support userspace modifying PMCR_EL0.N.
The following patch will add support for that.

Reviewed-by: Sebastian Ott <sebott@redhat.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20231020214053.2144305-5-rananta@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24 22:59:30 +00:00
Marc Zyngier
fe49fd940e KVM: arm64: Move VTCR_EL2 into struct s2_mmu
We currently have a global VTCR_EL2 value for each guest, even
if the guest uses NV. This implies that the guest's own S2 must
fit in the host's. This is odd, for multiple reasons:

- the PARange values and the number of IPA bits don't necessarily
  match: you can have 33 bits of IPA space, and yet you can only
  describe 32 or 36 bits of PARange

- When userspace set the IPA space, it creates a contract with the
  kernel saying "this is the IPA space I'm prepared to handle".
  At no point does it constraint the guest's own IPA space as
  long as the guest doesn't try to use a [I]PA outside of the
  IPA space set by userspace

- We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange.

And then there is the consequence of the above: if a guest tries
to create a S2 that has for input address something that is larger
than the IPA space defined by the host, we inject a fatal exception.

This is no good. For all intent and purposes, a guest should be
able to have the S2 it really wants, as long as the *output* address
of that S2 isn't outside of the IPA space.

For that, we need to have a per-s2_mmu VTCR_EL2 setting, which
allows us to represent the full PARange. Move the vctr field into
the s2_mmu structure, which has no impact whatsoever, except for NV.

Note that once we are able to override ID_AA64MMFR0_EL1.PARange
from userspace, we'll also be able to restrict the size of the
shadow S2 that NV uses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23 18:48:46 +00:00
Jeremy Linton
23b727dc20 arm64: cpufeature: Display the set of cores with a feature
The AMU feature can be enabled on a subset of the cores in a system.
Because of that, it prints a message for each core as it is detected.
This becomes tedious when there are hundreds of cores. Instead, for
CPU features which can be enabled on a subset of the present cores,
lets wait until update_cpu_capabilities() and print the subset of cores
the feature was enabled on.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@bytedance.com>
Tested-by: Punit Agrawal <punit.agrawal@bytedance.com>
Link: https://lore.kernel.org/r/20231017052322.1211099-2-jeremy.linton@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-23 16:40:37 +01:00
Oliver Upton
27cde4c0fe KVM: arm64: Rename helpers for VHE vCPU load/put
The names for the helpers we expose to the 'generic' KVM code are a bit
imprecise; we switch the EL0 + EL1 sysreg context and setup trap
controls that do not need to change for every guest entry/exit. Rename +
shuffle things around a bit in preparation for loading the stage-2 MMU
context on vcpu_load().

Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20 17:52:01 +00:00
Marc Zyngier
5eba523e1e KVM: arm64: Reload stage-2 for VMID change on VHE
Naturally, a change to the VMID for an MMU implies a new value for
VTTBR. Reload on VMID change in anticipation of loading stage-2 on
vcpu_load() instead of every guest entry.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20 17:52:01 +00:00
Andre Przywara
851354cbd1 clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
The AppliedMicro XGene-1 CPU has an erratum where the timer condition
would only consider TVAL, not CVAL. We currently apply a workaround when
seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption
that this would match only the XGene-1 CPU model.
However even the Ampere eMAG (aka XGene-3) uses that same part number, and
only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is
0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter
doesn't show the faulty behaviour.

Increase the specificity of the check to only consider partnum 0x000 and
variant 0x00, to exclude the Ampere eMAG.

Fixes: 012f188504 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations")
Reported-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18 10:58:59 +01:00
Gavin Shan
0899a6278a arm64: Remove system_uses_lse_atomics()
There are two variants of system_uses_lse_atomics(), depending on
CONFIG_ARM64_LSE_ATOMICS. The function isn't called anywhere when
CONFIG_ARM64_LSE_ATOMICS is disabled. It can be directly replaced
by alternative_has_cap_likely(ARM64_HAS_LSE_ATOMICS) when the kernel
option is enabled.

No need to keep system_uses_lse_atomics() and just remove it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231017005036.334067-1-gshan@redhat.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-18 10:58:59 +01:00
Catalin Marinas
dba2ff4922 arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
This argument is not used by the arm64 implementation. Mark it as
__always_unused and also remove the unnecessary 'addr' increment in
set_ptes().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310140531.BQQwt3NQ-lkp@intel.com/
Cc: Will Deacon <will@kernel.org>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/ZS6EvMiJ0QF5INkv@arm.com
2023-10-18 10:58:31 +01:00
Ryan Roberts
3425cec42c arm64/mm: Hoist synchronization out of set_ptes() loop
set_ptes() sets a physically contiguous block of memory (which all
belongs to the same folio) to a contiguous block of ptes. The arm64
implementation of this previously just looped, operating on each
individual pte. But the __sync_icache_dcache() and mte_sync_tags()
operations can both be hoisted out of the loop so that they are
performed once for the contiguous set of pages (which may be less than
the whole folio). This should result in minor performance gains.

__sync_icache_dcache() already acts on the whole folio, and sets a flag
in the folio so that it skips duplicate calls. But by hoisting the call,
all the pte testing is done only once.

mte_sync_tags() operates on each individual page with its own loop. But
by passing the number of pages explicitly, we can rely solely on its
loop and do the checks only once. This approach also makes it robust for
the future, rather than assuming if a head page of a compound page is
being mapped, then the whole compound page is being mapped, instead we
explicitly know how many pages are being mapped. The old assumption may
not continue to hold once the "anonymous large folios" feature is
merged.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20231005140730.2191134-1-ryan.roberts@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 18:27:31 +01:00
Mark Rutland
e8d4006dc2 arm64: Remove cpus_have_const_cap()
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.

Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.

The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
47759eca76 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
In arch_tlbbatch_should_defer() we use cpus_have_const_cap() to check
for ARM64_WORKAROUND_REPEAT_TLBI, but this is not necessary and
alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The cpus_have_const_cap() check in arch_tlbbatch_should_defer() is an
optimization to avoid some redundant work when the
ARM64_WORKAROUND_REPEAT_TLBI cpucap is detected and forces the immediate
use of TLBI + DSB ISH. In the window between detecting the
ARM64_WORKAROUND_REPEAT_TLBI cpucap and patching alternatives this is
not a big concern and there's no need to optimize this window at the
expsense of subsequent usage at runtime.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The ARM64_WORKAROUND_REPEAT_TLBI cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible without requiring ifdeffery or IS_ENABLED() checks at each
usage.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
0d48058ef8 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
In has_useable_cnp() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but this is not necessary and
cpus_have_cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

We use has_useable_cnp() to determine whether we have the system-wide
ARM64_HAS_CNP cpucap. Due to the structure of the cpufeature code, we
call has_useable_cnp() in two distinct cases:

1) When finalizing system capabilities, setup_system_capabilities() will
   call has_useable_cnp() with SCOPE_SYSTEM to determine whether all
   CPUs have the feature. This is called after we've detected any local
   cpucaps including ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, but prior to
   patching alternatives.

   If the ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will not
   detect ARM64_HAS_CNP.

2) After finalizing system capabilties, verify_local_cpu_capabilities()
   will call has_useable_cnp() with SCOPE_LOCAL_CPU to verify that CPUs
   have CNP if we previously detected it.

   Note that if ARM64_WORKAROUND_NVIDIA_CARMEL_CNP was detected, we will
   not have detected ARM64_HAS_CNP.

For case 1 we must check the system_cpucaps bitmap as this occurs prior
to patching the alternatives. For case 2 we'll only call
has_useable_cnp() once per subsequent onlining of a CPU, and as this
isn't a fast path it's not necessary to optimize for this case.

This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. The ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
cpucap is added to cpucap_is_possible() so that code can be elided
entirely when this is not possible.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
a98a5eac4d arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
In gic_read_iar() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_CAVIUM_23154 but this is not necessary and
alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_WORKAROUND_CAVIUM_23154 cpucap is detected and patched early
on the boot CPU before the GICv3 driver is initialized and hence before
gic_read_iar() is ever called. Thus it is not necessary to use
cpus_have_const_cap(), and alternative_has_cap() is equivalent.

In addition, arm64's gic_read_iar() lives in irq-gic-v3.c purely for
historical reasons. It was originally added prior to 32-bit arm support
in commit:

  6d4e11c5e2 ("irqchip/gicv3: Workaround for Cavium ThunderX erratum 23154")

When support for 32-bit arm was added, 32-bit arm's gic_read_iar()
implementation was placed in <asm/arch_gicv3.h>, but the arm64 version
was kept within irq-gic-v3.c as it depended on a static key local to
irq-gic-v3.c and it was easier to add ifdeffery, which is what we did in
commit:

  7936e914f7 ("irqchip/gic-v3: Refactor the arm64 specific parts")

Subsequently the static key was replaced with a cpucap in commit:

  a4023f6827 ("arm64: Add hypervisor safe helper for checking constant capabilities")

Since that commit there has been no need to keep arm64's gic_read_iar()
in irq-gic-v3.c.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. For consistency, move the arm64-specific gic_read_iar()
implementation over to arm64's <asm/arch_gicv3.h>. The
ARM64_WORKAROUND_CAVIUM_23154 cpucap is added to cpucap_is_possible() so
that code can be elided entirely when this is not possible.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
412cb3801d arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
We use cpus_have_const_cap() to check for ARM64_WORKAROUND_2645198 but
this is not necessary and alternative_has_cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_WORKAROUND_2645198 cpucap is detected and patched before any
userspace translation table exist, and the workaround is only necessary
when manipulating usrspace translation tables which are in use. Thus it
is not necessary to use cpus_have_const_cap(), and alternative_has_cap()
is equivalent.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.  The ARM64_WORKAROUND_2645198 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible, and redundant IS_ENABLED() checks are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:07 +01:00
Mark Rutland
48b57d9199 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
In elf_hwcap_fixup() we use cpus_have_const_cap() to check for
ARM64_WORKAROUND_1742098, but this is not necessary and cpus_have_cap()
would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_WORKAROUND_1742098 cpucap is detected and patched before
elf_hwcap_fixup() can run, and hence it is not necessary to use
cpus_have_const_cap(). We run cpus_have_const_cap() at most twice: once
after finalizing system cpucaps, and potentially once more after
detecting mismatched CPUs which support AArch32 at EL0. Due to this,
it's not necessary to optimize for many calls to elf_hwcap_fixup(), and
it's fine to use cpus_have_cap().

This patch replaces the use of cpus_have_const_cap() with
cpus_have_cap(), which will only generate the bitmap test and avoid
generating an alternative sequence, resulting in slightly simpler annd
smaller code being generated. For consistenct with other cpucaps, the
ARM64_WORKAROUND_1742098 cpucap is added to cpucap_is_possible() so that
code can be elided when this is not possible. However, as we only define
compat_elf_hwcap2 when CONFIG_COMPAT=y, some ifdeffery is still required
within user_feature_fixup() to avoid build errors when CONFIG_COMPAT=n.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:06 +01:00
Mark Rutland
0a285dfe87 arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
In count_plts() and is_forbidden_offset_for_adrp() we use
cpus_have_const_cap() to check for ARM64_WORKAROUND_843419, but this is
not necessary and cpus_have_final_cap() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

It's not possible to load a module in the window between detecting the
ARM64_WORKAROUND_843419 cpucap and patching alternatives. The module VA
range limits are initialized much later in module_init_limits() which is
a subsys_initcall, and module loading cannot happen before this. Hence
it's not necessary for count_plts() or is_forbidden_offset_for_adrp() to
use cpus_have_const_cap().

This patch replaces the use of cpus_have_const_cap() with
cpus_have_final_cap() which will avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime. Using cpus_have_final_cap() clearly documents that we do not
expect this code to run before cpucaps are finalized, and will make it
easier to spot issues if code is changed in future to allow modules to
be loaded earlier. The ARM64_WORKAROUND_843419 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is not
possible, and redundant IS_ENABLED() checks are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:06 +01:00
Mark Rutland
c2ef5f1e15 arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
In arm64_kernel_unmapped_at_el0() we use cpus_have_const_cap() to check
for ARM64_UNMAP_KERNEL_AT_EL0, but this is only necessary so that
arm64_get_bp_hardening_vector() and this_cpu_set_vectors() can run prior
to alternatives being patched. Otherwise this is not necessary and
alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is a system-wide feature that is
detected and patched before any translation tables are created for
userspace. In the window between detecting the ARM64_UNMAP_KERNEL_AT_EL0
cpucap and patching alternatives, most users of
arm64_kernel_unmapped_at_el0() do not need to know that the cpucap has
been detected:

* As KVM is initialized after cpucaps are finalized, no usaef of
  arm64_kernel_unmapped_at_el0() in the KVM code is reachable during
  this window.

* The arm64_mm_context_get() function in arch/arm64/mm/context.c is only
  called after the SMMU driver is brought up after alternatives have
  been patched. Thus this can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

  Similarly the asids_update_limit() function is called after
  alternatives have been patched as an arch_initcall, and this can
  safely use cpus_have_final_cap() or alternative_has_cap_*().

  Similarly we do not expect an ASID rollover to occur between cpucaps
  being detected and patching alternatives. Thus
  set_reserved_asid_bits() can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The __tlbi_user() and __tlbi_user_level() macros are not used during
  this window, and only need to invalidate additional entries once
  userspace translation tables have been active on a CPU. Thus these can
  safely use alternative_has_cap_*().

* The xen_kernel_unmapped_at_usr() function is not used during this
  window as it is only used in a late_initcall. Thus this can safely use
  cpus_have_final_cap() or alternative_has_cap_*().

* The arm64_get_meltdown_state() function is not used during this
  window. It only used by arm64_get_meltdown_state() and KVM code, both
  of which are only used after cpucaps have been finalized. Thus this
  can safely use cpus_have_final_cap() or alternative_has_cap_*().

* The tls_thread_switch() uses arm64_kernel_unmapped_at_el0() as an
  optimization to avoid zeroing tpidrro_el0 when KPTI is enabled
  and this will be trampled by the KPTI trampoline. It doesn't matter if
  this continues to zero the register during the window between
  detecting the cpucap and patching alternatives, so this can safely use
  alternative_has_cap_*().

* The sdei_arch_get_entry_point() and do_sdei_event() functions aren't
  reachable at this time as the SDEI driver is registered later by
  acpi_init() -> acpi_ghes_init() -> sdei_init(), where acpi_init is a
  subsys_initcall. Thus these can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The uses under drivers/ aren't reachable at this time as the drivers
  are registered later:

  - TRBE is registered via module_init()
  - SMMUv3 is registred via module_driver()
  - SPE is registred via module_init()

* The arm64_get_bp_hardening_vector() and this_cpu_set_vectors()
  functions need to run on boot CPUs prior to patching alternatives.
  As these are only called during the onlining of a CPU, it's fine to
  perform a system_cpucaps bitmap test using cpus_have_cap().

This patch modifies this_cpu_set_vectors() to use cpus_have_cap(), and
replaced all other use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The ARM64_UNMAP_KERNEL_AT_EL0 cpucap is added to
cpucap_is_possible() so that code can be elided entirely when this is
not possible.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:06 +01:00
Mark Rutland
a76521d160 arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
In system_supports_{sve,sme,sme2,fa64}() we use cpus_have_const_cap() to
check for the relevant cpucaps, but this is only necessary so that
sve_setup() and sme_setup() can run prior to alternatives being patched,
and otherwise alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

All of system_supports_{sve,sme,sme2,fa64}() will return false prior to
system cpucaps being detected. In the window between system cpucaps being
detected and patching alternatives, we need system_supports_sve() and
system_supports_sme() to run to initialize SVE and SME properties, but
all other users of system_supports_{sve,sme,sme2,fa64}() don't depend on
the relevant cpucap becoming true until alternatives are patched:

* No KVM code runs until after alternatives are patched, and so this can
  safely use cpus_have_final_cap() or alternative_has_cap_*().

* The cpuid_cpu_online() callback in arch/arm64/kernel/cpuinfo.c is
  registered later from cpuinfo_regs_init() as a device_initcall, and so
  this can safely use cpus_have_final_cap() or alternative_has_cap_*().

* The entry, signal, and ptrace code isn't reachable until userspace has
  run, and so this can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* Currently perf_reg_validate() will un-reserve the PERF_REG_ARM64_VG
  pseudo-register before alternatives are patched, and before
  sve_setup() has run. If a sampling event is created early enough, this
  would allow perf_ext_reg_value() to sample (the as-yet uninitialized)
  thread_struct::vl[] prior to alternatives being patched.

  It would be preferable to defer this until alternatives are patched,
  and this can safely use alternative_has_cap_*().

* The context-switch code will run during this window as part of
  stop_machine() used during alternatives_patch_all(), and potentially
  for other work if other kernel threads are created early. No threads
  require the use of SVE/SME/SME2/FA64 prior to alternatives being
  patched, and it would be preferable for the related context-switch
  logic to take effect after alternatives are patched so that ths is
  guaranteed to see a consistent system-wide state (e.g. anything
  initialized by sve_setup() and sme_setup().

  This can safely ues alternative_has_cap_*().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime. The sve_setup() and sme_setup() functions are modified to
use cpus_have_cap() directly so that they can observe the cpucaps being
set prior to alternatives being patched.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:06 +01:00
Mark Rutland
af64543977 arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
In arm64_apply_bp_hardening() we use cpus_have_const_cap() to check for
ARM64_SPECTRE_V2 , but this is not necessary and alternative_has_cap_*()
would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The cpus_have_const_cap() check in arm64_apply_bp_hardening() is
intended to avoid the overhead of looking up and invoking a per-cpu
function pointer when no branch predictor hardening is required. The
arm64_apply_bp_hardening() function itself is called in two distinct
flows:

1) When handling certain exceptions taken from EL0, where the PC could
   be a TTBR1 address and hence might have trained a branch predictor.

   As cpucaps are detected and alternatives are patched long before it
   is possible to execute userspace, it is not necessary to use
   cpus_have_const_cap() for these cases, and cpus_have_final_cap() or
   alternative_has_cap() would be preferable.

2) When switching between tasks in check_and_switch_context().

   This can be called before cpucaps are detected and alternatives are
   patched, but this is long before the kernel mounts filesystems or
   accepts any input. At this stage the kernel hasn't loaded any secrets
   and there is no potential for hostile branch predictor training. Once
   cpucaps have been finalized and alternatives have been patched,
   switching tasks will invalidate any prior predictions. Hence it is
   not necessary to use cpus_have_const_cap() for this case.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:05 +01:00
Mark Rutland
94324bcbc9 arm64: Avoid cpus_have_const_cap() for ARM64_MTE
In system_supports_mte() we use cpus_have_const_cap() to check for
ARM64_MTE, but this is not necessary and cpus_have_final_boot_cap()
would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

The ARM64_MTE cpucap is a boot cpu feature which is detected and patched
early on the boot CPU under smp_prepare_boot_cpu(). In the window
between detecting the ARM64_MTE cpucap and patching alternatives,
nothing depends on the ARM64_MTE cpucap:

* The kasan_hw_tags_enabled() helper depends upon the kasan_flag_enabled
  static key, which is initialized later in kasan_init_hw_tags() after
  alternatives have been applied.

* No KVM code is called during this window, and KVM is not initialized
  until after system cpucaps have been detected and patched. KVM code
  can safely use cpus_have_final_cap() or alternative_has_cap_*().

* We don't context-switch prior to patching boot alternatives, and thus
  mte_thread_switch() is not reachable during this window. Thus, we can
  safely use cpus_have_final_boot_cap() or alternative_has_cap_*() in
  the context-switch code.

* IRQ and FIQ are masked during this window, and we can only take SError
  and Debug exceptions. SError exceptions are fatal at this point in
  time, and we do not expect to take Debug exceptions, thus:

  - It's fine to lave TCO set for exceptions taken during this window,
    and mte_disable_tco_entry() doesn't need to do anything.

  - We don't need to detect and report asynchronous tag cehck faults
    during this window, and neither mte_check_tfsr_entry() nor
    mte_check_tfsr_exit() need to do anything.

  Since we want to report any SErrors taken during thiw window, these
  cannot safely use cpus_have_final_boot_cap() or cpus_have_final_cap(),
  but these can safely use alternative_has_cap_*().

* The __set_pte_at() function is not used during this window. It is
  possible for this to be used on kernel mappings prior to boot cpucaps
  being finalized, so this cannot safely use cpus_have_final_boot_cap()
  or cpus_have_final_cap(), but this can safely use
  alternative_has_cap_*().

* No userspace translation tables have been created yet, and swap has
  not been initialized yet. Thus swapping is not possible and none of
  the following are called:

  - arch_thp_swp_supported()
  - arch_prepare_to_swap()
  - arch_swap_invalidate_page()
  - arch_swap_invalidate_area()
  - arch_swap_restore()

  These can safely use system_has_final_cap() or
  alternative_has_cap_*().

* The elfcore functions are only reachable after userspace is brought
  up, which happens after system cpucaps have been detected and patched.
  Thus the elfcore code can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* Hibernation is only possible after userspace is brought up, which
  happens after system cpucaps have been detected and patched. Thus the
  hibernate code can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The set_tagged_addr_ctrl() function is only reachable after userspace
  is brought up, which happens after system cpucaps have been detected
  and patched. Thus this can safely use cpus_have_final_cap() or
  alternative_has_cap_*().

* The copy_user_highpage() and copy_highpage() functions are not used
  during this window, and can safely use alternative_has_cap_*().

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which avoid generating code to test the
system_cpucaps bitmap and should be better for all subsequent calls at
runtime.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:05 +01:00
Mark Rutland
b54b525764 arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
We use cpus_have_const_cap() to check for ARM64_HAS_TLB_RANGE, but this
is not necessary and alternative_has_cap_*() would be preferable.

For historical reasons, cpus_have_const_cap() is more complicated than
it needs to be. Before cpucaps are finalized, it will perform a bitmap
test of the system_cpucaps bitmap, and once cpucaps are finalized it
will use an alternative branch. This used to be necessary to handle some
race conditions in the window between cpucap detection and the
subsequent patching of alternatives and static branches, where different
branches could be out-of-sync with one another (or w.r.t. alternative
sequences). Now that we use alternative branches instead of static
branches, these are all patched atomically w.r.t. one another, and there
are only a handful of cases that need special care in the window between
cpucap detection and alternative patching.

Due to the above, it would be nice to remove cpus_have_const_cap(), and
migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(),
or cpus_have_cap() depending on when their requirements. This will
remove redundant instructions and improve code generation, and will make
it easier to determine how each callsite will behave before, during, and
after alternative patching.

In the window between detecting the ARM64_HAS_TLB_RANGE cpucap and
patching alternative branches, we do not perform any TLB invalidation,
and even if we were to perform TLB invalidation here it would not be
functionally necessary to optimize this by using range invalidation.
Hence there's no need to use cpus_have_const_cap(), and
alternative_has_cap_unlikely() is sufficient.

This patch replaces the use of cpus_have_const_cap() with
alternative_has_cap_unlikely(), which will avoid generating code to test
the system_cpucaps bitmap and should be better for all subsequent calls
at runtime.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2023-10-16 14:17:05 +01:00