Commit Graph

7565 Commits

Author SHA1 Message Date
Christophe Leroy
28da734d58 powerpc/machdep: Define 'compatibles' property in ppc_md and use it
Most probe functions that do not use the 'compatible' string do
nothing else than checking whether the machine is compatible with
one of the strings in a NULL terminated table of strings.

Define that table of strings in ppc_md structure and check it directly
from probe_machine() instead of using ppc_md.probe() for that.

Keep checking in ppc_md.probe() only for more complex probing.

All .compatible could be replaced with a single element NULL
terminated list but that's not worth the churn. Can be do incrementaly
in follow-up patches.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231214103152.12269-4-mpe@ellerman.id.au
2024-03-03 22:20:29 +11:00
Christophe Leroy
3c8016e681 powerpc: Refactor __kernel_map_pages()
__kernel_map_pages() is almost identical for PPC32 and RADIX.

Refactor it.

On PPC32 it is not needed for KFENCE, but to keep it simple
just make it similar to PPC64.

Move the prototype of hash__kernel_map_pages() into mmu_decl.h to allow
IS_ENABLED() to work on 32-bit.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/3656d47c53bff577739dac536dbae31fff52f6d8.1708078640.git.christophe.leroy@csgroup.eu
2024-03-03 22:18:45 +11:00
Linus Torvalds
ab0a97cffa Merge tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - Fix a crash when hot adding a PCI device to an LPAR since
   recent changes

 - Fix nested KVM level-2 guest reboot failure due to empty
   'arch_compat'

Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav
Batra, and Vaibhav Jain.

* tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
  powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
2024-02-24 16:49:51 -08:00
Nathan Lynch
fad87dbd48 powerpc/rtas: use correct function name for resetting TCE tables
The PAPR spec spells the function name as

  "ibm,reset-pe-dma-windows"

but in practice firmware uses the singular form:

  "ibm,reset-pe-dma-window"

in the device tree. Since we have the wrong spelling in the RTAS
function table, reverse lookups (token -> name) fail and warn:

  unexpected failed lookup for token 86
  WARNING: CPU: 1 PID: 545 at arch/powerpc/kernel/rtas.c:659 __do_enter_rtas_trace+0x2a4/0x2b4
  CPU: 1 PID: 545 Comm: systemd-udevd Not tainted 6.8.0-rc4 #30
  Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NL1060_028) hv:phyp pSeries
  NIP [c0000000000417f0] __do_enter_rtas_trace+0x2a4/0x2b4
  LR [c0000000000417ec] __do_enter_rtas_trace+0x2a0/0x2b4
  Call Trace:
   __do_enter_rtas_trace+0x2a0/0x2b4 (unreliable)
   rtas_call+0x1f8/0x3e0
   enable_ddw.constprop.0+0x4d0/0xc84
   dma_iommu_dma_supported+0xe8/0x24c
   dma_set_mask+0x5c/0xd8
   mlx5_pci_init.constprop.0+0xf0/0x46c [mlx5_core]
   probe_one+0xfc/0x32c [mlx5_core]
   local_pci_probe+0x68/0x12c
   pci_call_probe+0x68/0x1ec
   pci_device_probe+0xbc/0x1a8
   really_probe+0x104/0x570
   __driver_probe_device+0xb8/0x224
   driver_probe_device+0x54/0x130
   __driver_attach+0x158/0x2b0
   bus_for_each_dev+0xa8/0x120
   driver_attach+0x34/0x48
   bus_add_driver+0x174/0x304
   driver_register+0x8c/0x1c4
   __pci_register_driver+0x68/0x7c
   mlx5_init+0xb8/0x118 [mlx5_core]
   do_one_initcall+0x60/0x388
   do_init_module+0x7c/0x2a4
   init_module_from_file+0xb4/0x108
   idempotent_init_module+0x184/0x34c
   sys_finit_module+0x90/0x114

And oopses are possible when lockdep is enabled or the RTAS
tracepoints are active, since those paths dereference the result of
the lookup.

Use the correct spelling to match firmware's behavior, adjusting the
related constants to match.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 8252b88294 ("powerpc/rtas: improve function information lookups")
Reported-by: Gaurav Batra <gbatra@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240222-rtas-fix-ibm-reset-pe-dma-window-v1-1-7aaf235ac63c@linux.ibm.com
2024-02-23 17:21:12 +11:00
David Hildenbrand
4d5bf0b618 mm/mmu_gather: add tlb_remove_tlb_entries()
Let's add a helper that lets us batch-process multiple consecutive PTEs.

Note that the loop will get optimized out on all architectures except on
powerpc.  We have to add an early define of __tlb_remove_tlb_entry() on
ppc to make the compiler happy (and avoid making tlb_remove_tlb_entries()
a macro).

[arnd@kernel.org: change __tlb_remove_tlb_entry() to an inline function]
  Link: https://lkml.kernel.org/r/20240221154549.2026073-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20240214204435.167852-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:27:17 -08:00
David Hildenbrand
f7dc4d689e powerpc/pgtable: define PFN_PTE_SHIFT
We want to make use of pte_next_pfn() outside of set_ptes().  Let's simply
define PFN_PTE_SHIFT, required by pte_next_pfn().

Link: https://lkml.kernel.org/r/20240129124649.189745-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Russell King (Oracle) <linux@armlinux.org.uk>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 10:24:50 -08:00
Christophe Leroy
09ca1b1171 powerpc: Implement set_memory_rox()
Same as x86 and s390, add set_memory_rox() to avoid doing
one pass with set_memory_ro() and a second pass with set_memory_x().

See commit 60463628c9 ("x86/mm: Implement native set_memory_rox()")
and commit 22e99fa564 ("s390/mm: implement set_memory_rox()") for
more information.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/dc9a794f82ab62572d7d0be5cb4b8b27920a4f78.1708078316.git.christophe.leroy@csgroup.eu
2024-02-22 21:55:33 +11:00
Christophe Leroy
d5835fb60b powerpc: Use user_mode() macro when possible
There is a nice macro to check user mode.

Use it instead of open coding anding with MSR_PR to increase
readability and avoid having to comment what that anding is for.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/fbf74887dcf1f1ba9e1680fc3247cbb581b00662.1708078228.git.christophe.leroy@csgroup.eu
2024-02-22 21:55:33 +11:00
Christophe Leroy
9e00743aba powerpc/trace: Restrict hash_fault trace event to HASH MMU
'perf list' on powerpc 8xx shows an event named "1:hash_fault".

This event is pointless because trace_hash_fault() is called only
from mm/book3s64/hash_utils.c

Only define it when CONFIG_PPC_64S_HASH_MMU is selected.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/85a86e51b4ab26ce4b592984cc0a0851a3cc9479.1708076780.git.christophe.leroy@csgroup.eu
2024-02-22 21:55:33 +11:00
Nicholas Piggin
6735fef14c powerpc/ps3: Fix lv1 hcall assembly for ELFv2 calling convention
Stack-passed parameters begin at a different offset in the caller's
stack in the ELFv2 ABI.

Reported-by: Geoff Levand <geoff@infradead.org>
Fixes: 8c5fa3b5c4 ("powerpc/64: Make ELFv2 the default for big-endian builds")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231227072405.63751-2-npiggin@gmail.com
2024-02-21 23:14:52 +11:00
Nicholas Piggin
6e9de2054e powerpc/pseries: Set CPU_FTR_DBELL according to ibm,pi-features
PAPR will define a new ibm,pi-features bit which says that doorbells
should not be used even on architectures where they exist. This could be
because they are emulated and slower than using the interrupt controller
directly for IPIs.

Wire this bit into the pi-features parser to clear CPU_FTR_DBELL, and
ensure CPU_FTR_DBELL is not in CPU_FTRS_ALWAYS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240207035220.339726-2-npiggin@gmail.com
2024-02-21 23:14:52 +11:00
Madhavan Srinivasan
c2ed087ed3 powerpc: Add Power11 architected and raw mode
Add CPU table entries for raw and architected mode. Most fields are
copied from the Power10 table entries.

CPU, MMU and user (ELF_HWCAP) features are unchanged vs P10. However
userspace can detect P11 because the AT_PLATFORM value changes to
"power11".

The logical PVR value of 0x0F000007, passed to firmware via the
ibm_arch_vec, indicates the kernel can support a P11 compatible CPU,
which means at least ISA v3.1 compliant.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240221044623.1598642-1-mpe@ellerman.id.au
2024-02-21 23:11:00 +11:00
Shrikanth Hegde
8c328de8fd powerpc: Remove duplicate/unnecessary ifdefs
When an ifdef is used in the below manner, second one could be considered
as duplicate.

  ifdef DEFINE_A
  ...code block...
  ifdef DEFINE_A       <-- This is a duplicate.
  ...code block...
  endif
  else
  ifndef DEFINE_A     <-- This is also duplicate.
  ...code block...
  endif
  endif

More details about the script and methods used to find these code
patterns are in cover letter of [1].

Few places in arch/powerpc where this pattern was seen:

  paca.h:
  Hunk1: Code is under check of CONFIG_PPC64 from line 13, hence the
  second CONFIG_PPC64 at line 166 is a duplicate.
  Hunk2: CONFIG_PPC_BOOK3S_64 was defined back to back. Merged the two
  ifdefs.

  asm-offsets.c:
  Code is under check of CONFIG_PPC64 from line 176 hence second
  CONFIG_PPC64 at line 249 is a duplicate.

  powermac/feature.c:
  #ifndef CONFIG_PPC64 is used at line 2066. And then in #else again
  #ifdef CONFIG_PPC64 is used. Which is a duplicate since in #else means
  CONFIG_PPC64 is defined.

  xmon.c:
  Code is under the check of CONFIG_SMP from line 521 hence the same
  check of CONFIG_SMP at line 646 is a duplicate.

No functional change is intended here. It only aims to improve code
readability.

[1] https://lore.kernel.org/all/20240118080326.13137-1-sshegde@linux.ibm.com/

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240216053016.528906-1-sshegde@linux.ibm.com
2024-02-21 15:15:40 +11:00
Gaurav Batra
a5c57fd2e9 powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
When a PCI device is dynamically added, the kernel oopses with a NULL
pointer dereference:

  BUG: Kernel NULL pointer dereference on read at 0x00000030
  Faulting instruction address: 0xc0000000006bbe5c
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs xsk_diag bonding nft_compat nf_tables nfnetlink rfkill binfmt_misc dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c mlx5_core mlxfw sd_mod t10_pi sg tls ibmvscsi ibmveth scsi_transport_srp vmx_crypto pseries_wdt psample dm_mirror dm_region_hash dm_log dm_mod fuse
  CPU: 17 PID: 2685 Comm: drmgr Not tainted 6.7.0-203405+ #66
  Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries
  NIP:  c0000000006bbe5c LR: c000000000a13e68 CTR: c0000000000579f8
  REGS: c00000009924f240 TRAP: 0300   Not tainted  (6.7.0-203405+)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002220  XER: 20040006
  CFAR: c000000000a13e64 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 0
  ...
  NIP sysfs_add_link_to_group+0x34/0x94
  LR  iommu_device_link+0x5c/0x118
  Call Trace:
   iommu_init_device+0x26c/0x318 (unreliable)
   iommu_device_link+0x5c/0x118
   iommu_init_device+0xa8/0x318
   iommu_probe_device+0xc0/0x134
   iommu_bus_notifier+0x44/0x104
   notifier_call_chain+0xb8/0x19c
   blocking_notifier_call_chain+0x64/0x98
   bus_notify+0x50/0x7c
   device_add+0x640/0x918
   pci_device_add+0x23c/0x298
   of_create_pci_dev+0x400/0x884
   of_scan_pci_dev+0x124/0x1b0
   __of_scan_bus+0x78/0x18c
   pcibios_scan_phb+0x2a4/0x3b0
   init_phb_dynamic+0xb8/0x110
   dlpar_add_slot+0x170/0x3b8 [rpadlpar_io]
   add_slot_store.part.0+0xb4/0x130 [rpadlpar_io]
   kobj_attr_store+0x2c/0x48
   sysfs_kf_write+0x64/0x78
   kernfs_fop_write_iter+0x1b0/0x290
   vfs_write+0x350/0x4a0
   ksys_write+0x84/0x140
   system_call_exception+0x124/0x330
   system_call_vectored_common+0x15c/0x2ec

Commit a940904443 ("powerpc/iommu: Add iommu_ops to report capabilities
and allow blocking domains") broke DLPAR add of PCI devices.

The above added iommu_device structure to pci_controller. During
system boot, PCI devices are discovered and this newly added iommu_device
structure is initialized by a call to iommu_device_register().

During DLPAR add of a PCI device, a new pci_controller structure is
allocated but there are no calls made to iommu_device_register()
interface.

Fix is to register the iommu device during DLPAR add as well.

Fixes: a940904443 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240215221833.4817-1-gbatra@linux.ibm.com
2024-02-19 16:16:34 +11:00
Linus Torvalds
c02197fc90 Merge tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
 "This is a bit of a big batch for rc4, but just due to holiday hangover
  and because I didn't send any fixes last week due to a late revert
  request. I think next week should be back to normal.

   - Fix ftrace bug on boot caused by exit text sections with
     '-fpatchable-function-entry'

   - Fix accuracy of stolen time on pseries since the switch to
     VIRT_CPU_ACCOUNTING_GEN

   - Fix a crash in the IOMMU code when doing DLPAR remove

   - Set pt_regs->link on scv entry to fix BPF stack unwinding

   - Add missing PPC_FEATURE_BOOKE on 64-bit e5500/e6500, which broke
     gdb

   - Fix boot on some 6xx platforms with STRICT_KERNEL_RWX enabled

   - Fix build failures with KASAN enabled and 32KB stack size

   - Some other minor fixes

  Thanks to Arnd Bergmann, Benjamin Gray, Christophe Leroy, David
  Engraf, Gaurav Batra, Jason Gunthorpe, Jiangfeng Xiao, Matthias
  Schiffer, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A,
  R Nageswara Sastry, Shivaprasad G Bhat, Shrikanth Hegde, Spoorthy,
  Srikar Dronamraju, and Venkat Rao Bagalkote"

* tag 'powerpc-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/iommu: Fix the missing iommu_group_put() during platform domain attach
  powerpc/pseries: fix accuracy of stolen time
  powerpc/ftrace: Ignore ftrace locations in exit text sections
  powerpc/cputable: Add missing PPC_FEATURE_BOOKE on PPC64 Book-E
  powerpc/kasan: Limit KASAN thread size increase to 32KB
  Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
  powerpc: 85xx: mark local functions static
  powerpc: udbg_memcons: mark functions static
  powerpc/kasan: Fix addr error caused by page alignment
  powerpc/6xx: set High BAT Enable flag on G2_LE cores
  selftests/powerpc/papr_vpd: Check devfd before get_system_loc_code()
  powerpc/64: Set task pt_regs->link to the LR value on scv entry
  powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
  powerpc/pseries/papr-sysparm: use u8 arrays for payloads
2024-02-17 16:59:31 -08:00
Ricardo B. Marliere
14ce0dbb56 powerpc: ibmebus: make ibmebus_bus_type const
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the ibmebus_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-5-8441b3f77827@marliere.net
2024-02-15 00:14:06 +11:00
Ricardo B. Marliere
112202f34e powerpc: pmac: make macio_bus_type const
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the macio_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-4-8441b3f77827@marliere.net
2024-02-15 00:14:06 +11:00
Ricardo B. Marliere
8e3d0b8d99 powerpc: mpic: make mpic_subsys const
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the mpic_subsys variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-3-8441b3f77827@marliere.net
2024-02-15 00:14:06 +11:00
Ricardo B. Marliere
565206aaa6 powerpc: vio: make vio_bus_type const
Since commit d492cc2573 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the vio_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.

Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212-bus_cleanup-powerpc2-v2-2-8441b3f77827@marliere.net
2024-02-15 00:14:06 +11:00
Christophe Leroy
c5aebb53b3 powerpc: Force inlining of arch_vmap_p{u/m}d_supported()
arch_vmap_pud_supported() and arch_vmap_pmd_supported() are
expected to constant-fold to false when RADIX is not enabled.

Force inlining in order to avoid following failure which
leads to unexpected call of non-existing pud_set_huge() and
pmd_set_huge() on powerpc 8xx.

In function 'pud_huge_tests',
    inlined from 'debug_vm_pgtable' at mm/debug_vm_pgtable.c:1399:2:
./arch/powerpc/include/asm/vmalloc.h:9:33: warning: inlining failed in call to 'arch_vmap_pud_supported.isra': call is unlikely and code size would grow [-Winline]
    9 | #define arch_vmap_pud_supported arch_vmap_pud_supported
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/vmalloc.h:10:20: note: in expansion of macro 'arch_vmap_pud_supported'
   10 | static inline bool arch_vmap_pud_supported(pgprot_t prot)
      |                    ^~~~~~~~~~~~~~~~~~~~~~~
./arch/powerpc/include/asm/vmalloc.h:9:33: note: called from here
    9 | #define arch_vmap_pud_supported arch_vmap_pud_supported
mm/debug_vm_pgtable.c:458:14: note: in expansion of macro 'arch_vmap_pud_supported'
  458 |         if (!arch_vmap_pud_supported(args->page_prot) ||
      |              ^~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402131836.OU1TDuoi-lkp@intel.com/
Fixes: 8309c9d717 ("powerpc: inline huge vmap supported functions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/bbd84ad52bf377e8d3b5865a906f2dc5d99964ba.1707832677.git.christophe.leroy@csgroup.eu
2024-02-15 00:14:05 +11:00
Michael Ellerman
0875f1ceba powerpc/smp: Remap boot CPU onto core 0 if >= nr_cpu_ids
If nr_cpu_ids is too low to include the boot CPU, remap the boot CPU
onto logical core 0.

This is achieved in two stages. In early_init_dt_scan_cpus() the boot
CPU is renumbered to be on logical core 0, and the original boot core's
hardware ID is recorded.

Later in smp_setup_cpu_maps(), if the original boot core ID is set, the
logical CPU numbers on the 0th core are skipped in the normal device
tree search over CPU device tree nodes. Then the search is continued
until the device tree node matching the boot core is found, and those
CPUs are assigned the CPU numbers starting at 0.

This allows kdump kernels to be booted with low values for nr_cpu_ids
to conserve memory, while also allowing the crashing/boot CPU to be
any CPU.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Wen Xiong <wenxiong@us.ibm.com>
Link: https://msgid.link/20231229120107.2281153-5-mpe@ellerman.id.au
2024-02-15 00:14:05 +11:00
Naveen N Rao
ea73179e64 powerpc/ftrace: Ignore ftrace locations in exit text sections
Michael reported that we are seeing an ftrace bug on bootup when KASAN
is enabled and we are using -fpatchable-function-entry:

  ftrace: allocating 47780 entries in 18 pages
  ftrace-powerpc: 0xc0000000020b3d5c: No module provided for non-kernel address
  ------------[ ftrace bug ]------------
  ftrace faulted on modifying
  [<c0000000020b3d5c>] 0xc0000000020b3d5c
  Initializing ftrace call sites
  ftrace record flags: 0
   (0)
   expected tramp: c00000000008cef4
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2180 ftrace_bug+0x3c0/0x424
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0-rc3-00120-g0f71dcfb4aef #860
  Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries
  NIP:  c0000000003aa81c LR: c0000000003aa818 CTR: 0000000000000000
  REGS: c0000000033cfab0 TRAP: 0700   Not tainted  (6.5.0-rc3-00120-g0f71dcfb4aef)
  MSR:  8000000002021033 <SF,VEC,ME,IR,DR,RI,LE>  CR: 28028240  XER: 00000000
  CFAR: c0000000002781a8 IRQMASK: 3
  ...
  NIP [c0000000003aa81c] ftrace_bug+0x3c0/0x424
  LR [c0000000003aa818] ftrace_bug+0x3bc/0x424
  Call Trace:
   ftrace_bug+0x3bc/0x424 (unreliable)
   ftrace_process_locs+0x5f4/0x8a0
   ftrace_init+0xc0/0x1d0
   start_kernel+0x1d8/0x484

With CONFIG_FTRACE_MCOUNT_USE_PATCHABLE_FUNCTION_ENTRY=y and
CONFIG_KASAN=y, compiler emits nops in functions that it generates for
registering and unregistering global variables (unlike with -pg and
-mprofile-kernel where calls to _mcount() are not generated in those
functions). Those functions then end up in INIT_TEXT and EXIT_TEXT
respectively. We don't expect to see any profiled functions in
EXIT_TEXT, so ftrace_init_nop() assumes that all addresses that aren't
in the core kernel text belongs to a module. Since these functions do
not match that criteria, we see the above bug.

Address this by having ftrace ignore all locations in the text exit
sections of vmlinux.

Fixes: 0f71dcfb4a ("powerpc/ftrace: Add support for -fpatchable-function-entry")
Cc: stable@vger.kernel.org # v6.6+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240213175410.1091313-1-naveen@kernel.org
2024-02-14 14:24:06 +11:00
Michael Ellerman
f1acb10950 powerpc/kasan: Limit KASAN thread size increase to 32KB
KASAN is seen to increase stack usage, to the point that it was reported
to lead to stack overflow on some 32-bit machines (see link).

To avoid overflows the stack size was doubled for KASAN builds in
commit 3e8635fb2e ("powerpc/kasan: Force thread size increase with
KASAN").

However with a 32KB stack size to begin with, the doubling leads to a
64KB stack, which causes build errors:
  arch/powerpc/kernel/switch.S:249: Error: operand out of range (0x000000000000fe50 is not between 0xffffffffffff8000 and 0x0000000000007fff)

Although the asm could be reworked, in practice a 32KB stack seems
sufficient even for KASAN builds - the additional usage seems to be in
the 2-3KB range for a 64-bit KASAN build.

So only increase the stack for KASAN if the stack size is < 32KB.

Fixes: 18f14afe28 ("powerpc/64s: Increase default stack size to 32KB")
Reported-by: Spoorthy <spoorthy@linux.ibm.com>
Reported-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Benjamin Gray <bgray@linux.ibm.com>
Link: https://lore.kernel.org/linuxppc-dev/bug-207129-206035@https.bugzilla.kernel.org%2F/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240212064244.3924505-1-mpe@ellerman.id.au
2024-02-14 14:24:06 +11:00
Michael Ellerman
1fba2bf8e9 Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"
This reverts commit ed8b94f6e0.

Gaurav reported that there are still problems with the patch and it
should be reverted pending a fuller fix.

Link: https://lore.kernel.org/all/4f6fc1ac-7a76-4447-9d0e-f55c0be373f8@linux.ibm.com/
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2024-02-14 14:24:00 +11:00
Linus Torvalds
4356e9f841 work around gcc bugs with 'asm goto' with outputs
We've had issues with gcc and 'asm goto' before, and we created a
'asm_volatile_goto()' macro for that in the past: see commits
3f0116c323 ("compiler/gcc4: Add quirk for 'asm goto' miscompilation
bug") and a9f180345f ("compiler/gcc4: Make quirk for
asm_volatile_goto() unconditional").

Then, much later, we ended up removing the workaround in commit
43c249ea0b ("compiler-gcc.h: remove ancient workaround for gcc PR
58670") because we no longer supported building the kernel with the
affected gcc versions, but we left the macro uses around.

Now, Sean Christopherson reports a new version of a very similar
problem, which is fixed by re-applying that ancient workaround.  But the
problem in question is limited to only the 'asm goto with outputs'
cases, so instead of re-introducing the old workaround as-is, let's
rename and limit the workaround to just that much less common case.

It looks like there are at least two separate issues that all hit in
this area:

 (a) some versions of gcc don't mark the asm goto as 'volatile' when it
     has outputs:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110420

     which is easy to work around by just adding the 'volatile' by hand.

 (b) Internal compiler errors:

        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110422

     which are worked around by adding the extra empty 'asm' as a
     barrier, as in the original workaround.

but the problem Sean sees may be a third thing since it involves bad
code generation (not an ICE) even with the manually added 'volatile'.

but the same old workaround works for this case, even if this feels a
bit like voodoo programming and may only be hiding the issue.

Reported-and-tested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Uros Bizjak <ubizjak@gmail.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andrew Pinski <quic_apinski@quicinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-02-09 15:57:48 -08:00
Paolo Bonzini
6bda055d62 KVM: define __KVM_HAVE_GUEST_DEBUG unconditionally
Since all architectures (for historical reasons) have to define
struct kvm_guest_debug_arch, and since userspace has to check
KVM_CHECK_EXTENSION(KVM_CAP_SET_GUEST_DEBUG) anyway, there is
no advantage in masking the capability #define itself.  Remove
the #define __KVM_HAVE_GUEST_DEBUG from architecture-specific
headers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:06 -05:00
Paolo Bonzini
d750951c9e KVM: powerpc: move powerpc-specific structs to uapi/asm/kvm.h
While this in principle breaks the appearance of KVM_PPC_* ioctls on architectures
other than powerpc, this seems unlikely to be a problem considering that there are
already many "struct kvm_ppc_*" definitions in arch/powerpc/include/uapi.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-08 08:41:05 -05:00
Matthias Schiffer
a038a3ff8c powerpc/6xx: set High BAT Enable flag on G2_LE cores
MMU_FTR_USE_HIGH_BATS is set for G2_LE cores and derivatives like e300cX,
but the high BATs need to be enabled in HID2 to work. Add register
definitions and add the needed setup to __setup_cpu_603.

This fixes boot on CPUs like the MPC5200B with STRICT_KERNEL_RWX enabled
on systems where the flag has not been set by the bootloader already.

Fixes: e4d6654ebe ("powerpc/mm/32s: rework mmu_mapin_ram()")
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240124103838.43675-1-matthias.schiffer@ew.tq-group.com
2024-02-05 22:54:49 +11:00
Gaurav Batra
ed8b94f6e0 powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add
When a PCI device is dynamically added, the kernel oopses with a NULL
pointer dereference:

  BUG: Kernel NULL pointer dereference on read at 0x00000030
  Faulting instruction address: 0xc0000000006bbe5c
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in: rpadlpar_io rpaphp rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs xsk_diag bonding nft_compat nf_tables nfnetlink rfkill binfmt_misc dm_multipath rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c mlx5_core mlxfw sd_mod t10_pi sg tls ibmvscsi ibmveth scsi_transport_srp vmx_crypto pseries_wdt psample dm_mirror dm_region_hash dm_log dm_mod fuse
  CPU: 17 PID: 2685 Comm: drmgr Not tainted 6.7.0-203405+ #66
  Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NH1060_008) hv:phyp pSeries
  NIP:  c0000000006bbe5c LR: c000000000a13e68 CTR: c0000000000579f8
  REGS: c00000009924f240 TRAP: 0300   Not tainted  (6.7.0-203405+)
  MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24002220  XER: 20040006
  CFAR: c000000000a13e64 DAR: 0000000000000030 DSISR: 40000000 IRQMASK: 0
  ...
  NIP sysfs_add_link_to_group+0x34/0x94
  LR  iommu_device_link+0x5c/0x118
  Call Trace:
   iommu_init_device+0x26c/0x318 (unreliable)
   iommu_device_link+0x5c/0x118
   iommu_init_device+0xa8/0x318
   iommu_probe_device+0xc0/0x134
   iommu_bus_notifier+0x44/0x104
   notifier_call_chain+0xb8/0x19c
   blocking_notifier_call_chain+0x64/0x98
   bus_notify+0x50/0x7c
   device_add+0x640/0x918
   pci_device_add+0x23c/0x298
   of_create_pci_dev+0x400/0x884
   of_scan_pci_dev+0x124/0x1b0
   __of_scan_bus+0x78/0x18c
   pcibios_scan_phb+0x2a4/0x3b0
   init_phb_dynamic+0xb8/0x110
   dlpar_add_slot+0x170/0x3b8 [rpadlpar_io]
   add_slot_store.part.0+0xb4/0x130 [rpadlpar_io]
   kobj_attr_store+0x2c/0x48
   sysfs_kf_write+0x64/0x78
   kernfs_fop_write_iter+0x1b0/0x290
   vfs_write+0x350/0x4a0
   ksys_write+0x84/0x140
   system_call_exception+0x124/0x330
   system_call_vectored_common+0x15c/0x2ec

Commit a940904443 ("powerpc/iommu: Add iommu_ops to report capabilities
and allow blocking domains") broke DLPAR add of PCI devices.

The above added iommu_device structure to pci_controller. During
system boot, PCI devices are discovered and this newly added iommu_device
structure is initialized by a call to iommu_device_register().

During DLPAR add of a PCI device, a new pci_controller structure is
allocated but there are no calls made to iommu_device_register()
interface.

Fix is to register the iommu device during DLPAR add as well.

Fixes: a940904443 ("powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains")
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
[mpe: Trim oops and tweak some change log wording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240122222407.39603-1-gbatra@linux.ibm.com
2024-02-05 22:37:37 +11:00
Nathan Lynch
8ded03ae48 powerpc/pseries/papr-sysparm: use u8 arrays for payloads
Some PAPR system parameter values are formatted by firmware as
nul-terminated strings (e.g. LPAR name, shared processor attributes).
But the values returned for other parameters, such as processor module
info and TLB block invalidate characteristics, are binary data with
parameter-specific layouts. So char[] isn't the appropriate type for
the general case. Use u8/__u8.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 905b9e4878 ("powerpc/pseries/papr-sysparm: Expose character device to user space")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240202-papr-sysparm-ioblock-data-use-u8-v1-1-f5c6c89f65ec@linux.ibm.com
2024-02-05 17:01:25 +11:00
Christophe Leroy
79d9f965ec powerpc: Simplify strict_kernel_rwx_enabled()
Now that rodata_enabled is always declared, remove #ifdef
and define a single version of strict_kernel_rwx_enabled().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2024-02-02 10:21:25 -08:00
Tanzir Hasan
66a5c40f60 kernel.h: removed REPEAT_BYTE from kernel.h
This patch creates wordpart.h and includes it in asm/word-at-a-time.h
for all architectures. WORD_AT_A_TIME_CONSTANTS depends on kernel.h
because of REPEAT_BYTE. Moving this to another header and including it
where necessary allows us to not include the bloated kernel.h. Making
this implicit dependency on REPEAT_BYTE explicit allows for later
improvements in the lib/string.c inclusion list.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Link: https://lore.kernel.org/r/20231226-libstringheader-v6-1-80aa08c7652c@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
2024-02-01 09:47:59 -08:00
Linus Torvalds
bd736f38c0 Merge tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
 "Here is the big set of tty and serial driver changes for 6.8-rc1.

  As usual, Jiri has a bunch of refactoring and cleanups for the tty
  core and drivers in here, along with the usual set of rs485 updates
  (someday this might work properly...)

  Along with those, in here are changes for:

   - sc16is7xx serial driver updates

   - platform driver removal api updates

   - amba-pl011 driver updates

   - tty driver binding updates

   - other small tty/serial driver updates and changes

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (197 commits)
  serial: sc16is7xx: refactor EFR lock
  serial: sc16is7xx: reorder code to remove prototype declarations
  serial: sc16is7xx: refactor FIFO access functions to increase commonality
  serial: sc16is7xx: drop unneeded MODULE_ALIAS
  serial: sc16is7xx: replace hardcoded divisor value with BIT() macro
  serial: sc16is7xx: add explicit return for some switch default cases
  serial: sc16is7xx: add macro for max number of UART ports
  serial: sc16is7xx: add driver name to struct uart_driver
  serial: sc16is7xx: use i2c_get_match_data()
  serial: sc16is7xx: use spi_get_device_match_data()
  serial: sc16is7xx: use DECLARE_BITMAP for sc16is7xx_lines bitfield
  serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
  serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
  serial: sc16is7xx: set safe default SPI clock frequency
  serial: sc16is7xx: add check for unsupported SPI modes during probe
  serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error
  serial: 8250_exar: Set missing rs485_supported flag
  serial: omap: do not override settings for RS485 support
  serial: core, imx: do not set RS485 enabled if it is not supported
  serial: core: make sure RS485 cannot be enabled when it is not supported
  ...
2024-01-18 11:37:24 -08:00
Linus Torvalds
09d1c6a80f Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Linus Torvalds
c299010061 Merge tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cleanups from Arnd Bergmann:
 "A series from Baoquan He cleans up the asm-generic/io.h to remove the
  ioremap_uc() definition from everything except x86, which still needs
  it for pre-PAT systems. This series notably contains a patch from
  Jiaxun Yang that converts MIPS to use asm-generic/io.h like every
  other architecture does, enabling future cleanups.

  Some of my own patches fix -Wmissing-prototype warnings in
  architecture specific code across several architectures. This is now
  needed as the warning is enabled by default. There are still some
  remaining warnings in minor platforms, but the series should catch
  most of the widely used ones make them more consistent with one
  another.

  David McKay fixes a bug in __generic_cmpxchg_local() when this is used
  on 64-bit architectures. This could currently only affect parisc64 and
  sparc64.

  Additional cleanups address from Linus Walleij, Uwe Kleine-König,
  Thomas Huth, and Kefeng Wang help reduce unnecessary inconsistencies
  between architectures"

* tag 'asm-generic-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: Fix 32 bit __generic_cmpxchg_local
  Hexagon: Make pfn accessors statics inlines
  ARC: mm: Make virt_to_pfn() a static inline
  mips: remove extraneous asm-generic/iomap.h include
  sparc: Use $(kecho) to announce kernel images being ready
  arm64: vdso32: Define BUILD_VDSO32_64 to correct prototypes
  csky: fix arch_jump_label_transform_static override
  arch: add do_page_fault prototypes
  arch: add missing prepare_ftrace_return() prototypes
  arch: vdso: consolidate gettime prototypes
  arch: include linux/cpu.h for trap_init() prototype
  arch: fix asm-offsets.c building with -Wmissing-prototypes
  arch: consolidate arch_irq_work_raise prototypes
  hexagon: Remove CONFIG_HEXAGON_ARCH_VERSION from uapi header
  asm/io: remove unnecessary xlate_dev_mem_ptr() and unxlate_dev_mem_ptr()
  mips: io: remove duplicated codes
  arch/*/io.h: remove ioremap_uc in some architectures
  mips: add <asm-generic/io.h> including
2024-01-10 18:13:44 -08:00
Michael Ellerman
44a1aad2fe Merge branch 'topic/ppc-kvm' into next
Merge our topic branch containing KVM related patches.
2023-12-29 15:30:45 +11:00
Vaibhav Jain
eb8446e164 powerpc/hvcall: Reorder Nestedv2 hcall opcodes
Reorder the newly introduced hcall opcodes for Nestedv2 to follow the
increasing-opcode-number convention followed in 'hvcall.h'.

Also updates the value for MAX_HCALL_OPCODE which is used in various
places in arch code for range checking. Notably in the KVM enabled-hcall
logic, and in hcall tracing.

Fixes: 19d31c5f11 ("KVM: PPC: Add support for nestedv2 guests")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231219092309.118151-1-vaibhav@linux.ibm.com
2023-12-21 22:13:06 +11:00
Aneesh Kumar K.V
6f4b7052da powerpc/sched: Cleanup vcpu_is_preempted()
No functional change in this patch. A helper is added to find if
vcpu is dispatched by hypervisor. Use that instead of opencoding.
Also clarify some of the comments.

Signed-off-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231114071219.198222-1-aneesh.kumar@linux.ibm.com
2023-12-15 13:50:39 +11:00
Sathvika Vasireddy
b20f98e8b3 powerpc/Kconfig: Select FUNCTION_ALIGNMENT_4B
Commit d49a062621 ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT")
introduced a generic function-alignment infrastructure. Move to using
FUNCTION_ALIGNMENT_4B on powerpc, to use the same alignment as that of
the existing _GLOBAL macro.

Signed-off-by: Sathvika Vasireddy <sv@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/21892186ec44abe24df0daf64f577dac0e78783f.1702045299.git.naveen@kernel.org
2023-12-13 21:49:22 +11:00
Naveen N Rao
2ec36570c3 powerpc/ftrace: Fix indentation in ftrace.h
Replace seven spaces with a tab character to fix an indentation issue
reported by the kernel test robot.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311221731.alUwTDIm-lkp@intel.com/
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/9f058227bd9243f0842786ef7228d87ab10d29f6.1702045299.git.naveen@kernel.org
2023-12-13 21:49:22 +11:00
Nathan Lynch
905b9e4878 powerpc/pseries/papr-sysparm: Expose character device to user space
Until now the papr_sysparm APIs have been kernel-internal. But user
space needs access to PAPR system parameters too. The only method
available to user space today to get or set system parameters is using
sys_rtas() and /dev/mem to pass RTAS-addressable buffers between user
space and firmware. This is incompatible with lockdown and should be
deprecated.

So provide an alternative ABI to user space in the form of a
/dev/papr-sysparm character device with just two ioctl commands (get
and set). The data payloads involved are small enough to fit in the
ioctl argument buffer, making the code relatively simple.

Exposing the system parameters through sysfs has been considered but
it would be too awkward:

* The kernel currently does not have to contain an exhaustive list of
  defined system parameters. This is a convenient property to maintain
  because we don't have to update the kernel whenever a new parameter
  is added to PAPR. Exporting a named attribute in sysfs for each
  parameter would negate this.

* Some system parameters are text-based and some are not.

* Retrieval of at least one system parameter requires input data,
  which a simple read-oriented interface can't support.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-11-e9eafd0c8c6c@linux.ibm.com
2023-12-13 21:38:21 +11:00
Nathan Lynch
514f6ff436 powerpc/pseries: Add papr-vpd character driver for VPD retrieval
PowerVM LPARs may retrieve Vital Product Data (VPD) for system
components using the ibm,get-vpd RTAS function.

We can expose this to user space with a /dev/papr-vpd character
device, where the programming model is:

  struct papr_location_code plc = { .str = "", }; /* obtain all VPD */
  int devfd = open("/dev/papr-vpd", O_RDONLY);
  int vpdfd = ioctl(devfd, PAPR_VPD_CREATE_HANDLE, &plc);
  size_t size = lseek(vpdfd, 0, SEEK_END);
  char *buf = malloc(size);
  pread(devfd, buf, size, 0);

When a file descriptor is obtained from ioctl(PAPR_VPD_CREATE_HANDLE),
the file contains the result of a complete ibm,get-vpd sequence. The
file contents are immutable from the POV of user space. To get a new
view of the VPD, the client must create a new handle.

This design choice insulates user space from most of the complexities
that ibm,get-vpd brings:

* ibm,get-vpd must be called more than once to obtain complete
  results.

* Only one ibm,get-vpd call sequence should be in progress at a time;
  interleaved sequences will disrupt each other. Callers must have a
  protocol for serializing their use of the function.

* A call sequence in progress may receive a "VPD changed, try again"
  status, requiring the client to abandon the sequence and start
  over.

The memory required for the VPD buffers seems acceptable, around 20KB
for all VPD on one of my systems. And the value of the
/rtas/ibm,vpd-size DT property (the estimated maximum size of VPD) is
consistently 300KB across various systems I've checked.

I've implemented support for this new ABI in the rtas_get_vpd()
function in librtas, which the vpdupdate command currently uses to
populate its VPD database. I've verified that an unmodified vpdupdate
binary generates an identical database when using a librtas.so that
prefers the new ABI.

Along with the papr-vpd.h header exposed to user space, this
introduces a common papr-miscdev.h uapi header to share a base ioctl
ID with similar drivers to come.

Tested-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-9-e9eafd0c8c6c@linux.ibm.com
2023-12-13 21:38:21 +11:00
Nathan Lynch
adf7a019e5 powerpc/rtas: Facilitate high-level call sequences
On RTAS platforms there is a general restriction that the OS must not
enter RTAS on more than one CPU at a time. This low-level
serialization requirement is satisfied by holding a spin
lock (rtas_lock) across most RTAS function invocations.

However, some pseries RTAS functions require multiple successive calls
to complete a logical operation. Beginning a new call sequence for such a
function may disrupt any other sequences of that function already in
progress. Safe and reliable use of these functions effectively
requires higher-level serialization beyond what is already done at the
level of RTAS entry and exit.

Where a sequence-based RTAS function is invoked only through
sys_rtas(), with no in-kernel users, there is no issue as far as the
kernel is concerned. User space is responsible for appropriately
serializing its call sequences. (Whether user space code actually
takes measures to prevent sequence interleaving is another matter.)
Examples of such functions currently include ibm,platform-dump and
ibm,get-vpd.

But where a sequence-based RTAS function has both user space and
in-kernel uesrs, there is a hazard. Even if the in-kernel call sites
of such a function serialize their sequences correctly, a user of
sys_rtas() can invoke the same function at any time, potentially
disrupting a sequence in progress.

So in order to prevent disruption of kernel-based RTAS call sequences,
they must serialize not only with themselves but also with sys_rtas()
users, somehow. Preferably without adding more function-specific hacks
to sys_rtas(). This is a prerequisite for adding an in-kernel call
sequence of ibm,get-vpd, which is in a change to follow.

Note that it has never been feasible for the kernel to prevent
sys_rtas()-based sequences from being disrupted because control
returns to user space on every call. sys_rtas()-based users of these
functions have always been, and continue to be, responsible for
coordinating their call sequences with other users, even those which
may invoke the RTAS functions through less direct means than
sys_rtas(). This is an unavoidable consequence of exposing
sequence-based RTAS functions through sys_rtas().

* Add an optional mutex member to struct rtas_function.

* Statically define a mutex for each RTAS function with known call
  sequence serialization requirements, and assign its address to the
  .lock member of the corresponding function table entry, along with
  justifying commentary.

* In sys_rtas(), if the table entry for the RTAS function being
  called has a populated lock member, acquire it before taking
  rtas_lock and entering RTAS.

* Kernel-based RTAS call sequences are expected to access the
  appropriate mutex explicitly by name. For example, a user of the
  ibm,activate-firmware RTAS function would do:

        int token = rtas_function_token(RTAS_FN_IBM_ACTIVATE_FIRMWARE);
        int fwrc;

        mutex_lock(&rtas_ibm_activate_firmware_lock);

        do {
                fwrc = rtas_call(token, 0, 1, NULL);
        } while (rtas_busy_delay(fwrc));

        mutex_unlock(&rtas_ibm_activate_firmware_lock);

There should be no perceivable change introduced here except that
concurrent callers of the same RTAS function via sys_rtas() may block
on a mutex instead of spinning on rtas_lock.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-6-e9eafd0c8c6c@linux.ibm.com
2023-12-13 21:38:20 +11:00
Nathan Lynch
9592aa5ad5 powerpc/rtas: Add function return status constants
Not all of the generic RTAS function statuses specified in PAPR have
symbolic constants and descriptions in rtas.h. Fix this, providing a
little more background, slightly updating the existing wording, and
improving the formatting.

Reviewed-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231212-papr-sys_rtas-vs-lockdown-v6-4-e9eafd0c8c6c@linux.ibm.com
2023-12-13 21:38:20 +11:00
Jiri Slaby (SUSE)
f32fcbedbe tty: hvc: convert to u8 and size_t
Switch character types to u8 and sizes to size_t. To conform to
characters/sizes in the rest of the tty layer.

Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Amit Shah <amit@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux.dev
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20231206073712.17776-13-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08 12:02:37 +01:00
Aneesh Kumar K.V (IBM)
a59c14f6b4 powerpc/book3s64: Avoid __pte_protnone() check in __pte_flags_need_flush()
This reverts commit 1abce0580b ("powerpc/64s: Fix __pte_needs_flush()
false positive warning")

The previous patch dropped the usage of _PAGE_PRIVILEGED with PAGE_NONE.
Hence this check can be dropped.

Signed-off-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231204093638.71503-2-aneesh.kumar@kernel.org
2023-12-07 23:34:11 +11:00
Aneesh Kumar K.V (IBM)
773b93f1d1 powerpc/book3s/hash: Drop _PAGE_PRIVILEGED from PAGE_NONE
There used to be a dependency on _PAGE_PRIVILEGED with pte_savedwrite.
But that got dropped by
commit 6a56ccbcf6 ("mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite")

With the change in this patch numa fault pte (pte_protnone()) gets mapped as regular user pte
with RWX cleared (no-access) whereas earlier it used to be mapped _PAGE_PRIVILEGED.

Hash fault handling code gets some WARN_ON added in this patch because
those functions are not expected to get called with _PAGE_READ cleared.
commit 18061c17c8 ("powerpc/mm: Update PROTFAULT handling in the page
fault path") explains the details.

Signed-off-by: "Aneesh Kumar K.V (IBM)" <aneesh.kumar@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231204093638.71503-1-aneesh.kumar@kernel.org
2023-12-07 23:34:11 +11:00
Jordan Niethe
db1dcfae1d KVM: PPC: Book3S HV nestedv2: Register the VPA with the L0
In the nestedv2 case, the L1 may register the L2's VPA with the L0. This
allows the L0 to manage the L2's dispatch count, as well as enable
possible performance optimisations by seeing if certain resources are
not being used by the L2 (such as the PMCs).

Use the H_GUEST_SET_STATE call to inform the L0 of the L2's VPA
address. This can not be done in the H_GUEST_VCPU_RUN input buffer.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231201132618.555031-11-vaibhav@linux.ibm.com
2023-12-07 23:33:08 +11:00
Jordan Niethe
e0d4acbcba KVM: PPC: Book3S HV nestedv2: Avoid reloading the tb offset
The kvmppc_get_tb_offset() getter reloads KVMPPC_GSID_TB_OFFSET from the
L0 for nestedv2 host. This is unnecessary as the value does not change.
KVMPPC_GSID_TB_OFFSET also need not be reloaded in
kvmppc_{s,g}et_dec_expires().

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231201132618.555031-3-vaibhav@linux.ibm.com
2023-12-07 23:33:07 +11:00
Jordan Niethe
7d370e1812 KVM: PPC: Book3S HV nestedv2: Invalidate RPT before deleting a guest
An L0 must invalidate the L2's RPT during H_GUEST_DELETE if this has not
already been done. This is a slow operation that means H_GUEST_DELETE
must return H_BUSY multiple times before completing. Invalidating the
tables before deleting the guest so there is less work for the L0 to do.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231201132618.555031-2-vaibhav@linux.ibm.com
2023-12-07 23:33:07 +11:00