Commit Graph

20375 Commits

Author SHA1 Message Date
Shivank Garg
f74642d81c x86/cpu: Remove redundant CONFIG_NUMA guard around numa_add_cpu()
Remove unnecessary CONFIG_NUMA #ifdef around numa_add_cpu() since the
function is already properly handled in <asm/numa.h> for both NUMA and
non-NUMA configurations. For !CONFIG_NUMA builds, numa_add_cpu() is
defined as an empty function.

Simplify the code without any functionality change.

Testing: Build CONFIG_NUMA=n

Signed-off-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241112072346.428623-1-shivankg@amd.com
2024-11-12 11:00:50 +01:00
Andy Shevchenko
62e724494d x86/cpu: Make sure flag_is_changeable_p() is always being used
When flag_is_changeable_p() is unused, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:

arch/x86/kernel/cpu/common.c:351:19: error: unused function 'flag_is_changeable_p' [-Werror,-Wunused-function]
  351 | static inline int flag_is_changeable_p(u32 flag)
      |                   ^~~~~~~~~~~~~~~~~~~~

Fix this by moving core around to make sure flag_is_changeable_p() is
always being used.

See also commit 6863f5643d ("kbuild: allow Clang to find unused static
inline functions for W=1 build").

While at it, fix the argument type to be unsigned long along with
the local variables, although it currently only runs in 32-bit cases.
Besides that, makes it return boolean instead of int. This induces
the change of the returning type of have_cpuid_p() to be boolean
as well.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Link: https://lore.kernel.org/all/20241108153105.1578186-1-andriy.shevchenko%40linux.intel.com
2024-11-08 09:08:48 -08:00
Ard Biesheuvel
577c134d31 x86/stackprotector: Work around strict Clang TLS symbol requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.

Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.

However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.

The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,

  10d:       64 a1 00 00 00 00       mov    %fs:0x0,%eax
                     10f: R_386_32   __stack_chk_guard

However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.

Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.

Fixes: 3fb0fdb3bb ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
2024-11-08 13:16:00 +01:00
Mike Rapoport (Microsoft)
9bfc4824fd x86/module: prepare module loading for ROX allocations of text
When module text memory will be allocated with ROX permissions, the memory
at the actual address where the module will live will contain invalid
instructions and there will be a writable copy that contains the actual
module code.

Update relocations and alternatives patching to deal with it.

[rppt@kernel.org: fix writable address in cfi_rewrite_endbr()]
  Link: https://lkml.kernel.org/r/ZysRwR29Ji8CcbXc@kernel.org
Link: https://lkml.kernel.org/r/20241023162711.2579610-7-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: kdevops <kdevops@lists.linux.dev>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Song Liu <song@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07 14:25:16 -08:00
Oscar Salvador
1317a5e7f7 arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappings
We want to stop special casing hugetlb mappings and make them go through
generic channels, so teach arch_get_unmapped_area_{topdown_}vmflags to
handle those.

x86 specific hugetlb function does not set either info.start_gap or
info.align_offset so the same here for compatibility.

Link: https://lkml.kernel.org/r/20241007075037.267650-4-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06 20:11:10 -08:00
Mario Limonciello
b79276dcac ACPI: processor: Move arch_init_invariance_cppc() call later
arch_init_invariance_cppc() is called at the end of
acpi_cppc_processor_probe() in order to configure frequency invariance
based upon the values from _CPC.

This however doesn't work on AMD CPPC shared memory designs that have
AMD preferred cores enabled because _CPC needs to be analyzed from all
cores to judge if preferred cores are enabled.

This issue manifests to users as a warning since commit 21fb59ab4b
("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"):
```
Could not retrieve highest performance (-19)
```

However the warning isn't the cause of this, it was actually
commit 279f838a61 ("x86/amd: Detect preferred cores in
amd_get_boost_ratio_numerator()") which exposed the issue.

To fix this problem, change arch_init_invariance_cppc() into a new weak
symbol that is called at the end of acpi_processor_driver_init().
Each architecture that supports it can declare the symbol to override
the weak one.

Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the
architectures using the generic arch_topology.c code.

Fixes: 279f838a61 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-06 21:31:36 +01:00
Masami Hiramatsu (Google)
4638d7ebef x86/kprobes: Cleanup kprobes on ftrace code
Cleanup kprobes on ftrace code for x86.
 - Set instruction pointer (ip + MCOUNT_INSN_SIZE) after pre_handler only
   when p->post_handler exists.
 - Use INT3_INSN_SIZE instead of 1.
 - Use instruction_pointer/instruction_pointer_set() functions instead of
   accessing regs->ip directly.

Link: https://lore.kernel.org/all/172951436219.167263.18330240454389154327.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-11-07 01:16:59 +09:00
Tony Luck
9bce6e94c4 x86/resctrl: Support Sub-NUMA cluster mode SNC6
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
2024-11-06 10:49:04 +01:00
Dave Airlie
bf99ceb6e0 Merge tag 'drm-intel-next-2024-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull #2 for v6.13:

Features and functionality:

- Pantherlake (PTL) Xe3 LPD display enabling for xe driver (Clint, Suraj,
  Dnyaneshwar, Matt, Gustavo, Radhakrishna, Chaitanya, Haridhar, Juha-Pekka, Ravi)
- Enable dbuf overlap detection on Lunarlake and later (Stanislav, Vinod)
- Allow fastset for HDR infoframe changes (Chaitanya)
- Write DP source OUI also for non-eDP sinks (Imre)

Refactoring and cleanups:
- Independent platform identification for display (Jani)
- Display tracepoint fixes and cleanups (Gustavo)
- Share PCI ID headers between i915 and xe drivers (Jani)
- Use x100 version for full version and release checks (Jani)
- Conversions to struct intel_display (Jani, Ville)
- Reuse DP DPCD and AUX macros in gvt instead of duplication (Jani)
- Use string choice helpers (R Sundar, Sai Teja)
- Remove unused underrun detection irq code (Sai Teja)
- Color management debug improvements and other cleanups (Ville)
- Refactor panel fitter code to a separate file (Ville)
- Use try_cmpxchg() instead of open-coding (Uros Bizjak)

Fixes:
- PSR and Panel Replay fixes and workarounds (Jouni)
- Fix panel power during connector detection (Imre)
- Fix connector detection and modeset races (Imre)
- Fix C20 PHY TX MISC configuration (Gustavo)
- Improve panel fitter validity checks (Ville)
- Fix eDP short HPD interrupt handling while runtime suspended (Imre)
- Propagate DP MST DSC BW overhead/slice calculation errors (Imre)
- Stop hotplug polling for eDP connectors (Imre)
- Workaround panels reporting bad link status after PSR enable (Jouni)
- Panel Replay VRR VSC SDP related workaround and refactor (Animesh, Mitul)
- Fix memory leak on eDP init error path (Shuicheng)
- Fix GVT KVMGT Kconfig dependencies (Arnd Bergmann)
- Fix irq function documentation build warning (Rodrigo)
- Add platform check to power management fuse bit read (Clint)
- Revert kstrdup_const() and kfree_const() usage for clarity (Christophe JAILLET)
- Workaround horizontal odd panning issues in display versions 20 and 30 (Nemesa)
- Fix xe drive HDCP GSC firmware check (Suraj)

Merges:
- Backmerge drm-next to get some KVM changes (Rodrigo)
- Fix a build failure originating from previous backmerge (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>

# Conflicts:
#	drivers/gpu/drm/i915/display/intel_dp_mst.c
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87h68ni0wd.fsf@intel.com
2024-11-06 09:08:53 +10:00
Mario Limonciello
a5ca1dc46a x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client
A number of Zen4 client SoCs advertise the ability to use virtualized
VMLOAD/VMSAVE, but using these instructions is reported to be a cause
of a random host reboot.

These instructions aren't intended to be advertised on Zen4 client
so clear the capability.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009
2024-11-05 17:48:32 +01:00
Marco Elver
93190bc35d seqlock, treewide: Switch to non-raw seqcount_latch interface
Switch all instrumentable users of the seqcount_latch interface over to
the non-raw interface.

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
2024-11-05 12:55:35 +01:00
Al Viro
6348be02ee fdget(), trivial conversions
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-03 01:28:06 -05:00
Thomas Weißschuh
7175126a6d x86/vdso: Allocate vvar page from C code
Allocate the vvar page through the standard union vdso_data_store
and remove the custom linker script logic.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-14-b64f0842d512@linutronix.de
2024-11-02 12:37:34 +01:00
Yazen Ghannam
e9876dafa2 x86/mce/apei: Handle variable SMCA BERT record size
The ACPI Boot Error Record Table (BERT) is being used by the kernel to report
errors that occurred in a previous boot. On some modern AMD systems, these
very errors within the BERT are reported through the x86 Common Platform Error
Record (CPER) format which consists of one or more Processor Context
Information Structures.

These context structures provide a starting address and represent an x86 MSR
range in which the data constitutes a contiguous set of MSRs starting from,
and including the starting address.

It's common, for AMD systems that implement this behavior, that the MSR range
represents the MCAX register space used for the Scalable MCA feature. The
apei_smca_report_x86_error() function decodes and passes this information
through the MCE notifier chain. However, this function assumes a fixed
register size based on the original HW/FW implementation.

This assumption breaks with the addition of two new MCAX registers viz.
MCA_SYND1 and MCA_SYND2. These registers are added at the end of the MCAX
register space, so they won't be included when decoding the CPER data.

Rework apei_smca_report_x86_error() to support a variable register array size.
This covers any case where the MSR context information starts at the MCAX
address for MCA_STATUS and ends at any other register within the MCAX register
space.

  [ Yazen: Add Avadhut as co-developer for wrapper changes.]
  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Co-developed-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-5-avadhut.naik@amd.com
2024-10-31 10:45:59 +01:00
Avadhut Naik
d4fca1358e x86/MCE/AMD: Add support for new MCA_SYND{1,2} registers
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers:
MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition to the
existing MCA_SYND register. The data within these registers is considered
valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like rasdaemon gather related hardware error
information through the tracepoints.

Therefore, export these two registers through the mce_record tracepoint so
that tools like rasdaemon can parse them and output the supplemental error
information like FRU text contained in them.

  [ bp: Massage. ]

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31 10:36:07 +01:00
Avadhut Naik
750fd23926 x86/mce: Add wrapper for struct mce to export vendor specific info
Currently, exporting new additional machine check error information
involves adding new fields for the same at the end of the struct mce.
This additional information can then be consumed through mcelog or
tracepoint.

However, as new MSRs are being added (and will be added in the future)
by CPU vendors on their newer CPUs with additional machine check error
information to be exported, the size of struct mce will balloon on some
CPUs, unnecessarily, since those fields are vendor-specific. Moreover,
different CPU vendors may export the additional information in varying
sizes.

The problem particularly intensifies since struct mce is exposed to
userspace as part of UAPI. It's bloating through vendor-specific data
should be avoided to limit the information being sent out to userspace.

Add a new structure mce_hw_err to wrap the existing struct mce. The same
will prevent its ballooning since vendor-specifc data, if any, can now be
exported through a union within the wrapper structure and through
__dynamic_array in mce_record tracepoint.

Furthermore, new internal kernel fields can be added to the wrapper
struct without impacting the user space API.

  [ bp: Restore reverse x-mas tree order of function vars declarations. ]

Suggested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com
2024-10-30 17:18:59 +01:00
Usama Arif
b2473a3597 of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verify
__pa() is only intended to be used for linear map addresses and using
it for initial_boot_params which is in fixmap for arm64 will give an
incorrect value. Hence save the physical address when it is known at
boot time when calling early_init_dt_scan for arm64 and use it at kexec
time instead of converting the virtual address using __pa().

Note that arm64 doesn't need the FDT region reserved in the DT as the
kernel explicitly reserves the passed in FDT. Therefore, only a debug
warning is fixed with this change.

Reported-by: Breno Leitao <leitao@debian.org>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Fixes: ac10be5cdb ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-10-29 15:32:45 -05:00
Ard Biesheuvel
223abe96ac x86/xen: Avoid relocatable quantities in Xen ELF notes
Xen puts virtual and physical addresses into ELF notes that are treated
by the linker as relocatable by default. Doing so is not only pointless,
given that the ELF notes are only intended for consumption by Xen before
the kernel boots. It is also a KASLR leak, given that the kernel's ELF
notes are exposed via the world readable /sys/kernel/notes.

So emit these constants in a way that prevents the linker from marking
them as relocatable. This involves place-relative relocations (which
subtract their own virtual address from the symbol value) and linker
provided absolute symbols that add the address of the place to the
desired value.

Tested-by: Jason Andryuk <jason.andryuk@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Message-ID: <20241009160438.3884381-11-ardb+git@google.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2024-10-29 17:23:36 +01:00
Jani Nikula
f719c2a2d1 drm/intel/pciids: rename i915_pciids.h to just pciids.h
In preparation of sharing the PCI ID macros between i915 and xe, rename
i915_pciids.h to pciids.h.

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tursulin@ursulin.net>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/835143845faa5310e4bb58405a8a0848392bbf06.1729590029.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-10-29 16:14:04 +02:00
Sabyrzhan Tasbolatov
1db272864f x86/traps: move kmsan check after instrumentation_begin
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:

  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
    kmsan_unpoison_entry_regs() leaves .noinstr.text section
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  MODPOST Module.symvers
  CC      .vmlinux.export.o

Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.

There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.

Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8 ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-28 21:40:39 -07:00
Qiuxu Zhuo
754269ccf0 x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff
Use the predefined MCG_BANKCNT_MASK macro instead of the hardcoded
0xff to mask the bank number bits.

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241025024602.24318-3-qiuxu.zhuo@intel.com
2024-10-28 14:27:34 +01:00
Qiuxu Zhuo
325c3376af x86/mce/mcelog: Use xchg() to get and clear the flags
Using xchg() to atomically get and clear the MCE log buffer flags,
streamlines the code and reduces the text size by 20 bytes.

  $ size dev-mcelog.o.*

       text	   data	    bss	    dec	    hex	filename
       3013	    360	    160	   3533	    dcd	dev-mcelog.o.old
       2993	    360	    160	   3513	    db9	dev-mcelog.o.new

No functional changes intended.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20241025024602.24318-2-qiuxu.zhuo@intel.com
2024-10-28 14:07:47 +01:00
Borislav Petkov (AMD)
e6e6a303f8 x86/cpu: Fix formatting of cpuid_bits[] in scattered.c
Realign initializers to accomodate for longer X86_FEATURE define names.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2024-10-28 13:51:05 +01:00
Perry Yuan
0c487010cb x86/cpufeatures: Add X86_FEATURE_AMD_WORKLOAD_CLASS feature bit
Add a new feature bit that indicates support for workload-based heuristic
feedback to OS for scheduling decisions.

When the bit set, threads are classified during runtime into enumerated
classes. The classes represent thread performance/power characteristics
that may benefit from special scheduling behaviors.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241028020251.8085-4-mario.limonciello@amd.com
2024-10-28 13:44:44 +01:00
Linus Torvalds
ea1fda89f5 Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Prevent a certain range of pages which get marked as hypervisor-only,
   to get allocated to a CoCo (SNP) guest which cannot use them and thus
   fail booting

 - Fix the microcode loader on AMD to pay attention to the stepping of a
   patch and to handle the case where a BIOS config option splits the
   machine into logical NUMA nodes per L3 cache slice

 - Disable LAM from being built by default due to security concerns

* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Ensure that RMP table fixups are reserved
  x86/microcode/AMD: Split load_microcode_amd()
  x86/microcode/AMD: Pay attention to the stepping dynamically
  x86/lam: Disable ADDRESS_MASKING in most cases
2024-10-27 09:01:36 -10:00
Thorsten Blum
7565caab47 x86/cpu: Use str_yes_no() helper in show_cpuinfo_misc()
Remove hard-coded strings by using the str_yes_no() helper function.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241026110808.78074-1-thorsten.blum@linux.dev
2024-10-26 15:37:15 +02:00
Mario Limonciello
3eef25ab0d x86/amd: Use heterogeneous core topology for identifying boost numerator
AMD heterogeneous designs include two types of cores:

 * Performance
 * Efficiency

Each core type has different highest performance values configured by the
platform.  Drivers such as amd_pstate need to identify the type of core to
correctly set an appropriate boost numerator to calculate the maximum
frequency.

X86_FEATURE_AMD_HETEROGENEOUS_CORES is used to identify whether the SoC
supports heterogeneous core type by reading CPUID leaf Fn_0x80000026.

On performance cores the scaling factor of 196 is used.  On efficiency cores
the scaling factor is the value reported as the highest perf.  Efficiency
cores have the same preferred core rankings.

Suggested-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-6-mario.limonciello@amd.com
2024-10-25 20:51:17 +02:00
Pawan Gupta
45239ba39a x86/cpu: Add CPU type to struct cpuinfo_topology
Sometimes it is required to take actions based on if a CPU is a performance or
efficiency core. As an example, intel_pstate driver uses the Intel core-type
to determine CPU scaling. Also, some CPU vulnerabilities only affect
a specific CPU type, like RFDS only affects Intel Atom. Hybrid systems that
have variants P+E, P-only(Core) and E-only(Atom), it is not straightforward to
identify which variant is affected by a type specific vulnerability.

Such processors do have CPUID field that can uniquely identify them. Like,
P+E, P-only and E-only enumerates CPUID.1A.CORE_TYPE identification, while P+E
additionally enumerates CPUID.7.HYBRID. Based on this information, it is
possible for boot CPU to identify if a system has mixed CPU types.

Add a new field hw_cpu_type to struct cpuinfo_topology that stores the
hardware specific CPU type. This saves the overhead of IPIs to get the CPU
type of a different CPU. CPU type is populated early in the boot process,
before vulnerabilities are enumerated.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/20241025171459.1093-5-mario.limonciello@amd.com
2024-10-25 20:44:26 +02:00
Perry Yuan
b0979e5364 x86/cpu: Enable SD_ASYM_PACKING for PKG domain on AMD
Enable the SD_ASYM_PACKING domain flag for the PKG domain on AMD heterogeneous
processors.  This flag is beneficial for processors with one or more CCDs and
relies on x86_sched_itmt_flags().

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Link: https://lore.kernel.org/r/20241025171459.1093-4-mario.limonciello@amd.com
2024-10-25 20:43:22 +02:00
Perry Yuan
1ad4667066 x86/cpufeatures: Add X86_FEATURE_AMD_HETEROGENEOUS_CORES
CPUID leaf 0x80000026 advertises core types with different efficiency
rankings.

Bit 30 indicates the heterogeneous core topology feature, if the bit
set, it means not all instances at the current hierarchical level have
the same core topology.

This is described in the AMD64 Architecture Programmers Manual Volume
2 and 3, doc ID #25493 and #25494.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-3-mario.limonciello@amd.com
2024-10-25 20:31:16 +02:00
Mario Limonciello
104edc6efc x86/cpufeatures: Rename X86_FEATURE_FAST_CPPC to have AMD prefix
This feature is an AMD unique feature of some processors, so put
AMD into the name.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20241025171459.1093-2-mario.limonciello@amd.com
2024-10-25 20:09:16 +02:00
Linus Torvalds
86e6b1547b x86: fix user address masking non-canonical speculation issue
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical
accesses in kernel space.  And so using just the high bit to decide
whether an access is in user space or kernel space ends up with the good
old "leak speculative data" if you have the right gadget using the
result:

  CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“

Now, the kernel surrounds the access with a STAC/CLAC pair, and those
instructions end up serializing execution on older Zen architectures,
which closes the speculation window.

But that was true only up until Zen 5, which renames the AC bit [1].
That improves performance of STAC/CLAC a lot, but also means that the
speculation window is now open.

Note that this affects not just the new address masking, but also the
regular valid_user_address() check used by access_ok(), and the asm
version of the sign bit check in the get_user() helpers.

It does not affect put_user() or clear_user() variants, since there's no
speculative result to be used in a gadget for those operations.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/
Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1]
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html
Link: https://arxiv.org/pdf/2108.10771
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case
Fixes: 2865baf540 ("x86: support user address masking instead of non-speculative conditional")
Fixes: 6014bc2756 ("x86-64: make access_ok() independent of LAM")
Fixes: b19b74bc99 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-10-25 09:53:03 -07:00
Chang S. Bae
9a819753b0 x86/microcode/intel: Remove unnecessary cache writeback and invalidation
Currently, an unconditional cache flush is performed during every
microcode update. Although the original changelog did not mention
a specific erratum, this measure was primarily intended to address
a specific microcode bug, the load of which has already been blocked by
is_blacklisted(). Therefore, this cache flush is no longer necessary.

Additionally, the side effects of doing this have been overlooked. It
increases CPU rendezvous time during late loading, where the cache flush
takes between 1x to 3.5x longer than the actual microcode update.

Remove native_wbinvd() and update the erratum name to align with the
latest errata documentation, document ID 334163 Version 022US.

  [ bp: Zap the flaky documentation URL. ]

Fixes: 91df9fdf51 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
Link: https://lore.kernel.org/r/20241001161042.465584-2-chang.seok.bae@intel.com
2024-10-25 18:12:03 +02:00
Borislav Petkov (AMD)
1d81d85d1a x86/microcode/AMD: Split load_microcode_amd()
This function should've been split a long time ago because it is used in
two paths:

1) On the late loading path, when the microcode is loaded through the
   request_firmware interface

2) In the save_microcode_in_initrd() path which collects all the
   microcode patches which are relevant for the current system before
   the initrd with the microcode container has been jettisoned.

   In that path, it is not really necessary to iterate over the nodes on
   a system and match a patch however it didn't cause any trouble so it
   was left for a later cleanup

However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().

So split that function and do only the necessary work needed at each
stage.

Fixes: 94838d230a ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
2024-10-22 16:48:00 +02:00
Borislav Petkov (AMD)
d1744a4c97 x86/microcode/AMD: Pay attention to the stepping dynamically
Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
the patch matching needs to happen with different rigidity, depending on what
is being done:

1) When the patch is added to the patches cache, the stepping must be ignored
   because the driver still supports different steppings per system

2) When the patch is matched for loading, then the stepping must be taken into
   account because each CPU needs the patch matching its exact stepping

Take care of that by making the matching smarter.

Fixes: 94838d230a ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
2024-10-22 16:37:13 +02:00
Linus Torvalds
d129377639 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Fix the guest view of the ID registers, making the relevant fields
     writable from userspace (affecting ID_AA64DFR0_EL1 and
     ID_AA64PFR1_EL1)

   - Correcly expose S1PIE to guests, fixing a regression introduced in
     6.12-rc1 with the S1POE support

   - Fix the recycling of stage-2 shadow MMUs by tracking the context
     (are we allowed to block or not) as well as the recycling state

   - Address a couple of issues with the vgic when userspace
     misconfigures the emulation, resulting in various splats. Headaches
     courtesy of our Syzkaller friends

   - Stop wasting space in the HYP idmap, as we are dangerously close to
     the 4kB limit, and this has already exploded in -next

   - Fix another race in vgic_init()

   - Fix a UBSAN error when faking the cache topology with MTE enabled

  RISCV:

   - RISCV: KVM: use raw_spinlock for critical section in imsic

  x86:

   - A bandaid for lack of XCR0 setup in selftests, which causes trouble
     if the compiler is configured to have x86-64-v3 (with AVX) as the
     default ISA. Proper XCR0 setup will come in the next merge window.

   - Fix an issue where KVM would not ignore low bits of the nested CR3
     and potentially leak up to 31 bytes out of the guest memory's
     bounds

   - Fix case in which an out-of-date cached value for the segments
     could by returned by KVM_GET_SREGS.

   - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

   - Override MTRR state for KVM confidential guests, making it WB by
     default as is already the case for Hyper-V guests.

  Generic:

   - Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
  RISCV: KVM: use raw_spinlock for critical section in imsic
  KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
  KVM: selftests: x86: Avoid using SSE/AVX instructions
  KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
  KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
  KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
  KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
  KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
  x86/kvm: Override default caching mode for SEV-SNP and TDX
  KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
  KVM: Remove unused kvm_vcpu_gfn_to_pfn
  KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
  KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
  KVM: arm64: Fix shift-out-of-bounds bug
  KVM: arm64: Shave a few bytes from the EL2 idmap code
  KVM: arm64: Don't eagerly teardown the vgic on init error
  KVM: arm64: Expose S1PIE to guests
  KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
  KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
  KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
  ...
2024-10-21 11:22:04 -07:00
Linus Torvalds
db87114dcf Merge tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Explicitly disable the TSC deadline timer when going idle to address
   some CPU errata in that area

 - Do not apply the Zenbleed fix on anything else except AMD Zen2 on the
   late microcode loading path

 - Clear CPU buffers later in the NMI exit path on 32-bit to avoid
   register clearing while they still contain sensitive data, for the
   RDFS mitigation

 - Do not clobber EFLAGS.ZF with VERW on the opportunistic SYSRET exit
   path on 32-bit

 - Fix parsing issues of memory bandwidth specification in sysfs for
   resctrl's memory bandwidth allocation feature

 - Other small cleanups and improvements

* tag 'x86_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Always explicitly disarm TSC-deadline timer
  x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
  x86/bugs: Use code segment selector for VERW operand
  x86/entry_32: Clear CPU buffers after register restore in NMI return
  x86/entry_32: Do not clobber user EFLAGS.ZF
  x86/resctrl: Annotate get_mem_config() functions as __init
  x86/resctrl: Avoid overflow in MB settings in bw_validate()
  x86/amd_nb: Add new PCI ID for AMD family 1Ah model 20h
2024-10-20 12:04:32 -07:00
Kirill A. Shutemov
8e690b817e x86/kvm: Override default caching mode for SEV-SNP and TDX
AMD SEV-SNP and Intel TDX have limited access to MTRR: either it is not
advertised in CPUID or it cannot be programmed (on TDX, due to #VE on
CR0.CD clear).

This results in guests using uncached mappings where it shouldn't and
pmd/pud_set_huge() failures due to non-uniform memory type reported by
mtrr_type_lookup().

Override MTRR state, making it WB by default as the kernel does for
Hyper-V guests.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Suggested-by: Binbin Wu <binbin.wu@intel.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241015095818.357915-1-kirill.shutemov@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-20 07:07:02 -04:00
Zheng Yejian
3bf19a0fb6 x86/unwind/orc: Fix unwind for newly forked tasks
When arch_stack_walk_reliable() is called to unwind for newly forked
tasks, the return value is negative which means the call stack is
unreliable. This obviously does not meet expectations.

The root cause is that after commit 3aec4ecb3d ("x86: Rewrite
 ret_from_fork() in C"), the 'ret_addr' of newly forked task is changed
to 'ret_from_fork_asm' (see copy_thread()), then at the start of the
unwind, it is incorrectly interprets not as a "signal" one because
'ret_from_fork' is still used to determine the initial "signal" (see
__unwind_start()). Then the address gets incorrectly decremented in the
call to orc_find() (see unwind_next_frame()) and resulting in the
incorrect ORC data.

To fix it, check 'ret_from_fork_asm' rather than 'ret_from_fork' in
__unwind_start().

Fixes: 3aec4ecb3d ("x86: Rewrite ret_from_fork() in C")
Signed-off-by: Zheng Yejian <zhengyejian@huaweicloud.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-10-17 15:13:07 -07:00
Josh Poimboeuf
ed1cb76ebd objtool: Detect non-relocated text references
When kernel IBT is enabled, objtool detects all text references in order
to determine which functions can be indirectly branched to.

In text, such references look like one of the following:

   mov    $0x0,%rax        R_X86_64_32S     .init.text+0x7e0a0
   lea    0x0(%rip),%rax   R_X86_64_PC32    autoremove_wake_function-0x4

Either way the function pointer is denoted by a relocation, so objtool
just reads that.

However there are some "lea xxx(%rip)" cases which don't use relocations
because they're referencing code in the same translation unit.  Objtool
doesn't have visibility to those.

The only currently known instances of that are a few hand-coded asm text
references which don't actually need ENDBR.  So it's not actually a
problem at the moment.

However if we enable -fpie, the compiler would start generating them and
there would definitely be bugs in the IBT sealing.

Detect non-relocated text references and handle them appropriately.

[ Note: I removed the manual static_call_tramp check -- that should
  already be handled by the noendbr check. ]

Reported-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2024-10-17 15:13:06 -07:00
Bart Van Assche
f642974c0b x86/acpi: Switch to irq_get_nr_irqs() and irq_set_nr_irqs()
Use the irq_get_nr_irqs() and irq_set_nr_irqs() functions instead of the
global variable 'nr_irqs'. Prepare for changing 'nr_irqs' from an
exported global variable into a variable with file scope.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20241015190953.1266194-7-bvanassche@acm.org
2024-10-16 21:56:57 +02:00
Zhang Rui
ffd95846c6 x86/apic: Always explicitly disarm TSC-deadline timer
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag and it is not in this low power mode, it is likely
to get quite toasty while it quickly sucks the battery dry.

The problem boils down to some CPUs' inability to power down until the
CPU recognizes that the local APIC timer is shut down. The current
kernel code works in one-shot and periodic modes but does not work for
deadline mode. Deadline mode has been the supported and preferred mode
on Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.

Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.

Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.

1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revision-guides/41322_10h_Rev_Gd.pdf

Fixes: 279f146143 ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241015061522.25288-1-rui.zhang%40intel.com
2024-10-15 05:45:18 -07:00
Christophe JAILLET
29eaa79583 x86/resctrl: Slightly clean-up mbm_config_show()
'mon_info' is already zeroed in the list_for_each_entry() loop below.  There
is no need to explicitly initialize it here. It just wastes some space and
cycles.

Remove this un-needed code.

On a x86_64, with allmodconfig:

  Before:
  ======
     text	   data	    bss	    dec	    hex	filename
    74967	   5103	   1880	  81950	  1401e	arch/x86/kernel/cpu/resctrl/rdtgroup.o

  After:
  =====
     text	   data	    bss	    dec	    hex	filename
    74903	   5103	   1880	  81886	  13fde	arch/x86/kernel/cpu/resctrl/rdtgroup.o

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/b2ebc809c8b6c6440d17b12ccf7c2d29aaafd488.1720868538.git.christophe.jaillet@wanadoo.fr
2024-10-14 18:58:24 +02:00
John Allen
ee4d4e8d2c x86/CPU/AMD: Only apply Zenbleed fix for Zen2 during late microcode load
Commit

  f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")

causes a bit in the DE_CFG MSR to get set erroneously after a microcode late
load.

The microcode late load path calls into amd_check_microcode() and subsequently
zen2_zenbleed_check(). Since the above commit removes the cpu_has_amd_erratum()
call from zen2_zenbleed_check(), this will cause all non-Zen2 CPUs to go
through the function and set the bit in the DE_CFG MSR.

Call into the Zenbleed fix path on Zen2 CPUs only.

  [ bp: Massage commit message, use cpu_feature_enabled(). ]

Fixes: f69759be25 ("x86/CPU/AMD: Move Zenbleed check to the Zen2 init function")
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240923164404.27227-1-john.allen@amd.com
2024-10-11 21:26:45 +02:00
Steven Rostedt
7888af4166 ftrace: Make ftrace_regs abstract from direct use
ftrace_regs was created to hold registers that store information to save
function parameters, return value and stack. Since it is a subset of
pt_regs, it should only be used by its accessor functions. But because
pt_regs can easily be taken from ftrace_regs (on most archs), it is
tempting to use it directly. But when running on other architectures, it
may fail to build or worse, build but crash the kernel!

Instead, make struct ftrace_regs an empty structure and have the
architectures define __arch_ftrace_regs and all the accessor functions
will typecast to it to get to the actual fields. This will help avoid
usage of ftrace_regs directly.

Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/

Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Cc: "x86@kernel.org" <x86@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Paul  Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas  Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav  Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-10 20:18:01 -04:00
Johannes Wikner
c62fa117c3 x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
Since X86_FEATURE_ENTRY_IBPB will invalidate all harmful predictions
with IBPB, no software-based untraining of returns is needed anymore.
Currently, this change affects retbleed and SRSO mitigations so if
either of the mitigations is doing IBPB and the other one does the
software sequence, the latter is not needed anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:38:21 +02:00
Johannes Wikner
0fad287864 x86/bugs: Skip RSB fill at VMEXIT
entry_ibpb() is designed to follow Intel's IBPB specification regardless
of CPU. This includes invalidating RSB entries.

Hence, if IBPB on VMEXIT has been selected, entry_ibpb() as part of the
RET untraining in the VMEXIT path will take care of all BTB and RSB
clearing so there's no need to explicitly fill the RSB anymore.

  [ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Cc: <stable@kernel.org>
2024-10-10 10:35:53 +02:00
Johannes Wikner
3ea87dfa31 x86/cpufeatures: Add a IBPB_NO_RET BUG flag
Set this flag if the CPU has an IBPB implementation that does not
invalidate return target predictions. Zen generations < 4 do not flush
the RSB when executing an IBPB and this bug flag denotes that.

  [ bp: Massage. ]

Signed-off-by: Johannes Wikner <kwikner@ethz.ch>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
2024-10-10 10:34:29 +02:00
Nathan Chancellor
d5fd042bf4 x86/resctrl: Annotate get_mem_config() functions as __init
After a recent LLVM change [1] that deduces __cold on functions that only call
cold code (such as __init functions), there is a section mismatch warning from
__get_mem_config_intel(), which got moved to .text.unlikely. as a result of
that optimization:

  WARNING: modpost: vmlinux: section mismatch in reference: \
  __get_mem_config_intel+0x77 (section: .text.unlikely.) -> thread_throttle_mode_init (section: .init.text)

Mark __get_mem_config_intel() as __init as well since it is only called
from __init code, which clears up the warning.

While __rdt_get_mem_config_amd() does not exhibit a warning because it
does not call any __init code, it is a similar function that is only
called from __init code like __get_mem_config_intel(), so mark it __init
as well to keep the code symmetrical.

CONFIG_SECTION_MISMATCH_WARN_ONLY=n would turn this into a fatal error.

Fixes: 05b93417ce ("x86/intel_rdt/mba: Add primary support for Memory Bandwidth Allocation (MBA)")
Fixes: 4d05bf71f1 ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: <stable@kernel.org>
Link: 6b11573b8c [1]
Link: https://lore.kernel.org/r/20240917-x86-restctrl-get_mem_config_intel-init-v3-1-10d521256284@kernel.org
2024-10-08 21:05:10 +02:00
Martin Kletzander
2b5648416e x86/resctrl: Avoid overflow in MB settings in bw_validate()
The resctrl schemata file supports specifying memory bandwidth associated with
the Memory Bandwidth Allocation (MBA) feature via a percentage (this is the
default) or bandwidth in MiBps (when resctrl is mounted with the "mba_MBps"
option).

The allowed range for the bandwidth percentage is from
/sys/fs/resctrl/info/MB/min_bandwidth to 100, using a granularity of
/sys/fs/resctrl/info/MB/bandwidth_gran. The supported range for the MiBps
bandwidth is 0 to U32_MAX.

There are two issues with parsing of MiBps memory bandwidth:

* The user provided MiBps is mistakenly rounded up to the granularity
  that is unique to percentage input.

* The user provided MiBps is parsed using unsigned long (thus accepting
  values up to ULONG_MAX), and then assigned to u32 that could result in
  overflow.

Do not round up the MiBps value and parse user provided bandwidth as the u32
it is intended to be. Use the appropriate kstrtou32() that can detect out of
range values.

Fixes: 8205a078ba ("x86/intel_rdt/mba_sc: Add schemata support")
Fixes: 6ce1560d35 ("x86/resctrl: Switch over to the resctrl mbps_val list")
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Martin Kletzander <nert.pinx@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
2024-10-08 16:17:38 +02:00